stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: [PATCH 3.15 43/66] powerpc/mm: Check paca psize is up to date for huge mappings
Date: Fri,  4 Jul 2014 15:14:42 -0700	[thread overview]
Message-ID: <20140704221424.768053884@linuxfoundation.org> (raw)
In-Reply-To: <20140704221422.813435485@linuxfoundation.org>

3.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michael Ellerman <mpe@ellerman.id.au>

commit 09567e7fd44291bfc08accfdd67ad8f467842332 upstream.

We have a bug in our hugepage handling which exhibits as an infinite
loop of hash faults. If the fault is being taken in the kernel it will
typically trigger the softlockup detector, or the RCU stall detector.

The bug is as follows:

 1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
 2. Slice code converts the slice psize to 16M.
 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
    synchronises the mm->context with the paca->context. So the paca slice
    mask is updated to include the 16M slice.
 3. Either:
    * mmap() fails because there are no huge pages available.
    * mmap() succeeds and the mapping is then munmapped.
    In both cases the slice psize remains at 16M in both the paca & mm.
 4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
 5. The slice psize is converted back to 64K. Because of the check on line 539
    of slice.c we DO NOT update the paca->context. The paca slice mask is now
    out of sync with the mm slice mask.
 6. User/kernel accesses 0xa0000000.
 7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
    to create an SLB entry and inserts it in the SLB.
18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
    is found so a data access exception is generated.
19. The data access handler calls do_page_fault() -> handle_mm_fault().
10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
11. The hardware retries the access, there is still nothing in the hash table
    so once again a data access exception is generated.
12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
    hash. Although the THP mapping maps 16M the hashing is done using 64K
    as the segment page size.
13. hash_page() returns immediately after calling __hash_page_thp(), skipping
    over the code at line 1125. Resulting in the mismatch between the
    paca->context and mm->context not being detected.
14. The hardware retries the access, the hash it generates using the 16M
    SLB entry does NOT match the hash we inserted.
15. We take another data access and go into __hash_page_thp().
16. We see a valid entry in the hpte_slot_array and so we call updatepp()
    which succeeds.
17. Goto 14.

We could fix this in two ways. The first would be to remove or modify
the check on line 539 of slice.c.

The second option is to cause the check of paca psize in hash_page() on
line 1125 to also be done for THP pages.

We prefer the latter, because the check & update of the paca psize is
not done until we know it's necessary. It's also done only on the
current cpu, so we don't need to IPI all other cpus.

Without further rearranging the code, the simplest fix is to pull out
the code that checks paca psize and call it in two places. Firstly for
THP/hugetlb, and secondly for other mappings as before.

Thanks to Dave Jones for trinity, which originally found this bug.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/mm/hash_utils_64.c |   31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -964,6 +964,22 @@ void hash_failure_debug(unsigned long ea
 		trap, vsid, ssize, psize, lpsize, pte);
 }
 
+static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
+			     int psize, bool user_region)
+{
+	if (user_region) {
+		if (psize != get_paca_psize(ea)) {
+			get_paca()->context = mm->context;
+			slb_flush_and_rebolt();
+		}
+	} else if (get_paca()->vmalloc_sllp !=
+		   mmu_psize_defs[mmu_vmalloc_psize].sllp) {
+		get_paca()->vmalloc_sllp =
+			mmu_psize_defs[mmu_vmalloc_psize].sllp;
+		slb_vmalloc_update();
+	}
+}
+
 /* Result code is:
  *  0 - handled
  *  1 - normal page fault
@@ -1085,6 +1101,8 @@ int hash_page(unsigned long ea, unsigned
 			WARN_ON(1);
 		}
 #endif
+		check_paca_psize(ea, mm, psize, user_region);
+
 		goto bail;
 	}
 
@@ -1125,17 +1143,8 @@ int hash_page(unsigned long ea, unsigned
 #endif
 		}
 	}
-	if (user_region) {
-		if (psize != get_paca_psize(ea)) {
-			get_paca()->context = mm->context;
-			slb_flush_and_rebolt();
-		}
-	} else if (get_paca()->vmalloc_sllp !=
-		   mmu_psize_defs[mmu_vmalloc_psize].sllp) {
-		get_paca()->vmalloc_sllp =
-			mmu_psize_defs[mmu_vmalloc_psize].sllp;
-		slb_vmalloc_update();
-	}
+
+	check_paca_psize(ea, mm, psize, user_region);
 #endif /* CONFIG_PPC_64K_PAGES */
 
 #ifdef CONFIG_PPC_HAS_HASH_64K



  parent reply	other threads:[~2014-07-04 22:14 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-04 22:13 [PATCH 3.15 00/66] 3.15.4-stable review Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 01/66] target: Fix left-over se_lun->lun_sep pointer OOPs Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 02/66] iscsi-target: Avoid rejecting incorrect ITT for Data-Out Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 03/66] iscsi-target: Explicily clear login response PDU in exception path Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 04/66] iscsi-target: fix iscsit_del_np deadlock on unload Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 05/66] Input: synaptics - fix resolution for manually provided min/max Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 06/66] Input: elantech - deal with clickpads reporting right button events Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 07/66] Input: elantech - dont set bit 1 of reg_10 when the no_hw_res quirk is set Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 08/66] PCI: Add new ID for Intel GPU "spurious interrupt" quirk Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 09/66] PCI: Fix incorrect vgaarb conditional in WARN_ON() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 10/66] mtip32xx: Fix ERO and NoSnoop values in PCIe upstream on AMD systems Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 11/66] mtip32xx: Increase timeout for STANDBY IMMEDIATE command Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 12/66] mtip32xx: Remove dfs_parent after pci unregister Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 13/66] recordmcount/MIPS: Fix possible incorrect mcount_loc table entries in modules Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 14/66] Revert "MIPS: Save/restore MSA context around signals" Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 15/66] MIPS: MSC: Prevent out-of-bounds writes to MIPS SC ioremapd region Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 16/66] hpsa: add new Smart Array PCI IDs (May 2014) Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 17/66] UBIFS: fix an mmap and fsync race condition Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 18/66] UBIFS: Remove incorrect assertion in shrink_tnc() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 19/66] RDMA/cxgb4: Fix memory leaks in c4iw_alloc() error paths Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 20/66] RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 21/66] RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 22/66] watchdog: sp805: Set watchdog_device->timeout from ->set_timeout() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 23/66] watchdog: ath79_wdt: avoid spurious restarts on AR934x Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 24/66] watchdog: kempld-wdt: Use the correct value when configuring the prescaler with the watchdog Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 25/66] kernel/watchdog.c: remove preemption restrictions when restarting lockup detector Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 26/66] IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 27/66] IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 28/66] IB/qib: Fix port in pkey change event Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 29/66] IB/ipath: Translate legacy diagpkt into newer extended diagpkt Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 30/66] IB/srp: Fix a sporadic crash triggered by cable pulling Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 31/66] IB/umad: Fix error handling Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 32/66] IB/umad: Fix use-after-free on close Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 33/66] SUNRPC: Fix a module reference leak in svc_handle_xprt Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 34/66] pNFS: Handle allocation errors correctly in filelayout_alloc_layout_hdr() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 35/66] nfsd4: fix FREE_STATEID lockowner leak Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 36/66] nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 37/66] NFS: Dont declare inode uptodate unless all attributes were checked Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 38/66] NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 39/66] NFS: populate ->net in mount data when remounting Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 40/66] nfs: Fix cache_validity check in nfs_write_pageuptodate() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 41/66] powerpc/pseries: Fix overwritten PE state Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 42/66] powernv: Fix permissions on sysparam sysfs entries Greg Kroah-Hartman
2014-07-04 22:14 ` Greg Kroah-Hartman [this message]
2014-07-04 22:14 ` [PATCH 3.15 44/66] powerpc/serial: Use saner flags when creating legacy ports Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 45/66] powerpc: 64bit sendfile is capped at 2GB Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 46/66] powerpc: fix typo CONFIG_PMAC Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 47/66] powerpc/perf: Ensure all EBB register state is cleared on fork() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 48/66] powerpc: fix typo CONFIG_PPC_CPU Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 49/66] powerpc: Dont setup CPUs with bad status Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 50/66] powerpc: Add AT_HWCAP2 to indicate V.CRYPTO category support Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 51/66] powerpc: Dont skip ePAPR spin-table CPUs Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 52/66] xfs: xfs_readsb needs to check for magic numbers Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 53/66] reiserfs: call truncate_setsize under tailpack mutex Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 54/66] cpufreq: ppc-corenet-cpu-freq: do_div use quotient Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 55/66] cpufreq: unlock when failing cpufreq_update_policy() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 56/66] MIPS: KVM: Remove redundant NULL checks before kfree() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 57/66] MIPS: KVM: Fix memory leak on VCPU Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 58/66] ptrace,x86: force IRET path after a ptrace_stop() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 59/66] lz4: add overrun checks to lz4_uncompress_unknownoutputsize() Greg Kroah-Hartman
2014-07-04 22:14 ` [PATCH 3.15 60/66] Documentation/SubmittingPatches: describe the Fixes: tag Greg Kroah-Hartman
2014-07-04 22:15 ` [PATCH 3.15 61/66] tracing: Try again for saved cmdline if failed due to locking Greg Kroah-Hartman
2014-07-04 22:15 ` [PATCH 3.15 62/66] tracing: Fix syscall_*regfunc() vs copy_process() race Greg Kroah-Hartman
2014-07-04 22:15 ` [PATCH 3.15 63/66] ALSA: usb-audio: Fix races at disconnection and PCM closing Greg Kroah-Hartman
2014-07-04 22:15 ` [PATCH 3.15 64/66] ALSA: hda - hdmi: call overridden init on resume Greg Kroah-Hartman
2014-07-04 22:15 ` [PATCH 3.15 65/66] ALSA: hda - Adjust speaker HPF and add LED support for HP Spectre 13 Greg Kroah-Hartman
2014-07-04 22:15 ` [PATCH 3.15 66/66] ALSA: hda - restore BCLK M/N values when resuming HSW/BDW display controller Greg Kroah-Hartman
2014-07-05  5:48 ` [PATCH 3.15 00/66] 3.15.4-stable review Guenter Roeck
2014-07-05  6:52   ` Satoru Takeuchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704221424.768053884@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).