All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Cliff Wickman <cpw@sgi.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Mel Gorman <mel@csn.ul.ie>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	David Sterba <dsterba@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 26/44] mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
Date: Wed,  5 Jun 2013 14:12:24 -0700	[thread overview]
Message-ID: <20130605211224.615407261@linuxfoundation.org> (raw)
In-Reply-To: <20130605211221.858177087@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Cliff Wickman <cpw@sgi.com>

commit a9ff785e4437c83d2179161e012f5bdfbd6381f0 upstream.

A panic can be caused by simply cat'ing /proc/<pid>/smaps while an
application has a VM_PFNMAP range.  It happened in-house when a
benchmarker was trying to decipher the memory layout of his program.

/proc/<pid>/smaps and similar walks through a user page table should not
be looking at VM_PFNMAP areas.

Certain tests in walk_page_range() (specifically split_huge_page_pmd())
assume that all the mapped PFN's are backed with page structures.  And
this is not usually true for VM_PFNMAP areas.  This can result in panics
on kernel page faults when attempting to address those page structures.

There are a half dozen callers of walk_page_range() that walk through a
task's entire page table (as N.  Horiguchi pointed out).  So rather than
change all of them, this patch changes just walk_page_range() to ignore
VM_PFNMAP areas.

The logic of hugetlb_vma() is moved back into walk_page_range(), as we
want to test any vma in the range.

VM_PFNMAP areas are used by:
- graphics memory manager   gpu/drm/drm_gem.c
- global reference unit     sgi-gru/grufile.c
- sgi special memory        char/mspec.c
- and probably several out-of-tree modules

[akpm@linux-foundation.org: remove now-unused hugetlb_vma() stub]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/pagewalk.c |   70 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 36 insertions(+), 34 deletions(-)

--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -127,28 +127,7 @@ static int walk_hugetlb_range(struct vm_
 	return 0;
 }
 
-static struct vm_area_struct* hugetlb_vma(unsigned long addr, struct mm_walk *walk)
-{
-	struct vm_area_struct *vma;
-
-	/* We don't need vma lookup at all. */
-	if (!walk->hugetlb_entry)
-		return NULL;
-
-	VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
-	vma = find_vma(walk->mm, addr);
-	if (vma && vma->vm_start <= addr && is_vm_hugetlb_page(vma))
-		return vma;
-
-	return NULL;
-}
-
 #else /* CONFIG_HUGETLB_PAGE */
-static struct vm_area_struct* hugetlb_vma(unsigned long addr, struct mm_walk *walk)
-{
-	return NULL;
-}
-
 static int walk_hugetlb_range(struct vm_area_struct *vma,
 			      unsigned long addr, unsigned long end,
 			      struct mm_walk *walk)
@@ -199,30 +178,53 @@ int walk_page_range(unsigned long addr,
 	if (!walk->mm)
 		return -EINVAL;
 
+	VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
+
 	pgd = pgd_offset(walk->mm, addr);
 	do {
-		struct vm_area_struct *vma;
+		struct vm_area_struct *vma = NULL;
 
 		next = pgd_addr_end(addr, end);
 
 		/*
-		 * handle hugetlb vma individually because pagetable walk for
-		 * the hugetlb page is dependent on the architecture and
-		 * we can't handled it in the same manner as non-huge pages.
+		 * This function was not intended to be vma based.
+		 * But there are vma special cases to be handled:
+		 * - hugetlb vma's
+		 * - VM_PFNMAP vma's
 		 */
-		vma = hugetlb_vma(addr, walk);
+		vma = find_vma(walk->mm, addr);
 		if (vma) {
-			if (vma->vm_end < next)
+			/*
+			 * There are no page structures backing a VM_PFNMAP
+			 * range, so do not allow split_huge_page_pmd().
+			 */
+			if ((vma->vm_start <= addr) &&
+			    (vma->vm_flags & VM_PFNMAP)) {
 				next = vma->vm_end;
+				pgd = pgd_offset(walk->mm, next);
+				continue;
+			}
 			/*
-			 * Hugepage is very tightly coupled with vma, so
-			 * walk through hugetlb entries within a given vma.
+			 * Handle hugetlb vma individually because pagetable
+			 * walk for the hugetlb page is dependent on the
+			 * architecture and we can't handled it in the same
+			 * manner as non-huge pages.
 			 */
-			err = walk_hugetlb_range(vma, addr, next, walk);
-			if (err)
-				break;
-			pgd = pgd_offset(walk->mm, next);
-			continue;
+			if (walk->hugetlb_entry && (vma->vm_start <= addr) &&
+			    is_vm_hugetlb_page(vma)) {
+				if (vma->vm_end < next)
+					next = vma->vm_end;
+				/*
+				 * Hugepage is very tightly coupled with vma,
+				 * so walk through hugetlb entries within a
+				 * given vma.
+				 */
+				err = walk_hugetlb_range(vma, addr, next, walk);
+				if (err)
+					break;
+				pgd = pgd_offset(walk->mm, next);
+				continue;
+			}
 		}
 
 		if (pgd_none_or_clear_bad(pgd)) {



  parent reply	other threads:[~2013-06-05 21:22 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05 21:11 [ 00/44] 3.4.48-stable review Greg Kroah-Hartman
2013-06-05 21:11 ` [ 01/44] avr32: fix relocation check for signed 18-bit offset Greg Kroah-Hartman
2013-06-05 21:12 ` [ 02/44] ARM: plat-orion: Fix num_resources and id for ge10 and ge11 Greg Kroah-Hartman
2013-06-05 21:12 ` [ 03/44] staging: vt6656: use free_netdev instead of kfree Greg Kroah-Hartman
2013-06-05 21:12 ` [ 04/44] usb: option: Add Telewell TW-LTE 4G Greg Kroah-Hartman
2013-06-05 21:12 ` [ 05/44] USB: option: add device IDs for Dell 5804 (Novatel E371) WWAN card Greg Kroah-Hartman
2013-06-05 21:12 ` [ 06/44] USB: ftdi_sio: Add support for Newport CONEX motor drivers Greg Kroah-Hartman
2013-06-05 21:12 ` [ 07/44] USB: cxacru: potential underflow in cxacru_cm_get_array() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 08/44] TTY: Fix tty miss restart after we turn off flow-control Greg Kroah-Hartman
2013-06-05 21:12 ` [ 09/44] USB: Blacklisted Cinterions PLxx WWAN Interface Greg Kroah-Hartman
2013-06-05 21:12 ` [ 10/44] USB: reset resume quirk needed by a hub Greg Kroah-Hartman
2013-06-05 21:12 ` [ 11/44] USB: xHCI: override bogus bulk wMaxPacketSize values Greg Kroah-Hartman
2013-06-05 21:12 ` [ 12/44] USB: UHCI: fix for suspend of virtual HP controller Greg Kroah-Hartman
2013-06-05 21:12 ` [ 13/44] cifs: only set ops for inodes in I_NEW state Greg Kroah-Hartman
2013-06-05 21:12 ` [ 14/44] fat: fix possible overflow for fat_clusters Greg Kroah-Hartman
2013-06-05 21:12 ` [ 15/44] perf: net_dropmonitor: Fix trace parameter order Greg Kroah-Hartman
2013-06-05 21:12 ` [ 16/44] perf: net_dropmonitor: Fix symbol-relative addresses Greg Kroah-Hartman
2013-06-05 21:12 ` [ 17/44] ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 18/44] Kirkwood: Enable PCIe port 1 on QNAP TS-11x/TS-21x Greg Kroah-Hartman
2013-06-05 21:12 ` [ 19/44] drivers/leds/leds-ot200.c: fix error caused by shifted mask Greg Kroah-Hartman
2013-06-05 21:12 ` [ 20/44] mm compaction: fix of improper cache flush in migration code Greg Kroah-Hartman
2013-06-05 21:12 ` [ 21/44] klist: del waiter from klist_remove_waiters before wakeup waitting process Greg Kroah-Hartman
2013-06-05 21:12 ` [ 22/44] wait: fix false timeouts when using wait_event_timeout() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 23/44] nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary Greg Kroah-Hartman
2013-06-05 21:12 ` [ 24/44] mm: mmu_notifier: re-fix freed page still mapped in secondary MMU Greg Kroah-Hartman
2013-06-05 21:12 ` [ 25/44] drivers/block/brd.c: fix brd_lookup_page() race Greg Kroah-Hartman
2013-06-05 21:12 ` Greg Kroah-Hartman [this message]
2013-06-05 21:12 ` [ 27/44] mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer Greg Kroah-Hartman
2013-06-05 21:12 ` [ 28/44] iscsi-target: fix heap buffer overflow on error Greg Kroah-Hartman
2013-06-05 21:12 ` [ 29/44] NFSv4: Fix a thinko in nfs4_try_open_cached Greg Kroah-Hartman
2013-06-05 21:12 ` [ 30/44] xfs: kill suid/sgid through the truncate path Greg Kroah-Hartman
2013-06-05 21:12 ` [ 31/44] drm/radeon: fix card_posted check for newer asics Greg Kroah-Hartman
2013-06-05 21:12 ` [ 32/44] cifs: fix potential buffer overrun when composing a new options string Greg Kroah-Hartman
2013-06-05 21:12 ` [ 33/44] USB: io_ti: Fix NULL dereference in chase_port() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 34/44] ata_piix: add PCI IDs for Intel BayTail Greg Kroah-Hartman
2013-06-05 21:12 ` [ 35/44] libata: make ata_exec_internal_sg honor DMADIR Greg Kroah-Hartman
2013-06-05 21:12 ` [ 36/44] m68k/mac: Fix unexpected interrupt with CONFIG_EARLY_PRINTK Greg Kroah-Hartman
2013-06-05 21:12 ` [ 37/44] xen/events: Handle VIRQ_TIMER before any other hardirq in event loop Greg Kroah-Hartman
2013-06-05 21:12 ` [ 38/44] jfs: fix a couple races Greg Kroah-Hartman
2013-06-05 21:12 ` [ 39/44] xen-netback: remove skb in xen_netbk_alloc_page Greg Kroah-Hartman
2013-06-05 21:12 ` [ 40/44] iommu/amd: Re-enable IOMMU event log interrupt after handling Greg Kroah-Hartman
2013-06-05 21:12 ` [ 41/44] iommu/amd: Workaround for ERBT1312 Greg Kroah-Hartman
2013-06-05 21:12 ` [ 42/44] x86, um: Correct syscall table type attributes breaking gcc 4.8 Greg Kroah-Hartman
2013-06-05 21:12 ` [ 43/44] mac80211: close AP_VLAN interfaces before unregistering all Greg Kroah-Hartman
2013-06-05 21:12 ` [ 44/44] thinkpad-acpi: recognize latest V-Series using DMI_BIOS_VENDOR Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130605211224.615407261@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cpw@sgi.com \
    --cc=dave.hansen@intel.com \
    --cc=dsterba@suse.cz \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.