public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave@sr71.net>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, kirill.shutemov@linux.intel.com,
	Dave Hansen <dave@sr71.net>
Subject: [PATCH 08/10] mm: pagewalk: add locked pte walker
Date: Mon, 02 Jun 2014 14:36:55 -0700	[thread overview]
Message-ID: <20140602213655.B913463C@viggo.jf.intel.com> (raw)
In-Reply-To: <20140602213644.925A26D0@viggo.jf.intel.com>


From: Dave Hansen <dave.hansen@linux.intel.com>

Neither the locking nor the splitting logic needed for
transparent huge pages is trivial.  We end up having to teach
each of the page walkers about it individually, and have the same
pattern copied across several of them.

This patch introduces a new handler: ->locked_single_entry.  It
does two things: it handles the page table locking, including the
difference between pmds and ptes, and it lets you have a single
handler for large and small pages.

This greatly simplifies the handlers.  I only implemented this
for two of the walk_page_range() users for now.  I believe this
can at least be applied to a few more of them going forward.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/mm.h |    7 +++++++
 b/mm/pagewalk.c      |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff -puN include/linux/mm.h~mm-pagewalk-add-locked-walker include/linux/mm.h
--- a/include/linux/mm.h~mm-pagewalk-add-locked-walker	2014-06-02 14:20:20.963882275 -0700
+++ b/include/linux/mm.h	2014-06-02 14:20:20.969882545 -0700
@@ -1096,6 +1096,11 @@ void unmap_vmas(struct mmu_gather *tlb,
  *	       pmd_trans_huge() pmds.  They may simply choose to
  *	       split_huge_page() instead of handling it explicitly.
  * @pte_entry: if set, called for each non-empty PTE (4th-level) entry
+ * @locked_pte_entry: if set, called for each pmd or pte entry. The
+ * 		      page table lock for the entry is also acquired
+ * 		      such that the handler does not have to worry
+ * 		      about the entry disappearing (or being split in
+ * 		      the case of a pmd_trans_huge).
  * @pte_hole: if set, called for each hole at all levels
  * @hugetlb_entry: if set, called for each hugetlb entry
  *		   *Caution*: The caller must hold mmap_sem() if @hugetlb_entry
@@ -1112,6 +1117,8 @@ struct mm_walk {
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_entry)(pte_t *pte, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
+	int (*locked_single_entry)(pte_t *pte, unsigned long addr,
+			 unsigned long pte_size, struct mm_walk *walk);
 	int (*pte_hole)(unsigned long addr, unsigned long next,
 			struct mm_walk *walk);
 	int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
diff -puN mm/pagewalk.c~mm-pagewalk-add-locked-walker mm/pagewalk.c
--- a/mm/pagewalk.c~mm-pagewalk-add-locked-walker	2014-06-02 14:20:20.965882364 -0700
+++ b/mm/pagewalk.c	2014-06-02 14:20:20.969882545 -0700
@@ -57,6 +57,40 @@ static int walk_pte_range(pmd_t *pmd, un
 	return err;
 }
 
+static int walk_single_entry_locked(pmd_t *pmd, unsigned long addr,
+				    unsigned long end, struct mm_walk *walk)
+{
+	int ret = 0;
+        struct vm_area_struct *vma = walk->vma;
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
+		ret = walk->locked_single_entry((pte_t *)pmd, addr,
+						HPAGE_PMD_SIZE, walk);
+		spin_unlock(ptl);
+		return ret;
+	}
+
+	/*
+	 * See pmd_none_or_trans_huge_or_clear_bad() for a
+	 * description of the races we are avoiding with this.
+	 * Note that this essentially acts as if the pmd were
+	 * NULL (empty).
+	 */
+	if (pmd_trans_unstable(pmd))
+		return 0;
+
+	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	for (; addr != end; pte++, addr += PAGE_SIZE) {
+		ret = walk->locked_single_entry(pte, addr, PAGE_SIZE, walk);
+		if (ret)
+			break;
+	}
+	pte_unmap_unlock(pte - 1, ptl);
+	return ret;
+}
+
 static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 			  struct mm_walk *walk)
 {
@@ -77,6 +111,15 @@ again:
 			continue;
 		}
 		/*
+		 * A ->locked_single_entry must be able to handle
+		 * arbitrary (well, pmd or pte-sized) sizes
+		 */
+		if (walk->locked_single_entry)
+			err = walk_single_entry_locked(pmd, addr, next, walk);
+		if (err)
+			break;
+
+		/*
 		 * This implies that each ->pmd_entry() handler
 		 * needs to know about pmd_trans_huge() pmds
 		 */
_

  parent reply	other threads:[~2014-06-02 21:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
2014-06-02 21:36 ` [PATCH 01/10] mm: pagewalk: consolidate vma->vm_start checks Dave Hansen
2014-06-02 21:36 ` [PATCH 02/10] mm: pagewalk: always skip hugetlbfs except when explicitly handled Dave Hansen
2014-06-02 21:36 ` [PATCH 03/10] mm: pagewalk: have generic code keep track of VMA Dave Hansen
2014-06-02 21:36 ` [PATCH 04/10] mm: pagewalk: add page walker for mincore() Dave Hansen
2014-06-02 21:36 ` [PATCH 05/10] mm: mincore: clean up hugetlbfs handling (part 1) Dave Hansen
2014-06-02 21:36 ` [PATCH 06/10] mm: mincore: clean up hugetlbfs handler (part 2) Dave Hansen
2014-06-02 21:36 ` [PATCH 07/10] mm: pagewalk: kill check for hugetlbfs inside /proc pagemap code Dave Hansen
2014-06-02 21:36 ` Dave Hansen [this message]
2014-06-02 21:36 ` [PATCH 09/10] mm: pagewalk: use new locked walker for /proc/pid/smaps Dave Hansen
2014-06-02 21:36 ` [PATCH 10/10] mm: pagewalk: use locked walker for /proc/pid/numa_maps Dave Hansen
     [not found] ` <1401745925-l651h3s9@n-horiguchi@ah.jp.nec.com>
2014-06-02 21:53   ` [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
     [not found]     ` <1401776292-dn0fof8e@n-horiguchi@ah.jp.nec.com>
2014-06-03 15:55       ` [PATCH -mm] mincore: apply page table walker on do_mincore() (Re: [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing) Dave Hansen
     [not found]         ` <1401825676-8py0r32h@n-horiguchi@ah.jp.nec.com>
2014-06-03 20:33           ` Dave Hansen
2014-06-03 15:59       ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140602213655.B913463C@viggo.jf.intel.com \
    --to=dave@sr71.net \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox