All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Michael J Wolf <mjwolf@us.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>
Subject: [RFC][PATCH 2/6] pagewalk: only split huge pages when necessary
Date: Mon, 31 Jan 2011 16:33:59 -0800	[thread overview]
Message-ID: <20110201003359.8DDFF665@kernel> (raw)
In-Reply-To: <20110201003357.D6F0BE0D@kernel>


Right now, if a mm_walk has either ->pte_entry or ->pmd_entry
set, it will unconditionally split and transparent huge pages
it runs in to.  In practice, that means that anyone doing a

	cat /proc/$pid/smaps

will unconditionally break down every huge page in the process
and depend on khugepaged to re-collapse it later.  This is
fairly suboptimal.

This patch changes that behavior.  It teaches each ->pmd_entry
handler (there are three) that they must break down the THPs
themselves.  Also, the _generic_ code will never break down
a THP unless a ->pte_entry handler is actually set.

This means that the ->pmd_entry handlers can now choose to
deal with THPs without breaking them down.


---

 linux-2.6.git-dave/fs/proc/task_mmu.c |    6 ++++++
 linux-2.6.git-dave/mm/pagewalk.c      |   24 ++++++++++++++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff -puN mm/pagewalk.c~pagewalk-dont-always-split-thp mm/pagewalk.c
--- linux-2.6.git/mm/pagewalk.c~pagewalk-dont-always-split-thp	2011-01-27 10:57:02.309914973 -0800
+++ linux-2.6.git-dave/mm/pagewalk.c	2011-01-27 10:57:02.317914965 -0800
@@ -33,19 +33,35 @@ static int walk_pmd_range(pud_t *pud, un
 
 	pmd = pmd_offset(pud, addr);
 	do {
+	again:
 		next = pmd_addr_end(addr, end);
-		split_huge_page_pmd(walk->mm, pmd);
-		if (pmd_none_or_clear_bad(pmd)) {
+		if (pmd_none(*pmd)) {
 			if (walk->pte_hole)
 				err = walk->pte_hole(addr, next, walk);
 			if (err)
 				break;
 			continue;
 		}
+		/*
+		 * This implies that each ->pmd_entry() handler
+		 * needs to know about pmd_trans_huge() pmds
+		 */
 		if (walk->pmd_entry)
 			err = walk->pmd_entry(pmd, addr, next, walk);
-		if (!err && walk->pte_entry)
-			err = walk_pte_range(pmd, addr, next, walk);
+		if (err)
+			break;
+
+		/*
+		 * Check this here so we only break down trans_huge
+		 * pages when we _need_ to
+		 */
+		if (!walk->pte_entry)
+			continue;
+
+		split_huge_page_pmd(walk->mm, pmd);
+		if (pmd_none_or_clear_bad(pmd))
+			goto again;
+		err = walk_pte_range(pmd, addr, next, walk);
 		if (err)
 			break;
 	} while (pmd++, addr = next, addr != end);
diff -puN fs/proc/task_mmu.c~pagewalk-dont-always-split-thp fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~pagewalk-dont-always-split-thp	2011-01-27 10:57:02.313914969 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c	2011-01-27 10:57:02.321914961 -0800
@@ -343,6 +343,8 @@ static int smaps_pte_range(pmd_t *pmd, u
 	struct page *page;
 	int mapcount;
 
+	split_huge_page_pmd(walk->mm, pmd);
+
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		ptent = *pte;
@@ -467,6 +469,8 @@ static int clear_refs_pte_range(pmd_t *p
 	spinlock_t *ptl;
 	struct page *page;
 
+	split_huge_page_pmd(walk->mm, pmd);
+
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		ptent = *pte;
@@ -623,6 +627,8 @@ static int pagemap_pte_range(pmd_t *pmd,
 	pte_t *pte;
 	int err = 0;
 
+	split_huge_page_pmd(walk->mm, pmd);
+
 	/* find the first VMA at or above 'addr' */
 	vma = find_vma(walk->mm, addr);
 	for (; addr != end; addr += PAGE_SIZE) {
_

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Michael J Wolf <mjwolf@us.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>
Subject: [RFC][PATCH 2/6] pagewalk: only split huge pages when necessary
Date: Mon, 31 Jan 2011 16:33:59 -0800	[thread overview]
Message-ID: <20110201003359.8DDFF665@kernel> (raw)
In-Reply-To: <20110201003357.D6F0BE0D@kernel>


Right now, if a mm_walk has either ->pte_entry or ->pmd_entry
set, it will unconditionally split and transparent huge pages
it runs in to.  In practice, that means that anyone doing a

	cat /proc/$pid/smaps

will unconditionally break down every huge page in the process
and depend on khugepaged to re-collapse it later.  This is
fairly suboptimal.

This patch changes that behavior.  It teaches each ->pmd_entry
handler (there are three) that they must break down the THPs
themselves.  Also, the _generic_ code will never break down
a THP unless a ->pte_entry handler is actually set.

This means that the ->pmd_entry handlers can now choose to
deal with THPs without breaking them down.


---

 linux-2.6.git-dave/fs/proc/task_mmu.c |    6 ++++++
 linux-2.6.git-dave/mm/pagewalk.c      |   24 ++++++++++++++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff -puN mm/pagewalk.c~pagewalk-dont-always-split-thp mm/pagewalk.c
--- linux-2.6.git/mm/pagewalk.c~pagewalk-dont-always-split-thp	2011-01-27 10:57:02.309914973 -0800
+++ linux-2.6.git-dave/mm/pagewalk.c	2011-01-27 10:57:02.317914965 -0800
@@ -33,19 +33,35 @@ static int walk_pmd_range(pud_t *pud, un
 
 	pmd = pmd_offset(pud, addr);
 	do {
+	again:
 		next = pmd_addr_end(addr, end);
-		split_huge_page_pmd(walk->mm, pmd);
-		if (pmd_none_or_clear_bad(pmd)) {
+		if (pmd_none(*pmd)) {
 			if (walk->pte_hole)
 				err = walk->pte_hole(addr, next, walk);
 			if (err)
 				break;
 			continue;
 		}
+		/*
+		 * This implies that each ->pmd_entry() handler
+		 * needs to know about pmd_trans_huge() pmds
+		 */
 		if (walk->pmd_entry)
 			err = walk->pmd_entry(pmd, addr, next, walk);
-		if (!err && walk->pte_entry)
-			err = walk_pte_range(pmd, addr, next, walk);
+		if (err)
+			break;
+
+		/*
+		 * Check this here so we only break down trans_huge
+		 * pages when we _need_ to
+		 */
+		if (!walk->pte_entry)
+			continue;
+
+		split_huge_page_pmd(walk->mm, pmd);
+		if (pmd_none_or_clear_bad(pmd))
+			goto again;
+		err = walk_pte_range(pmd, addr, next, walk);
 		if (err)
 			break;
 	} while (pmd++, addr = next, addr != end);
diff -puN fs/proc/task_mmu.c~pagewalk-dont-always-split-thp fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~pagewalk-dont-always-split-thp	2011-01-27 10:57:02.313914969 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c	2011-01-27 10:57:02.321914961 -0800
@@ -343,6 +343,8 @@ static int smaps_pte_range(pmd_t *pmd, u
 	struct page *page;
 	int mapcount;
 
+	split_huge_page_pmd(walk->mm, pmd);
+
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		ptent = *pte;
@@ -467,6 +469,8 @@ static int clear_refs_pte_range(pmd_t *p
 	spinlock_t *ptl;
 	struct page *page;
 
+	split_huge_page_pmd(walk->mm, pmd);
+
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		ptent = *pte;
@@ -623,6 +627,8 @@ static int pagemap_pte_range(pmd_t *pmd,
 	pte_t *pte;
 	int err = 0;
 
+	split_huge_page_pmd(walk->mm, pmd);
+
 	/* find the first VMA at or above 'addr' */
 	vma = find_vma(walk->mm, addr);
 	for (; addr != end; addr += PAGE_SIZE) {
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-02-01  0:34 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-01  0:33 [RFC][PATCH 0/6] more detailed per-process transparent hugepage statistics Dave Hansen
2011-02-01  0:33 ` Dave Hansen
2011-02-01  0:33 ` [RFC][PATCH 1/6] count transparent hugepage splits Dave Hansen
2011-02-01  0:33   ` Dave Hansen
2011-02-01  9:58   ` Johannes Weiner
2011-02-01  9:58     ` Johannes Weiner
2011-02-03 21:22   ` David Rientjes
2011-02-03 21:22     ` David Rientjes
2011-02-04 21:18     ` Andrea Arcangeli
2011-02-04 21:18       ` Andrea Arcangeli
2011-02-04 21:28       ` Dave Hansen
2011-02-04 21:28         ` Dave Hansen
2011-02-01  0:33 ` Dave Hansen [this message]
2011-02-01  0:33   ` [RFC][PATCH 2/6] pagewalk: only split huge pages when necessary Dave Hansen
2011-02-01 10:04   ` Johannes Weiner
2011-02-01 10:04     ` Johannes Weiner
2011-02-01 15:03     ` Dave Hansen
2011-02-01 15:03       ` Dave Hansen
2011-02-03 21:22   ` David Rientjes
2011-02-03 21:22     ` David Rientjes
2011-02-03 21:33     ` Dave Hansen
2011-02-03 21:33       ` Dave Hansen
2011-02-03 21:46       ` David Rientjes
2011-02-03 21:46         ` David Rientjes
2011-02-04 17:19         ` Dave Hansen
2011-02-04 17:19           ` Dave Hansen
2011-02-04 21:10           ` Andrea Arcangeli
2011-02-04 21:10             ` Andrea Arcangeli
2011-02-01  0:34 ` [RFC][PATCH 3/6] break out smaps_pte_entry() from smaps_pte_range() Dave Hansen
2011-02-01  0:34   ` Dave Hansen
2011-02-01 10:08   ` Johannes Weiner
2011-02-01 10:08     ` Johannes Weiner
2011-02-03 21:22   ` David Rientjes
2011-02-03 21:22     ` David Rientjes
2011-02-03 21:40     ` Dave Hansen
2011-02-03 21:40       ` Dave Hansen
2011-02-01  0:34 ` [RFC][PATCH 4/6] pass pte size argument in to smaps_pte_entry() Dave Hansen
2011-02-01  0:34   ` Dave Hansen
2011-02-01 10:09   ` Johannes Weiner
2011-02-01 10:09     ` Johannes Weiner
2011-02-03 21:22   ` David Rientjes
2011-02-03 21:22     ` David Rientjes
2011-02-01  0:34 ` [RFC][PATCH 5/6] teach smaps_pte_range() about THP pmds Dave Hansen
2011-02-01  0:34   ` Dave Hansen
2011-02-01 10:11   ` Johannes Weiner
2011-02-01 10:11     ` Johannes Weiner
2011-02-01 15:02     ` Dave Hansen
2011-02-01 15:02       ` Dave Hansen
2011-02-01 16:09       ` Andrea Arcangeli
2011-02-01 16:09         ` Andrea Arcangeli
2011-02-03 21:22   ` David Rientjes
2011-02-03 21:22     ` David Rientjes
2011-02-03 21:34     ` Dave Hansen
2011-02-03 21:34       ` Dave Hansen
2011-02-01  0:34 ` [RFC][PATCH 6/6] have smaps show transparent huge pages Dave Hansen
2011-02-01  0:34   ` Dave Hansen
2011-02-01 10:12   ` Johannes Weiner
2011-02-01 10:12     ` Johannes Weiner
2011-02-03 21:22   ` David Rientjes
2011-02-03 21:22     ` David Rientjes
2011-02-01 15:38 ` [RFC][PATCH 0/6] more detailed per-process transparent hugepage statistics Andrea Arcangeli
2011-02-01 15:38   ` Andrea Arcangeli
2011-02-01 17:15   ` Dave Hansen
2011-02-01 17:15     ` Dave Hansen
2011-02-01 20:39     ` Andrea Arcangeli
2011-02-01 20:39       ` Andrea Arcangeli
2011-02-01 20:56       ` Dave Hansen
2011-02-01 20:56         ` Dave Hansen
2011-02-02  0:07         ` Andrea Arcangeli
2011-02-02  0:07           ` Andrea Arcangeli
2011-02-08 17:54           ` Dave Hansen
2011-02-08 17:54             ` Dave Hansen
2011-02-08 18:17             ` Andrea Arcangeli
2011-02-08 18:17               ` Andrea Arcangeli
2011-02-03 21:54 ` David Rientjes
2011-02-03 21:54   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110201003359.8DDFF665@kernel \
    --to=dave@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mjwolf@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.