From: Dave Hansen <dave@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Michael J Wolf <mjwolf@us.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Dave Hansen <dave@linux.vnet.ibm.com>
Subject: [RFC][PATCH 2/6] pagewalk: only split huge pages when necessary
Date: Mon, 31 Jan 2011 16:33:59 -0800 [thread overview]
Message-ID: <20110201003359.8DDFF665@kernel> (raw)
In-Reply-To: <20110201003357.D6F0BE0D@kernel>
Right now, if a mm_walk has either ->pte_entry or ->pmd_entry
set, it will unconditionally split and transparent huge pages
it runs in to. In practice, that means that anyone doing a
cat /proc/$pid/smaps
will unconditionally break down every huge page in the process
and depend on khugepaged to re-collapse it later. This is
fairly suboptimal.
This patch changes that behavior. It teaches each ->pmd_entry
handler (there are three) that they must break down the THPs
themselves. Also, the _generic_ code will never break down
a THP unless a ->pte_entry handler is actually set.
This means that the ->pmd_entry handlers can now choose to
deal with THPs without breaking them down.
---
linux-2.6.git-dave/fs/proc/task_mmu.c | 6 ++++++
linux-2.6.git-dave/mm/pagewalk.c | 24 ++++++++++++++++++++----
2 files changed, 26 insertions(+), 4 deletions(-)
diff -puN mm/pagewalk.c~pagewalk-dont-always-split-thp mm/pagewalk.c
--- linux-2.6.git/mm/pagewalk.c~pagewalk-dont-always-split-thp 2011-01-27 10:57:02.309914973 -0800
+++ linux-2.6.git-dave/mm/pagewalk.c 2011-01-27 10:57:02.317914965 -0800
@@ -33,19 +33,35 @@ static int walk_pmd_range(pud_t *pud, un
pmd = pmd_offset(pud, addr);
do {
+ again:
next = pmd_addr_end(addr, end);
- split_huge_page_pmd(walk->mm, pmd);
- if (pmd_none_or_clear_bad(pmd)) {
+ if (pmd_none(*pmd)) {
if (walk->pte_hole)
err = walk->pte_hole(addr, next, walk);
if (err)
break;
continue;
}
+ /*
+ * This implies that each ->pmd_entry() handler
+ * needs to know about pmd_trans_huge() pmds
+ */
if (walk->pmd_entry)
err = walk->pmd_entry(pmd, addr, next, walk);
- if (!err && walk->pte_entry)
- err = walk_pte_range(pmd, addr, next, walk);
+ if (err)
+ break;
+
+ /*
+ * Check this here so we only break down trans_huge
+ * pages when we _need_ to
+ */
+ if (!walk->pte_entry)
+ continue;
+
+ split_huge_page_pmd(walk->mm, pmd);
+ if (pmd_none_or_clear_bad(pmd))
+ goto again;
+ err = walk_pte_range(pmd, addr, next, walk);
if (err)
break;
} while (pmd++, addr = next, addr != end);
diff -puN fs/proc/task_mmu.c~pagewalk-dont-always-split-thp fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~pagewalk-dont-always-split-thp 2011-01-27 10:57:02.313914969 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c 2011-01-27 10:57:02.321914961 -0800
@@ -343,6 +343,8 @@ static int smaps_pte_range(pmd_t *pmd, u
struct page *page;
int mapcount;
+ split_huge_page_pmd(walk->mm, pmd);
+
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
@@ -467,6 +469,8 @@ static int clear_refs_pte_range(pmd_t *p
spinlock_t *ptl;
struct page *page;
+ split_huge_page_pmd(walk->mm, pmd);
+
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
@@ -623,6 +627,8 @@ static int pagemap_pte_range(pmd_t *pmd,
pte_t *pte;
int err = 0;
+ split_huge_page_pmd(walk->mm, pmd);
+
/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
for (; addr != end; addr += PAGE_SIZE) {
_
WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Michael J Wolf <mjwolf@us.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Dave Hansen <dave@linux.vnet.ibm.com>
Subject: [RFC][PATCH 2/6] pagewalk: only split huge pages when necessary
Date: Mon, 31 Jan 2011 16:33:59 -0800 [thread overview]
Message-ID: <20110201003359.8DDFF665@kernel> (raw)
In-Reply-To: <20110201003357.D6F0BE0D@kernel>
Right now, if a mm_walk has either ->pte_entry or ->pmd_entry
set, it will unconditionally split and transparent huge pages
it runs in to. In practice, that means that anyone doing a
cat /proc/$pid/smaps
will unconditionally break down every huge page in the process
and depend on khugepaged to re-collapse it later. This is
fairly suboptimal.
This patch changes that behavior. It teaches each ->pmd_entry
handler (there are three) that they must break down the THPs
themselves. Also, the _generic_ code will never break down
a THP unless a ->pte_entry handler is actually set.
This means that the ->pmd_entry handlers can now choose to
deal with THPs without breaking them down.
---
linux-2.6.git-dave/fs/proc/task_mmu.c | 6 ++++++
linux-2.6.git-dave/mm/pagewalk.c | 24 ++++++++++++++++++++----
2 files changed, 26 insertions(+), 4 deletions(-)
diff -puN mm/pagewalk.c~pagewalk-dont-always-split-thp mm/pagewalk.c
--- linux-2.6.git/mm/pagewalk.c~pagewalk-dont-always-split-thp 2011-01-27 10:57:02.309914973 -0800
+++ linux-2.6.git-dave/mm/pagewalk.c 2011-01-27 10:57:02.317914965 -0800
@@ -33,19 +33,35 @@ static int walk_pmd_range(pud_t *pud, un
pmd = pmd_offset(pud, addr);
do {
+ again:
next = pmd_addr_end(addr, end);
- split_huge_page_pmd(walk->mm, pmd);
- if (pmd_none_or_clear_bad(pmd)) {
+ if (pmd_none(*pmd)) {
if (walk->pte_hole)
err = walk->pte_hole(addr, next, walk);
if (err)
break;
continue;
}
+ /*
+ * This implies that each ->pmd_entry() handler
+ * needs to know about pmd_trans_huge() pmds
+ */
if (walk->pmd_entry)
err = walk->pmd_entry(pmd, addr, next, walk);
- if (!err && walk->pte_entry)
- err = walk_pte_range(pmd, addr, next, walk);
+ if (err)
+ break;
+
+ /*
+ * Check this here so we only break down trans_huge
+ * pages when we _need_ to
+ */
+ if (!walk->pte_entry)
+ continue;
+
+ split_huge_page_pmd(walk->mm, pmd);
+ if (pmd_none_or_clear_bad(pmd))
+ goto again;
+ err = walk_pte_range(pmd, addr, next, walk);
if (err)
break;
} while (pmd++, addr = next, addr != end);
diff -puN fs/proc/task_mmu.c~pagewalk-dont-always-split-thp fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~pagewalk-dont-always-split-thp 2011-01-27 10:57:02.313914969 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c 2011-01-27 10:57:02.321914961 -0800
@@ -343,6 +343,8 @@ static int smaps_pte_range(pmd_t *pmd, u
struct page *page;
int mapcount;
+ split_huge_page_pmd(walk->mm, pmd);
+
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
@@ -467,6 +469,8 @@ static int clear_refs_pte_range(pmd_t *p
spinlock_t *ptl;
struct page *page;
+ split_huge_page_pmd(walk->mm, pmd);
+
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
@@ -623,6 +627,8 @@ static int pagemap_pte_range(pmd_t *pmd,
pte_t *pte;
int err = 0;
+ split_huge_page_pmd(walk->mm, pmd);
+
/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
for (; addr != end; addr += PAGE_SIZE) {
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-02-01 0:34 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-01 0:33 [RFC][PATCH 0/6] more detailed per-process transparent hugepage statistics Dave Hansen
2011-02-01 0:33 ` Dave Hansen
2011-02-01 0:33 ` [RFC][PATCH 1/6] count transparent hugepage splits Dave Hansen
2011-02-01 0:33 ` Dave Hansen
2011-02-01 9:58 ` Johannes Weiner
2011-02-01 9:58 ` Johannes Weiner
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:22 ` David Rientjes
2011-02-04 21:18 ` Andrea Arcangeli
2011-02-04 21:18 ` Andrea Arcangeli
2011-02-04 21:28 ` Dave Hansen
2011-02-04 21:28 ` Dave Hansen
2011-02-01 0:33 ` Dave Hansen [this message]
2011-02-01 0:33 ` [RFC][PATCH 2/6] pagewalk: only split huge pages when necessary Dave Hansen
2011-02-01 10:04 ` Johannes Weiner
2011-02-01 10:04 ` Johannes Weiner
2011-02-01 15:03 ` Dave Hansen
2011-02-01 15:03 ` Dave Hansen
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:33 ` Dave Hansen
2011-02-03 21:33 ` Dave Hansen
2011-02-03 21:46 ` David Rientjes
2011-02-03 21:46 ` David Rientjes
2011-02-04 17:19 ` Dave Hansen
2011-02-04 17:19 ` Dave Hansen
2011-02-04 21:10 ` Andrea Arcangeli
2011-02-04 21:10 ` Andrea Arcangeli
2011-02-01 0:34 ` [RFC][PATCH 3/6] break out smaps_pte_entry() from smaps_pte_range() Dave Hansen
2011-02-01 0:34 ` Dave Hansen
2011-02-01 10:08 ` Johannes Weiner
2011-02-01 10:08 ` Johannes Weiner
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:40 ` Dave Hansen
2011-02-03 21:40 ` Dave Hansen
2011-02-01 0:34 ` [RFC][PATCH 4/6] pass pte size argument in to smaps_pte_entry() Dave Hansen
2011-02-01 0:34 ` Dave Hansen
2011-02-01 10:09 ` Johannes Weiner
2011-02-01 10:09 ` Johannes Weiner
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:22 ` David Rientjes
2011-02-01 0:34 ` [RFC][PATCH 5/6] teach smaps_pte_range() about THP pmds Dave Hansen
2011-02-01 0:34 ` Dave Hansen
2011-02-01 10:11 ` Johannes Weiner
2011-02-01 10:11 ` Johannes Weiner
2011-02-01 15:02 ` Dave Hansen
2011-02-01 15:02 ` Dave Hansen
2011-02-01 16:09 ` Andrea Arcangeli
2011-02-01 16:09 ` Andrea Arcangeli
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:34 ` Dave Hansen
2011-02-03 21:34 ` Dave Hansen
2011-02-01 0:34 ` [RFC][PATCH 6/6] have smaps show transparent huge pages Dave Hansen
2011-02-01 0:34 ` Dave Hansen
2011-02-01 10:12 ` Johannes Weiner
2011-02-01 10:12 ` Johannes Weiner
2011-02-03 21:22 ` David Rientjes
2011-02-03 21:22 ` David Rientjes
2011-02-01 15:38 ` [RFC][PATCH 0/6] more detailed per-process transparent hugepage statistics Andrea Arcangeli
2011-02-01 15:38 ` Andrea Arcangeli
2011-02-01 17:15 ` Dave Hansen
2011-02-01 17:15 ` Dave Hansen
2011-02-01 20:39 ` Andrea Arcangeli
2011-02-01 20:39 ` Andrea Arcangeli
2011-02-01 20:56 ` Dave Hansen
2011-02-01 20:56 ` Dave Hansen
2011-02-02 0:07 ` Andrea Arcangeli
2011-02-02 0:07 ` Andrea Arcangeli
2011-02-08 17:54 ` Dave Hansen
2011-02-08 17:54 ` Dave Hansen
2011-02-08 18:17 ` Andrea Arcangeli
2011-02-08 18:17 ` Andrea Arcangeli
2011-02-03 21:54 ` David Rientjes
2011-02-03 21:54 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110201003359.8DDFF665@kernel \
--to=dave@linux.vnet.ibm.com \
--cc=aarcange@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjwolf@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.