* [PATCH 1/7] mm/pagewalk: remove pgd_entry() and pud_entry()
2014-06-06 22:58 [PATCH -mm 0/7] mm/pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
@ 2014-06-06 22:58 ` Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 2/7] mm/pagewalk: replace mm_walk->skip with more general mm_walk->control Naoya Horiguchi
` (5 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-06 22:58 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
Currently no user of page table walker sets ->pgd_entry() or ->pud_entry(),
so checking their existence in each loop is just wasting CPU cycle.
So let's remove it to reduce overhead.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
include/linux/mm.h | 6 ------
mm/pagewalk.c | 18 +-----------------
2 files changed, 1 insertion(+), 23 deletions(-)
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/include/linux/mm.h v3.15-rc8-mmots-2014-06-03-16-28/include/linux/mm.h
index 563c79ea07bd..b4aa6579f2b1 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/include/linux/mm.h
+++ v3.15-rc8-mmots-2014-06-03-16-28/include/linux/mm.h
@@ -1092,8 +1092,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
/**
* mm_walk - callbacks for walk_page_range
- * @pgd_entry: if set, called for each non-empty PGD (top-level) entry
- * @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
* @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
* this handler is required to be able to handle
* pmd_trans_huge() pmds. They may simply choose to
@@ -1115,10 +1113,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
* (see the comment on walk_page_range() for more details)
*/
struct mm_walk {
- int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
- unsigned long next, struct mm_walk *walk);
- int (*pud_entry)(pud_t *pud, unsigned long addr,
- unsigned long next, struct mm_walk *walk);
int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
unsigned long next, struct mm_walk *walk);
int (*pte_entry)(pte_t *pte, unsigned long addr,
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
index b2a075ffb96e..15c7585e8684 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
@@ -115,14 +115,6 @@ static int walk_pud_range(pgd_t *pgd, unsigned long addr,
continue;
}
- if (walk->pud_entry) {
- err = walk->pud_entry(pud, addr, next, walk);
- if (skip_lower_level_walking(walk))
- continue;
- if (err)
- break;
- }
-
if (walk->pmd_entry || walk->pte_entry) {
err = walk_pmd_range(pud, addr, next, walk);
if (err)
@@ -152,15 +144,7 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
continue;
}
- if (walk->pgd_entry) {
- err = walk->pgd_entry(pgd, addr, next, walk);
- if (skip_lower_level_walking(walk))
- continue;
- if (err)
- break;
- }
-
- if (walk->pud_entry || walk->pmd_entry || walk->pte_entry) {
+ if (walk->pmd_entry || walk->pte_entry) {
err = walk_pud_range(pgd, addr, next, walk);
if (err)
break;
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/7] mm/pagewalk: replace mm_walk->skip with more general mm_walk->control
2014-06-06 22:58 [PATCH -mm 0/7] mm/pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 1/7] mm/pagewalk: remove pgd_entry() and pud_entry() Naoya Horiguchi
@ 2014-06-06 22:58 ` Naoya Horiguchi
2014-06-09 20:01 ` Dave Hansen
2014-06-06 22:58 ` [PATCH 3/7] madvise: cleanup swapin_walk_pmd_entry() Naoya Horiguchi
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-06 22:58 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
Originally mm_walk->skip is used to determine whether we walk over a vma
or not. But this is not enough because one of the page table walker's caller
subpage_mark_vma_nohuge(), will need another behavior PTWALK_BREAK, which
let us break current loop and continue from the beginning of the next loop.
To implement this behavior and make it extensible for future users, this patch
replaces mm_walk->skip with more flexible mm_walk->control, and changes its
default value to PTWALK_NEXT (which is equivalent to present walk->skip == 1.)
This is because PTWALK_NEXT provides the behavior which is most likely to be
used globally.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
fs/proc/task_mmu.c | 20 +++++++-------
include/linux/mm.h | 13 ++++++---
mm/memcontrol.c | 5 ++--
mm/mempolicy.c | 3 +--
mm/pagewalk.c | 79 +++++++++++++++++++++++++++++++++---------------------
5 files changed, 71 insertions(+), 49 deletions(-)
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/fs/proc/task_mmu.c v3.15-rc8-mmots-2014-06-03-16-28/fs/proc/task_mmu.c
index fa6d6a4e85b3..2864028ae2f8 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/fs/proc/task_mmu.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/fs/proc/task_mmu.c
@@ -502,9 +502,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
spin_unlock(ptl);
mss->anonymous_thp += HPAGE_PMD_SIZE;
- /* don't call smaps_pte() */
- walk->skip = 1;
- }
+ } else
+ walk->control = PTWALK_DOWN;
return 0;
}
@@ -762,13 +761,14 @@ static int clear_refs_test_walk(unsigned long start, unsigned long end,
* Writing 4 to /proc/pid/clear_refs affects all pages.
*/
if (cp->type == CLEAR_REFS_ANON && vma->vm_file)
- walk->skip = 1;
+ return 0;
if (cp->type == CLEAR_REFS_MAPPED && !vma->vm_file)
- walk->skip = 1;
+ return 0;
if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
if (vma->vm_flags & VM_SOFTDIRTY)
vma->vm_flags &= ~VM_SOFTDIRTY;
}
+ walk->control = PTWALK_DOWN;
return 0;
}
@@ -1006,9 +1006,8 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
break;
}
spin_unlock(ptl);
- /* don't call pagemap_pte() */
- walk->skip = 1;
- }
+ } else
+ walk->control = PTWALK_DOWN;
return err;
}
@@ -1288,9 +1287,8 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
gather_stats(page, md, pte_dirty(huge_pte),
HPAGE_PMD_SIZE/PAGE_SIZE);
spin_unlock(ptl);
- /* don't call gather_pte_stats() */
- walk->skip = 1;
- }
+ } else
+ walk->control = PTWALK_DOWN;
return 0;
}
#ifdef CONFIG_HUGETLB_PAGE
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/include/linux/mm.h v3.15-rc8-mmots-2014-06-03-16-28/include/linux/mm.h
index b4aa6579f2b1..43449eba3032 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/include/linux/mm.h
+++ v3.15-rc8-mmots-2014-06-03-16-28/include/linux/mm.h
@@ -1106,8 +1106,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
* right now." 0 means "skip the current vma."
* @mm: mm_struct representing the target process of page table walk
* @vma: vma currently walked
- * @skip: internal control flag which is set when we skip the lower
- * level entries.
+ * @control: walk control flag
* @private: private data for callbacks' use
*
* (see the comment on walk_page_range() for more details)
@@ -1125,10 +1124,18 @@ struct mm_walk {
struct mm_walk *walk);
struct mm_struct *mm;
struct vm_area_struct *vma;
- int skip;
+ int control;
void *private;
};
+enum mm_walk_control {
+ PTWALK_NEXT = 0, /* Go to the next entry in the same level or
+ * the next vma. This is default behavior. */
+ PTWALK_DOWN, /* Go down to lower level */
+ PTWALK_BREAK, /* Break current loop and continue from the
+ * next loop */
+};
+
int walk_page_range(unsigned long addr, unsigned long end,
struct mm_walk *walk);
int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk);
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/memcontrol.c v3.15-rc8-mmots-2014-06-03-16-28/mm/memcontrol.c
index 6970857ba0c8..aeab82bce739 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/memcontrol.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/memcontrol.c
@@ -6729,9 +6729,8 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
mc.precharge += HPAGE_PMD_NR;
spin_unlock(ptl);
- /* don't call mem_cgroup_count_precharge_pte() */
- walk->skip = 1;
- }
+ } else
+ skip->control = PTWALK_DOWN;
return 0;
}
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/mempolicy.c v3.15-rc8-mmots-2014-06-03-16-28/mm/mempolicy.c
index cf3b995b21d0..cc9dc8c06bcb 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/mempolicy.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/mempolicy.c
@@ -596,7 +596,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
}
qp->prev = vma;
- walk->skip = 1;
if (vma->vm_flags & VM_PFNMAP)
return 0;
@@ -610,7 +609,7 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
vma_migratable(vma)))
/* queue pages from current vma */
- walk->skip = 0;
+ walk->control = PTWALK_DOWN;
return 0;
}
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
index 15c7585e8684..385efd59178f 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
@@ -3,22 +3,12 @@
#include <linux/sched.h>
#include <linux/hugetlb.h>
-/*
- * Check the current skip status of page table walker.
- *
- * Here what I mean by skip is to skip lower level walking, and that was
- * determined for each entry independently. For example, when walk_pmd_range
- * handles a pmd_trans_huge we don't have to walk over ptes under that pmd,
- * and the skipping does not affect the walking over ptes under other pmds.
- * That's why we reset @walk->skip after tested.
- */
-static bool skip_lower_level_walking(struct mm_walk *walk)
+static int get_reset_walk_control(struct mm_walk *walk)
{
- if (walk->skip) {
- walk->skip = 0;
- return true;
- }
- return false;
+ int ret = walk->control;
+ /* Reset to default value */
+ walk->control = PTWALK_NEXT;
+ return ret;
}
static int walk_pte_range(pmd_t *pmd, unsigned long addr,
@@ -47,7 +37,18 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, walk);
if (err)
break;
+ switch (get_reset_walk_control(walk)) {
+ case PTWALK_NEXT:
+ continue;
+ case PTWALK_DOWN:
+ break;
+ case PTWALK_BREAK:
+ goto out_unlock;
+ default:
+ BUG();
+ }
} while (pte++, addr += PAGE_SIZE, addr < end);
+out_unlock:
pte_unmap_unlock(orig_pte, ptl);
cond_resched();
return addr == end ? 0 : err;
@@ -75,10 +76,16 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
if (walk->pmd_entry) {
err = walk->pmd_entry(pmd, addr, next, walk);
- if (skip_lower_level_walking(walk))
- continue;
if (err)
break;
+ switch (get_reset_walk_control(walk)) {
+ case PTWALK_NEXT:
+ continue;
+ case PTWALK_DOWN:
+ break;
+ default:
+ BUG();
+ }
}
if (walk->pte_entry) {
@@ -204,13 +211,13 @@ static inline int walk_hugetlb_range(unsigned long addr, unsigned long end,
#endif /* CONFIG_HUGETLB_PAGE */
/*
- * Decide whether we really walk over the current vma on [@start, @end)
- * or skip it. When we skip it, we set @walk->skip to 1.
- * The return value is used to control the page table walking to
- * continue (for zero) or not (for non-zero).
+ * Decide whether we really walk over the current vma on [@start, @end) or
+ * skip it. If we walk over it, we should set @walk->control to PTWALK_DOWN.
+ * Otherwise, we skip it. The return value is used to control the current
+ * walking to continue (for zero) or terminate (for non-zero).
*
- * Default check (only VM_PFNMAP check for now) is used when the caller
- * doesn't define test_walk() callback.
+ * We fall through to the default check if the caller doesn't define its own
+ * test_walk() callback.
*/
static int walk_page_test(unsigned long start, unsigned long end,
struct mm_walk *walk)
@@ -224,8 +231,8 @@ static int walk_page_test(unsigned long start, unsigned long end,
* Do not walk over vma(VM_PFNMAP), because we have no valid struct
* page backing a VM_PFNMAP range. See also commit a9ff785e4437.
*/
- if (vma->vm_flags & VM_PFNMAP)
- walk->skip = 1;
+ if (!(vma->vm_flags & VM_PFNMAP))
+ walk->control = PTWALK_DOWN;
return 0;
}
@@ -266,7 +273,7 @@ static int __walk_page_range(unsigned long start, unsigned long end,
* defines test_walk(), pmd_entry(), and pte_entry(), then callbacks are
* called in the order of test_walk(), pmd_entry(), and pte_entry().
* If you don't want to go down to lower level at some point and move to
- * the next entry in the same level, you set @walk->skip to 1.
+ * the next entry in the same level, you set @walk->control to PTWALK_DOWN.
* For example if you succeed to handle some pmd entry as trans_huge entry,
* you need not call walk_pte_range() any more, so set it to avoid that.
* We can't determine whether to go down to lower level with the return
@@ -310,10 +317,16 @@ int walk_page_range(unsigned long start, unsigned long end,
next = min(end, vma->vm_end);
err = walk_page_test(start, next, walk);
- if (skip_lower_level_walking(walk))
- continue;
if (err)
break;
+ switch (get_reset_walk_control(walk)) {
+ case PTWALK_NEXT:
+ continue;
+ case PTWALK_DOWN:
+ break;
+ default:
+ BUG();
+ }
}
err = __walk_page_range(start, next, walk);
if (err)
@@ -333,9 +346,15 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
VM_BUG_ON(!vma);
walk->vma = vma;
err = walk_page_test(vma->vm_start, vma->vm_end, walk);
- if (skip_lower_level_walking(walk))
- return 0;
if (err)
return err;
+ switch (get_reset_walk_control(walk)) {
+ case PTWALK_NEXT:
+ return 0;
+ case PTWALK_DOWN:
+ break;
+ default:
+ BUG();
+ }
return __walk_page_range(vma->vm_start, vma->vm_end, walk);
}
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] mm/pagewalk: replace mm_walk->skip with more general mm_walk->control
2014-06-06 22:58 ` [PATCH 2/7] mm/pagewalk: replace mm_walk->skip with more general mm_walk->control Naoya Horiguchi
@ 2014-06-09 20:01 ` Dave Hansen
2014-06-09 21:29 ` Naoya Horiguchi
[not found] ` <1402349339-n9udlcv2@n-horiguchi@ah.jp.nec.com>
0 siblings, 2 replies; 18+ messages in thread
From: Dave Hansen @ 2014-06-09 20:01 UTC (permalink / raw)
To: Naoya Horiguchi, linux-mm
Cc: Andrew Morton, Hugh Dickins, Kirill A. Shutemov, linux-kernel
On 06/06/2014 03:58 PM, Naoya Horiguchi wrote:
> +enum mm_walk_control {
> + PTWALK_NEXT = 0, /* Go to the next entry in the same level or
> + * the next vma. This is default behavior. */
> + PTWALK_DOWN, /* Go down to lower level */
> + PTWALK_BREAK, /* Break current loop and continue from the
> + * next loop */
> +};
I think this is a bad idea.
The page walker should be for the common cases of walking page tables,
and it should be simple. It *HAS* to be better (shorter/faster) than if
someone was to just open-code a page table walk, or it's not really useful.
The only place this is used is in the ppc walker, and it saves a single
line of code, but requires some comments to explain what is going on:
arch/powerpc/mm/subpage-prot.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
So, it adds infrastructure, but saves a single line of code. Seems like
a bad trade off to me. :(
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] mm/pagewalk: replace mm_walk->skip with more general mm_walk->control
2014-06-09 20:01 ` Dave Hansen
@ 2014-06-09 21:29 ` Naoya Horiguchi
[not found] ` <1402349339-n9udlcv2@n-horiguchi@ah.jp.nec.com>
1 sibling, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-09 21:29 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-mm, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
On Mon, Jun 09, 2014 at 01:01:44PM -0700, Dave Hansen wrote:
> On 06/06/2014 03:58 PM, Naoya Horiguchi wrote:
> > +enum mm_walk_control {
> > + PTWALK_NEXT = 0, /* Go to the next entry in the same level or
> > + * the next vma. This is default behavior. */
> > + PTWALK_DOWN, /* Go down to lower level */
> > + PTWALK_BREAK, /* Break current loop and continue from the
> > + * next loop */
> > +};
>
> I think this is a bad idea.
>
> The page walker should be for the common cases of walking page tables,
> and it should be simple. It *HAS* to be better (shorter/faster) than if
> someone was to just open-code a page table walk, or it's not really useful.
>
> The only place this is used is in the ppc walker, and it saves a single
> line of code, but requires some comments to explain what is going on:
>
> arch/powerpc/mm/subpage-prot.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> So, it adds infrastructure, but saves a single line of code. Seems like
> a bad trade off to me. :(
Right, thank you.
What I felt uneasy was that after moving pmd locking into common code,
we need unlock/re-lock before/after split_huge_page_pmd() like this:
static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
unsigned long end, struct mm_walk *walk)
{
struct vm_area_struct *vma = walk->vma;
+ spin_unlock(walk->ptl);
split_huge_page_pmd(vma, addr, pmd);
+ spin_lock(walk->ptl);
return 0;
}
I thought it's straightforward but dirty, but my workaround in this patch
was dirty too. So I'm fine to give up the control stuff and take this one.
BTW after moving pmd locking, PTWALK_DOWN is used only by walk_page_test()
and ->test_walk(), so we can completely remove ->skip by giving that role
to the positive value of the return value. It's cleaner.
Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <1402349339-n9udlcv2@n-horiguchi@ah.jp.nec.com>]
* Re: [PATCH 2/7] mm/pagewalk: replace mm_walk->skip with more general mm_walk->control
[not found] ` <1402349339-n9udlcv2@n-horiguchi@ah.jp.nec.com>
@ 2014-06-09 21:51 ` Dave Hansen
0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2014-06-09 21:51 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
On 06/09/2014 02:29 PM, Naoya Horiguchi wrote:
> static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
> unsigned long end, struct mm_walk *walk)
> {
> struct vm_area_struct *vma = walk->vma;
> + spin_unlock(walk->ptl);
> split_huge_page_pmd(vma, addr, pmd);
> + spin_lock(walk->ptl);
> return 0;
> }
>
> I thought it's straightforward but dirty, but my workaround in this patch
> was dirty too. So I'm fine to give up the control stuff and take this one.
I think there's essentially no way to fix this with the current
handlers. This needs the locks to not be held, and everything else
needs them held so that they don't have to do it themselves.
Instead of a flag to control the walk directly, we could have one that
controls whether the locks are held, although that seems quite prone to
breakage.
I think this is rare-enough code that we can live with the hack that
you've got above, although we need to run it by the ppc folks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/7] madvise: cleanup swapin_walk_pmd_entry()
2014-06-06 22:58 [PATCH -mm 0/7] mm/pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 1/7] mm/pagewalk: remove pgd_entry() and pud_entry() Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 2/7] mm/pagewalk: replace mm_walk->skip with more general mm_walk->control Naoya Horiguchi
@ 2014-06-06 22:58 ` Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 4/7] memcg: separate mem_cgroup_move_charge_pte_range() Naoya Horiguchi
` (3 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-06 22:58 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
With the recent update on page table walker, we can use common code for
the walking more. Unlike many other users, this swapin_walk expects to
handle swap entries. As a result we should be careful about ptl locking.
Swapin operation, read_swap_cache_async(), could cause page reclaim, so
we can't keep holding ptl throughout this pte loop.
In order to properly handle ptl in pte_entry(), this patch adds two new
members on struct mm_walk.
This cleanup is necessary to get to the final form of page table walker,
where we should do all caller's specific work on leaf entries (IOW, all
pmd_entry() should be used for trans_pmd.)
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
---
include/linux/mm.h | 4 ++++
mm/madvise.c | 54 +++++++++++++++++++++++-------------------------------
mm/pagewalk.c | 5 +++--
3 files changed, 30 insertions(+), 33 deletions(-)
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/include/linux/mm.h v3.15-rc8-mmots-2014-06-03-16-28/include/linux/mm.h
index 43449eba3032..a94166b1b48b 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/include/linux/mm.h
+++ v3.15-rc8-mmots-2014-06-03-16-28/include/linux/mm.h
@@ -1106,6 +1106,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
* right now." 0 means "skip the current vma."
* @mm: mm_struct representing the target process of page table walk
* @vma: vma currently walked
+ * @pmd: current pmd entry
+ * @ptl: page table lock associated with current entry
* @control: walk control flag
* @private: private data for callbacks' use
*
@@ -1124,6 +1126,8 @@ struct mm_walk {
struct mm_walk *walk);
struct mm_struct *mm;
struct vm_area_struct *vma;
+ pmd_t *pmd;
+ spinlock_t *ptl;
int control;
void *private;
};
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/madvise.c v3.15-rc8-mmots-2014-06-03-16-28/mm/madvise.c
index a402f8fdc68e..06b390a6fbbd 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/madvise.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/madvise.c
@@ -135,38 +135,31 @@ static long madvise_behavior(struct vm_area_struct *vma,
}
#ifdef CONFIG_SWAP
-static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
+/*
+ * Assuming that page table walker holds page table lock.
+ */
+static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
unsigned long end, struct mm_walk *walk)
{
- pte_t *orig_pte;
- struct vm_area_struct *vma = walk->private;
- unsigned long index;
-
- if (pmd_none_or_trans_huge_or_clear_bad(pmd))
- return 0;
-
- for (index = start; index != end; index += PAGE_SIZE) {
- pte_t pte;
- swp_entry_t entry;
- struct page *page;
- spinlock_t *ptl;
-
- orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
- pte = *(orig_pte + ((index - start) / PAGE_SIZE));
- pte_unmap_unlock(orig_pte, ptl);
-
- if (pte_present(pte) || pte_none(pte) || pte_file(pte))
- continue;
- entry = pte_to_swp_entry(pte);
- if (unlikely(non_swap_entry(entry)))
- continue;
-
- page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
- vma, index);
- if (page)
- page_cache_release(page);
- }
+ pte_t ptent;
+ pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
+ swp_entry_t entry;
+ struct page *page;
+ ptent = *pte;
+ pte_unmap_unlock(orig_pte, walk->ptl);
+ if (pte_present(ptent) || pte_none(ptent) || pte_file(ptent))
+ goto lock;
+ entry = pte_to_swp_entry(ptent);
+ if (unlikely(non_swap_entry(entry)))
+ goto lock;
+ page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
+ walk->vma, start);
+ if (page)
+ page_cache_release(page);
+lock:
+ pte_offset_map(walk->pmd, start & PMD_MASK);
+ spin_lock(walk->ptl);
return 0;
}
@@ -175,8 +168,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
{
struct mm_walk walk = {
.mm = vma->vm_mm,
- .pmd_entry = swapin_walk_pmd_entry,
- .private = vma,
+ .pte_entry = swapin_walk_pte_entry,
};
walk_page_range(start, end, &walk);
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
index 385efd59178f..8d71e09a36ea 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
@@ -20,7 +20,8 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
spinlock_t *ptl;
int err = 0;
- orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+ walk->pmd = pmd;
+ orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
do {
if (pte_none(*pte)) {
if (walk->pte_hole)
@@ -49,7 +50,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
}
} while (pte++, addr += PAGE_SIZE, addr < end);
out_unlock:
- pte_unmap_unlock(orig_pte, ptl);
+ pte_unmap_unlock(orig_pte, walk->ptl);
cond_resched();
return addr == end ? 0 : err;
}
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/7] memcg: separate mem_cgroup_move_charge_pte_range()
2014-06-06 22:58 [PATCH -mm 0/7] mm/pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
` (2 preceding siblings ...)
2014-06-06 22:58 ` [PATCH 3/7] madvise: cleanup swapin_walk_pmd_entry() Naoya Horiguchi
@ 2014-06-06 22:58 ` Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 5/7] arch/powerpc/mm/subpage-prot.c: cleanup subpage_walk_pmd_entry() Naoya Horiguchi
` (2 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-06 22:58 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
mem_cgroup_move_charge_pte_range() handles both pte and pmd, which is not
standardized, so let's cleanup it. One tricky part is the retry, which is
performed when we detect !mc.precharge. In such case we retry the same entry,
so we don't have to go outside the pte loop. With rewriting this retry in
the pte loop, we can separate pmd_entry() and pte_entry(), which is what
we need.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
mm/memcontrol.c | 128 +++++++++++++++++++++++++++++---------------------------
1 file changed, 66 insertions(+), 62 deletions(-)
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/memcontrol.c v3.15-rc8-mmots-2014-06-03-16-28/mm/memcontrol.c
index aeab82bce739..3b1692d2bca3 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/memcontrol.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/memcontrol.c
@@ -6880,14 +6880,72 @@ static void mem_cgroup_cancel_attach(struct cgroup_subsys_state *css,
mem_cgroup_clear_mc();
}
-static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
+static int mem_cgroup_move_charge_pte(pte_t *pte,
unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
int ret = 0;
struct vm_area_struct *vma = walk->vma;
- pte_t *pte;
- spinlock_t *ptl;
+ union mc_target target;
+ struct page *page;
+ struct page_cgroup *pc;
+ swp_entry_t ent;
+
+retry:
+ if (!mc.precharge) {
+ pte_t *orig_pte = pte - ((addr & (PMD_SIZE - 1)) >> PAGE_SHIFT);
+ pte_unmap_unlock(orig_pte, walk->ptl);
+ cond_resched();
+ /*
+ * We have consumed all precharges we got in can_attach().
+ * We try charge one by one, but don't do any additional
+ * charges to mc.to if we have failed in charge once in attach()
+ * phase.
+ */
+ ret = mem_cgroup_do_precharge(1);
+ pte_offset_map(walk->pmd, addr & PMD_MASK);
+ spin_lock(walk->ptl);
+ if (!ret)
+ goto retry;
+ return ret;
+ }
+
+ switch (get_mctgt_type(vma, addr, *pte, &target)) {
+ case MC_TARGET_PAGE:
+ page = target.page;
+ if (isolate_lru_page(page))
+ goto put;
+ pc = lookup_page_cgroup(page);
+ if (!mem_cgroup_move_account(page, 1, pc,
+ mc.from, mc.to)) {
+ mc.precharge--;
+ /* we uncharge from mc.from later. */
+ mc.moved_charge++;
+ }
+ putback_lru_page(page);
+put: /* get_mctgt_type() gets the page */
+ put_page(page);
+ break;
+ case MC_TARGET_SWAP:
+ ent = target.ent;
+ if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
+ mc.precharge--;
+ /* we fixup refcnts and charges later. */
+ mc.moved_swap++;
+ }
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
+ unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
+{
+ struct vm_area_struct *vma = walk->vma;
enum mc_target_type target_type;
union mc_target target;
struct page *page;
@@ -6923,71 +6981,17 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
put_page(page);
}
spin_unlock(ptl);
- return 0;
- }
-
- if (pmd_trans_unstable(pmd))
- return 0;
-retry:
- pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- for (; addr != end; addr += PAGE_SIZE) {
- pte_t ptent = *(pte++);
- swp_entry_t ent;
-
- if (!mc.precharge)
- break;
-
- switch (get_mctgt_type(vma, addr, ptent, &target)) {
- case MC_TARGET_PAGE:
- page = target.page;
- if (isolate_lru_page(page))
- goto put;
- pc = lookup_page_cgroup(page);
- if (!mem_cgroup_move_account(page, 1, pc,
- mc.from, mc.to)) {
- mc.precharge--;
- /* we uncharge from mc.from later. */
- mc.moved_charge++;
- }
- putback_lru_page(page);
-put: /* get_mctgt_type() gets the page */
- put_page(page);
- break;
- case MC_TARGET_SWAP:
- ent = target.ent;
- if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
- mc.precharge--;
- /* we fixup refcnts and charges later. */
- mc.moved_swap++;
- }
- break;
- default:
- break;
- }
- }
- pte_unmap_unlock(pte - 1, ptl);
- cond_resched();
-
- if (addr != end) {
- /*
- * We have consumed all precharges we got in can_attach().
- * We try charge one by one, but don't do any additional
- * charges to mc.to if we have failed in charge once in attach()
- * phase.
- */
- ret = mem_cgroup_do_precharge(1);
- if (!ret)
- goto retry;
- }
-
- return ret;
+ } else
+ walk->control = PTWALK_DOWN;
+ return 0;
}
static void mem_cgroup_move_charge(struct mm_struct *mm)
{
struct vm_area_struct *vma;
struct mm_walk mem_cgroup_move_charge_walk = {
- .pmd_entry = mem_cgroup_move_charge_pte_range,
+ .pmd_entry = mem_cgroup_move_charge_pmd,
+ .pte_entry = mem_cgroup_move_charge_pte,
.mm = mm,
};
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/7] arch/powerpc/mm/subpage-prot.c: cleanup subpage_walk_pmd_entry()
2014-06-06 22:58 [PATCH -mm 0/7] mm/pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
` (3 preceding siblings ...)
2014-06-06 22:58 ` [PATCH 4/7] memcg: separate mem_cgroup_move_charge_pte_range() Naoya Horiguchi
@ 2014-06-06 22:58 ` Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 6/7] mm/pagewalk: move pmd_trans_huge_lock() from callbacks to common code Naoya Horiguchi
2014-06-06 22:58 ` [PATCH 7/7] mincore: apply page table walker on do_mincore() Naoya Horiguchi
6 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-06 22:58 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
Currently subpage_mark_vma_nohuge() uses page table walker to find thps and
then split them. But this can be done by page table walker itself, so let's
rewrite it in more suitable way. No functional change.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
arch/powerpc/mm/subpage-prot.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/arch/powerpc/mm/subpage-prot.c v3.15-rc8-mmots-2014-06-03-16-28/arch/powerpc/mm/subpage-prot.c
index fa9fb5b4c66c..555cfe15371d 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/arch/powerpc/mm/subpage-prot.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/arch/powerpc/mm/subpage-prot.c
@@ -131,11 +131,10 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
+static int subpage_walk_pte(pte_t *pte, unsigned long addr,
unsigned long end, struct mm_walk *walk)
{
- struct vm_area_struct *vma = walk->vma;
- split_huge_page_pmd(vma, addr, pmd);
+ walk->control = PTWALK_BREAK;
return 0;
}
@@ -143,9 +142,14 @@ static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr,
unsigned long len)
{
struct vm_area_struct *vma;
+ /*
+ * What this walking expects is to split all thps under this mm.
+ * Page table walker internally splits thps just before we try to
+ * call .pte_entry() on them, so let's utilize it.
+ */
struct mm_walk subpage_proto_walk = {
.mm = mm,
- .pmd_entry = subpage_walk_pmd_entry,
+ .pte_entry = subpage_walk_pte,
};
/*
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 6/7] mm/pagewalk: move pmd_trans_huge_lock() from callbacks to common code
2014-06-06 22:58 [PATCH -mm 0/7] mm/pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
` (4 preceding siblings ...)
2014-06-06 22:58 ` [PATCH 5/7] arch/powerpc/mm/subpage-prot.c: cleanup subpage_walk_pmd_entry() Naoya Horiguchi
@ 2014-06-06 22:58 ` Naoya Horiguchi
2014-06-09 20:04 ` Dave Hansen
2014-06-06 22:58 ` [PATCH 7/7] mincore: apply page table walker on do_mincore() Naoya Horiguchi
6 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-06 22:58 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
Now all of current users of page table walker are canonicalized, i.e.
pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
So we can factorize common code more.
This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
fs/proc/task_mmu.c | 65 ++++++++++++++++++------------------------------------
mm/memcontrol.c | 53 ++++++++++++++------------------------------
mm/pagewalk.c | 25 +++++++++++++++++----
3 files changed, 59 insertions(+), 84 deletions(-)
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/fs/proc/task_mmu.c v3.15-rc8-mmots-2014-06-03-16-28/fs/proc/task_mmu.c
index 2864028ae2f8..0b45bb9f3351 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/fs/proc/task_mmu.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/fs/proc/task_mmu.c
@@ -496,14 +496,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
struct mem_size_stats *mss = walk->private;
- spinlock_t *ptl;
-
- if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
- smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
- spin_unlock(ptl);
- mss->anonymous_thp += HPAGE_PMD_SIZE;
- } else
- walk->control = PTWALK_DOWN;
+ smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
+ mss->anonymous_thp += HPAGE_PMD_SIZE;
return 0;
}
@@ -983,31 +977,22 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
struct vm_area_struct *vma = walk->vma;
struct pagemapread *pm = walk->private;
pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
- spinlock_t *ptl;
+ int pmd_flags2;
- if (!vma)
- return err;
- if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
- int pmd_flags2;
-
- if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
- pmd_flags2 = __PM_SOFT_DIRTY;
- else
- pmd_flags2 = 0;
+ if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
+ pmd_flags2 = __PM_SOFT_DIRTY;
+ else
+ pmd_flags2 = 0;
- for (; addr != end; addr += PAGE_SIZE) {
- unsigned long offset;
+ for (; addr != end; addr += PAGE_SIZE) {
+ unsigned long offset;
- offset = (addr & ~PAGEMAP_WALK_MASK) >>
- PAGE_SHIFT;
- thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
- err = add_to_pagemap(addr, &pme, pm);
- if (err)
- break;
- }
- spin_unlock(ptl);
- } else
- walk->control = PTWALK_DOWN;
+ offset = (addr & ~PAGEMAP_WALK_MASK) >> PAGE_SHIFT;
+ thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+ err = add_to_pagemap(addr, &pme, pm);
+ if (err)
+ break;
+ }
return err;
}
@@ -1276,19 +1261,13 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
{
struct numa_maps *md = walk->private;
struct vm_area_struct *vma = walk->vma;
- spinlock_t *ptl;
-
- if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
- pte_t huge_pte = *(pte_t *)pmd;
- struct page *page;
-
- page = can_gather_numa_stats(huge_pte, vma, addr);
- if (page)
- gather_stats(page, md, pte_dirty(huge_pte),
- HPAGE_PMD_SIZE/PAGE_SIZE);
- spin_unlock(ptl);
- } else
- walk->control = PTWALK_DOWN;
+ pte_t huge_pte = *(pte_t *)pmd;
+ struct page *page;
+
+ page = can_gather_numa_stats(huge_pte, vma, addr);
+ if (page)
+ gather_stats(page, md, pte_dirty(huge_pte),
+ HPAGE_PMD_SIZE/PAGE_SIZE);
return 0;
}
#ifdef CONFIG_HUGETLB_PAGE
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/memcontrol.c v3.15-rc8-mmots-2014-06-03-16-28/mm/memcontrol.c
index 3b1692d2bca3..bb987cb9e043 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/memcontrol.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/memcontrol.c
@@ -6723,14 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
struct mm_walk *walk)
{
struct vm_area_struct *vma = walk->vma;
- spinlock_t *ptl;
- if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
- if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
- mc.precharge += HPAGE_PMD_NR;
- spin_unlock(ptl);
- } else
- skip->control = PTWALK_DOWN;
+ if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
+ mc.precharge += HPAGE_PMD_NR;
return 0;
}
@@ -6951,38 +6946,22 @@ static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
struct page *page;
struct page_cgroup *pc;
- /*
- * We don't take compound_lock() here but no race with splitting thp
- * happens because:
- * - if pmd_trans_huge_lock() returns 1, the relevant thp is not
- * under splitting, which means there's no concurrent thp split,
- * - if another thread runs into split_huge_page() just after we
- * entered this if-block, the thread must wait for page table lock
- * to be unlocked in __split_huge_page_splitting(), where the main
- * part of thp split is not executed yet.
- */
- if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
- if (mc.precharge < HPAGE_PMD_NR) {
- spin_unlock(ptl);
- return 0;
- }
- target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
- if (target_type == MC_TARGET_PAGE) {
- page = target.page;
- if (!isolate_lru_page(page)) {
- pc = lookup_page_cgroup(page);
- if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
- pc, mc.from, mc.to)) {
- mc.precharge -= HPAGE_PMD_NR;
- mc.moved_charge += HPAGE_PMD_NR;
- }
- putback_lru_page(page);
+ if (mc.precharge < HPAGE_PMD_NR)
+ return 0;
+ target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
+ if (target_type == MC_TARGET_PAGE) {
+ page = target.page;
+ if (!isolate_lru_page(page)) {
+ pc = lookup_page_cgroup(page);
+ if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
+ pc, mc.from, mc.to)) {
+ mc.precharge -= HPAGE_PMD_NR;
+ mc.moved_charge += HPAGE_PMD_NR;
}
- put_page(page);
+ putback_lru_page(page);
}
- spin_unlock(ptl);
- } else
- walk->control = PTWALK_DOWN;
+ put_page(page);
+ }
return 0;
}
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
index 8d71e09a36ea..879cee00eb70 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/pagewalk.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/pagewalk.c
@@ -61,6 +61,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
pmd_t *pmd;
unsigned long next;
int err = 0;
+ spinlock_t *ptl;
pmd = pmd_offset(pud, addr);
do {
@@ -75,8 +76,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
continue;
}
+ /*
+ * We don't take compound_lock() here but no race with splitting
+ * thp happens because:
+ * - if pmd_trans_huge_lock() returns 1, the relevant thp is
+ * not under splitting, which means there's no concurrent
+ * thp split,
+ * - if another thread runs into split_huge_page() just after
+ * we entered this if-block, the thread must wait for page
+ * table lock to be unlocked in __split_huge_page_splitting(),
+ * where the main part of thp split is not executed yet.
+ */
if (walk->pmd_entry) {
- err = walk->pmd_entry(pmd, addr, next, walk);
+ if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
+ err = walk->pmd_entry(pmd, addr, next, walk);
+ spin_unlock(ptl);
+ }
if (err)
break;
switch (get_reset_walk_control(walk)) {
@@ -286,9 +301,11 @@ static int __walk_page_range(unsigned long start, unsigned long end,
* outside a vma. If you want to access to some caller-specific data from
* callbacks, @walk->private should be helpful.
*
- * The callers should hold @walk->mm->mmap_sem. Note that the lower level
- * iterators can take page table lock in lowest level iteration and/or
- * in split_huge_page_pmd().
+ * Locking:
+ * Callers of walk_page_range() and walk_page_vma() should hold
+ * @walk->mm->mmap_sem, because these function traverse vma list and/or
+ * access to vma's data. And page table lock is held during running
+ * pmd_entry() and pte_entry().
*/
int walk_page_range(unsigned long start, unsigned long end,
struct mm_walk *walk)
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 6/7] mm/pagewalk: move pmd_trans_huge_lock() from callbacks to common code
2014-06-06 22:58 ` [PATCH 6/7] mm/pagewalk: move pmd_trans_huge_lock() from callbacks to common code Naoya Horiguchi
@ 2014-06-09 20:04 ` Dave Hansen
2014-06-09 21:35 ` Naoya Horiguchi
0 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2014-06-09 20:04 UTC (permalink / raw)
To: Naoya Horiguchi, linux-mm
Cc: Andrew Morton, Hugh Dickins, Kirill A. Shutemov, linux-kernel
On 06/06/2014 03:58 PM, Naoya Horiguchi wrote:
> @@ -6723,14 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
> struct mm_walk *walk)
> {
> struct vm_area_struct *vma = walk->vma;
> - spinlock_t *ptl;
>
> - if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> - mc.precharge += HPAGE_PMD_NR;
> - spin_unlock(ptl);
> - } else
> - skip->control = PTWALK_DOWN;
> + if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> + mc.precharge += HPAGE_PMD_NR;
> return 0;
> }
I guess my series did two things:
1. move page table walking to the walk_page_range() code
2. make new walk handler that can take arbitrarily-sizes ptes
This does (1) quite nicely and has some nice code savings. I still
think (2) has some value, and like my approach, but this is definitely a
step in the right direction.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/7] mm/pagewalk: move pmd_trans_huge_lock() from callbacks to common code
2014-06-09 20:04 ` Dave Hansen
@ 2014-06-09 21:35 ` Naoya Horiguchi
0 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-09 21:35 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-mm, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
On Mon, Jun 09, 2014 at 01:04:08PM -0700, Dave Hansen wrote:
> On 06/06/2014 03:58 PM, Naoya Horiguchi wrote:
> > @@ -6723,14 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
> > struct mm_walk *walk)
> > {
> > struct vm_area_struct *vma = walk->vma;
> > - spinlock_t *ptl;
> >
> > - if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> > - mc.precharge += HPAGE_PMD_NR;
> > - spin_unlock(ptl);
> > - } else
> > - skip->control = PTWALK_DOWN;
> > + if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> > + mc.precharge += HPAGE_PMD_NR;
> > return 0;
> > }
>
> I guess my series did two things:
> 1. move page table walking to the walk_page_range() code
> 2. make new walk handler that can take arbitrarily-sizes ptes
>
> This does (1) quite nicely and has some nice code savings. I still
> think (2) has some value, and like my approach, but this is definitely a
> step in the right direction.
Thank you. And yes, I'm planning to add (2) to this series in later version.
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 7/7] mincore: apply page table walker on do_mincore()
2014-06-06 22:58 [PATCH -mm 0/7] mm/pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
` (5 preceding siblings ...)
2014-06-06 22:58 ` [PATCH 6/7] mm/pagewalk: move pmd_trans_huge_lock() from callbacks to common code Naoya Horiguchi
@ 2014-06-06 22:58 ` Naoya Horiguchi
2014-06-12 22:04 ` Andrew Morton
2014-06-16 15:24 ` Sasha Levin
6 siblings, 2 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-06 22:58 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
This patch makes do_mincore() use walk_page_vma(), which reduces many lines
of code by using common page table walk code.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
mm/huge_memory.c | 20 ------
mm/mincore.c | 192 +++++++++++++++++++------------------------------------
2 files changed, 65 insertions(+), 147 deletions(-)
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/huge_memory.c v3.15-rc8-mmots-2014-06-03-16-28/mm/huge_memory.c
index 6fd0668d4e1d..2671a9621d0e 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/huge_memory.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/huge_memory.c
@@ -1379,26 +1379,6 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
return ret;
}
-int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
-{
- spinlock_t *ptl;
- int ret = 0;
-
- if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
- /*
- * All logical pages in the range are present
- * if backed by a huge page.
- */
- spin_unlock(ptl);
- memset(vec, 1, (end - addr) >> PAGE_SHIFT);
- ret = 1;
- }
-
- return ret;
-}
-
int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
unsigned long old_addr,
unsigned long new_addr, unsigned long old_end,
diff --git v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/mincore.c v3.15-rc8-mmots-2014-06-03-16-28/mm/mincore.c
index 725c80961048..6ca7b1fd62fe 100644
--- v3.15-rc8-mmots-2014-06-03-16-28.orig/mm/mincore.c
+++ v3.15-rc8-mmots-2014-06-03-16-28/mm/mincore.c
@@ -19,38 +19,27 @@
#include <asm/uaccess.h>
#include <asm/pgtable.h>
-static void mincore_hugetlb_page_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
+static int mincore_hugetlb(pte_t *pte, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
{
+ int err = 0;
#ifdef CONFIG_HUGETLB_PAGE
- struct hstate *h;
+ unsigned char present;
+ unsigned char *vec = walk->private;
- h = hstate_vma(vma);
- while (1) {
- unsigned char present;
- pte_t *ptep;
- /*
- * Huge pages are always in RAM for now, but
- * theoretically it needs to be checked.
- */
- ptep = huge_pte_offset(current->mm,
- addr & huge_page_mask(h));
- present = ptep && !huge_pte_none(huge_ptep_get(ptep));
- while (1) {
- *vec = present;
- vec++;
- addr += PAGE_SIZE;
- if (addr == end)
- return;
- /* check hugepage border */
- if (!(addr & ~huge_page_mask(h)))
- break;
- }
- }
+ /*
+ * Hugepages under user process are always in RAM and never
+ * swapped out, but theoretically it needs to be checked.
+ */
+ present = pte && !huge_pte_none(huge_ptep_get(pte));
+ for (; addr != end; vec++, addr += PAGE_SIZE)
+ *vec = present;
+ cond_resched();
+ walk->private += (end - addr) >> PAGE_SHIFT;
#else
BUG();
#endif
+ return err;
}
/*
@@ -94,10 +83,11 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
return present;
}
-static void mincore_unmapped_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
+static int mincore_hole(unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
{
+ struct vm_area_struct *vma = walk->vma;
+ unsigned char *vec = walk->private;
unsigned long nr = (end - addr) >> PAGE_SHIFT;
int i;
@@ -111,110 +101,49 @@ static void mincore_unmapped_range(struct vm_area_struct *vma,
for (i = 0; i < nr; i++)
vec[i] = 0;
}
+ walk->private += nr;
+ return 0;
}
-static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
+static int mincore_pte(pte_t *pte, unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
{
- unsigned long next;
- spinlock_t *ptl;
- pte_t *ptep;
-
- ptep = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- do {
- pte_t pte = *ptep;
- pgoff_t pgoff;
-
- next = addr + PAGE_SIZE;
- if (pte_none(pte))
- mincore_unmapped_range(vma, addr, next, vec);
- else if (pte_present(pte))
+ struct vm_area_struct *vma = walk->vma;
+ unsigned char *vec = walk->private;
+ pgoff_t pgoff;
+
+ if (pte_present(*pte))
+ *vec = 1;
+ else if (pte_file(*pte)) {
+ pgoff = pte_to_pgoff(*pte);
+ *vec = mincore_page(vma->vm_file->f_mapping, pgoff);
+ } else { /* pte is a swap entry */
+ swp_entry_t entry = pte_to_swp_entry(*pte);
+
+ if (is_migration_entry(entry)) {
+ /* migration entries are always uptodate */
*vec = 1;
- else if (pte_file(pte)) {
- pgoff = pte_to_pgoff(pte);
- *vec = mincore_page(vma->vm_file->f_mapping, pgoff);
- } else { /* pte is a swap entry */
- swp_entry_t entry = pte_to_swp_entry(pte);
-
- if (is_migration_entry(entry)) {
- /* migration entries are always uptodate */
- *vec = 1;
- } else {
+ } else {
#ifdef CONFIG_SWAP
- pgoff = entry.val;
- *vec = mincore_page(swap_address_space(entry),
- pgoff);
+ pgoff = entry.val;
+ *vec = mincore_page(swap_address_space(entry),
+ pgoff);
#else
- WARN_ON(1);
- *vec = 1;
+ WARN_ON(1);
+ *vec = 1;
#endif
- }
}
- vec++;
- } while (ptep++, addr = next, addr != end);
- pte_unmap_unlock(ptep - 1, ptl);
-}
-
-static void mincore_pmd_range(struct vm_area_struct *vma, pud_t *pud,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
-{
- unsigned long next;
- pmd_t *pmd;
-
- pmd = pmd_offset(pud, addr);
- do {
- next = pmd_addr_end(addr, end);
- if (pmd_trans_huge(*pmd)) {
- if (mincore_huge_pmd(vma, pmd, addr, next, vec)) {
- vec += (next - addr) >> PAGE_SHIFT;
- continue;
- }
- /* fall through */
- }
- if (pmd_none_or_trans_huge_or_clear_bad(pmd))
- mincore_unmapped_range(vma, addr, next, vec);
- else
- mincore_pte_range(vma, pmd, addr, next, vec);
- vec += (next - addr) >> PAGE_SHIFT;
- } while (pmd++, addr = next, addr != end);
-}
-
-static void mincore_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
-{
- unsigned long next;
- pud_t *pud;
-
- pud = pud_offset(pgd, addr);
- do {
- next = pud_addr_end(addr, end);
- if (pud_none_or_clear_bad(pud))
- mincore_unmapped_range(vma, addr, next, vec);
- else
- mincore_pmd_range(vma, pud, addr, next, vec);
- vec += (next - addr) >> PAGE_SHIFT;
- } while (pud++, addr = next, addr != end);
+ }
+ walk->private++;
+ return 0;
}
-static void mincore_page_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
+static int mincore_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
{
- unsigned long next;
- pgd_t *pgd;
-
- pgd = pgd_offset(vma->vm_mm, addr);
- do {
- next = pgd_addr_end(addr, end);
- if (pgd_none_or_clear_bad(pgd))
- mincore_unmapped_range(vma, addr, next, vec);
- else
- mincore_pud_range(vma, pgd, addr, next, vec);
- vec += (next - addr) >> PAGE_SHIFT;
- } while (pgd++, addr = next, addr != end);
+ memset(walk->private, 1, (end - addr) >> PAGE_SHIFT);
+ walk->private += (end - addr) >> PAGE_SHIFT;
+ return 0;
}
/*
@@ -226,6 +155,7 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
{
struct vm_area_struct *vma;
unsigned long end;
+ int err;
vma = find_vma(current->mm, addr);
if (!vma || addr < vma->vm_start)
@@ -233,12 +163,20 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
- if (is_vm_hugetlb_page(vma))
- mincore_hugetlb_page_range(vma, addr, end, vec);
+ struct mm_walk mincore_walk = {
+ .pmd_entry = mincore_pmd,
+ .pte_entry = mincore_pte,
+ .pte_hole = mincore_hole,
+ .hugetlb_entry = mincore_hugetlb,
+ .mm = vma->vm_mm,
+ .vma = vma,
+ .private = vec,
+ };
+ err = walk_page_vma(vma, &mincore_walk);
+ if (err < 0)
+ return err;
else
- mincore_page_range(vma, addr, end, vec);
-
- return (end - addr) >> PAGE_SHIFT;
+ return (end - addr) >> PAGE_SHIFT;
}
/*
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 7/7] mincore: apply page table walker on do_mincore()
2014-06-06 22:58 ` [PATCH 7/7] mincore: apply page table walker on do_mincore() Naoya Horiguchi
@ 2014-06-12 22:04 ` Andrew Morton
2014-06-12 23:33 ` Naoya Horiguchi
2014-06-16 15:24 ` Sasha Levin
1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2014-06-12 22:04 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Dave Hansen, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
On Fri, 6 Jun 2014 18:58:40 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> @@ -233,12 +163,20 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
>
> end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
>
> - if (is_vm_hugetlb_page(vma))
> - mincore_hugetlb_page_range(vma, addr, end, vec);
> + struct mm_walk mincore_walk = {
> + .pmd_entry = mincore_pmd,
> + .pte_entry = mincore_pte,
> + .pte_hole = mincore_hole,
> + .hugetlb_entry = mincore_hugetlb,
> + .mm = vma->vm_mm,
> + .vma = vma,
> + .private = vec,
> + };
> + err = walk_page_vma(vma, &mincore_walk);
> + if (err < 0)
> + return err;
> else
> - mincore_page_range(vma, addr, end, vec);
> -
> - return (end - addr) >> PAGE_SHIFT;
> + return (end - addr) >> PAGE_SHIFT;
> }
>
> /*
Please review carefully.
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mincore-apply-page-table-walker-on-do_mincore-fix
mm/mincore.c: In function 'do_mincore':
mm/mincore.c:166: warning: ISO C90 forbids mixed declarations and code
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mincore.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff -puN mm/huge_memory.c~mincore-apply-page-table-walker-on-do_mincore-fix mm/huge_memory.c
diff -puN mm/mincore.c~mincore-apply-page-table-walker-on-do_mincore-fix mm/mincore.c
--- a/mm/mincore.c~mincore-apply-page-table-walker-on-do_mincore-fix
+++ a/mm/mincore.c
@@ -151,32 +151,34 @@ static int mincore_pmd(pmd_t *pmd, unsig
* all the arguments, we hold the mmap semaphore: we should
* just return the amount of info we're asked for.
*/
-static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec)
+static long do_mincore(unsigned long addr, unsigned long pages,
+ unsigned char *vec)
{
struct vm_area_struct *vma;
- unsigned long end;
int err;
-
- vma = find_vma(current->mm, addr);
- if (!vma || addr < vma->vm_start)
- return -ENOMEM;
-
- end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
-
struct mm_walk mincore_walk = {
.pmd_entry = mincore_pmd,
.pte_entry = mincore_pte,
.pte_hole = mincore_hole,
.hugetlb_entry = mincore_hugetlb,
- .mm = vma->vm_mm,
- .vma = vma,
.private = vec,
};
+
+ vma = find_vma(current->mm, addr);
+ if (!vma || addr < vma->vm_start)
+ return -ENOMEM;
+ mincore_walk.vma = vma;
+ mincore_walk.mm = vma->vm_mm;
+
err = walk_page_vma(vma, &mincore_walk);
- if (err < 0)
+ if (err < 0) {
return err;
- else
+ } else {
+ unsigned long end;
+
+ end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
return (end - addr) >> PAGE_SHIFT;
+ }
}
/*
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 7/7] mincore: apply page table walker on do_mincore()
2014-06-12 22:04 ` Andrew Morton
@ 2014-06-12 23:33 ` Naoya Horiguchi
0 siblings, 0 replies; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 23:33 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Dave Hansen, Hugh Dickins, Kirill A. Shutemov,
linux-kernel
On Thu, Jun 12, 2014 at 03:04:43PM -0700, Andrew Morton wrote:
> On Fri, 6 Jun 2014 18:58:40 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
>
> > @@ -233,12 +163,20 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
> >
> > end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
> >
> > - if (is_vm_hugetlb_page(vma))
> > - mincore_hugetlb_page_range(vma, addr, end, vec);
> > + struct mm_walk mincore_walk = {
> > + .pmd_entry = mincore_pmd,
> > + .pte_entry = mincore_pte,
> > + .pte_hole = mincore_hole,
> > + .hugetlb_entry = mincore_hugetlb,
> > + .mm = vma->vm_mm,
> > + .vma = vma,
> > + .private = vec,
> > + };
> > + err = walk_page_vma(vma, &mincore_walk);
> > + if (err < 0)
> > + return err;
> > else
> > - mincore_page_range(vma, addr, end, vec);
> > -
> > - return (end - addr) >> PAGE_SHIFT;
> > + return (end - addr) >> PAGE_SHIFT;
> > }
> >
> > /*
>
> Please review carefully.
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mincore-apply-page-table-walker-on-do_mincore-fix
>
> mm/mincore.c: In function 'do_mincore':
> mm/mincore.c:166: warning: ISO C90 forbids mixed declarations and code
Yes, this warning is fixed in patch 11/11 in ver.2.
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/mincore.c | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff -puN mm/huge_memory.c~mincore-apply-page-table-walker-on-do_mincore-fix mm/huge_memory.c
> diff -puN mm/mincore.c~mincore-apply-page-table-walker-on-do_mincore-fix mm/mincore.c
> --- a/mm/mincore.c~mincore-apply-page-table-walker-on-do_mincore-fix
> +++ a/mm/mincore.c
> @@ -151,32 +151,34 @@ static int mincore_pmd(pmd_t *pmd, unsig
> * all the arguments, we hold the mmap semaphore: we should
> * just return the amount of info we're asked for.
> */
> -static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec)
> +static long do_mincore(unsigned long addr, unsigned long pages,
> + unsigned char *vec)
> {
> struct vm_area_struct *vma;
> - unsigned long end;
> int err;
> -
> - vma = find_vma(current->mm, addr);
> - if (!vma || addr < vma->vm_start)
> - return -ENOMEM;
> -
> - end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
> -
> struct mm_walk mincore_walk = {
> .pmd_entry = mincore_pmd,
> .pte_entry = mincore_pte,
> .pte_hole = mincore_hole,
> .hugetlb_entry = mincore_hugetlb,
> - .mm = vma->vm_mm,
> - .vma = vma,
> .private = vec,
> };
> +
> + vma = find_vma(current->mm, addr);
> + if (!vma || addr < vma->vm_start)
> + return -ENOMEM;
> + mincore_walk.vma = vma;
> + mincore_walk.mm = vma->vm_mm;
> +
> err = walk_page_vma(vma, &mincore_walk);
> - if (err < 0)
> + if (err < 0) {
> return err;
> - else
> + } else {
> + unsigned long end;
> +
> + end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
> return (end - addr) >> PAGE_SHIFT;
> + }
Doesn't this min() with vma->vm_end break the mincore(2)'s behavior?
The return value of do_mincore() are subtracted from pages in sys_mincore
and the while loop is repeated until pages reaches 0, so if users call
mincore(2) over the virtual address range not backed by vma, the return
value is underestimated. I think that in mincore's walk we should set vec
to 0 for such hole region (no vma backed,) so it should be counted in the
return value.
Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 7/7] mincore: apply page table walker on do_mincore()
2014-06-06 22:58 ` [PATCH 7/7] mincore: apply page table walker on do_mincore() Naoya Horiguchi
2014-06-12 22:04 ` Andrew Morton
@ 2014-06-16 15:24 ` Sasha Levin
2014-06-16 16:44 ` Naoya Horiguchi
1 sibling, 1 reply; 18+ messages in thread
From: Sasha Levin @ 2014-06-16 15:24 UTC (permalink / raw)
To: Naoya Horiguchi, linux-mm
Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
linux-kernel, Dave Jones
On 06/06/2014 06:58 PM, Naoya Horiguchi wrote:
> This patch makes do_mincore() use walk_page_vma(), which reduces many lines
> of code by using common page table walk code.
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Hi Naoya,
This patch is causing a few issues on -next:
[ 367.679282] BUG: sleeping function called from invalid context at mm/mincore.c:37
[ 367.683618] in_atomic(): 1, irqs_disabled(): 0, pid: 10386, name: trinity-c13
[ 367.686236] 2 locks held by trinity-c13/10386:
[ 367.688006] #0: (&mm->mmap_sem){++++++}, at: SyS_mincore (mm/mincore.c:161 mm/mincore.c:245 mm/mincore.c:213)
[ 367.693232] #1: (&(ptlock_ptr(page))->rlock){+.+.-.}, at: __walk_page_range (mm/pagewalk.c:209 mm/pagewalk.c:262)
[ 367.698436] Preemption disabled __walk_page_range (mm/pagewalk.c:209 mm/pagewalk.c:262)
[ 367.700421]
[ 367.700803] CPU: 13 PID: 10386 Comm: trinity-c13 Not tainted 3.15.0-next-20140616-sasha-00025-g0fd1f7d #655
[ 367.703605] ffff88010bd08000 ffff88010bddbdd8 ffffffffb7514111 0000000000000002
[ 367.706819] 0000000000000000 ffff88010bddbe08 ffffffffb419ca64 ffff8801b209dc38
[ 367.710416] 00007fcc85400000 0000000000000000 ffff88010bddbef0 ffff88010bddbe38
[ 367.713101] Call Trace:
[ 367.714248] dump_stack (lib/dump_stack.c:52)
[ 367.716428] __might_sleep (kernel/sched/core.c:7080)
[ 367.723608] mincore_hugetlb (mm/mincore.c:37)
[ 367.725609] ? __walk_page_range (mm/pagewalk.c:209 mm/pagewalk.c:262)
[ 367.727712] __walk_page_range (include/linux/spinlock.h:343 mm/pagewalk.c:211 mm/pagewalk.c:262)
[ 367.729098] walk_page_vma (mm/pagewalk.c:376)
[ 367.731343] SyS_mincore (mm/mincore.c:178 mm/mincore.c:245 mm/mincore.c:213)
[ 367.733621] ? mincore_hugetlb (mm/mincore.c:144)
[ 367.735712] ? mincore_hole (mm/mincore.c:110)
[ 367.737022] ? mincore_page (mm/mincore.c:88)
[ 367.739416] ? copy_page_range (mm/mincore.c:24)
[ 367.741634] tracesys (arch/x86/kernel/entry_64.S:542)
And:
[ 391.118663] BUG: unable to handle kernel paging request at ffff880142aca000
[ 391.118663] IP: mincore_hole (mm/mincore.c:99 (discriminator 2))
[ 391.118663] PGD 3bbcd067 PUD 70574e067 PMD 705738067 PTE 8000000142aca060
[ 391.118663] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 391.118663] Dumping ftrace buffer:
[ 391.118663] (ftrace buffer empty)
[ 391.118663] Modules linked in:
[ 391.118663] CPU: 4 PID: 9695 Comm: trinity-c566 Not tainted 3.15.0-next-20140616-sasha-00025-g0fd1f7d #655
[ 391.118663] task: ffff880044a5b000 ti: ffff880044adc000 task.ti: ffff880044adc000
[ 391.118663] RIP: mincore_hole (mm/mincore.c:99 (discriminator 2))
[ 391.118663] RSP: 0000:ffff880044adfd48 EFLAGS: 00010246
[ 391.118663] RAX: 0000000000000000 RBX: 0000000000007137 RCX: 0000005b107d3134
[ 391.118663] RDX: 0000000000000001 RSI: ffffffffb4b2dbe8 RDI: 0000000000000000
[ 391.118663] RBP: ffff880044adfd88 R08: 00000000000163fc R09: 0000000000000000
[ 391.118663] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000200
[ 391.118663] R13: ffff880142aca000 R14: ffff8800cadc0000 R15: 0000000000000001
[ 391.118663] FS: 00007fcc8afe0700(0000) GS:ffff880144e00000(0000) knlGS:0000000000000000
[ 391.118663] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 391.118663] CR2: ffff880142aca000 CR3: 0000000044a76000 CR4: 00000000000006a0
[ 391.118663] Stack:
[ 391.118663] ffff880044adfef0 ffff880142aca000 0000000000000000 00007fcc46a00000
[ 391.118663] 00007fcc4baee000 ffff880044adfef0 00007fcc46800000 ffff8800cad251a0
[ 391.118663] ffff880044adfe38 ffffffffb42d1c8d ffff880000000000 ffffffffb41a0038
[ 391.118663] Call Trace:
[ 391.118663] walk_pgd_range (mm/pagewalk.c:73 mm/pagewalk.c:141 mm/pagewalk.c:170)
[ 391.118663] ? preempt_count_sub (kernel/sched/core.c:2602)
[ 391.118663] __walk_page_range (mm/pagewalk.c:264)
[ 391.118663] ? SyS_mincore (mm/mincore.c:161 mm/mincore.c:245 mm/mincore.c:213)
[ 391.118663] walk_page_vma (mm/pagewalk.c:376)
[ 391.118663] SyS_mincore (mm/mincore.c:178 mm/mincore.c:245 mm/mincore.c:213)
[ 391.118663] ? mincore_hugetlb (mm/mincore.c:144)
[ 391.118663] ? mincore_hole (mm/mincore.c:110)
[ 391.118663] ? mincore_page (mm/mincore.c:88)
[ 391.118663] ? copy_page_range (mm/mincore.c:24)
[ 391.118663] tracesys (arch/x86/kernel/entry_64.S:542)
[ 391.118663] Code: 4d 85 e4 74 57 0f 1f 40 00 49 8b 86 a0 00 00 00 48 89 de 41 83 c7 01 4c 03 6d c8 48 83 c3 01 48 8b b8 f8 01 00 00 e8 ae fe ff ff <41> 88 45 00 4d 63 ef 4d 39 ec 77 d2 48 8b 4d c0 48 8b 49 50 48
All code
========
0: 4d 85 e4 test %r12,%r12
3: 74 57 je 0x5c
5: 0f 1f 40 00 nopl 0x0(%rax)
9: 49 8b 86 a0 00 00 00 mov 0xa0(%r14),%rax
10: 48 89 de mov %rbx,%rsi
13: 41 83 c7 01 add $0x1,%r15d
17: 4c 03 6d c8 add -0x38(%rbp),%r13
1b: 48 83 c3 01 add $0x1,%rbx
1f: 48 8b b8 f8 01 00 00 mov 0x1f8(%rax),%rdi
26: e8 ae fe ff ff callq 0xfffffffffffffed9
2b:* 41 88 45 00 mov %al,0x0(%r13) <-- trapping instruction
2f: 4d 63 ef movslq %r15d,%r13
32: 4d 39 ec cmp %r13,%r12
35: 77 d2 ja 0x9
37: 48 8b 4d c0 mov -0x40(%rbp),%rcx
3b: 48 8b 49 50 mov 0x50(%rcx),%rcx
3f: 48 rex.W
...
Code starting with the faulting instruction
===========================================
0: 41 88 45 00 mov %al,0x0(%r13)
4: 4d 63 ef movslq %r15d,%r13
7: 4d 39 ec cmp %r13,%r12
a: 77 d2 ja 0xffffffffffffffde
c: 48 8b 4d c0 mov -0x40(%rbp),%rcx
10: 48 8b 49 50 mov 0x50(%rcx),%rcx
14: 48 rex.W
...
[ 391.118663] RIP mincore_hole (mm/mincore.c:99 (discriminator 2))
[ 391.118663] RSP <ffff880044adfd48>
[ 391.118663] CR2: ffff880142aca000
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 7/7] mincore: apply page table walker on do_mincore()
2014-06-16 15:24 ` Sasha Levin
@ 2014-06-16 16:44 ` Naoya Horiguchi
2014-06-16 21:14 ` Sasha Levin
0 siblings, 1 reply; 18+ messages in thread
From: Naoya Horiguchi @ 2014-06-16 16:44 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
Kirill A. Shutemov, linux-kernel, Dave Jones
Hi Sasha,
Thanks for bug reporting.
On Mon, Jun 16, 2014 at 11:24:16AM -0400, Sasha Levin wrote:
> On 06/06/2014 06:58 PM, Naoya Horiguchi wrote:
> > This patch makes do_mincore() use walk_page_vma(), which reduces many lines
> > of code by using common page table walk code.
> >
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>
> Hi Naoya,
>
> This patch is causing a few issues on -next:
>
> [ 367.679282] BUG: sleeping function called from invalid context at mm/mincore.c:37
cond_resched() in mincore_hugetlb() triggered this. This is done in common
pagewalk code, so I should have removed it.
...
> And:
>
> [ 391.118663] BUG: unable to handle kernel paging request at ffff880142aca000
> [ 391.118663] IP: mincore_hole (mm/mincore.c:99 (discriminator 2))
walk->pte_hole cannot assume walk->vma != NULL, so I should've checked it
in mincore_hole() before using walk->vma.
Could you try the following fixes?
Thanks,
Naoya Horiguchi
---
diff --git a/mm/mincore.c b/mm/mincore.c
index d8a5e9f62268..3261788369bd 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -34,7 +34,6 @@ static int mincore_hugetlb(pte_t *pte, unsigned long addr,
present = pte && !huge_pte_none(huge_ptep_get(pte));
for (; addr != end; vec++, addr += PAGE_SIZE)
*vec = present;
- cond_resched();
walk->private += (end - addr) >> PAGE_SHIFT;
#else
BUG();
@@ -91,7 +90,7 @@ static int mincore_hole(unsigned long addr, unsigned long end,
unsigned long nr = (end - addr) >> PAGE_SHIFT;
int i;
- if (vma->vm_file) {
+ if (vma && vma->vm_file) {
pgoff_t pgoff;
pgoff = linear_page_index(vma, addr);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 7/7] mincore: apply page table walker on do_mincore()
2014-06-16 16:44 ` Naoya Horiguchi
@ 2014-06-16 21:14 ` Sasha Levin
0 siblings, 0 replies; 18+ messages in thread
From: Sasha Levin @ 2014-06-16 21:14 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
Kirill A. Shutemov, linux-kernel, Dave Jones
On 06/16/2014 12:44 PM, Naoya Horiguchi wrote:
> Hi Sasha,
>
> Thanks for bug reporting.
>
> On Mon, Jun 16, 2014 at 11:24:16AM -0400, Sasha Levin wrote:
>> On 06/06/2014 06:58 PM, Naoya Horiguchi wrote:
>>> This patch makes do_mincore() use walk_page_vma(), which reduces many lines
>>> of code by using common page table walk code.
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>
>> Hi Naoya,
>>
>> This patch is causing a few issues on -next:
>>
>> [ 367.679282] BUG: sleeping function called from invalid context at mm/mincore.c:37
>
> cond_resched() in mincore_hugetlb() triggered this. This is done in common
> pagewalk code, so I should have removed it.
>
> ...
>> And:
>>
>> [ 391.118663] BUG: unable to handle kernel paging request at ffff880142aca000
>> [ 391.118663] IP: mincore_hole (mm/mincore.c:99 (discriminator 2))
>
> walk->pte_hole cannot assume walk->vma != NULL, so I should've checked it
> in mincore_hole() before using walk->vma.
>
> Could you try the following fixes?
That solved those two, but I'm seeing new ones:
[ 650.352956] BUG: unable to handle kernel paging request at ffff8802fdf03000
[ 650.352956] IP: mincore_hole (mm/mincore.c:101 (discriminator 2))
[ 650.352956] PGD 23bcd067 PUD 704b48067 PMD 704958067 PTE 80000002fdf03060
[ 650.352956] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 650.352956] Dumping ftrace buffer:
[ 650.352956] (ftrace buffer empty)
[ 650.352956] Modules linked in:
[ 650.352956] CPU: 12 PID: 15403 Comm: trinity-c363 Tainted: G W 3.15.0-next-20140616-sasha-00025-g0fd1f7d-dirty #657
[ 650.352956] task: ffff88027caf3000 ti: ffff880279d5c000 task.ti: ffff880279d5c000
[ 650.352956] RIP: mincore_hole (mm/mincore.c:101 (discriminator 2))
[ 650.352956] RSP: 0018:ffff880279d5fd48 EFLAGS: 00010202
[ 650.352956] RAX: 0000000000000001 RBX: 00007f2445400000 RCX: 0000000000000000
[ 650.352956] RDX: 0000000000000000 RSI: 00007f2445400000 RDI: 00007f2445200000
[ 650.352956] RBP: ffff880279d5fd88 R08: 0000000000000001 R09: ffff880000000100
[ 650.352956] R10: 0000000000000001 R11: 00007f2444126000 R12: 00007f2480000000
[ 650.352956] R13: ffff8802fdf03000 R14: 0000000000000200 R15: ffff8804e32f2000
[ 650.352956] FS: 00007f24899ec700(0000) GS:ffff8802ff000000(0000) knlGS:0000000000000000
[ 650.352956] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 650.352956] CR2: ffff8802fdf03000 CR3: 000000027c39b000 CR4: 00000000000006a0
[ 650.352956] DR0: 00000000006df000 DR1: 0000000000000000 DR2: 0000000000000000
[ 650.352956] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 650.352956] Stack:
[ 650.352956] ffff880279d5fef0 0000000000000000 0000000000000000 00007f2445400000
[ 650.352956] 00007f2480000000 ffff880279d5fef0 00007f2445200000 ffff8804e30fd148
[ 650.352956] ffff880279d5fe38 ffffffff9c2d1c4d ffff880200000000 ffffffff9c1a0038
[ 650.352956] Call Trace:
[ 650.352956] walk_pgd_range (mm/pagewalk.c:73 mm/pagewalk.c:141 mm/pagewalk.c:170)
[ 650.352956] ? preempt_count_sub (kernel/sched/core.c:2602)
[ 650.352956] __walk_page_range (mm/pagewalk.c:264)
[ 650.352956] ? SyS_mincore (mm/mincore.c:160 mm/mincore.c:244 mm/mincore.c:212)
[ 650.352956] walk_page_vma (mm/pagewalk.c:376)
[ 650.352956] SyS_mincore (mm/mincore.c:177 mm/mincore.c:244 mm/mincore.c:212)
[ 650.352956] ? mincore_hugetlb (mm/mincore.c:143)
[ 650.352956] ? mincore_hole (mm/mincore.c:109)
[ 650.352956] ? mincore_page (mm/mincore.c:87)
[ 650.352956] ? copy_page_range (mm/mincore.c:24)
[ 650.352956] tracesys (arch/x86/kernel/entry_64.S:542)
[ 650.352956] Code: 87 a0 00 00 00 48 83 c3 01 48 8b b8 f8 01 00 00 e8 ab fe ff ff 48 8b 55 c8 88 02 49 63 c4 49 39 c6 77 cd eb 14 0f 1f 00 83 c0 01 <41> c6 44 15 00 00 48 63 d0 49 39 d6 77 ef 48 8b 55 c0 4c 8b 6a
All code
========
0: 87 a0 00 00 00 48 xchg %esp,0x48000000(%rax)
6: 83 c3 01 add $0x1,%ebx
9: 48 8b b8 f8 01 00 00 mov 0x1f8(%rax),%rdi
10: e8 ab fe ff ff callq 0xfffffffffffffec0
15: 48 8b 55 c8 mov -0x38(%rbp),%rdx
19: 88 02 mov %al,(%rdx)
1b: 49 63 c4 movslq %r12d,%rax
1e: 49 39 c6 cmp %rax,%r14
21: 77 cd ja 0xfffffffffffffff0
23: eb 14 jmp 0x39
25: 0f 1f 00 nopl (%rax)
28: 83 c0 01 add $0x1,%eax
2b:* 41 c6 44 15 00 00 movb $0x0,0x0(%r13,%rdx,1) <-- trapping instruction
31: 48 63 d0 movslq %eax,%rdx
34: 49 39 d6 cmp %rdx,%r14
37: 77 ef ja 0x28
39: 48 8b 55 c0 mov -0x40(%rbp),%rdx
3d: 4c 8b 6a 00 mov 0x0(%rdx),%r13
Code starting with the faulting instruction
===========================================
0: 41 c6 44 15 00 00 movb $0x0,0x0(%r13,%rdx,1)
6: 48 63 d0 movslq %eax,%rdx
9: 49 39 d6 cmp %rdx,%r14
c: 77 ef ja 0xfffffffffffffffd
e: 48 8b 55 c0 mov -0x40(%rbp),%rdx
12: 4c 8b 6a 00 mov 0x0(%rdx),%r13
[ 650.352956] RIP mincore_hole (mm/mincore.c:101 (discriminator 2))
[ 650.352956] RSP <ffff880279d5fd48>
[ 650.352956] CR2: ffff8802fdf03000
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread