linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/2] Soft-dirty page tracker improvemens
@ 2013-07-30 20:41 Cyrill Gorcunov
  2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
  2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov
  0 siblings, 2 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-07-30 20:41 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

Hi, as being reported by Andy, there are a couple of situations
when soft-dirty bit will be lost, in paricular when page we're
tracking is going to swap and when file page get reclaimed. In
this series both problems are aimed.

One more hardness which remains is the scenario when vma area
(which has soft-dirty bit set in appropriate pte entries) get
unmapped then new one mapped in-place. I'm working on it now
hope to provide a patch soon.

Thanks,
	Cyrill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-07-30 20:41 [patch 0/2] Soft-dirty page tracker improvemens Cyrill Gorcunov
@ 2013-07-30 20:41 ` Cyrill Gorcunov
  2013-07-31  8:16   ` Pavel Emelyanov
                     ` (5 more replies)
  2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov
  1 sibling, 6 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-07-30 20:41 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

[-- Attachment #1: pte-sft-dirty-swap-4 --]
[-- Type: text/plain, Size: 9111 bytes --]

Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
bit set get swapped out, the bit is getting lost and no longer
available when pte read back.

To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
saved in pte entry for the page being swapped out. When such page
is to be read back from a swap cache we check for bit presence
and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
bit back.

One of the problem was to find a place in pte entry where we can
save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
_PAGE_PSE was chosen for that, it doesn't intersect with swap
entry format stored in pte.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
 arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
 fs/proc/task_mmu.c                   |   21 +++++++++++++++------
 include/asm-generic/pgtable.h        |   15 +++++++++++++++
 include/linux/swapops.h              |    2 ++
 mm/memory.c                          |    2 ++
 mm/rmap.c                            |    6 +++++-
 mm/swapfile.c                        |   19 +++++++++++++++++--
 8 files changed, 84 insertions(+), 9 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
 }
 
+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
+{
+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
+}
+
+static inline int pte_swp_soft_dirty(pte_t pte)
+{
+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
+}
+
+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
+{
+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
+}
+
 /*
  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
  * can use those bits for other purposes, so leave them be.
Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
@@ -67,6 +67,19 @@
 #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
 #endif
 
+/*
+ * Tracking soft dirty bit when a page goes to a swap is tricky.
+ * We need a bit which can be stored in pte _and_ not conflict
+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
+ * into swap entry computation, but bit 6 is used for nonlinear
+ * file mapping, so we borrow bit 7 for soft dirty tracking.
+ */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
+#else
+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
+#endif
+
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
 #else
Index: linux-2.6.git/fs/proc/task_mmu.c
===================================================================
--- linux-2.6.git.orig/fs/proc/task_mmu.c
+++ linux-2.6.git/fs/proc/task_mmu.c
@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
 	 * of how soft-dirty works.
 	 */
 	pte_t ptent = *pte;
-	ptent = pte_wrprotect(ptent);
-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
+
+	if (pte_present(ptent)) {
+		ptent = pte_wrprotect(ptent);
+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
+	} else if (is_swap_pte(ptent)) {
+		ptent = pte_swp_clear_soft_dirty(ptent);
+	}
+
 	set_pte_at(vma->vm_mm, addr, pte, ptent);
 #endif
 }
@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		ptent = *pte;
-		if (!pte_present(ptent))
-			continue;
 
 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
 			clear_soft_dirty(vma, addr, pte);
 			continue;
 		}
 
+		if (!pte_present(ptent))
+			continue;
+
 		page = vm_normal_page(vma, addr, ptent);
 		if (!page)
 			continue;
@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
 		flags = PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 	} else if (is_swap_pte(pte)) {
-		swp_entry_t entry = pte_to_swp_entry(pte);
-
+		swp_entry_t entry;
+		if (pte_swp_soft_dirty(pte))
+			flags2 |= __PM_SOFT_DIRTY;
+		entry = pte_to_swp_entry(pte);
 		frame = swp_type(entry) |
 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
 		flags = PM_SWAP;
Index: linux-2.6.git/include/asm-generic/pgtable.h
===================================================================
--- linux-2.6.git.orig/include/asm-generic/pgtable.h
+++ linux-2.6.git/include/asm-generic/pgtable.h
@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
 {
 	return pmd;
 }
+
+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
+{
+	return pte;
+}
+
+static inline int pte_swp_soft_dirty(pte_t pte)
+{
+	return 0;
+}
+
+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
+{
+	return pte;
+}
 #endif
 
 #ifndef __HAVE_PFNMAP_TRACKING
Index: linux-2.6.git/include/linux/swapops.h
===================================================================
--- linux-2.6.git.orig/include/linux/swapops.h
+++ linux-2.6.git/include/linux/swapops.h
@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
 	swp_entry_t arch_entry;
 
 	BUG_ON(pte_file(pte));
+	if (pte_swp_soft_dirty(pte))
+		pte = pte_swp_clear_soft_dirty(pte);
 	arch_entry = __pte_to_swp_entry(pte);
 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
 }
Index: linux-2.6.git/mm/memory.c
===================================================================
--- linux-2.6.git.orig/mm/memory.c
+++ linux-2.6.git/mm/memory.c
@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
 		exclusive = 1;
 	}
 	flush_icache_page(vma, page);
+	if (pte_swp_soft_dirty(orig_pte))
+		pte = pte_mksoft_dirty(pte);
 	set_pte_at(mm, address, page_table, pte);
 	if (page == swapcache)
 		do_page_add_anon_rmap(page, vma, address, exclusive);
Index: linux-2.6.git/mm/rmap.c
===================================================================
--- linux-2.6.git.orig/mm/rmap.c
+++ linux-2.6.git/mm/rmap.c
@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
 			   swp_entry_to_pte(make_hwpoison_entry(page)));
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
+		pte_t swp_pte;
 
 		if (PageSwapCache(page)) {
 			/*
@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
 			entry = make_migration_entry(page, pte_write(pteval));
 		}
-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
+		swp_pte = swp_entry_to_pte(entry);
+		if (pte_soft_dirty(pteval))
+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
+		set_pte_at(mm, address, pte, swp_pte);
 		BUG_ON(pte_file(*pte));
 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
Index: linux-2.6.git/mm/swapfile.c
===================================================================
--- linux-2.6.git.orig/mm/swapfile.c
+++ linux-2.6.git/mm/swapfile.c
@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
 }
 #endif /* CONFIG_HIBERNATION */
 
+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
+{
+#ifdef CONFIG_MEM_SOFT_DIRTY
+	/*
+	 * When pte keeps soft dirty bit the pte generated
+	 * from swap entry does not has it, still it's same
+	 * pte from logical point of view.
+	 */
+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
+#else
+	return pte_same(pte, swp_pte);
+#endif
+}
+
 /*
  * No need to decide whether this PTE shares the swap entry with others,
  * just let do_wp_page work it out if a write is requested later - to
@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
 	}
 
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
 		mem_cgroup_cancel_charge_swapin(memcg);
 		ret = 0;
 		goto out;
@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
 		 * swapoff spends a _lot_ of time in this loop!
 		 * Test inline before going to call unuse_pte.
 		 */
-		if (unlikely(pte_same(*pte, swp_pte))) {
+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
 			pte_unmap(pte);
 			ret = unuse_pte(vma, pmd, addr, entry, page);
 			if (ret)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-07-30 20:41 [patch 0/2] Soft-dirty page tracker improvemens Cyrill Gorcunov
  2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
@ 2013-07-30 20:41 ` Cyrill Gorcunov
  2013-07-31  8:16   ` Pavel Emelyanov
  2013-08-07 20:28   ` Andrew Morton
  1 sibling, 2 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-07-30 20:41 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

[-- Attachment #1: pte-sft-dirty-file-2 --]
[-- Type: text/plain, Size: 9938 bytes --]

Andy reported that if file page get reclaimed we loose soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address
get encoded into pte entry. Thus when #pf happens on such non-present
pte we can restore it back.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/pgtable-2level.h |   48 +++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/pgtable-3level.h |    3 ++
 arch/x86/include/asm/pgtable.h        |   15 ++++++++++
 arch/x86/include/asm/pgtable_types.h  |    4 ++
 fs/proc/task_mmu.c                    |    2 +
 include/asm-generic/pgtable.h         |   15 ++++++++++
 mm/fremap.c                           |   11 +++++--
 mm/memory.c                           |   11 +++++--
 mm/rmap.c                             |    8 ++++-
 9 files changed, 107 insertions(+), 10 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
@@ -55,9 +55,53 @@ static inline pmd_t native_pmdp_get_and_
 #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
 #endif
 
+#ifdef CONFIG_MEM_SOFT_DIRTY
+
+/*
+ * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE, _PAGE_BIT_SOFT_DIRTY and
+ * _PAGE_BIT_PROTNONE are taken, split up the 28 bits of offset
+ * into this range.
+ */
+#define PTE_FILE_MAX_BITS	28
+#define PTE_FILE_SHIFT1		(_PAGE_BIT_PRESENT + 1)
+#define PTE_FILE_SHIFT2		(_PAGE_BIT_FILE + 1)
+#define PTE_FILE_SHIFT3		(_PAGE_BIT_PROTNONE + 1)
+#define PTE_FILE_SHIFT4		(_PAGE_BIT_SOFT_DIRTY + 1)
+#define PTE_FILE_BITS1		(PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1)
+#define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
+#define PTE_FILE_BITS3		(PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1)
+
+#define pte_to_pgoff(pte)						\
+	((((pte).pte_low >> (PTE_FILE_SHIFT1))				\
+	  & ((1U << PTE_FILE_BITS1) - 1)))				\
+	+ ((((pte).pte_low >> (PTE_FILE_SHIFT2))			\
+	    & ((1U << PTE_FILE_BITS2) - 1))				\
+	   << (PTE_FILE_BITS1))						\
+	+ ((((pte).pte_low >> (PTE_FILE_SHIFT3))			\
+	    & ((1U << PTE_FILE_BITS3) - 1))				\
+	   << (PTE_FILE_BITS1 + PTE_FILE_BITS2))			\
+	+ ((((pte).pte_low >> (PTE_FILE_SHIFT4)))			\
+	    << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))
+
+#define pgoff_to_pte(off)						\
+	((pte_t) { .pte_low =						\
+	 ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1)	\
+	 + ((((off) >> PTE_FILE_BITS1)					\
+	     & ((1U << PTE_FILE_BITS2) - 1))				\
+	    << PTE_FILE_SHIFT2)						\
+	 + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2))		\
+	     & ((1U << PTE_FILE_BITS3) - 1))				\
+	    << PTE_FILE_SHIFT3)						\
+	 + ((((off) >>							\
+	      (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)))	\
+	    << PTE_FILE_SHIFT4)						\
+	 + _PAGE_FILE })
+
+#else /* CONFIG_MEM_SOFT_DIRTY */
+
 /*
  * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken,
- * split up the 29 bits of offset into this range:
+ * split up the 29 bits of offset into this range.
  */
 #define PTE_FILE_MAX_BITS	29
 #define PTE_FILE_SHIFT1		(_PAGE_BIT_PRESENT + 1)
@@ -88,6 +132,8 @@ static inline pmd_t native_pmdp_get_and_
 	    << PTE_FILE_SHIFT3)						\
 	 + _PAGE_FILE })
 
+#endif /* CONFIG_MEM_SOFT_DIRTY */
+
 /* Encode and de-code a swap entry */
 #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
Index: linux-2.6.git/arch/x86/include/asm/pgtable-3level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-3level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-3level.h
@@ -179,6 +179,9 @@ static inline pmd_t native_pmdp_get_and_
 /*
  * Bits 0, 6 and 7 are taken in the low part of the pte,
  * put the 32 bits of offset into the high part.
+ *
+ * For soft-dirty tracking 11 bit is taken from
+ * the low part of pte as well.
  */
 #define pte_to_pgoff(pte) ((pte).pte_high)
 #define pgoff_to_pte(off)						\
Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
@@ -329,6 +329,21 @@ static inline pte_t pte_swp_clear_soft_d
 	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
 }
 
+static inline pte_t pte_file_clear_soft_dirty(pte_t pte)
+{
+	return pte_clear_flags(pte, _PAGE_SOFT_DIRTY);
+}
+
+static inline pte_t pte_file_mksoft_dirty(pte_t pte)
+{
+	return pte_set_flags(pte, _PAGE_SOFT_DIRTY);
+}
+
+static inline int pte_file_soft_dirty(pte_t pte)
+{
+	return pte_flags(pte) & _PAGE_SOFT_DIRTY;
+}
+
 /*
  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
  * can use those bits for other purposes, so leave them be.
Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
@@ -61,8 +61,10 @@
  * they do not conflict with each other.
  */
 
+#define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_HIDDEN
+
 #ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_HIDDEN)
+#define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY)
 #else
 #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
 #endif
Index: linux-2.6.git/fs/proc/task_mmu.c
===================================================================
--- linux-2.6.git.orig/fs/proc/task_mmu.c
+++ linux-2.6.git/fs/proc/task_mmu.c
@@ -736,6 +736,8 @@ static inline void clear_soft_dirty(stru
 		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
 	} else if (is_swap_pte(ptent)) {
 		ptent = pte_swp_clear_soft_dirty(ptent);
+	} else if (pte_file(ptent)) {
+		ptent = pte_file_clear_soft_dirty(ptent);
 	}
 
 	set_pte_at(vma->vm_mm, addr, pte, ptent);
Index: linux-2.6.git/include/asm-generic/pgtable.h
===================================================================
--- linux-2.6.git.orig/include/asm-generic/pgtable.h
+++ linux-2.6.git/include/asm-generic/pgtable.h
@@ -432,6 +432,21 @@ static inline pte_t pte_swp_clear_soft_d
 {
 	return pte;
 }
+
+static inline pte_t pte_file_clear_soft_dirty(pte_t pte)
+{
+       return pte;
+}
+
+static inline pte_t pte_file_mksoft_dirty(pte_t pte)
+{
+       return pte;
+}
+
+static inline int pte_file_soft_dirty(pte_t pte)
+{
+       return 0;
+}
 #endif
 
 #ifndef __HAVE_PFNMAP_TRACKING
Index: linux-2.6.git/mm/fremap.c
===================================================================
--- linux-2.6.git.orig/mm/fremap.c
+++ linux-2.6.git/mm/fremap.c
@@ -57,17 +57,22 @@ static int install_file_pte(struct mm_st
 		unsigned long addr, unsigned long pgoff, pgprot_t prot)
 {
 	int err = -ENOMEM;
-	pte_t *pte;
+	pte_t *pte, ptfile;
 	spinlock_t *ptl;
 
 	pte = get_locked_pte(mm, addr, &ptl);
 	if (!pte)
 		goto out;
 
-	if (!pte_none(*pte))
+	ptfile = pgoff_to_pte(pgoff);
+
+	if (!pte_none(*pte)) {
+		if (pte_present(*pte) && pte_soft_dirty(*pte))
+			pte_file_mksoft_dirty(ptfile);
 		zap_pte(mm, vma, addr, pte);
+	}
 
-	set_pte_at(mm, addr, pte, pgoff_to_pte(pgoff));
+	set_pte_at(mm, addr, pte, ptfile);
 	/*
 	 * We don't need to run update_mmu_cache() here because the "file pte"
 	 * being installed by install_file_pte() is not a real pte - it's a
Index: linux-2.6.git/mm/memory.c
===================================================================
--- linux-2.6.git.orig/mm/memory.c
+++ linux-2.6.git/mm/memory.c
@@ -1141,9 +1141,12 @@ again:
 				continue;
 			if (unlikely(details) && details->nonlinear_vma
 			    && linear_page_index(details->nonlinear_vma,
-						addr) != page->index)
-				set_pte_at(mm, addr, pte,
-					   pgoff_to_pte(page->index));
+						addr) != page->index) {
+				pte_t ptfile = pgoff_to_pte(page->index);
+				if (pte_soft_dirty(ptent))
+					pte_file_mksoft_dirty(ptfile);
+				set_pte_at(mm, addr, pte, ptfile);
+			}
 			if (PageAnon(page))
 				rss[MM_ANONPAGES]--;
 			else {
@@ -3410,6 +3413,8 @@ static int __do_fault(struct mm_struct *
 		entry = mk_pte(page, vma->vm_page_prot);
 		if (flags & FAULT_FLAG_WRITE)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		else if (pte_file(orig_pte) && pte_file_soft_dirty(orig_pte))
+			pte_mksoft_dirty(entry);
 		if (anon) {
 			inc_mm_counter_fast(mm, MM_ANONPAGES);
 			page_add_new_anon_rmap(page, vma, address);
Index: linux-2.6.git/mm/rmap.c
===================================================================
--- linux-2.6.git.orig/mm/rmap.c
+++ linux-2.6.git/mm/rmap.c
@@ -1405,8 +1405,12 @@ static int try_to_unmap_cluster(unsigned
 		pteval = ptep_clear_flush(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
-		if (page->index != linear_page_index(vma, address))
-			set_pte_at(mm, address, pte, pgoff_to_pte(page->index));
+		if (page->index != linear_page_index(vma, address)) {
+			pte_t ptfile = pgoff_to_pte(page->index);
+			if (pte_soft_dirty(pteval))
+				pte_file_mksoft_dirty(ptfile);
+			set_pte_at(mm, address, pte, ptfile);
+		}
 
 		/* Move the dirty bit to the physical page now the pte is gone. */
 		if (pte_dirty(pteval))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
@ 2013-07-31  8:16   ` Pavel Emelyanov
  2013-08-01  0:51   ` Minchan Kim
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: Pavel Emelyanov @ 2013-07-31  8:16 UTC (permalink / raw)
  To: Cyrill Gorcunov, akpm
  Cc: linux-mm, linux-kernel, luto, gorcunov, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On 07/31/2013 12:41 AM, Cyrill Gorcunov wrote:

> Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> bit set get swapped out, the bit is getting lost and no longer
> available when pte read back.
> 
> To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> saved in pte entry for the page being swapped out. When such page
> is to be read back from a swap cache we check for bit presence
> and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> bit back.
> 
> One of the problem was to find a place in pte entry where we can
> save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> _PAGE_PSE was chosen for that, it doesn't intersect with swap
> entry format stored in pte.
> 
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>

Acked-by: Pavel Emelyanov <xemul@parallels.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov
@ 2013-07-31  8:16   ` Pavel Emelyanov
  2013-08-07 20:28   ` Andrew Morton
  1 sibling, 0 replies; 34+ messages in thread
From: Pavel Emelyanov @ 2013-07-31  8:16 UTC (permalink / raw)
  To: Cyrill Gorcunov, akpm
  Cc: linux-mm, linux-kernel, luto, gorcunov, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On 07/31/2013 12:41 AM, Cyrill Gorcunov wrote:

> Andy reported that if file page get reclaimed we loose soft-dirty bit
> if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address
> get encoded into pte entry. Thus when #pf happens on such non-present
> pte we can restore it back.
> 
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>

Acked-by: Pavel Emelyanov <xemul@parallels.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
  2013-07-31  8:16   ` Pavel Emelyanov
@ 2013-08-01  0:51   ` Minchan Kim
  2013-08-01  5:53     ` Cyrill Gorcunov
  2013-08-05  1:48   ` Wanpeng Li
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2013-08-01  0:51 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm,
	xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

Hello,

On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
> Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> bit set get swapped out, the bit is getting lost and no longer
> available when pte read back.
> 
> To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> saved in pte entry for the page being swapped out. When such page
> is to be read back from a swap cache we check for bit presence
> and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> bit back.
> 
> One of the problem was to find a place in pte entry where we can
> save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> _PAGE_PSE was chosen for that, it doesn't intersect with swap
> entry format stored in pte.
> 
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matt Mackall <mpm@selenic.com>
> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
>  arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
>  fs/proc/task_mmu.c                   |   21 +++++++++++++++------
>  include/asm-generic/pgtable.h        |   15 +++++++++++++++
>  include/linux/swapops.h              |    2 ++
>  mm/memory.c                          |    2 ++
>  mm/rmap.c                            |    6 +++++-
>  mm/swapfile.c                        |   19 +++++++++++++++++--
>  8 files changed, 84 insertions(+), 9 deletions(-)
> 
> Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
> +++ linux-2.6.git/arch/x86/include/asm/pgtable.h
> @@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>  	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>  }
>  
> +static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> +{
> +	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> +}
> +
> +static inline int pte_swp_soft_dirty(pte_t pte)
> +{
> +	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
> +}
> +
> +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> +{
> +	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> +}
> +
>  /*
>   * Mask out unsupported bits in a present pgprot.  Non-present pgprots
>   * can use those bits for other purposes, so leave them be.
> Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
> +++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> @@ -67,6 +67,19 @@
>  #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
>  #endif
>  
> +/*
> + * Tracking soft dirty bit when a page goes to a swap is tricky.
> + * We need a bit which can be stored in pte _and_ not conflict
> + * with swap entry format. On x86 bits 6 and 7 are *not* involved
> + * into swap entry computation, but bit 6 is used for nonlinear
> + * file mapping, so we borrow bit 7 for soft dirty tracking.
> + */
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> +#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
> +#else
> +#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
> +#endif
> +
>  #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>  #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
>  #else
> Index: linux-2.6.git/fs/proc/task_mmu.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/task_mmu.c
> +++ linux-2.6.git/fs/proc/task_mmu.c
> @@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>  	 * of how soft-dirty works.
>  	 */
>  	pte_t ptent = *pte;
> -	ptent = pte_wrprotect(ptent);
> -	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> +
> +	if (pte_present(ptent)) {
> +		ptent = pte_wrprotect(ptent);
> +		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> +	} else if (is_swap_pte(ptent)) {
> +		ptent = pte_swp_clear_soft_dirty(ptent);
> +	}
> +
>  	set_pte_at(vma->vm_mm, addr, pte, ptent);
>  #endif
>  }
> @@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>  	for (; addr != end; pte++, addr += PAGE_SIZE) {
>  		ptent = *pte;
> -		if (!pte_present(ptent))
> -			continue;
>  
>  		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>  			clear_soft_dirty(vma, addr, pte);
>  			continue;
>  		}
>  
> +		if (!pte_present(ptent))
> +			continue;
> +
>  		page = vm_normal_page(vma, addr, ptent);
>  		if (!page)
>  			continue;
> @@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>  		flags = PM_PRESENT;
>  		page = vm_normal_page(vma, addr, pte);
>  	} else if (is_swap_pte(pte)) {
> -		swp_entry_t entry = pte_to_swp_entry(pte);
> -
> +		swp_entry_t entry;
> +		if (pte_swp_soft_dirty(pte))
> +			flags2 |= __PM_SOFT_DIRTY;
> +		entry = pte_to_swp_entry(pte);
>  		frame = swp_type(entry) |
>  			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>  		flags = PM_SWAP;
> Index: linux-2.6.git/include/asm-generic/pgtable.h
> ===================================================================
> --- linux-2.6.git.orig/include/asm-generic/pgtable.h
> +++ linux-2.6.git/include/asm-generic/pgtable.h
> @@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>  {
>  	return pmd;
>  }
> +
> +static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> +{
> +	return pte;
> +}
> +
> +static inline int pte_swp_soft_dirty(pte_t pte)
> +{
> +	return 0;
> +}
> +
> +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> +{
> +	return pte;
> +}
>  #endif
>  
>  #ifndef __HAVE_PFNMAP_TRACKING
> Index: linux-2.6.git/include/linux/swapops.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/swapops.h
> +++ linux-2.6.git/include/linux/swapops.h
> @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>  	swp_entry_t arch_entry;
>  
>  	BUG_ON(pte_file(pte));
> +	if (pte_swp_soft_dirty(pte))
> +		pte = pte_swp_clear_soft_dirty(pte);

Why do you remove soft-dirty flag whenever pte_to_swp_entry is called?
Isn't there any problem if we use mincore?

>  	arch_entry = __pte_to_swp_entry(pte);
>  	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>  }
> Index: linux-2.6.git/mm/memory.c
> ===================================================================
> --- linux-2.6.git.orig/mm/memory.c
> +++ linux-2.6.git/mm/memory.c
> @@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>  		exclusive = 1;
>  	}
>  	flush_icache_page(vma, page);
> +	if (pte_swp_soft_dirty(orig_pte))
> +		pte = pte_mksoft_dirty(pte);
>  	set_pte_at(mm, address, page_table, pte);
>  	if (page == swapcache)
>  		do_page_add_anon_rmap(page, vma, address, exclusive);
> Index: linux-2.6.git/mm/rmap.c
> ===================================================================
> --- linux-2.6.git.orig/mm/rmap.c
> +++ linux-2.6.git/mm/rmap.c
> @@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>  			   swp_entry_to_pte(make_hwpoison_entry(page)));
>  	} else if (PageAnon(page)) {
>  		swp_entry_t entry = { .val = page_private(page) };
> +		pte_t swp_pte;
>  
>  		if (PageSwapCache(page)) {
>  			/*
> @@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>  			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>  			entry = make_migration_entry(page, pte_write(pteval));
>  		}
> -		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
> +		swp_pte = swp_entry_to_pte(entry);
> +		if (pte_soft_dirty(pteval))
> +			swp_pte = pte_swp_mksoft_dirty(swp_pte);
> +		set_pte_at(mm, address, pte, swp_pte);
>  		BUG_ON(pte_file(*pte));
>  	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
>  		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
> Index: linux-2.6.git/mm/swapfile.c
> ===================================================================
> --- linux-2.6.git.orig/mm/swapfile.c
> +++ linux-2.6.git/mm/swapfile.c
> @@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>  }
>  #endif /* CONFIG_HIBERNATION */
>  
> +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)

Nitpick.
If maybe_same_pte is used widely, it looks good to me
but it's used for only swapoff at the moment so I think pte_swap_same
would be better name.

> +{
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> +	/*
> +	 * When pte keeps soft dirty bit the pte generated
> +	 * from swap entry does not has it, still it's same
> +	 * pte from logical point of view.
> +	 */
> +	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
> +	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
> +#else
> +	return pte_same(pte, swp_pte);
> +#endif
> +}
> +
>  /*
>   * No need to decide whether this PTE shares the swap entry with others,
>   * just let do_wp_page work it out if a write is requested later - to
> @@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>  	}
>  
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> -	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
> +	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>  		mem_cgroup_cancel_charge_swapin(memcg);
>  		ret = 0;
>  		goto out;
> @@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>  		 * swapoff spends a _lot_ of time in this loop!
>  		 * Test inline before going to call unuse_pte.
>  		 */
> -		if (unlikely(pte_same(*pte, swp_pte))) {
> +		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>  			pte_unmap(pte);
>  			ret = unuse_pte(vma, pmd, addr, entry, page);
>  			if (ret)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-01  0:51   ` Minchan Kim
@ 2013-08-01  5:53     ` Cyrill Gorcunov
  2013-08-01  6:16       ` Minchan Kim
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-01  5:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On Thu, Aug 01, 2013 at 09:51:32AM +0900, Minchan Kim wrote:
> > Index: linux-2.6.git/include/linux/swapops.h
> > ===================================================================
> > --- linux-2.6.git.orig/include/linux/swapops.h
> > +++ linux-2.6.git/include/linux/swapops.h
> > @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> >  	swp_entry_t arch_entry;
> >  
> >  	BUG_ON(pte_file(pte));
> > +	if (pte_swp_soft_dirty(pte))
> > +		pte = pte_swp_clear_soft_dirty(pte);
> 
> Why do you remove soft-dirty flag whenever pte_to_swp_entry is called?
> Isn't there any problem if we use mincore?

No, there is no problem. pte_to_swp_entry caller when we know that pte
we're decoding is having swap format (except the case in swap code which
figures out the number of bits allowed for offset). Still since this bit
is set on "higher" level than __swp_type/__swp_offset helpers it should
be cleaned before the value from pte comes to "one level down" helpers
function.

> > +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
> 
> Nitpick.
> If maybe_same_pte is used widely, it looks good to me
> but it's used for only swapoff at the moment so I think pte_swap_same
> would be better name.

I don't see much difference, but sure, lets rename it on top once series
in -mm tree, sounds good?

	Cyrill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-01  5:53     ` Cyrill Gorcunov
@ 2013-08-01  6:16       ` Minchan Kim
  2013-08-01  6:28         ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2013-08-01  6:16 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On Thu, Aug 01, 2013 at 09:53:03AM +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 01, 2013 at 09:51:32AM +0900, Minchan Kim wrote:
> > > Index: linux-2.6.git/include/linux/swapops.h
> > > ===================================================================
> > > --- linux-2.6.git.orig/include/linux/swapops.h
> > > +++ linux-2.6.git/include/linux/swapops.h
> > > @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> > >  	swp_entry_t arch_entry;
> > >  
> > >  	BUG_ON(pte_file(pte));
> > > +	if (pte_swp_soft_dirty(pte))
> > > +		pte = pte_swp_clear_soft_dirty(pte);
> > 
> > Why do you remove soft-dirty flag whenever pte_to_swp_entry is called?
> > Isn't there any problem if we use mincore?
> 
> No, there is no problem. pte_to_swp_entry caller when we know that pte
> we're decoding is having swap format (except the case in swap code which
> figures out the number of bits allowed for offset). Still since this bit
> is set on "higher" level than __swp_type/__swp_offset helpers it should
> be cleaned before the value from pte comes to "one level down" helpers
> function.

I don't get it. Could you correct me with below example?

Process A context
        try_to_unmap
                swp_pte = swp_entry_to_pte /* change generic swp into arch swap */
                swp_pte = pte_swp_mksoft_dirty(swp_pte);
                set_pte_at(, swp_pte);

Process A context
        ..
        mincore_pte_range
                pte_to_swp_entry
                        pte = pte_swp_clear_soft_dirty  <=== 1)
                        change arch swp with generic swp
                mincore_page 

Process B want to know dirty state of the page
        ..
        pagemap_read
        pte_to_pagemap_entry
        is_swap_pte
                if (pte_swap_soft_dirty(pte)) <=== but failed by 1)

So, Process B can't get the dirty status from process A's the page.

> 
> > > +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
> > 
> > Nitpick.
> > If maybe_same_pte is used widely, it looks good to me
> > but it's used for only swapoff at the moment so I think pte_swap_same
> > would be better name.
> 
> I don't see much difference, but sure, lets rename it on top once series
> in -mm tree, sounds good?
> 
> 	Cyrill
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-01  6:16       ` Minchan Kim
@ 2013-08-01  6:28         ` Cyrill Gorcunov
  2013-08-01  6:37           ` Minchan Kim
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-01  6:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On Thu, Aug 01, 2013 at 03:16:32PM +0900, Minchan Kim wrote:
> 
> I don't get it. Could you correct me with below example?
> 
> Process A context
>         try_to_unmap
>                 swp_pte = swp_entry_to_pte /* change generic swp into arch swap */
>                 swp_pte = pte_swp_mksoft_dirty(swp_pte);
>                 set_pte_at(, swp_pte);
> 
> Process A context
>         ..
>         mincore_pte_range
		pte_t pte = *ptep;	<-- local copy of the pte value, in memory it remains the same
						with swap softdirty bit set
>                 pte_to_swp_entry
>                         pte = pte_swp_clear_soft_dirty  <=== 1)
>                         change arch swp with generic swp
>                 mincore_page 
> 
> Process B want to know dirty state of the page
>         ..
>         pagemap_read
>         pte_to_pagemap_entry
>         is_swap_pte
>                 if (pte_swap_soft_dirty(pte)) <=== but failed by 1)
> 
> So, Process B can't get the dirty status from process A's the page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-01  6:28         ` Cyrill Gorcunov
@ 2013-08-01  6:37           ` Minchan Kim
  2013-08-01  6:38             ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2013-08-01  6:37 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On Thu, Aug 01, 2013 at 10:28:14AM +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 01, 2013 at 03:16:32PM +0900, Minchan Kim wrote:
> > 
> > I don't get it. Could you correct me with below example?
> > 
> > Process A context
> >         try_to_unmap
> >                 swp_pte = swp_entry_to_pte /* change generic swp into arch swap */
> >                 swp_pte = pte_swp_mksoft_dirty(swp_pte);
> >                 set_pte_at(, swp_pte);
> > 
> > Process A context
> >         ..
> >         mincore_pte_range
> 		pte_t pte = *ptep;	<-- local copy of the pte value, in memory it remains the same
> 						with swap softdirty bit set

Argh, I missed that. Thank you!

Reviewed-by: Minchan Kim <minchan@kernel.org>
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-01  6:37           ` Minchan Kim
@ 2013-08-01  6:38             ` Cyrill Gorcunov
  0 siblings, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-01  6:38 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On Thu, Aug 01, 2013 at 03:37:06PM +0900, Minchan Kim wrote:
> 
> Reviewed-by: Minchan Kim <minchan@kernel.org>

Thanks a lot for review, Minchan!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
  2013-07-31  8:16   ` Pavel Emelyanov
  2013-08-01  0:51   ` Minchan Kim
@ 2013-08-05  1:48   ` Wanpeng Li
  2013-08-05  1:48   ` Wanpeng Li
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05  1:48 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm,
	xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>bit set get swapped out, the bit is getting lost and no longer
>available when pte read back.
>
>To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>saved in pte entry for the page being swapped out. When such page
>is to be read back from a swap cache we check for bit presence
>and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>bit back.
>
>One of the problem was to find a place in pte entry where we can
>save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>_PAGE_PSE was chosen for that, it doesn't intersect with swap
>entry format stored in pte.
>
>Reported-by: Andy Lutomirski <luto@amacapital.net>
>Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>Cc: Pavel Emelyanov <xemul@parallels.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Matt Mackall <mpm@selenic.com>
>Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>Cc: Marcelo Tosatti <mtosatti@redhat.com>
>Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>---
> arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
> arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
> fs/proc/task_mmu.c                   |   21 +++++++++++++++------
> include/asm-generic/pgtable.h        |   15 +++++++++++++++
> include/linux/swapops.h              |    2 ++
> mm/memory.c                          |    2 ++
> mm/rmap.c                            |    6 +++++-
> mm/swapfile.c                        |   19 +++++++++++++++++--
> 8 files changed, 84 insertions(+), 9 deletions(-)
>
>Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> }
>
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
> /*
>  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
>  * can use those bits for other purposes, so leave them be.
>Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>@@ -67,6 +67,19 @@
> #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
> #endif
>
>+/*
>+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>+ * We need a bit which can be stored in pte _and_ not conflict
>+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>+ * into swap entry computation, but bit 6 is used for nonlinear
>+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>+ */
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
>+#else
>+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
>+#endif
>+
> #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
> #else
>Index: linux-2.6.git/fs/proc/task_mmu.c
>===================================================================
>--- linux-2.6.git.orig/fs/proc/task_mmu.c
>+++ linux-2.6.git/fs/proc/task_mmu.c
>@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> 	 * of how soft-dirty works.
> 	 */
> 	pte_t ptent = *pte;
>-	ptent = pte_wrprotect(ptent);
>-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+
>+	if (pte_present(ptent)) {
>+		ptent = pte_wrprotect(ptent);
>+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+	} else if (is_swap_pte(ptent)) {
>+		ptent = pte_swp_clear_soft_dirty(ptent);
>+	}
>+
> 	set_pte_at(vma->vm_mm, addr, pte, ptent);
> #endif
> }
>@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> 	for (; addr != end; pte++, addr += PAGE_SIZE) {
> 		ptent = *pte;
>-		if (!pte_present(ptent))
>-			continue;
>
> 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> 			clear_soft_dirty(vma, addr, pte);
> 			continue;
> 		}
>
>+		if (!pte_present(ptent))
>+			continue;
>+
> 		page = vm_normal_page(vma, addr, ptent);
> 		if (!page)
> 			continue;
>@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> 		flags = PM_PRESENT;
> 		page = vm_normal_page(vma, addr, pte);
> 	} else if (is_swap_pte(pte)) {
>-		swp_entry_t entry = pte_to_swp_entry(pte);
>-
>+		swp_entry_t entry;
>+		if (pte_swp_soft_dirty(pte))
>+			flags2 |= __PM_SOFT_DIRTY;
>+		entry = pte_to_swp_entry(pte);
> 		frame = swp_type(entry) |
> 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> 		flags = PM_SWAP;
>Index: linux-2.6.git/include/asm-generic/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>+++ linux-2.6.git/include/asm-generic/pgtable.h
>@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> {
> 	return pmd;
> }
>+
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+	return pte;
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+	return 0;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+	return pte;
>+}
> #endif
>
> #ifndef __HAVE_PFNMAP_TRACKING
>Index: linux-2.6.git/include/linux/swapops.h
>===================================================================
>--- linux-2.6.git.orig/include/linux/swapops.h
>+++ linux-2.6.git/include/linux/swapops.h
>@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> 	swp_entry_t arch_entry;
>
> 	BUG_ON(pte_file(pte));
>+	if (pte_swp_soft_dirty(pte))
>+		pte = pte_swp_clear_soft_dirty(pte);
> 	arch_entry = __pte_to_swp_entry(pte);
> 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> }
>Index: linux-2.6.git/mm/memory.c
>===================================================================
>--- linux-2.6.git.orig/mm/memory.c
>+++ linux-2.6.git/mm/memory.c
>@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> 		exclusive = 1;
> 	}
> 	flush_icache_page(vma, page);
>+	if (pte_swp_soft_dirty(orig_pte))
>+		pte = pte_mksoft_dirty(pte);

entry = pte_to_swp_entry(orig_pte);
orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 

> 	set_pte_at(mm, address, page_table, pte);
> 	if (page == swapcache)
> 		do_page_add_anon_rmap(page, vma, address, exclusive);
>Index: linux-2.6.git/mm/rmap.c
>===================================================================
>--- linux-2.6.git.orig/mm/rmap.c
>+++ linux-2.6.git/mm/rmap.c
>@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> 			   swp_entry_to_pte(make_hwpoison_entry(page)));
> 	} else if (PageAnon(page)) {
> 		swp_entry_t entry = { .val = page_private(page) };
>+		pte_t swp_pte;
>
> 		if (PageSwapCache(page)) {
> 			/*
>@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> 			entry = make_migration_entry(page, pte_write(pteval));
> 		}
>-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>+		swp_pte = swp_entry_to_pte(entry);
>+		if (pte_soft_dirty(pteval))
>+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>+		set_pte_at(mm, address, pte, swp_pte);
> 		BUG_ON(pte_file(*pte));
> 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
> 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
>Index: linux-2.6.git/mm/swapfile.c
>===================================================================
>--- linux-2.6.git.orig/mm/swapfile.c
>+++ linux-2.6.git/mm/swapfile.c
>@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> }
> #endif /* CONFIG_HIBERNATION */
>
>+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>+{
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+	/*
>+	 * When pte keeps soft dirty bit the pte generated
>+	 * from swap entry does not has it, still it's same
>+	 * pte from logical point of view.
>+	 */
>+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>+#else
>+	return pte_same(pte, swp_pte);
>+#endif
>+}
>+
> /*
>  * No need to decide whether this PTE shares the swap entry with others,
>  * just let do_wp_page work it out if a write is requested later - to
>@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> 	}
>
> 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> 		mem_cgroup_cancel_charge_swapin(memcg);
> 		ret = 0;
> 		goto out;
>@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> 		 * swapoff spends a _lot_ of time in this loop!
> 		 * Test inline before going to call unuse_pte.
> 		 */
>-		if (unlikely(pte_same(*pte, swp_pte))) {
>+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> 			pte_unmap(pte);
> 			ret = unuse_pte(vma, pmd, addr, entry, page);
> 			if (ret)
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
                     ` (2 preceding siblings ...)
  2013-08-05  1:48   ` Wanpeng Li
@ 2013-08-05  1:48   ` Wanpeng Li
       [not found]   ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
  2013-08-07 20:21   ` Andrew Morton
  5 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05  1:48 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm,
	xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>bit set get swapped out, the bit is getting lost and no longer
>available when pte read back.
>
>To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>saved in pte entry for the page being swapped out. When such page
>is to be read back from a swap cache we check for bit presence
>and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>bit back.
>
>One of the problem was to find a place in pte entry where we can
>save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>_PAGE_PSE was chosen for that, it doesn't intersect with swap
>entry format stored in pte.
>
>Reported-by: Andy Lutomirski <luto@amacapital.net>
>Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>Cc: Pavel Emelyanov <xemul@parallels.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Matt Mackall <mpm@selenic.com>
>Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>Cc: Marcelo Tosatti <mtosatti@redhat.com>
>Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>---
> arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
> arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
> fs/proc/task_mmu.c                   |   21 +++++++++++++++------
> include/asm-generic/pgtable.h        |   15 +++++++++++++++
> include/linux/swapops.h              |    2 ++
> mm/memory.c                          |    2 ++
> mm/rmap.c                            |    6 +++++-
> mm/swapfile.c                        |   19 +++++++++++++++++--
> 8 files changed, 84 insertions(+), 9 deletions(-)
>
>Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> }
>
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
> /*
>  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
>  * can use those bits for other purposes, so leave them be.
>Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>@@ -67,6 +67,19 @@
> #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
> #endif
>
>+/*
>+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>+ * We need a bit which can be stored in pte _and_ not conflict
>+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>+ * into swap entry computation, but bit 6 is used for nonlinear
>+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>+ */
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
>+#else
>+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
>+#endif
>+
> #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
> #else
>Index: linux-2.6.git/fs/proc/task_mmu.c
>===================================================================
>--- linux-2.6.git.orig/fs/proc/task_mmu.c
>+++ linux-2.6.git/fs/proc/task_mmu.c
>@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> 	 * of how soft-dirty works.
> 	 */
> 	pte_t ptent = *pte;
>-	ptent = pte_wrprotect(ptent);
>-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+
>+	if (pte_present(ptent)) {
>+		ptent = pte_wrprotect(ptent);
>+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+	} else if (is_swap_pte(ptent)) {
>+		ptent = pte_swp_clear_soft_dirty(ptent);
>+	}
>+
> 	set_pte_at(vma->vm_mm, addr, pte, ptent);
> #endif
> }
>@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> 	for (; addr != end; pte++, addr += PAGE_SIZE) {
> 		ptent = *pte;
>-		if (!pte_present(ptent))
>-			continue;
>
> 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> 			clear_soft_dirty(vma, addr, pte);
> 			continue;
> 		}
>
>+		if (!pte_present(ptent))
>+			continue;
>+
> 		page = vm_normal_page(vma, addr, ptent);
> 		if (!page)
> 			continue;
>@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> 		flags = PM_PRESENT;
> 		page = vm_normal_page(vma, addr, pte);
> 	} else if (is_swap_pte(pte)) {
>-		swp_entry_t entry = pte_to_swp_entry(pte);
>-
>+		swp_entry_t entry;
>+		if (pte_swp_soft_dirty(pte))
>+			flags2 |= __PM_SOFT_DIRTY;
>+		entry = pte_to_swp_entry(pte);
> 		frame = swp_type(entry) |
> 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> 		flags = PM_SWAP;
>Index: linux-2.6.git/include/asm-generic/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>+++ linux-2.6.git/include/asm-generic/pgtable.h
>@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> {
> 	return pmd;
> }
>+
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+	return pte;
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+	return 0;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+	return pte;
>+}
> #endif
>
> #ifndef __HAVE_PFNMAP_TRACKING
>Index: linux-2.6.git/include/linux/swapops.h
>===================================================================
>--- linux-2.6.git.orig/include/linux/swapops.h
>+++ linux-2.6.git/include/linux/swapops.h
>@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> 	swp_entry_t arch_entry;
>
> 	BUG_ON(pte_file(pte));
>+	if (pte_swp_soft_dirty(pte))
>+		pte = pte_swp_clear_soft_dirty(pte);
> 	arch_entry = __pte_to_swp_entry(pte);
> 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> }
>Index: linux-2.6.git/mm/memory.c
>===================================================================
>--- linux-2.6.git.orig/mm/memory.c
>+++ linux-2.6.git/mm/memory.c
>@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> 		exclusive = 1;
> 	}
> 	flush_icache_page(vma, page);
>+	if (pte_swp_soft_dirty(orig_pte))
>+		pte = pte_mksoft_dirty(pte);

entry = pte_to_swp_entry(orig_pte);
orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 

> 	set_pte_at(mm, address, page_table, pte);
> 	if (page == swapcache)
> 		do_page_add_anon_rmap(page, vma, address, exclusive);
>Index: linux-2.6.git/mm/rmap.c
>===================================================================
>--- linux-2.6.git.orig/mm/rmap.c
>+++ linux-2.6.git/mm/rmap.c
>@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> 			   swp_entry_to_pte(make_hwpoison_entry(page)));
> 	} else if (PageAnon(page)) {
> 		swp_entry_t entry = { .val = page_private(page) };
>+		pte_t swp_pte;
>
> 		if (PageSwapCache(page)) {
> 			/*
>@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> 			entry = make_migration_entry(page, pte_write(pteval));
> 		}
>-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>+		swp_pte = swp_entry_to_pte(entry);
>+		if (pte_soft_dirty(pteval))
>+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>+		set_pte_at(mm, address, pte, swp_pte);
> 		BUG_ON(pte_file(*pte));
> 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
> 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
>Index: linux-2.6.git/mm/swapfile.c
>===================================================================
>--- linux-2.6.git.orig/mm/swapfile.c
>+++ linux-2.6.git/mm/swapfile.c
>@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> }
> #endif /* CONFIG_HIBERNATION */
>
>+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>+{
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+	/*
>+	 * When pte keeps soft dirty bit the pte generated
>+	 * from swap entry does not has it, still it's same
>+	 * pte from logical point of view.
>+	 */
>+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>+#else
>+	return pte_same(pte, swp_pte);
>+#endif
>+}
>+
> /*
>  * No need to decide whether this PTE shares the swap entry with others,
>  * just let do_wp_page work it out if a write is requested later - to
>@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> 	}
>
> 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> 		mem_cgroup_cancel_charge_swapin(memcg);
> 		ret = 0;
> 		goto out;
>@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> 		 * swapoff spends a _lot_ of time in this loop!
> 		 * Test inline before going to call unuse_pte.
> 		 */
>-		if (unlikely(pte_same(*pte, swp_pte))) {
>+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> 			pte_unmap(pte);
> 			ret = unuse_pte(vma, pmd, addr, entry, page);
> 			if (ret)
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
       [not found]   ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-05  2:17     ` Minchan Kim
  2013-08-05  2:38       ` Wanpeng Li
                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Minchan Kim @ 2013-08-05  2:17 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
	akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

Hello Wanpeng,

On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> >bit set get swapped out, the bit is getting lost and no longer
> >available when pte read back.
> >
> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> >saved in pte entry for the page being swapped out. When such page
> >is to be read back from a swap cache we check for bit presence
> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> >bit back.
> >
> >One of the problem was to find a place in pte entry where we can
> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
> >entry format stored in pte.
> >
> >Reported-by: Andy Lutomirski <luto@amacapital.net>
> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >Cc: Pavel Emelyanov <xemul@parallels.com>
> >Cc: Andrew Morton <akpm@linux-foundation.org>
> >Cc: Matt Mackall <mpm@selenic.com>
> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> >Cc: Peter Zijlstra <peterz@infradead.org>
> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >---
> > arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
> > arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
> > fs/proc/task_mmu.c                   |   21 +++++++++++++++------
> > include/asm-generic/pgtable.h        |   15 +++++++++++++++
> > include/linux/swapops.h              |    2 ++
> > mm/memory.c                          |    2 ++
> > mm/rmap.c                            |    6 +++++-
> > mm/swapfile.c                        |   19 +++++++++++++++++--
> > 8 files changed, 84 insertions(+), 9 deletions(-)
> >
> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
> >===================================================================
> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> > 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> > }
> >
> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >+{
> >+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >+}
> >+
> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >+{
> >+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
> >+}
> >+
> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >+{
> >+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >+}
> >+
> > /*
> >  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
> >  * can use those bits for other purposes, so leave them be.
> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >===================================================================
> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >@@ -67,6 +67,19 @@
> > #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
> > #endif
> >
> >+/*
> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
> >+ * We need a bit which can be stored in pte _and_ not conflict
> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
> >+ * into swap entry computation, but bit 6 is used for nonlinear
> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
> >+ */
> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
> >+#else
> >+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
> >+#endif
> >+
> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> > #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
> > #else
> >Index: linux-2.6.git/fs/proc/task_mmu.c
> >===================================================================
> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
> >+++ linux-2.6.git/fs/proc/task_mmu.c
> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> > 	 * of how soft-dirty works.
> > 	 */
> > 	pte_t ptent = *pte;
> >-	ptent = pte_wrprotect(ptent);
> >-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >+
> >+	if (pte_present(ptent)) {
> >+		ptent = pte_wrprotect(ptent);
> >+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >+	} else if (is_swap_pte(ptent)) {
> >+		ptent = pte_swp_clear_soft_dirty(ptent);
> >+	}
> >+
> > 	set_pte_at(vma->vm_mm, addr, pte, ptent);
> > #endif
> > }
> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> > 	for (; addr != end; pte++, addr += PAGE_SIZE) {
> > 		ptent = *pte;
> >-		if (!pte_present(ptent))
> >-			continue;
> >
> > 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> > 			clear_soft_dirty(vma, addr, pte);
> > 			continue;
> > 		}
> >
> >+		if (!pte_present(ptent))
> >+			continue;
> >+
> > 		page = vm_normal_page(vma, addr, ptent);
> > 		if (!page)
> > 			continue;
> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> > 		flags = PM_PRESENT;
> > 		page = vm_normal_page(vma, addr, pte);
> > 	} else if (is_swap_pte(pte)) {
> >-		swp_entry_t entry = pte_to_swp_entry(pte);
> >-
> >+		swp_entry_t entry;
> >+		if (pte_swp_soft_dirty(pte))
> >+			flags2 |= __PM_SOFT_DIRTY;
> >+		entry = pte_to_swp_entry(pte);
> > 		frame = swp_type(entry) |
> > 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> > 		flags = PM_SWAP;
> >Index: linux-2.6.git/include/asm-generic/pgtable.h
> >===================================================================
> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
> >+++ linux-2.6.git/include/asm-generic/pgtable.h
> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> > {
> > 	return pmd;
> > }
> >+
> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >+{
> >+	return pte;
> >+}
> >+
> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >+{
> >+	return 0;
> >+}
> >+
> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >+{
> >+	return pte;
> >+}
> > #endif
> >
> > #ifndef __HAVE_PFNMAP_TRACKING
> >Index: linux-2.6.git/include/linux/swapops.h
> >===================================================================
> >--- linux-2.6.git.orig/include/linux/swapops.h
> >+++ linux-2.6.git/include/linux/swapops.h
> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> > 	swp_entry_t arch_entry;
> >
> > 	BUG_ON(pte_file(pte));
> >+	if (pte_swp_soft_dirty(pte))
> >+		pte = pte_swp_clear_soft_dirty(pte);
> > 	arch_entry = __pte_to_swp_entry(pte);
> > 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> > }
> >Index: linux-2.6.git/mm/memory.c
> >===================================================================
> >--- linux-2.6.git.orig/mm/memory.c
> >+++ linux-2.6.git/mm/memory.c
> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> > 		exclusive = 1;
> > 	}
> > 	flush_icache_page(vma, page);
> >+	if (pte_swp_soft_dirty(orig_pte))
> >+		pte = pte_mksoft_dirty(pte);
> 
> entry = pte_to_swp_entry(orig_pte);
> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 

You seem to walk same way with me.
Please look at my stupid questions in this thread.

> 
> > 	set_pte_at(mm, address, page_table, pte);
> > 	if (page == swapcache)
> > 		do_page_add_anon_rmap(page, vma, address, exclusive);
> >Index: linux-2.6.git/mm/rmap.c
> >===================================================================
> >--- linux-2.6.git.orig/mm/rmap.c
> >+++ linux-2.6.git/mm/rmap.c
> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> > 			   swp_entry_to_pte(make_hwpoison_entry(page)));
> > 	} else if (PageAnon(page)) {
> > 		swp_entry_t entry = { .val = page_private(page) };
> >+		pte_t swp_pte;
> >
> > 		if (PageSwapCache(page)) {
> > 			/*
> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> > 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> > 			entry = make_migration_entry(page, pte_write(pteval));
> > 		}
> >-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
> >+		swp_pte = swp_entry_to_pte(entry);
> >+		if (pte_soft_dirty(pteval))
> >+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
> >+		set_pte_at(mm, address, pte, swp_pte);
> > 		BUG_ON(pte_file(*pte));
> > 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
> > 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
> >Index: linux-2.6.git/mm/swapfile.c
> >===================================================================
> >--- linux-2.6.git.orig/mm/swapfile.c
> >+++ linux-2.6.git/mm/swapfile.c
> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> > }
> > #endif /* CONFIG_HIBERNATION */
> >
> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
> >+{
> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >+	/*
> >+	 * When pte keeps soft dirty bit the pte generated
> >+	 * from swap entry does not has it, still it's same
> >+	 * pte from logical point of view.
> >+	 */
> >+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
> >+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
> >+#else
> >+	return pte_same(pte, swp_pte);
> >+#endif
> >+}
> >+
> > /*
> >  * No need to decide whether this PTE shares the swap entry with others,
> >  * just let do_wp_page work it out if a write is requested later - to
> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> > 	}
> >
> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> >-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
> >+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> > 		mem_cgroup_cancel_charge_swapin(memcg);
> > 		ret = 0;
> > 		goto out;
> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> > 		 * swapoff spends a _lot_ of time in this loop!
> > 		 * Test inline before going to call unuse_pte.
> > 		 */
> >-		if (unlikely(pte_same(*pte, swp_pte))) {
> >+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> > 			pte_unmap(pte);
> > 			ret = unuse_pte(vma, pmd, addr, entry, page);
> > 			if (ret)
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-05  2:17     ` Minchan Kim
@ 2013-08-05  2:38       ` Wanpeng Li
  2013-08-05  2:38       ` Wanpeng Li
       [not found]       ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05  2:38 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
	akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

Hi Minchan,

On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>Hello Wanpeng,
>
>On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >bit set get swapped out, the bit is getting lost and no longer
>> >available when pte read back.
>> >
>> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >saved in pte entry for the page being swapped out. When such page
>> >is to be read back from a swap cache we check for bit presence
>> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >bit back.
>> >
>> >One of the problem was to find a place in pte entry where we can
>> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >entry format stored in pte.
>> >
>> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >Cc: Matt Mackall <mpm@selenic.com>
>> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >---
>> > arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
>> > arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
>> > fs/proc/task_mmu.c                   |   21 +++++++++++++++------
>> > include/asm-generic/pgtable.h        |   15 +++++++++++++++
>> > include/linux/swapops.h              |    2 ++
>> > mm/memory.c                          |    2 ++
>> > mm/rmap.c                            |    6 +++++-
>> > mm/swapfile.c                        |   19 +++++++++++++++++--
>> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> > }
>> >
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> > /*
>> >  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
>> >  * can use those bits for other purposes, so leave them be.
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >@@ -67,6 +67,19 @@
>> > #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
>> > #endif
>> >
>> >+/*
>> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >+ */
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
>> >+#else
>> >+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
>> >+#endif
>> >+
>> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> > #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> > #else
>> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >===================================================================
>> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> > 	 * of how soft-dirty works.
>> > 	 */
>> > 	pte_t ptent = *pte;
>> >-	ptent = pte_wrprotect(ptent);
>> >-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+
>> >+	if (pte_present(ptent)) {
>> >+		ptent = pte_wrprotect(ptent);
>> >+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+	} else if (is_swap_pte(ptent)) {
>> >+		ptent = pte_swp_clear_soft_dirty(ptent);
>> >+	}
>> >+
>> > 	set_pte_at(vma->vm_mm, addr, pte, ptent);
>> > #endif
>> > }
>> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> > 	for (; addr != end; pte++, addr += PAGE_SIZE) {
>> > 		ptent = *pte;
>> >-		if (!pte_present(ptent))
>> >-			continue;
>> >
>> > 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> > 			clear_soft_dirty(vma, addr, pte);
>> > 			continue;
>> > 		}
>> >
>> >+		if (!pte_present(ptent))
>> >+			continue;
>> >+
>> > 		page = vm_normal_page(vma, addr, ptent);
>> > 		if (!page)
>> > 			continue;
>> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> > 		flags = PM_PRESENT;
>> > 		page = vm_normal_page(vma, addr, pte);
>> > 	} else if (is_swap_pte(pte)) {
>> >-		swp_entry_t entry = pte_to_swp_entry(pte);
>> >-
>> >+		swp_entry_t entry;
>> >+		if (pte_swp_soft_dirty(pte))
>> >+			flags2 |= __PM_SOFT_DIRTY;
>> >+		entry = pte_to_swp_entry(pte);
>> > 		frame = swp_type(entry) |
>> > 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> > 		flags = PM_SWAP;
>> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > {
>> > 	return pmd;
>> > }
>> >+
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+	return pte;
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+	return 0;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+	return pte;
>> >+}
>> > #endif
>> >
>> > #ifndef __HAVE_PFNMAP_TRACKING
>> >Index: linux-2.6.git/include/linux/swapops.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >+++ linux-2.6.git/include/linux/swapops.h
>> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> > 	swp_entry_t arch_entry;
>> >
>> > 	BUG_ON(pte_file(pte));
>> >+	if (pte_swp_soft_dirty(pte))
>> >+		pte = pte_swp_clear_soft_dirty(pte);
>> > 	arch_entry = __pte_to_swp_entry(pte);
>> > 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> > }
>> >Index: linux-2.6.git/mm/memory.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/memory.c
>> >+++ linux-2.6.git/mm/memory.c
>> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> > 		exclusive = 1;
>> > 	}
>> > 	flush_icache_page(vma, page);
>> >+	if (pte_swp_soft_dirty(orig_pte))
>> >+		pte = pte_mksoft_dirty(pte);
>> 
>> entry = pte_to_swp_entry(orig_pte);
>> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 
>
>You seem to walk same way with me.
>Please look at my stupid questions in this thread.
>

I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
both against orig_pte, where I miss? ;-)

>> 
>> > 	set_pte_at(mm, address, page_table, pte);
>> > 	if (page == swapcache)
>> > 		do_page_add_anon_rmap(page, vma, address, exclusive);
>> >Index: linux-2.6.git/mm/rmap.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/rmap.c
>> >+++ linux-2.6.git/mm/rmap.c
>> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> > 			   swp_entry_to_pte(make_hwpoison_entry(page)));
>> > 	} else if (PageAnon(page)) {
>> > 		swp_entry_t entry = { .val = page_private(page) };
>> >+		pte_t swp_pte;
>> >
>> > 		if (PageSwapCache(page)) {
>> > 			/*
>> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> > 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> > 			entry = make_migration_entry(page, pte_write(pteval));
>> > 		}
>> >-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >+		swp_pte = swp_entry_to_pte(entry);
>> >+		if (pte_soft_dirty(pteval))
>> >+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >+		set_pte_at(mm, address, pte, swp_pte);
>> > 		BUG_ON(pte_file(*pte));
>> > 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> > 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >Index: linux-2.6.git/mm/swapfile.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/swapfile.c
>> >+++ linux-2.6.git/mm/swapfile.c
>> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> > }
>> > #endif /* CONFIG_HIBERNATION */
>> >
>> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >+{
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+	/*
>> >+	 * When pte keeps soft dirty bit the pte generated
>> >+	 * from swap entry does not has it, still it's same
>> >+	 * pte from logical point of view.
>> >+	 */
>> >+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >+#else
>> >+	return pte_same(pte, swp_pte);
>> >+#endif
>> >+}
>> >+
>> > /*
>> >  * No need to decide whether this PTE shares the swap entry with others,
>> >  * just let do_wp_page work it out if a write is requested later - to
>> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> > 	}
>> >
>> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> > 		mem_cgroup_cancel_charge_swapin(memcg);
>> > 		ret = 0;
>> > 		goto out;
>> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> > 		 * swapoff spends a _lot_ of time in this loop!
>> > 		 * Test inline before going to call unuse_pte.
>> > 		 */
>> >-		if (unlikely(pte_same(*pte, swp_pte))) {
>> >+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> > 			pte_unmap(pte);
>> > 			ret = unuse_pte(vma, pmd, addr, entry, page);
>> > 			if (ret)
>> >
>> >--
>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >the body to majordomo@kvack.org.  For more info on Linux MM,
>> >see: http://www.linux-mm.org/ .
>> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> 
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>-- 
>Kind regards,
>Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-05  2:17     ` Minchan Kim
  2013-08-05  2:38       ` Wanpeng Li
@ 2013-08-05  2:38       ` Wanpeng Li
       [not found]       ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05  2:38 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
	akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

Hi Minchan,

On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>Hello Wanpeng,
>
>On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >bit set get swapped out, the bit is getting lost and no longer
>> >available when pte read back.
>> >
>> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >saved in pte entry for the page being swapped out. When such page
>> >is to be read back from a swap cache we check for bit presence
>> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >bit back.
>> >
>> >One of the problem was to find a place in pte entry where we can
>> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >entry format stored in pte.
>> >
>> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >Cc: Matt Mackall <mpm@selenic.com>
>> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >---
>> > arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
>> > arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
>> > fs/proc/task_mmu.c                   |   21 +++++++++++++++------
>> > include/asm-generic/pgtable.h        |   15 +++++++++++++++
>> > include/linux/swapops.h              |    2 ++
>> > mm/memory.c                          |    2 ++
>> > mm/rmap.c                            |    6 +++++-
>> > mm/swapfile.c                        |   19 +++++++++++++++++--
>> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> > }
>> >
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> > /*
>> >  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
>> >  * can use those bits for other purposes, so leave them be.
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >@@ -67,6 +67,19 @@
>> > #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
>> > #endif
>> >
>> >+/*
>> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >+ */
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
>> >+#else
>> >+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
>> >+#endif
>> >+
>> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> > #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> > #else
>> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >===================================================================
>> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> > 	 * of how soft-dirty works.
>> > 	 */
>> > 	pte_t ptent = *pte;
>> >-	ptent = pte_wrprotect(ptent);
>> >-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+
>> >+	if (pte_present(ptent)) {
>> >+		ptent = pte_wrprotect(ptent);
>> >+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+	} else if (is_swap_pte(ptent)) {
>> >+		ptent = pte_swp_clear_soft_dirty(ptent);
>> >+	}
>> >+
>> > 	set_pte_at(vma->vm_mm, addr, pte, ptent);
>> > #endif
>> > }
>> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> > 	for (; addr != end; pte++, addr += PAGE_SIZE) {
>> > 		ptent = *pte;
>> >-		if (!pte_present(ptent))
>> >-			continue;
>> >
>> > 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> > 			clear_soft_dirty(vma, addr, pte);
>> > 			continue;
>> > 		}
>> >
>> >+		if (!pte_present(ptent))
>> >+			continue;
>> >+
>> > 		page = vm_normal_page(vma, addr, ptent);
>> > 		if (!page)
>> > 			continue;
>> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> > 		flags = PM_PRESENT;
>> > 		page = vm_normal_page(vma, addr, pte);
>> > 	} else if (is_swap_pte(pte)) {
>> >-		swp_entry_t entry = pte_to_swp_entry(pte);
>> >-
>> >+		swp_entry_t entry;
>> >+		if (pte_swp_soft_dirty(pte))
>> >+			flags2 |= __PM_SOFT_DIRTY;
>> >+		entry = pte_to_swp_entry(pte);
>> > 		frame = swp_type(entry) |
>> > 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> > 		flags = PM_SWAP;
>> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > {
>> > 	return pmd;
>> > }
>> >+
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+	return pte;
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+	return 0;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+	return pte;
>> >+}
>> > #endif
>> >
>> > #ifndef __HAVE_PFNMAP_TRACKING
>> >Index: linux-2.6.git/include/linux/swapops.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >+++ linux-2.6.git/include/linux/swapops.h
>> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> > 	swp_entry_t arch_entry;
>> >
>> > 	BUG_ON(pte_file(pte));
>> >+	if (pte_swp_soft_dirty(pte))
>> >+		pte = pte_swp_clear_soft_dirty(pte);
>> > 	arch_entry = __pte_to_swp_entry(pte);
>> > 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> > }
>> >Index: linux-2.6.git/mm/memory.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/memory.c
>> >+++ linux-2.6.git/mm/memory.c
>> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> > 		exclusive = 1;
>> > 	}
>> > 	flush_icache_page(vma, page);
>> >+	if (pte_swp_soft_dirty(orig_pte))
>> >+		pte = pte_mksoft_dirty(pte);
>> 
>> entry = pte_to_swp_entry(orig_pte);
>> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 
>
>You seem to walk same way with me.
>Please look at my stupid questions in this thread.
>

I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
both against orig_pte, where I miss? ;-)

>> 
>> > 	set_pte_at(mm, address, page_table, pte);
>> > 	if (page == swapcache)
>> > 		do_page_add_anon_rmap(page, vma, address, exclusive);
>> >Index: linux-2.6.git/mm/rmap.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/rmap.c
>> >+++ linux-2.6.git/mm/rmap.c
>> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> > 			   swp_entry_to_pte(make_hwpoison_entry(page)));
>> > 	} else if (PageAnon(page)) {
>> > 		swp_entry_t entry = { .val = page_private(page) };
>> >+		pte_t swp_pte;
>> >
>> > 		if (PageSwapCache(page)) {
>> > 			/*
>> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> > 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> > 			entry = make_migration_entry(page, pte_write(pteval));
>> > 		}
>> >-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >+		swp_pte = swp_entry_to_pte(entry);
>> >+		if (pte_soft_dirty(pteval))
>> >+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >+		set_pte_at(mm, address, pte, swp_pte);
>> > 		BUG_ON(pte_file(*pte));
>> > 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> > 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >Index: linux-2.6.git/mm/swapfile.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/swapfile.c
>> >+++ linux-2.6.git/mm/swapfile.c
>> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> > }
>> > #endif /* CONFIG_HIBERNATION */
>> >
>> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >+{
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+	/*
>> >+	 * When pte keeps soft dirty bit the pte generated
>> >+	 * from swap entry does not has it, still it's same
>> >+	 * pte from logical point of view.
>> >+	 */
>> >+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >+#else
>> >+	return pte_same(pte, swp_pte);
>> >+#endif
>> >+}
>> >+
>> > /*
>> >  * No need to decide whether this PTE shares the swap entry with others,
>> >  * just let do_wp_page work it out if a write is requested later - to
>> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> > 	}
>> >
>> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> > 		mem_cgroup_cancel_charge_swapin(memcg);
>> > 		ret = 0;
>> > 		goto out;
>> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> > 		 * swapoff spends a _lot_ of time in this loop!
>> > 		 * Test inline before going to call unuse_pte.
>> > 		 */
>> >-		if (unlikely(pte_same(*pte, swp_pte))) {
>> >+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> > 			pte_unmap(pte);
>> > 			ret = unuse_pte(vma, pmd, addr, entry, page);
>> > 			if (ret)
>> >
>> >--
>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >the body to majordomo@kvack.org.  For more info on Linux MM,
>> >see: http://www.linux-mm.org/ .
>> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> 
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>-- 
>Kind regards,
>Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
       [not found]       ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-05  2:54         ` Minchan Kim
  2013-08-05  2:58           ` Wanpeng Li
                             ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Minchan Kim @ 2013-08-05  2:54 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
	akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

On Mon, Aug 05, 2013 at 10:38:58AM +0800, Wanpeng Li wrote:
> Hi Minchan,
> 
> On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
> >Hello Wanpeng,
> >
> >On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
> >> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
> >> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> >> >bit set get swapped out, the bit is getting lost and no longer
> >> >available when pte read back.
> >> >
> >> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> >> >saved in pte entry for the page being swapped out. When such page
> >> >is to be read back from a swap cache we check for bit presence
> >> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> >> >bit back.
> >> >
> >> >One of the problem was to find a place in pte entry where we can
> >> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> >> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
> >> >entry format stored in pte.
> >> >
> >> >Reported-by: Andy Lutomirski <luto@amacapital.net>
> >> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >> >Cc: Pavel Emelyanov <xemul@parallels.com>
> >> >Cc: Andrew Morton <akpm@linux-foundation.org>
> >> >Cc: Matt Mackall <mpm@selenic.com>
> >> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> >> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> >> >Cc: Peter Zijlstra <peterz@infradead.org>
> >> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> >---
> >> > arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
> >> > arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
> >> > fs/proc/task_mmu.c                   |   21 +++++++++++++++------
> >> > include/asm-generic/pgtable.h        |   15 +++++++++++++++
> >> > include/linux/swapops.h              |    2 ++
> >> > mm/memory.c                          |    2 ++
> >> > mm/rmap.c                            |    6 +++++-
> >> > mm/swapfile.c                        |   19 +++++++++++++++++--
> >> > 8 files changed, 84 insertions(+), 9 deletions(-)
> >> >
> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
> >> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> >> > 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> >> > }
> >> >
> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >> >+{
> >> >+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >> >+}
> >> >+
> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >> >+{
> >> >+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
> >> >+}
> >> >+
> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >> >+{
> >> >+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >> >+}
> >> >+
> >> > /*
> >> >  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
> >> >  * can use those bits for other purposes, so leave them be.
> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >> >@@ -67,6 +67,19 @@
> >> > #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
> >> > #endif
> >> >
> >> >+/*
> >> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
> >> >+ * We need a bit which can be stored in pte _and_ not conflict
> >> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
> >> >+ * into swap entry computation, but bit 6 is used for nonlinear
> >> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
> >> >+ */
> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >> >+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
> >> >+#else
> >> >+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
> >> >+#endif
> >> >+
> >> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> >> > #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
> >> > #else
> >> >Index: linux-2.6.git/fs/proc/task_mmu.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
> >> >+++ linux-2.6.git/fs/proc/task_mmu.c
> >> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> >> > 	 * of how soft-dirty works.
> >> > 	 */
> >> > 	pte_t ptent = *pte;
> >> >-	ptent = pte_wrprotect(ptent);
> >> >-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >> >+
> >> >+	if (pte_present(ptent)) {
> >> >+		ptent = pte_wrprotect(ptent);
> >> >+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >> >+	} else if (is_swap_pte(ptent)) {
> >> >+		ptent = pte_swp_clear_soft_dirty(ptent);
> >> >+	}
> >> >+
> >> > 	set_pte_at(vma->vm_mm, addr, pte, ptent);
> >> > #endif
> >> > }
> >> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> >> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> >> > 	for (; addr != end; pte++, addr += PAGE_SIZE) {
> >> > 		ptent = *pte;
> >> >-		if (!pte_present(ptent))
> >> >-			continue;
> >> >
> >> > 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> >> > 			clear_soft_dirty(vma, addr, pte);
> >> > 			continue;
> >> > 		}
> >> >
> >> >+		if (!pte_present(ptent))
> >> >+			continue;
> >> >+
> >> > 		page = vm_normal_page(vma, addr, ptent);
> >> > 		if (!page)
> >> > 			continue;
> >> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> >> > 		flags = PM_PRESENT;
> >> > 		page = vm_normal_page(vma, addr, pte);
> >> > 	} else if (is_swap_pte(pte)) {
> >> >-		swp_entry_t entry = pte_to_swp_entry(pte);
> >> >-
> >> >+		swp_entry_t entry;
> >> >+		if (pte_swp_soft_dirty(pte))
> >> >+			flags2 |= __PM_SOFT_DIRTY;
> >> >+		entry = pte_to_swp_entry(pte);
> >> > 		frame = swp_type(entry) |
> >> > 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> >> > 		flags = PM_SWAP;
> >> >Index: linux-2.6.git/include/asm-generic/pgtable.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
> >> >+++ linux-2.6.git/include/asm-generic/pgtable.h
> >> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> >> > {
> >> > 	return pmd;
> >> > }
> >> >+
> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >> >+{
> >> >+	return pte;
> >> >+}
> >> >+
> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >> >+{
> >> >+	return 0;
> >> >+}
> >> >+
> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >> >+{
> >> >+	return pte;
> >> >+}
> >> > #endif
> >> >
> >> > #ifndef __HAVE_PFNMAP_TRACKING
> >> >Index: linux-2.6.git/include/linux/swapops.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/include/linux/swapops.h
> >> >+++ linux-2.6.git/include/linux/swapops.h
> >> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> >> > 	swp_entry_t arch_entry;
> >> >
> >> > 	BUG_ON(pte_file(pte));
> >> >+	if (pte_swp_soft_dirty(pte))
> >> >+		pte = pte_swp_clear_soft_dirty(pte);
> >> > 	arch_entry = __pte_to_swp_entry(pte);
> >> > 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> >> > }
> >> >Index: linux-2.6.git/mm/memory.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/mm/memory.c
> >> >+++ linux-2.6.git/mm/memory.c
> >> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> >> > 		exclusive = 1;
> >> > 	}
> >> > 	flush_icache_page(vma, page);
> >> >+	if (pte_swp_soft_dirty(orig_pte))
> >> >+		pte = pte_mksoft_dirty(pte);
> >> 
> >> entry = pte_to_swp_entry(orig_pte);
> >> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 
> >
> >You seem to walk same way with me.
> >Please look at my stupid questions in this thread.
> >
> 
> I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
> both against orig_pte, where I miss? ;-)

pte_to_swp_entry is passed orig_pte by vaule, not a pointer
so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
So orig_pte is never changed.

> 
> >> 
> >> > 	set_pte_at(mm, address, page_table, pte);
> >> > 	if (page == swapcache)
> >> > 		do_page_add_anon_rmap(page, vma, address, exclusive);
> >> >Index: linux-2.6.git/mm/rmap.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/mm/rmap.c
> >> >+++ linux-2.6.git/mm/rmap.c
> >> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> >> > 			   swp_entry_to_pte(make_hwpoison_entry(page)));
> >> > 	} else if (PageAnon(page)) {
> >> > 		swp_entry_t entry = { .val = page_private(page) };
> >> >+		pte_t swp_pte;
> >> >
> >> > 		if (PageSwapCache(page)) {
> >> > 			/*
> >> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> >> > 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> >> > 			entry = make_migration_entry(page, pte_write(pteval));
> >> > 		}
> >> >-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
> >> >+		swp_pte = swp_entry_to_pte(entry);
> >> >+		if (pte_soft_dirty(pteval))
> >> >+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
> >> >+		set_pte_at(mm, address, pte, swp_pte);
> >> > 		BUG_ON(pte_file(*pte));
> >> > 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
> >> > 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
> >> >Index: linux-2.6.git/mm/swapfile.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/mm/swapfile.c
> >> >+++ linux-2.6.git/mm/swapfile.c
> >> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> >> > }
> >> > #endif /* CONFIG_HIBERNATION */
> >> >
> >> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
> >> >+{
> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >> >+	/*
> >> >+	 * When pte keeps soft dirty bit the pte generated
> >> >+	 * from swap entry does not has it, still it's same
> >> >+	 * pte from logical point of view.
> >> >+	 */
> >> >+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
> >> >+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
> >> >+#else
> >> >+	return pte_same(pte, swp_pte);
> >> >+#endif
> >> >+}
> >> >+
> >> > /*
> >> >  * No need to decide whether this PTE shares the swap entry with others,
> >> >  * just let do_wp_page work it out if a write is requested later - to
> >> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> >> > 	}
> >> >
> >> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> >> >-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
> >> >+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> >> > 		mem_cgroup_cancel_charge_swapin(memcg);
> >> > 		ret = 0;
> >> > 		goto out;
> >> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> >> > 		 * swapoff spends a _lot_ of time in this loop!
> >> > 		 * Test inline before going to call unuse_pte.
> >> > 		 */
> >> >-		if (unlikely(pte_same(*pte, swp_pte))) {
> >> >+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> >> > 			pte_unmap(pte);
> >> > 			ret = unuse_pte(vma, pmd, addr, entry, page);
> >> > 			if (ret)
> >> >
> >> >--
> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >> >see: http://www.linux-mm.org/ .
> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >> 
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> >-- 
> >Kind regards,
> >Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-05  2:54         ` Minchan Kim
@ 2013-08-05  2:58           ` Wanpeng Li
  2013-08-05  2:58           ` Wanpeng Li
       [not found]           ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05  2:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
	akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

On Mon, Aug 05, 2013 at 11:54:37AM +0900, Minchan Kim wrote:
>On Mon, Aug 05, 2013 at 10:38:58AM +0800, Wanpeng Li wrote:
>> Hi Minchan,
>> 
>> On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>> >Hello Wanpeng,
>> >
>> >On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> >> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >> >bit set get swapped out, the bit is getting lost and no longer
>> >> >available when pte read back.
>> >> >
>> >> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >> >saved in pte entry for the page being swapped out. When such page
>> >> >is to be read back from a swap cache we check for bit presence
>> >> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >> >bit back.
>> >> >
>> >> >One of the problem was to find a place in pte entry where we can
>> >> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >> >entry format stored in pte.
>> >> >
>> >> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >> >Cc: Matt Mackall <mpm@selenic.com>
>> >> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> >---
>> >> > arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
>> >> > arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
>> >> > fs/proc/task_mmu.c                   |   21 +++++++++++++++------
>> >> > include/asm-generic/pgtable.h        |   15 +++++++++++++++
>> >> > include/linux/swapops.h              |    2 ++
>> >> > mm/memory.c                          |    2 ++
>> >> > mm/rmap.c                            |    6 +++++-
>> >> > mm/swapfile.c                        |   19 +++++++++++++++++--
>> >> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >> >
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> >> > }
>> >> >
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> > /*
>> >> >  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
>> >> >  * can use those bits for other purposes, so leave them be.
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >@@ -67,6 +67,19 @@
>> >> > #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
>> >> > #endif
>> >> >
>> >> >+/*
>> >> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >> >+ */
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
>> >> >+#else
>> >> >+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
>> >> >+#endif
>> >> >+
>> >> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> >> > #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> >> > #else
>> >> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> >> > 	 * of how soft-dirty works.
>> >> > 	 */
>> >> > 	pte_t ptent = *pte;
>> >> >-	ptent = pte_wrprotect(ptent);
>> >> >-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+
>> >> >+	if (pte_present(ptent)) {
>> >> >+		ptent = pte_wrprotect(ptent);
>> >> >+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+	} else if (is_swap_pte(ptent)) {
>> >> >+		ptent = pte_swp_clear_soft_dirty(ptent);
>> >> >+	}
>> >> >+
>> >> > 	set_pte_at(vma->vm_mm, addr, pte, ptent);
>> >> > #endif
>> >> > }
>> >> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> >> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> > 	for (; addr != end; pte++, addr += PAGE_SIZE) {
>> >> > 		ptent = *pte;
>> >> >-		if (!pte_present(ptent))
>> >> >-			continue;
>> >> >
>> >> > 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> >> > 			clear_soft_dirty(vma, addr, pte);
>> >> > 			continue;
>> >> > 		}
>> >> >
>> >> >+		if (!pte_present(ptent))
>> >> >+			continue;
>> >> >+
>> >> > 		page = vm_normal_page(vma, addr, ptent);
>> >> > 		if (!page)
>> >> > 			continue;
>> >> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> >> > 		flags = PM_PRESENT;
>> >> > 		page = vm_normal_page(vma, addr, pte);
>> >> > 	} else if (is_swap_pte(pte)) {
>> >> >-		swp_entry_t entry = pte_to_swp_entry(pte);
>> >> >-
>> >> >+		swp_entry_t entry;
>> >> >+		if (pte_swp_soft_dirty(pte))
>> >> >+			flags2 |= __PM_SOFT_DIRTY;
>> >> >+		entry = pte_to_swp_entry(pte);
>> >> > 		frame = swp_type(entry) |
>> >> > 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> >> > 		flags = PM_SWAP;
>> >> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > {
>> >> > 	return pmd;
>> >> > }
>> >> >+
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte;
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return 0;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte;
>> >> >+}
>> >> > #endif
>> >> >
>> >> > #ifndef __HAVE_PFNMAP_TRACKING
>> >> >Index: linux-2.6.git/include/linux/swapops.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >> >+++ linux-2.6.git/include/linux/swapops.h
>> >> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> >> > 	swp_entry_t arch_entry;
>> >> >
>> >> > 	BUG_ON(pte_file(pte));
>> >> >+	if (pte_swp_soft_dirty(pte))
>> >> >+		pte = pte_swp_clear_soft_dirty(pte);
>> >> > 	arch_entry = __pte_to_swp_entry(pte);
>> >> > 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> >> > }
>> >> >Index: linux-2.6.git/mm/memory.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/memory.c
>> >> >+++ linux-2.6.git/mm/memory.c
>> >> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> >> > 		exclusive = 1;
>> >> > 	}
>> >> > 	flush_icache_page(vma, page);
>> >> >+	if (pte_swp_soft_dirty(orig_pte))
>> >> >+		pte = pte_mksoft_dirty(pte);
>> >> 
>> >> entry = pte_to_swp_entry(orig_pte);
>> >> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 
>> >
>> >You seem to walk same way with me.
>> >Please look at my stupid questions in this thread.
>> >
>> 
>> I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
>> both against orig_pte, where I miss? ;-)
>
>pte_to_swp_entry is passed orig_pte by vaule, not a pointer
>so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
>So orig_pte is never changed.

Ouch! Thanks for pointing out. ;-)

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>
>> 
>> >> 
>> >> > 	set_pte_at(mm, address, page_table, pte);
>> >> > 	if (page == swapcache)
>> >> > 		do_page_add_anon_rmap(page, vma, address, exclusive);
>> >> >Index: linux-2.6.git/mm/rmap.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/rmap.c
>> >> >+++ linux-2.6.git/mm/rmap.c
>> >> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> >> > 			   swp_entry_to_pte(make_hwpoison_entry(page)));
>> >> > 	} else if (PageAnon(page)) {
>> >> > 		swp_entry_t entry = { .val = page_private(page) };
>> >> >+		pte_t swp_pte;
>> >> >
>> >> > 		if (PageSwapCache(page)) {
>> >> > 			/*
>> >> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> >> > 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> >> > 			entry = make_migration_entry(page, pte_write(pteval));
>> >> > 		}
>> >> >-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >> >+		swp_pte = swp_entry_to_pte(entry);
>> >> >+		if (pte_soft_dirty(pteval))
>> >> >+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >> >+		set_pte_at(mm, address, pte, swp_pte);
>> >> > 		BUG_ON(pte_file(*pte));
>> >> > 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> >> > 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >> >Index: linux-2.6.git/mm/swapfile.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/swapfile.c
>> >> >+++ linux-2.6.git/mm/swapfile.c
>> >> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> >> > }
>> >> > #endif /* CONFIG_HIBERNATION */
>> >> >
>> >> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >> >+{
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+	/*
>> >> >+	 * When pte keeps soft dirty bit the pte generated
>> >> >+	 * from swap entry does not has it, still it's same
>> >> >+	 * pte from logical point of view.
>> >> >+	 */
>> >> >+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >> >+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >> >+#else
>> >> >+	return pte_same(pte, swp_pte);
>> >> >+#endif
>> >> >+}
>> >> >+
>> >> > /*
>> >> >  * No need to decide whether this PTE shares the swap entry with others,
>> >> >  * just let do_wp_page work it out if a write is requested later - to
>> >> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> >> > 	}
>> >> >
>> >> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> >-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >> >+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> >> > 		mem_cgroup_cancel_charge_swapin(memcg);
>> >> > 		ret = 0;
>> >> > 		goto out;
>> >> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> >> > 		 * swapoff spends a _lot_ of time in this loop!
>> >> > 		 * Test inline before going to call unuse_pte.
>> >> > 		 */
>> >> >-		if (unlikely(pte_same(*pte, swp_pte))) {
>> >> >+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> >> > 			pte_unmap(pte);
>> >> > 			ret = unuse_pte(vma, pmd, addr, entry, page);
>> >> > 			if (ret)
>> >> >
>> >> >--
>> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> >the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> >see: http://www.linux-mm.org/ .
>> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >> 
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> >-- 
>> >Kind regards,
>> >Minchan Kim
>> 
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>-- 
>Kind regards,
>Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-05  2:54         ` Minchan Kim
  2013-08-05  2:58           ` Wanpeng Li
@ 2013-08-05  2:58           ` Wanpeng Li
       [not found]           ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05  2:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
	akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

On Mon, Aug 05, 2013 at 11:54:37AM +0900, Minchan Kim wrote:
>On Mon, Aug 05, 2013 at 10:38:58AM +0800, Wanpeng Li wrote:
>> Hi Minchan,
>> 
>> On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>> >Hello Wanpeng,
>> >
>> >On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> >> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >> >bit set get swapped out, the bit is getting lost and no longer
>> >> >available when pte read back.
>> >> >
>> >> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >> >saved in pte entry for the page being swapped out. When such page
>> >> >is to be read back from a swap cache we check for bit presence
>> >> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >> >bit back.
>> >> >
>> >> >One of the problem was to find a place in pte entry where we can
>> >> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >> >entry format stored in pte.
>> >> >
>> >> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >> >Cc: Matt Mackall <mpm@selenic.com>
>> >> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> >---
>> >> > arch/x86/include/asm/pgtable.h       |   15 +++++++++++++++
>> >> > arch/x86/include/asm/pgtable_types.h |   13 +++++++++++++
>> >> > fs/proc/task_mmu.c                   |   21 +++++++++++++++------
>> >> > include/asm-generic/pgtable.h        |   15 +++++++++++++++
>> >> > include/linux/swapops.h              |    2 ++
>> >> > mm/memory.c                          |    2 ++
>> >> > mm/rmap.c                            |    6 +++++-
>> >> > mm/swapfile.c                        |   19 +++++++++++++++++--
>> >> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >> >
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> >> > }
>> >> >
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> > /*
>> >> >  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
>> >> >  * can use those bits for other purposes, so leave them be.
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >@@ -67,6 +67,19 @@
>> >> > #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
>> >> > #endif
>> >> >
>> >> >+/*
>> >> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >> >+ */
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
>> >> >+#else
>> >> >+#define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
>> >> >+#endif
>> >> >+
>> >> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> >> > #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> >> > #else
>> >> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> >> > 	 * of how soft-dirty works.
>> >> > 	 */
>> >> > 	pte_t ptent = *pte;
>> >> >-	ptent = pte_wrprotect(ptent);
>> >> >-	ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+
>> >> >+	if (pte_present(ptent)) {
>> >> >+		ptent = pte_wrprotect(ptent);
>> >> >+		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+	} else if (is_swap_pte(ptent)) {
>> >> >+		ptent = pte_swp_clear_soft_dirty(ptent);
>> >> >+	}
>> >> >+
>> >> > 	set_pte_at(vma->vm_mm, addr, pte, ptent);
>> >> > #endif
>> >> > }
>> >> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> >> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> > 	for (; addr != end; pte++, addr += PAGE_SIZE) {
>> >> > 		ptent = *pte;
>> >> >-		if (!pte_present(ptent))
>> >> >-			continue;
>> >> >
>> >> > 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> >> > 			clear_soft_dirty(vma, addr, pte);
>> >> > 			continue;
>> >> > 		}
>> >> >
>> >> >+		if (!pte_present(ptent))
>> >> >+			continue;
>> >> >+
>> >> > 		page = vm_normal_page(vma, addr, ptent);
>> >> > 		if (!page)
>> >> > 			continue;
>> >> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> >> > 		flags = PM_PRESENT;
>> >> > 		page = vm_normal_page(vma, addr, pte);
>> >> > 	} else if (is_swap_pte(pte)) {
>> >> >-		swp_entry_t entry = pte_to_swp_entry(pte);
>> >> >-
>> >> >+		swp_entry_t entry;
>> >> >+		if (pte_swp_soft_dirty(pte))
>> >> >+			flags2 |= __PM_SOFT_DIRTY;
>> >> >+		entry = pte_to_swp_entry(pte);
>> >> > 		frame = swp_type(entry) |
>> >> > 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> >> > 		flags = PM_SWAP;
>> >> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > {
>> >> > 	return pmd;
>> >> > }
>> >> >+
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte;
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return 0;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+	return pte;
>> >> >+}
>> >> > #endif
>> >> >
>> >> > #ifndef __HAVE_PFNMAP_TRACKING
>> >> >Index: linux-2.6.git/include/linux/swapops.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >> >+++ linux-2.6.git/include/linux/swapops.h
>> >> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> >> > 	swp_entry_t arch_entry;
>> >> >
>> >> > 	BUG_ON(pte_file(pte));
>> >> >+	if (pte_swp_soft_dirty(pte))
>> >> >+		pte = pte_swp_clear_soft_dirty(pte);
>> >> > 	arch_entry = __pte_to_swp_entry(pte);
>> >> > 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> >> > }
>> >> >Index: linux-2.6.git/mm/memory.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/memory.c
>> >> >+++ linux-2.6.git/mm/memory.c
>> >> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> >> > 		exclusive = 1;
>> >> > 	}
>> >> > 	flush_icache_page(vma, page);
>> >> >+	if (pte_swp_soft_dirty(orig_pte))
>> >> >+		pte = pte_mksoft_dirty(pte);
>> >> 
>> >> entry = pte_to_swp_entry(orig_pte);
>> >> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. 
>> >
>> >You seem to walk same way with me.
>> >Please look at my stupid questions in this thread.
>> >
>> 
>> I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
>> both against orig_pte, where I miss? ;-)
>
>pte_to_swp_entry is passed orig_pte by vaule, not a pointer
>so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
>So orig_pte is never changed.

Ouch! Thanks for pointing out. ;-)

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>
>> 
>> >> 
>> >> > 	set_pte_at(mm, address, page_table, pte);
>> >> > 	if (page == swapcache)
>> >> > 		do_page_add_anon_rmap(page, vma, address, exclusive);
>> >> >Index: linux-2.6.git/mm/rmap.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/rmap.c
>> >> >+++ linux-2.6.git/mm/rmap.c
>> >> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> >> > 			   swp_entry_to_pte(make_hwpoison_entry(page)));
>> >> > 	} else if (PageAnon(page)) {
>> >> > 		swp_entry_t entry = { .val = page_private(page) };
>> >> >+		pte_t swp_pte;
>> >> >
>> >> > 		if (PageSwapCache(page)) {
>> >> > 			/*
>> >> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> >> > 			BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> >> > 			entry = make_migration_entry(page, pte_write(pteval));
>> >> > 		}
>> >> >-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >> >+		swp_pte = swp_entry_to_pte(entry);
>> >> >+		if (pte_soft_dirty(pteval))
>> >> >+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >> >+		set_pte_at(mm, address, pte, swp_pte);
>> >> > 		BUG_ON(pte_file(*pte));
>> >> > 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> >> > 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >> >Index: linux-2.6.git/mm/swapfile.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/swapfile.c
>> >> >+++ linux-2.6.git/mm/swapfile.c
>> >> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> >> > }
>> >> > #endif /* CONFIG_HIBERNATION */
>> >> >
>> >> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >> >+{
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+	/*
>> >> >+	 * When pte keeps soft dirty bit the pte generated
>> >> >+	 * from swap entry does not has it, still it's same
>> >> >+	 * pte from logical point of view.
>> >> >+	 */
>> >> >+	pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >> >+	return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >> >+#else
>> >> >+	return pte_same(pte, swp_pte);
>> >> >+#endif
>> >> >+}
>> >> >+
>> >> > /*
>> >> >  * No need to decide whether this PTE shares the swap entry with others,
>> >> >  * just let do_wp_page work it out if a write is requested later - to
>> >> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> >> > 	}
>> >> >
>> >> > 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> >-	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >> >+	if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> >> > 		mem_cgroup_cancel_charge_swapin(memcg);
>> >> > 		ret = 0;
>> >> > 		goto out;
>> >> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> >> > 		 * swapoff spends a _lot_ of time in this loop!
>> >> > 		 * Test inline before going to call unuse_pte.
>> >> > 		 */
>> >> >-		if (unlikely(pte_same(*pte, swp_pte))) {
>> >> >+		if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> >> > 			pte_unmap(pte);
>> >> > 			ret = unuse_pte(vma, pmd, addr, entry, page);
>> >> > 			if (ret)
>> >> >
>> >> >--
>> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> >the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> >see: http://www.linux-mm.org/ .
>> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >> 
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> >-- 
>> >Kind regards,
>> >Minchan Kim
>> 
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>-- 
>Kind regards,
>Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
       [not found]           ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-05  5:43             ` Cyrill Gorcunov
  0 siblings, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-05  5:43 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Minchan Kim, linux-mm, linux-kernel, luto, xemul, akpm, mpm,
	xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

On Mon, Aug 05, 2013 at 10:58:35AM +0800, Wanpeng Li wrote:
> >
> >pte_to_swp_entry is passed orig_pte by vaule, not a pointer
> >so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
> >So orig_pte is never changed.
> 
> Ouch! Thanks for pointing out. ;-)
> 
> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Yeah, it's a bit tricky. Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
                     ` (4 preceding siblings ...)
       [not found]   ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-07 20:21   ` Andrew Morton
  2013-08-07 20:29     ` Cyrill Gorcunov
  2013-08-10 17:48     ` James Bottomley
  5 siblings, 2 replies; 34+ messages in thread
From: Andrew Morton @ 2013-08-07 20:21 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On Wed, 31 Jul 2013 00:41:55 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> bit set get swapped out, the bit is getting lost and no longer
> available when pte read back.
> 
> To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> saved in pte entry for the page being swapped out. When such page
> is to be read back from a swap cache we check for bit presence
> and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> bit back.
> 
> One of the problem was to find a place in pte entry where we can
> save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> _PAGE_PSE was chosen for that, it doesn't intersect with swap
> entry format stored in pte.

So the implication is that if another architecture wants to support
this (and, realistically, wants to support CRIU), that architecture
must find a spare pte bit to implement _PTE_SWP_SOFT_DIRTY.  Yes?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov
  2013-07-31  8:16   ` Pavel Emelyanov
@ 2013-08-07 20:28   ` Andrew Morton
  2013-08-07 20:31     ` Cyrill Gorcunov
  2013-08-08 14:51     ` Cyrill Gorcunov
  1 sibling, 2 replies; 34+ messages in thread
From: Andrew Morton @ 2013-08-07 20:28 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar

On Wed, 31 Jul 2013 00:41:56 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> +#define pte_to_pgoff(pte)						\
> +	((((pte).pte_low >> (PTE_FILE_SHIFT1))				\
> +	  & ((1U << PTE_FILE_BITS1) - 1)))				\
> +	+ ((((pte).pte_low >> (PTE_FILE_SHIFT2))			\
> +	    & ((1U << PTE_FILE_BITS2) - 1))				\
> +	   << (PTE_FILE_BITS1))						\
> +	+ ((((pte).pte_low >> (PTE_FILE_SHIFT3))			\
> +	    & ((1U << PTE_FILE_BITS3) - 1))				\
> +	   << (PTE_FILE_BITS1 + PTE_FILE_BITS2))			\
> +	+ ((((pte).pte_low >> (PTE_FILE_SHIFT4)))			\
> +	    << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))
> +
> +#define pgoff_to_pte(off)						\
> +	((pte_t) { .pte_low =						\
> +	 ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1)	\
> +	 + ((((off) >> PTE_FILE_BITS1)					\
> +	     & ((1U << PTE_FILE_BITS2) - 1))				\
> +	    << PTE_FILE_SHIFT2)						\
> +	 + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2))		\
> +	     & ((1U << PTE_FILE_BITS3) - 1))				\
> +	    << PTE_FILE_SHIFT3)						\
> +	 + ((((off) >>							\
> +	      (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)))	\
> +	    << PTE_FILE_SHIFT4)						\
> +	 + _PAGE_FILE })

Good god.

I wonder if these can be turned into out-of-line functions in some form
which humans can understand.

or

#define pte_to_pgoff(pte)
	frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) +
	frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) +
	frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) +
	frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-07 20:21   ` Andrew Morton
@ 2013-08-07 20:29     ` Cyrill Gorcunov
  2013-08-10 17:48     ` James Bottomley
  1 sibling, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-07 20:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti,
	kosaki.motohiro, sfr, peterz, aneesh.kumar

On Wed, Aug 07, 2013 at 01:21:56PM -0700, Andrew Morton wrote:
> > 
> > One of the problem was to find a place in pte entry where we can
> > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> > _PAGE_PSE was chosen for that, it doesn't intersect with swap
> > entry format stored in pte.
> 
> So the implication is that if another architecture wants to support
> this (and, realistically, wants to support CRIU), that architecture
> must find a spare pte bit to implement _PTE_SWP_SOFT_DIRTY.  Yes?

Exactly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-07 20:28   ` Andrew Morton
@ 2013-08-07 20:31     ` Cyrill Gorcunov
  2013-08-08 14:51     ` Cyrill Gorcunov
  1 sibling, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-07 20:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti,
	kosaki.motohiro, sfr, peterz, aneesh.kumar

On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote:
> 
> Good god.
> 
> I wonder if these can be turned into out-of-line functions in some form
> which humans can understand.
> 
> or
> 
> #define pte_to_pgoff(pte)
> 	frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) +
> 	frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) +
> 	frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) +
> 	frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)

I copied this code from existing one, not mine invention ;)
I'll clean it up on top.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-07 20:28   ` Andrew Morton
  2013-08-07 20:31     ` Cyrill Gorcunov
@ 2013-08-08 14:51     ` Cyrill Gorcunov
  2013-08-12 21:57       ` Andrew Morton
  1 sibling, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-08 14:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti,
	kosaki.motohiro, sfr, peterz, aneesh.kumar

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote:
> 
> Good god.
> 
> I wonder if these can be turned into out-of-line functions in some form
> which humans can understand.
> 
> or
> 
> #define pte_to_pgoff(pte)
> 	frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) +
> 	frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) +
> 	frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) +
> 	frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)

Hi, here is what I ended up with. Please take a look (I decided to post
patch in the thread since it's related to the context of the mails).

[-- Attachment #2: pte-sft-dirty-file-cleanup-2 --]
[-- Type: text/plain, Size: 5559 bytes --]

From: Cyrill Gorcunov <gorcunov@gmail.com>
Subject: mm: Cleanup pte_to_pgoff and pgoff_to_pte helpers

Andrew asked if there a way to make pte_to_pgoff
and pgoff_to_pte macro helpers somehow more readable.

With this patch it should be more understandable what
is happening with bits when they come to and from pte
entry.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
---

Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE"
test present in this pgtable-2level.h file at all? I can't imagine
where it can be false on x86.

 arch/x86/include/asm/pgtable-2level.h |   82 +++++++++++++++++-----------------
 1 file changed, 41 insertions(+), 41 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
@@ -55,6 +55,9 @@ static inline pmd_t native_pmdp_get_and_
 #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
 #endif
 
+#define _mfrob(v,r,m,l)		((((v) >> (r)) & (m)) << (l))
+#define __frob(v,r,l)		(((v) >> (r)) << (l))
+
 #ifdef CONFIG_MEM_SOFT_DIRTY
 
 /*
@@ -71,31 +74,27 @@ static inline pmd_t native_pmdp_get_and_
 #define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
 #define PTE_FILE_BITS3		(PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1)
 
-#define pte_to_pgoff(pte)						\
-	((((pte).pte_low >> (PTE_FILE_SHIFT1))				\
-	  & ((1U << PTE_FILE_BITS1) - 1)))				\
-	+ ((((pte).pte_low >> (PTE_FILE_SHIFT2))			\
-	    & ((1U << PTE_FILE_BITS2) - 1))				\
-	   << (PTE_FILE_BITS1))						\
-	+ ((((pte).pte_low >> (PTE_FILE_SHIFT3))			\
-	    & ((1U << PTE_FILE_BITS3) - 1))				\
-	   << (PTE_FILE_BITS1 + PTE_FILE_BITS2))			\
-	+ ((((pte).pte_low >> (PTE_FILE_SHIFT4)))			\
-	    << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))
-
-#define pgoff_to_pte(off)						\
-	((pte_t) { .pte_low =						\
-	 ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1)	\
-	 + ((((off) >> PTE_FILE_BITS1)					\
-	     & ((1U << PTE_FILE_BITS2) - 1))				\
-	    << PTE_FILE_SHIFT2)						\
-	 + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2))		\
-	     & ((1U << PTE_FILE_BITS3) - 1))				\
-	    << PTE_FILE_SHIFT3)						\
-	 + ((((off) >>							\
-	      (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)))	\
-	    << PTE_FILE_SHIFT4)						\
-	 + _PAGE_FILE })
+#define PTE_FILE_MASK1		((1U << PTE_FILE_BITS1) - 1)
+#define PTE_FILE_MASK2		((1U << PTE_FILE_BITS2) - 1)
+#define PTE_FILE_MASK3		((1U << PTE_FILE_BITS3) - 1)
+
+#define PTE_FILE_LSHIFT2	(PTE_FILE_BITS1)
+#define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
+#define PTE_FILE_LSHIFT4	(PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
+
+#define pte_to_pgoff(pte)							    \
+	(_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
+	 _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
+	 _mfrob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \
+	 __frob((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_LSHIFT4))
+
+#define pgoff_to_pte(off)							\
+	((pte_t) { .pte_low =							\
+	_mfrob(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
+	_mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
+	_mfrob(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3)	+	\
+	__frob(off, PTE_FILE_LSHIFT4, PTE_FILE_SHIFT4)			+	\
+	_PAGE_FILE })
 
 #else /* CONFIG_MEM_SOFT_DIRTY */
 
@@ -115,22 +114,23 @@ static inline pmd_t native_pmdp_get_and_
 #define PTE_FILE_BITS1		(PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1)
 #define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
 
-#define pte_to_pgoff(pte)						\
-	((((pte).pte_low >> PTE_FILE_SHIFT1)				\
-	  & ((1U << PTE_FILE_BITS1) - 1))				\
-	 + ((((pte).pte_low >> PTE_FILE_SHIFT2)				\
-	     & ((1U << PTE_FILE_BITS2) - 1)) << PTE_FILE_BITS1)		\
-	 + (((pte).pte_low >> PTE_FILE_SHIFT3)				\
-	    << (PTE_FILE_BITS1 + PTE_FILE_BITS2)))
-
-#define pgoff_to_pte(off)						\
-	((pte_t) { .pte_low =						\
-	 (((off) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1)	\
-	 + ((((off) >> PTE_FILE_BITS1) & ((1U << PTE_FILE_BITS2) - 1))	\
-	    << PTE_FILE_SHIFT2)						\
-	 + (((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2))		\
-	    << PTE_FILE_SHIFT3)						\
-	 + _PAGE_FILE })
+#define PTE_FILE_MASK1		((1U << PTE_FILE_BITS1) - 1)
+#define PTE_FILE_MASK2		((1U << PTE_FILE_BITS2) - 1)
+
+#define PTE_FILE_LSHIFT2	(PTE_FILE_BITS1)
+#define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
+
+#define pte_to_pgoff(pte)							    \
+	(_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
+	 _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
+	 __frob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_LSHIFT3))
+
+#define pgoff_to_pte(off)							\
+	((pte_t) { .pte_low =							\
+	_mfrob(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
+	_mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
+	__frob(off, PTE_FILE_LSHIFT3, PTE_FILE_SHIFT3)			+	\
+	_PAGE_FILE })
 
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
  2013-08-07 20:21   ` Andrew Morton
  2013-08-07 20:29     ` Cyrill Gorcunov
@ 2013-08-10 17:48     ` James Bottomley
  1 sibling, 0 replies; 34+ messages in thread
From: James Bottomley @ 2013-08-10 17:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
	mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar

On Wed, 2013-08-07 at 13:21 -0700, Andrew Morton wrote:
> On Wed, 31 Jul 2013 00:41:55 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> 
> > Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> > bit set get swapped out, the bit is getting lost and no longer
> > available when pte read back.
> > 
> > To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> > saved in pte entry for the page being swapped out. When such page
> > is to be read back from a swap cache we check for bit presence
> > and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> > bit back.
> > 
> > One of the problem was to find a place in pte entry where we can
> > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> > _PAGE_PSE was chosen for that, it doesn't intersect with swap
> > entry format stored in pte.
> 
> So the implication is that if another architecture wants to support
> this (and, realistically, wants to support CRIU),

To be clear, CRIU is usable for basic checkpoint/restore without soft
dirty.  It's using CRIU as an engine for process migration between nodes
that won't work efficiently without soft dirty.  What happens without
soft dirty is that we have to freeze the source process state, transfer
the bits and then begin execution on the target ... that means the
process can be suspended for minutes (and means that customers notice
and your SLAs get blown).  Using soft dirty, we can iteratively build up
the process image on the target while the source process is still
executing meaning the actual transfer between source and target takes
only seconds (when the delta is small enough, we freeze the source,
transfer the remaining changed bits and begin on the target).

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-08 14:51     ` Cyrill Gorcunov
@ 2013-08-12 21:57       ` Andrew Morton
  2013-08-12 22:28         ` Andy Lutomirski
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Morton @ 2013-08-12 21:57 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti,
	kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner

On Thu, 8 Aug 2013 18:51:20 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote:
> > 
> > Good god.
> > 
> > I wonder if these can be turned into out-of-line functions in some form
> > which humans can understand.
> > 
> > or
> > 
> > #define pte_to_pgoff(pte)
> > 	frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) +
> > 	frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) +
> > 	frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) +
> > 	frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
> 
> Hi, here is what I ended up with. Please take a look (I decided to post
> patch in the thread since it's related to the context of the mails).

You could have #undefed _mfrob and __frob after using them, but whatever.  

I saved this patch to wave at the x86 guys for 3.12.  I plan to merge
mm-save-soft-dirty-bits-on-file-pages.patch for 3.11.

> Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE"
> test present in this pgtable-2level.h file at all? I can't imagine
> where it can be false on x86.

I doubt if "Guys" read this.  x86 maintainers cc'ed.





From: Cyrill Gorcunov <gorcunov@gmail.com>
Subject: arch/x86/include/asm/pgtable-2level.h: clean up pte_to_pgoff and pgoff_to_pte helpers

Andrew asked if there a way to make pte_to_pgoff and pgoff_to_pte macro
helpers somehow more readable.

With this patch it should be more understandable what is happening with
bits when they come to and from pte entry.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/include/asm/pgtable-2level.h |   82 ++++++++++++------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff -puN arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers arch/x86/include/asm/pgtable-2level.h
--- a/arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers
+++ a/arch/x86/include/asm/pgtable-2level.h
@@ -55,6 +55,9 @@ static inline pmd_t native_pmdp_get_and_
 #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
 #endif
 
+#define _mfrob(v,r,m,l)		((((v) >> (r)) & (m)) << (l))
+#define __frob(v,r,l)		(((v) >> (r)) << (l))
+
 #ifdef CONFIG_MEM_SOFT_DIRTY
 
 /*
@@ -71,31 +74,27 @@ static inline pmd_t native_pmdp_get_and_
 #define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
 #define PTE_FILE_BITS3		(PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1)
 
-#define pte_to_pgoff(pte)						\
-	((((pte).pte_low >> (PTE_FILE_SHIFT1))				\
-	  & ((1U << PTE_FILE_BITS1) - 1)))				\
-	+ ((((pte).pte_low >> (PTE_FILE_SHIFT2))			\
-	    & ((1U << PTE_FILE_BITS2) - 1))				\
-	   << (PTE_FILE_BITS1))						\
-	+ ((((pte).pte_low >> (PTE_FILE_SHIFT3))			\
-	    & ((1U << PTE_FILE_BITS3) - 1))				\
-	   << (PTE_FILE_BITS1 + PTE_FILE_BITS2))			\
-	+ ((((pte).pte_low >> (PTE_FILE_SHIFT4)))			\
-	    << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))
-
-#define pgoff_to_pte(off)						\
-	((pte_t) { .pte_low =						\
-	 ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1)	\
-	 + ((((off) >> PTE_FILE_BITS1)					\
-	     & ((1U << PTE_FILE_BITS2) - 1))				\
-	    << PTE_FILE_SHIFT2)						\
-	 + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2))		\
-	     & ((1U << PTE_FILE_BITS3) - 1))				\
-	    << PTE_FILE_SHIFT3)						\
-	 + ((((off) >>							\
-	      (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)))	\
-	    << PTE_FILE_SHIFT4)						\
-	 + _PAGE_FILE })
+#define PTE_FILE_MASK1		((1U << PTE_FILE_BITS1) - 1)
+#define PTE_FILE_MASK2		((1U << PTE_FILE_BITS2) - 1)
+#define PTE_FILE_MASK3		((1U << PTE_FILE_BITS3) - 1)
+
+#define PTE_FILE_LSHIFT2	(PTE_FILE_BITS1)
+#define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
+#define PTE_FILE_LSHIFT4	(PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
+
+#define pte_to_pgoff(pte)							    \
+	(_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
+	 _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
+	 _mfrob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \
+	 __frob((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_LSHIFT4))
+
+#define pgoff_to_pte(off)							\
+	((pte_t) { .pte_low =							\
+	_mfrob(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
+	_mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
+	_mfrob(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3)	+	\
+	__frob(off, PTE_FILE_LSHIFT4, PTE_FILE_SHIFT4)			+	\
+	_PAGE_FILE })
 
 #else /* CONFIG_MEM_SOFT_DIRTY */
 
@@ -115,22 +114,23 @@ static inline pmd_t native_pmdp_get_and_
 #define PTE_FILE_BITS1		(PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1)
 #define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
 
-#define pte_to_pgoff(pte)						\
-	((((pte).pte_low >> PTE_FILE_SHIFT1)				\
-	  & ((1U << PTE_FILE_BITS1) - 1))				\
-	 + ((((pte).pte_low >> PTE_FILE_SHIFT2)				\
-	     & ((1U << PTE_FILE_BITS2) - 1)) << PTE_FILE_BITS1)		\
-	 + (((pte).pte_low >> PTE_FILE_SHIFT3)				\
-	    << (PTE_FILE_BITS1 + PTE_FILE_BITS2)))
-
-#define pgoff_to_pte(off)						\
-	((pte_t) { .pte_low =						\
-	 (((off) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1)	\
-	 + ((((off) >> PTE_FILE_BITS1) & ((1U << PTE_FILE_BITS2) - 1))	\
-	    << PTE_FILE_SHIFT2)						\
-	 + (((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2))		\
-	    << PTE_FILE_SHIFT3)						\
-	 + _PAGE_FILE })
+#define PTE_FILE_MASK1		((1U << PTE_FILE_BITS1) - 1)
+#define PTE_FILE_MASK2		((1U << PTE_FILE_BITS2) - 1)
+
+#define PTE_FILE_LSHIFT2	(PTE_FILE_BITS1)
+#define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
+
+#define pte_to_pgoff(pte)							    \
+	(_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
+	 _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
+	 __frob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_LSHIFT3))
+
+#define pgoff_to_pte(off)							\
+	((pte_t) { .pte_low =							\
+	_mfrob(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
+	_mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
+	__frob(off, PTE_FILE_LSHIFT3, PTE_FILE_SHIFT3)			+	\
+	_PAGE_FILE })
 
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-12 21:57       ` Andrew Morton
@ 2013-08-12 22:28         ` Andy Lutomirski
  2013-08-12 22:37           ` Andrew Morton
  2013-08-13  5:02           ` Cyrill Gorcunov
  0 siblings, 2 replies; 34+ messages in thread
From: Andy Lutomirski @ 2013-08-12 22:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, xemul, mpm,
	xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar, Ingo Molnar, H. Peter Anvin, Thomas Gleixner

On Mon, Aug 12, 2013 at 2:57 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 8 Aug 2013 18:51:20 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
>> On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote:
>> >
>> > Good god.
>> >
>> > I wonder if these can be turned into out-of-line functions in some form
>> > which humans can understand.
>> >
>> > or
>> >
>> > #define pte_to_pgoff(pte)
>> >     frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) +
>> >     frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) +
>> >     frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) +
>> >     frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
>>
>> Hi, here is what I ended up with. Please take a look (I decided to post
>> patch in the thread since it's related to the context of the mails).
>
> You could have #undefed _mfrob and __frob after using them, but whatever.
>
> I saved this patch to wave at the x86 guys for 3.12.  I plan to merge
> mm-save-soft-dirty-bits-on-file-pages.patch for 3.11.
>
>> Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE"
>> test present in this pgtable-2level.h file at all? I can't imagine
>> where it can be false on x86.
>
> I doubt if "Guys" read this.  x86 maintainers cc'ed.
>
>
>
>
>
> From: Cyrill Gorcunov <gorcunov@gmail.com>
> Subject: arch/x86/include/asm/pgtable-2level.h: clean up pte_to_pgoff and pgoff_to_pte helpers
>
> Andrew asked if there a way to make pte_to_pgoff and pgoff_to_pte macro
> helpers somehow more readable.
>
> With this patch it should be more understandable what is happening with
> bits when they come to and from pte entry.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  arch/x86/include/asm/pgtable-2level.h |   82 ++++++++++++------------
>  1 file changed, 41 insertions(+), 41 deletions(-)
>
> diff -puN arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers arch/x86/include/asm/pgtable-2level.h
> --- a/arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers
> +++ a/arch/x86/include/asm/pgtable-2level.h
> @@ -55,6 +55,9 @@ static inline pmd_t native_pmdp_get_and_
>  #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
>  #endif
>
> +#define _mfrob(v,r,m,l)                ((((v) >> (r)) & (m)) << (l))
> +#define __frob(v,r,l)          (((v) >> (r)) << (l))
> +
>  #ifdef CONFIG_MEM_SOFT_DIRTY
>

If I'm understanding this right, the idea is to take the bits in the
range a..b of v and stick them at c..d, where a-b == c-d.  Would it
make sense to change this to look something like

#define __frob(v, inmsb, inlsb, outlsb) ((v >> inlsb) & ((1<<(inmsb -
inlsb + 1)-1) << outlsb)

For extra fun, there could be an __unfrob macro that takes the same
inmsg, inlsb, outlsb parameters but undoes it so that it's (more)
clear that the operations that are supposed to be inverses are indeed
inverses.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-12 22:28         ` Andy Lutomirski
@ 2013-08-12 22:37           ` Andrew Morton
  2013-08-13  5:02           ` Cyrill Gorcunov
  1 sibling, 0 replies; 34+ messages in thread
From: Andrew Morton @ 2013-08-12 22:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Cyrill Gorcunov, linux-mm, linux-kernel, xemul, mpm,
	xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar, Ingo Molnar, H. Peter Anvin, Thomas Gleixner

On Mon, 12 Aug 2013 15:28:06 -0700 Andy Lutomirski <luto@amacapital.net> wrote:

> > +#define _mfrob(v,r,m,l)                ((((v) >> (r)) & (m)) << (l))
> > +#define __frob(v,r,l)          (((v) >> (r)) << (l))
> > +
> >  #ifdef CONFIG_MEM_SOFT_DIRTY
> >
> 
> If I'm understanding this right, the idea is to take the bits in the
> range a..b of v and stick them at c..d, where a-b == c-d.  Would it
> make sense to change this to look something like
> 
> #define __frob(v, inmsb, inlsb, outlsb) ((v >> inlsb) & ((1<<(inmsb -
> inlsb + 1)-1) << outlsb)
> 
> For extra fun, there could be an __unfrob macro that takes the same
> inmsg, inlsb, outlsb parameters but undoes it so that it's (more)
> clear that the operations that are supposed to be inverses are indeed
> inverses.

hm, I seem to remember writing
drivers/net/ethernet/3com/3c59x.c:BFINS() and BFEXT() shortly after the
invention of the electronic computer.

I'm kinda surprised that we don't already have something like this in
kernel.h or somewhere - there's surely a ton of code which does such
things.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-12 22:28         ` Andy Lutomirski
  2013-08-12 22:37           ` Andrew Morton
@ 2013-08-13  5:02           ` Cyrill Gorcunov
  2013-08-13 15:14             ` H. Peter Anvin
  1 sibling, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-13  5:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Morton, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong,
	mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner

On Mon, Aug 12, 2013 at 03:28:06PM -0700, Andy Lutomirski wrote:
> >
> > You could have #undefed _mfrob and __frob after using them, but whatever.

Sure, for some reason I forgot to do that. Will send update on top.

> > I saved this patch to wave at the x86 guys for 3.12.  I plan to merge
> > mm-save-soft-dirty-bits-on-file-pages.patch for 3.11.
> >
> >> Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE"
> >> test present in this pgtable-2level.h file at all? I can't imagine
> >> where it can be false on x86.
> >
> > I doubt if "Guys" read this.  x86 maintainers cc'ed.

Thanks!

> > +#define _mfrob(v,r,m,l)                ((((v) >> (r)) & (m)) << (l))
> > +#define __frob(v,r,l)          (((v) >> (r)) << (l))
> > +
> >  #ifdef CONFIG_MEM_SOFT_DIRTY
> 
> If I'm understanding this right, the idea is to take the bits in the
> range a..b of v and stick them at c..d, where a-b == c-d.  Would it
> make sense to change this to look something like
> 
> #define __frob(v, inmsb, inlsb, outlsb) ((v >> inlsb) & ((1<<(inmsb -
> inlsb + 1)-1) << outlsb)

There is a case when you don't need a mask completely. And because this
pte conversion is on hot path and time critical I kept generated code
as it was (even if that lead to slightly less clear source code).

> For extra fun, there could be an __unfrob macro that takes the same
> inmsg, inlsb, outlsb parameters but undoes it so that it's (more)
> clear that the operations that are supposed to be inverses are indeed
> inverses.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-13  5:02           ` Cyrill Gorcunov
@ 2013-08-13 15:14             ` H. Peter Anvin
  2013-08-13 15:37               ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2013-08-13 15:14 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul,
	mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar, Ingo Molnar, Thomas Gleixner

On 08/12/2013 10:02 PM, Cyrill Gorcunov wrote:
> 
> There is a case when you don't need a mask completely. And because this
> pte conversion is on hot path and time critical I kept generated code
> as it was (even if that lead to slightly less clear source code).
> 

Does it actually matter, generated-code-wise, or is the compiler smart
enough to figure it out?  The reason I'm asking is because it makes the
code much harder to follow.

The other thing is can we please pretty please call it something other
than "frob"?

	-hpa




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-13 15:14             ` H. Peter Anvin
@ 2013-08-13 15:37               ` Cyrill Gorcunov
  2013-08-13 16:43                 ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-13 15:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul,
	mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar, Ingo Molnar, Thomas Gleixner

On Tue, Aug 13, 2013 at 08:14:39AM -0700, H. Peter Anvin wrote:
> On 08/12/2013 10:02 PM, Cyrill Gorcunov wrote:
> > 
> > There is a case when you don't need a mask completely. And because this
> > pte conversion is on hot path and time critical I kept generated code
> > as it was (even if that lead to slightly less clear source code).
> > 
> 
> Does it actually matter, generated-code-wise, or is the compiler smart
> enough to figure it out?  The reason I'm asking is because it makes the

gcc-4.7.2 is smart enough to suppress useless masking (ie ((1u << 31) - 1))
completely but I don't know if this can be assumed for all gcc series.

> code much harder to follow.

I see. OK, I'll try to prepare more readable macro helpers.

> 
> The other thing is can we please pretty please call it something other
> than "frob"?

Sure.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-13 15:37               ` Cyrill Gorcunov
@ 2013-08-13 16:43                 ` H. Peter Anvin
  2013-08-13 21:28                   ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2013-08-13 16:43 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul,
	mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar, Ingo Molnar, Thomas Gleixner

On 08/13/2013 08:37 AM, Cyrill Gorcunov wrote:
>>
>> Does it actually matter, generated-code-wise, or is the compiler smart
>> enough to figure it out?  The reason I'm asking is because it makes the
> 
> gcc-4.7.2 is smart enough to suppress useless masking (ie ((1u << 31) - 1))
> completely but I don't know if this can be assumed for all gcc series.
> 

I would be highly surprised if it wasn't the case for any gcc we care about.

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages
  2013-08-13 16:43                 ` H. Peter Anvin
@ 2013-08-13 21:28                   ` Cyrill Gorcunov
  0 siblings, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-13 21:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul,
	mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
	aneesh.kumar, Ingo Molnar, Thomas Gleixner

On Tue, Aug 13, 2013 at 09:43:23AM -0700, H. Peter Anvin wrote:
> On 08/13/2013 08:37 AM, Cyrill Gorcunov wrote:
> >>
> >> Does it actually matter, generated-code-wise, or is the compiler smart
> >> enough to figure it out?  The reason I'm asking is because it makes the
> > 
> > gcc-4.7.2 is smart enough to suppress useless masking (ie ((1u << 31) - 1))
> > completely but I don't know if this can be assumed for all gcc series.
> > 
> 
> I would be highly surprised if it wasn't the case for any gcc we care about.

Does below one looks better? (Btw, what about the snippet we have there as well

#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
#define PTE_FILE_SHIFT2		(_PAGE_BIT_FILE + 1)
#define PTE_FILE_SHIFT3		(_PAGE_BIT_PROTNONE + 1)
#else
#define PTE_FILE_SHIFT2		(_PAGE_BIT_PROTNONE + 1)
#define PTE_FILE_SHIFT3		(_PAGE_BIT_FILE + 1)
#endif

where

#define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL	-> 8
#define _PAGE_BIT_FILE		_PAGE_BIT_DIRTY		-> 6

so I wonder where the cases on x86 when _PAGE_BIT_FILE > _PAGE_BIT_PROTNONE,
what i'm missing here?)

---
 arch/x86/include/asm/pgtable-2level.h |   37 +++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 16 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
@@ -55,8 +55,11 @@ static inline pmd_t native_pmdp_get_and_
 #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
 #endif
 
-#define _mfrob(v,r,m,l)		((((v) >> (r)) & (m)) << (l))
-#define __frob(v,r,l)		(((v) >> (r)) << (l))
+/*
+ * For readable bitfield manipulations.
+ */
+#define PTE_FILE_NOMASK		(-1U)
+#define __bfop(v,r,m,l)		((((v) >> (r)) & (m)) << (l))
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
 
@@ -83,17 +86,17 @@ static inline pmd_t native_pmdp_get_and_
 #define PTE_FILE_LSHIFT4	(PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
 
 #define pte_to_pgoff(pte)							    \
-	(_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
-	 _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
-	 _mfrob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \
-	 __frob((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_LSHIFT4))
+	(__bfop((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
+	 __bfop((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
+	 __bfop((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \
+	 __bfop((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_NOMASK, PTE_FILE_LSHIFT4))
 
 #define pgoff_to_pte(off)							\
 	((pte_t) { .pte_low =							\
-	_mfrob(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
-	_mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
-	_mfrob(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3)	+	\
-	__frob(off, PTE_FILE_LSHIFT4, PTE_FILE_SHIFT4)			+	\
+	__bfop(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
+	__bfop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
+	__bfop(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3)	+	\
+	__bfop(off, PTE_FILE_LSHIFT4, PTE_FILE_NOMASK, PTE_FILE_SHIFT4)	+	\
 	_PAGE_FILE })
 
 #else /* CONFIG_MEM_SOFT_DIRTY */
@@ -121,19 +124,21 @@ static inline pmd_t native_pmdp_get_and_
 #define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
 
 #define pte_to_pgoff(pte)							    \
-	(_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
-	 _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
-	 __frob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_LSHIFT3))
+	(__bfop((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0)		  + \
+	 __bfop((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \
+	 __bfop((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_NOMASK, PTE_FILE_LSHIFT3))
 
 #define pgoff_to_pte(off)							\
 	((pte_t) { .pte_low =							\
-	_mfrob(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
-	_mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
-	__frob(off, PTE_FILE_LSHIFT3, PTE_FILE_SHIFT3)			+	\
+	__bfop(off,                0, PTE_FILE_MASK1, PTE_FILE_SHIFT1)	+	\
+	__bfop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2)	+	\
+	__bfop(off, PTE_FILE_LSHIFT3, PTE_FILE_NOMASK, PTE_FILE_SHIFT3)	+	\
 	_PAGE_FILE })
 
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 
+#undef __bfop
+
 /* Encode and de-code a swap entry */
 #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-08-13 21:29 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-30 20:41 [patch 0/2] Soft-dirty page tracker improvemens Cyrill Gorcunov
2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
2013-07-31  8:16   ` Pavel Emelyanov
2013-08-01  0:51   ` Minchan Kim
2013-08-01  5:53     ` Cyrill Gorcunov
2013-08-01  6:16       ` Minchan Kim
2013-08-01  6:28         ` Cyrill Gorcunov
2013-08-01  6:37           ` Minchan Kim
2013-08-01  6:38             ` Cyrill Gorcunov
2013-08-05  1:48   ` Wanpeng Li
2013-08-05  1:48   ` Wanpeng Li
     [not found]   ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-05  2:17     ` Minchan Kim
2013-08-05  2:38       ` Wanpeng Li
2013-08-05  2:38       ` Wanpeng Li
     [not found]       ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-05  2:54         ` Minchan Kim
2013-08-05  2:58           ` Wanpeng Li
2013-08-05  2:58           ` Wanpeng Li
     [not found]           ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-05  5:43             ` Cyrill Gorcunov
2013-08-07 20:21   ` Andrew Morton
2013-08-07 20:29     ` Cyrill Gorcunov
2013-08-10 17:48     ` James Bottomley
2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov
2013-07-31  8:16   ` Pavel Emelyanov
2013-08-07 20:28   ` Andrew Morton
2013-08-07 20:31     ` Cyrill Gorcunov
2013-08-08 14:51     ` Cyrill Gorcunov
2013-08-12 21:57       ` Andrew Morton
2013-08-12 22:28         ` Andy Lutomirski
2013-08-12 22:37           ` Andrew Morton
2013-08-13  5:02           ` Cyrill Gorcunov
2013-08-13 15:14             ` H. Peter Anvin
2013-08-13 15:37               ` Cyrill Gorcunov
2013-08-13 16:43                 ` H. Peter Anvin
2013-08-13 21:28                   ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).