* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
@ 2013-07-31 8:16 ` Pavel Emelyanov
2013-08-01 0:51 ` Minchan Kim
` (4 subsequent siblings)
5 siblings, 0 replies; 34+ messages in thread
From: Pavel Emelyanov @ 2013-07-31 8:16 UTC (permalink / raw)
To: Cyrill Gorcunov, akpm
Cc: linux-mm, linux-kernel, luto, gorcunov, mpm, xiaoguangrong,
mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar
On 07/31/2013 12:41 AM, Cyrill Gorcunov wrote:
> Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> bit set get swapped out, the bit is getting lost and no longer
> available when pte read back.
>
> To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> saved in pte entry for the page being swapped out. When such page
> is to be read back from a swap cache we check for bit presence
> and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> bit back.
>
> One of the problem was to find a place in pte entry where we can
> save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> _PAGE_PSE was chosen for that, it doesn't intersect with swap
> entry format stored in pte.
>
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
2013-07-31 8:16 ` Pavel Emelyanov
@ 2013-08-01 0:51 ` Minchan Kim
2013-08-01 5:53 ` Cyrill Gorcunov
2013-08-05 1:48 ` Wanpeng Li
` (3 subsequent siblings)
5 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2013-08-01 0:51 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm,
xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
Hello,
On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
> Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> bit set get swapped out, the bit is getting lost and no longer
> available when pte read back.
>
> To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> saved in pte entry for the page being swapped out. When such page
> is to be read back from a swap cache we check for bit presence
> and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> bit back.
>
> One of the problem was to find a place in pte entry where we can
> save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> _PAGE_PSE was chosen for that, it doesn't intersect with swap
> entry format stored in pte.
>
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matt Mackall <mpm@selenic.com>
> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> ---
> arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
> arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
> fs/proc/task_mmu.c | 21 +++++++++++++++------
> include/asm-generic/pgtable.h | 15 +++++++++++++++
> include/linux/swapops.h | 2 ++
> mm/memory.c | 2 ++
> mm/rmap.c | 6 +++++-
> mm/swapfile.c | 19 +++++++++++++++++--
> 8 files changed, 84 insertions(+), 9 deletions(-)
>
> Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
> +++ linux-2.6.git/arch/x86/include/asm/pgtable.h
> @@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> }
>
> +static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> +{
> + return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> +}
> +
> +static inline int pte_swp_soft_dirty(pte_t pte)
> +{
> + return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
> +}
> +
> +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> +{
> + return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> +}
> +
> /*
> * Mask out unsupported bits in a present pgprot. Non-present pgprots
> * can use those bits for other purposes, so leave them be.
> Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
> +++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> @@ -67,6 +67,19 @@
> #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
> #endif
>
> +/*
> + * Tracking soft dirty bit when a page goes to a swap is tricky.
> + * We need a bit which can be stored in pte _and_ not conflict
> + * with swap entry format. On x86 bits 6 and 7 are *not* involved
> + * into swap entry computation, but bit 6 is used for nonlinear
> + * file mapping, so we borrow bit 7 for soft dirty tracking.
> + */
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> +#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
> +#else
> +#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
> +#endif
> +
> #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
> #else
> Index: linux-2.6.git/fs/proc/task_mmu.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/task_mmu.c
> +++ linux-2.6.git/fs/proc/task_mmu.c
> @@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> * of how soft-dirty works.
> */
> pte_t ptent = *pte;
> - ptent = pte_wrprotect(ptent);
> - ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> +
> + if (pte_present(ptent)) {
> + ptent = pte_wrprotect(ptent);
> + ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> + } else if (is_swap_pte(ptent)) {
> + ptent = pte_swp_clear_soft_dirty(ptent);
> + }
> +
> set_pte_at(vma->vm_mm, addr, pte, ptent);
> #endif
> }
> @@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> for (; addr != end; pte++, addr += PAGE_SIZE) {
> ptent = *pte;
> - if (!pte_present(ptent))
> - continue;
>
> if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> clear_soft_dirty(vma, addr, pte);
> continue;
> }
>
> + if (!pte_present(ptent))
> + continue;
> +
> page = vm_normal_page(vma, addr, ptent);
> if (!page)
> continue;
> @@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> flags = PM_PRESENT;
> page = vm_normal_page(vma, addr, pte);
> } else if (is_swap_pte(pte)) {
> - swp_entry_t entry = pte_to_swp_entry(pte);
> -
> + swp_entry_t entry;
> + if (pte_swp_soft_dirty(pte))
> + flags2 |= __PM_SOFT_DIRTY;
> + entry = pte_to_swp_entry(pte);
> frame = swp_type(entry) |
> (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> flags = PM_SWAP;
> Index: linux-2.6.git/include/asm-generic/pgtable.h
> ===================================================================
> --- linux-2.6.git.orig/include/asm-generic/pgtable.h
> +++ linux-2.6.git/include/asm-generic/pgtable.h
> @@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> {
> return pmd;
> }
> +
> +static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> +{
> + return pte;
> +}
> +
> +static inline int pte_swp_soft_dirty(pte_t pte)
> +{
> + return 0;
> +}
> +
> +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> +{
> + return pte;
> +}
> #endif
>
> #ifndef __HAVE_PFNMAP_TRACKING
> Index: linux-2.6.git/include/linux/swapops.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/swapops.h
> +++ linux-2.6.git/include/linux/swapops.h
> @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> swp_entry_t arch_entry;
>
> BUG_ON(pte_file(pte));
> + if (pte_swp_soft_dirty(pte))
> + pte = pte_swp_clear_soft_dirty(pte);
Why do you remove soft-dirty flag whenever pte_to_swp_entry is called?
Isn't there any problem if we use mincore?
> arch_entry = __pte_to_swp_entry(pte);
> return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> }
> Index: linux-2.6.git/mm/memory.c
> ===================================================================
> --- linux-2.6.git.orig/mm/memory.c
> +++ linux-2.6.git/mm/memory.c
> @@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> exclusive = 1;
> }
> flush_icache_page(vma, page);
> + if (pte_swp_soft_dirty(orig_pte))
> + pte = pte_mksoft_dirty(pte);
> set_pte_at(mm, address, page_table, pte);
> if (page == swapcache)
> do_page_add_anon_rmap(page, vma, address, exclusive);
> Index: linux-2.6.git/mm/rmap.c
> ===================================================================
> --- linux-2.6.git.orig/mm/rmap.c
> +++ linux-2.6.git/mm/rmap.c
> @@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> swp_entry_to_pte(make_hwpoison_entry(page)));
> } else if (PageAnon(page)) {
> swp_entry_t entry = { .val = page_private(page) };
> + pte_t swp_pte;
>
> if (PageSwapCache(page)) {
> /*
> @@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> entry = make_migration_entry(page, pte_write(pteval));
> }
> - set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
> + swp_pte = swp_entry_to_pte(entry);
> + if (pte_soft_dirty(pteval))
> + swp_pte = pte_swp_mksoft_dirty(swp_pte);
> + set_pte_at(mm, address, pte, swp_pte);
> BUG_ON(pte_file(*pte));
> } else if (IS_ENABLED(CONFIG_MIGRATION) &&
> (TTU_ACTION(flags) == TTU_MIGRATION)) {
> Index: linux-2.6.git/mm/swapfile.c
> ===================================================================
> --- linux-2.6.git.orig/mm/swapfile.c
> +++ linux-2.6.git/mm/swapfile.c
> @@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> }
> #endif /* CONFIG_HIBERNATION */
>
> +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
Nitpick.
If maybe_same_pte is used widely, it looks good to me
but it's used for only swapoff at the moment so I think pte_swap_same
would be better name.
> +{
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> + /*
> + * When pte keeps soft dirty bit the pte generated
> + * from swap entry does not has it, still it's same
> + * pte from logical point of view.
> + */
> + pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
> + return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
> +#else
> + return pte_same(pte, swp_pte);
> +#endif
> +}
> +
> /*
> * No need to decide whether this PTE shares the swap entry with others,
> * just let do_wp_page work it out if a write is requested later - to
> @@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> }
>
> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> - if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
> + if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> mem_cgroup_cancel_charge_swapin(memcg);
> ret = 0;
> goto out;
> @@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> * swapoff spends a _lot_ of time in this loop!
> * Test inline before going to call unuse_pte.
> */
> - if (unlikely(pte_same(*pte, swp_pte))) {
> + if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> pte_unmap(pte);
> ret = unuse_pte(vma, pmd, addr, entry, page);
> if (ret)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-01 0:51 ` Minchan Kim
@ 2013-08-01 5:53 ` Cyrill Gorcunov
2013-08-01 6:16 ` Minchan Kim
0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-01 5:53 UTC (permalink / raw)
To: Minchan Kim
Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar
On Thu, Aug 01, 2013 at 09:51:32AM +0900, Minchan Kim wrote:
> > Index: linux-2.6.git/include/linux/swapops.h
> > ===================================================================
> > --- linux-2.6.git.orig/include/linux/swapops.h
> > +++ linux-2.6.git/include/linux/swapops.h
> > @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> > swp_entry_t arch_entry;
> >
> > BUG_ON(pte_file(pte));
> > + if (pte_swp_soft_dirty(pte))
> > + pte = pte_swp_clear_soft_dirty(pte);
>
> Why do you remove soft-dirty flag whenever pte_to_swp_entry is called?
> Isn't there any problem if we use mincore?
No, there is no problem. pte_to_swp_entry caller when we know that pte
we're decoding is having swap format (except the case in swap code which
figures out the number of bits allowed for offset). Still since this bit
is set on "higher" level than __swp_type/__swp_offset helpers it should
be cleaned before the value from pte comes to "one level down" helpers
function.
> > +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>
> Nitpick.
> If maybe_same_pte is used widely, it looks good to me
> but it's used for only swapoff at the moment so I think pte_swap_same
> would be better name.
I don't see much difference, but sure, lets rename it on top once series
in -mm tree, sounds good?
Cyrill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-01 5:53 ` Cyrill Gorcunov
@ 2013-08-01 6:16 ` Minchan Kim
2013-08-01 6:28 ` Cyrill Gorcunov
0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2013-08-01 6:16 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar
On Thu, Aug 01, 2013 at 09:53:03AM +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 01, 2013 at 09:51:32AM +0900, Minchan Kim wrote:
> > > Index: linux-2.6.git/include/linux/swapops.h
> > > ===================================================================
> > > --- linux-2.6.git.orig/include/linux/swapops.h
> > > +++ linux-2.6.git/include/linux/swapops.h
> > > @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> > > swp_entry_t arch_entry;
> > >
> > > BUG_ON(pte_file(pte));
> > > + if (pte_swp_soft_dirty(pte))
> > > + pte = pte_swp_clear_soft_dirty(pte);
> >
> > Why do you remove soft-dirty flag whenever pte_to_swp_entry is called?
> > Isn't there any problem if we use mincore?
>
> No, there is no problem. pte_to_swp_entry caller when we know that pte
> we're decoding is having swap format (except the case in swap code which
> figures out the number of bits allowed for offset). Still since this bit
> is set on "higher" level than __swp_type/__swp_offset helpers it should
> be cleaned before the value from pte comes to "one level down" helpers
> function.
I don't get it. Could you correct me with below example?
Process A context
try_to_unmap
swp_pte = swp_entry_to_pte /* change generic swp into arch swap */
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(, swp_pte);
Process A context
..
mincore_pte_range
pte_to_swp_entry
pte = pte_swp_clear_soft_dirty <=== 1)
change arch swp with generic swp
mincore_page
Process B want to know dirty state of the page
..
pagemap_read
pte_to_pagemap_entry
is_swap_pte
if (pte_swap_soft_dirty(pte)) <=== but failed by 1)
So, Process B can't get the dirty status from process A's the page.
>
> > > +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
> >
> > Nitpick.
> > If maybe_same_pte is used widely, it looks good to me
> > but it's used for only swapoff at the moment so I think pte_swap_same
> > would be better name.
>
> I don't see much difference, but sure, lets rename it on top once series
> in -mm tree, sounds good?
>
> Cyrill
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-01 6:16 ` Minchan Kim
@ 2013-08-01 6:28 ` Cyrill Gorcunov
2013-08-01 6:37 ` Minchan Kim
0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-01 6:28 UTC (permalink / raw)
To: Minchan Kim
Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar
On Thu, Aug 01, 2013 at 03:16:32PM +0900, Minchan Kim wrote:
>
> I don't get it. Could you correct me with below example?
>
> Process A context
> try_to_unmap
> swp_pte = swp_entry_to_pte /* change generic swp into arch swap */
> swp_pte = pte_swp_mksoft_dirty(swp_pte);
> set_pte_at(, swp_pte);
>
> Process A context
> ..
> mincore_pte_range
pte_t pte = *ptep; <-- local copy of the pte value, in memory it remains the same
with swap softdirty bit set
> pte_to_swp_entry
> pte = pte_swp_clear_soft_dirty <=== 1)
> change arch swp with generic swp
> mincore_page
>
> Process B want to know dirty state of the page
> ..
> pagemap_read
> pte_to_pagemap_entry
> is_swap_pte
> if (pte_swap_soft_dirty(pte)) <=== but failed by 1)
>
> So, Process B can't get the dirty status from process A's the page.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-01 6:28 ` Cyrill Gorcunov
@ 2013-08-01 6:37 ` Minchan Kim
2013-08-01 6:38 ` Cyrill Gorcunov
0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2013-08-01 6:37 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar
On Thu, Aug 01, 2013 at 10:28:14AM +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 01, 2013 at 03:16:32PM +0900, Minchan Kim wrote:
> >
> > I don't get it. Could you correct me with below example?
> >
> > Process A context
> > try_to_unmap
> > swp_pte = swp_entry_to_pte /* change generic swp into arch swap */
> > swp_pte = pte_swp_mksoft_dirty(swp_pte);
> > set_pte_at(, swp_pte);
> >
> > Process A context
> > ..
> > mincore_pte_range
> pte_t pte = *ptep; <-- local copy of the pte value, in memory it remains the same
> with swap softdirty bit set
Argh, I missed that. Thank you!
Reviewed-by: Minchan Kim <minchan@kernel.org>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-01 6:37 ` Minchan Kim
@ 2013-08-01 6:38 ` Cyrill Gorcunov
0 siblings, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-01 6:38 UTC (permalink / raw)
To: Minchan Kim
Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong,
mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar
On Thu, Aug 01, 2013 at 03:37:06PM +0900, Minchan Kim wrote:
>
> Reviewed-by: Minchan Kim <minchan@kernel.org>
Thanks a lot for review, Minchan!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
2013-07-31 8:16 ` Pavel Emelyanov
2013-08-01 0:51 ` Minchan Kim
@ 2013-08-05 1:48 ` Wanpeng Li
2013-08-05 1:48 ` Wanpeng Li
` (2 subsequent siblings)
5 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05 1:48 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm,
xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>bit set get swapped out, the bit is getting lost and no longer
>available when pte read back.
>
>To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>saved in pte entry for the page being swapped out. When such page
>is to be read back from a swap cache we check for bit presence
>and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>bit back.
>
>One of the problem was to find a place in pte entry where we can
>save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>_PAGE_PSE was chosen for that, it doesn't intersect with swap
>entry format stored in pte.
>
>Reported-by: Andy Lutomirski <luto@amacapital.net>
>Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>Cc: Pavel Emelyanov <xemul@parallels.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Matt Mackall <mpm@selenic.com>
>Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>Cc: Marcelo Tosatti <mtosatti@redhat.com>
>Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>---
> arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
> arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
> fs/proc/task_mmu.c | 21 +++++++++++++++------
> include/asm-generic/pgtable.h | 15 +++++++++++++++
> include/linux/swapops.h | 2 ++
> mm/memory.c | 2 ++
> mm/rmap.c | 6 +++++-
> mm/swapfile.c | 19 +++++++++++++++++--
> 8 files changed, 84 insertions(+), 9 deletions(-)
>
>Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> }
>
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
> /*
> * Mask out unsupported bits in a present pgprot. Non-present pgprots
> * can use those bits for other purposes, so leave them be.
>Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>@@ -67,6 +67,19 @@
> #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
> #endif
>
>+/*
>+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>+ * We need a bit which can be stored in pte _and_ not conflict
>+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>+ * into swap entry computation, but bit 6 is used for nonlinear
>+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>+ */
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
>+#else
>+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
>+#endif
>+
> #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
> #else
>Index: linux-2.6.git/fs/proc/task_mmu.c
>===================================================================
>--- linux-2.6.git.orig/fs/proc/task_mmu.c
>+++ linux-2.6.git/fs/proc/task_mmu.c
>@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> * of how soft-dirty works.
> */
> pte_t ptent = *pte;
>- ptent = pte_wrprotect(ptent);
>- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+
>+ if (pte_present(ptent)) {
>+ ptent = pte_wrprotect(ptent);
>+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+ } else if (is_swap_pte(ptent)) {
>+ ptent = pte_swp_clear_soft_dirty(ptent);
>+ }
>+
> set_pte_at(vma->vm_mm, addr, pte, ptent);
> #endif
> }
>@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> for (; addr != end; pte++, addr += PAGE_SIZE) {
> ptent = *pte;
>- if (!pte_present(ptent))
>- continue;
>
> if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> clear_soft_dirty(vma, addr, pte);
> continue;
> }
>
>+ if (!pte_present(ptent))
>+ continue;
>+
> page = vm_normal_page(vma, addr, ptent);
> if (!page)
> continue;
>@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> flags = PM_PRESENT;
> page = vm_normal_page(vma, addr, pte);
> } else if (is_swap_pte(pte)) {
>- swp_entry_t entry = pte_to_swp_entry(pte);
>-
>+ swp_entry_t entry;
>+ if (pte_swp_soft_dirty(pte))
>+ flags2 |= __PM_SOFT_DIRTY;
>+ entry = pte_to_swp_entry(pte);
> frame = swp_type(entry) |
> (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> flags = PM_SWAP;
>Index: linux-2.6.git/include/asm-generic/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>+++ linux-2.6.git/include/asm-generic/pgtable.h
>@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> {
> return pmd;
> }
>+
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+ return pte;
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+ return 0;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+ return pte;
>+}
> #endif
>
> #ifndef __HAVE_PFNMAP_TRACKING
>Index: linux-2.6.git/include/linux/swapops.h
>===================================================================
>--- linux-2.6.git.orig/include/linux/swapops.h
>+++ linux-2.6.git/include/linux/swapops.h
>@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> swp_entry_t arch_entry;
>
> BUG_ON(pte_file(pte));
>+ if (pte_swp_soft_dirty(pte))
>+ pte = pte_swp_clear_soft_dirty(pte);
> arch_entry = __pte_to_swp_entry(pte);
> return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> }
>Index: linux-2.6.git/mm/memory.c
>===================================================================
>--- linux-2.6.git.orig/mm/memory.c
>+++ linux-2.6.git/mm/memory.c
>@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> exclusive = 1;
> }
> flush_icache_page(vma, page);
>+ if (pte_swp_soft_dirty(orig_pte))
>+ pte = pte_mksoft_dirty(pte);
entry = pte_to_swp_entry(orig_pte);
orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
> set_pte_at(mm, address, page_table, pte);
> if (page == swapcache)
> do_page_add_anon_rmap(page, vma, address, exclusive);
>Index: linux-2.6.git/mm/rmap.c
>===================================================================
>--- linux-2.6.git.orig/mm/rmap.c
>+++ linux-2.6.git/mm/rmap.c
>@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> swp_entry_to_pte(make_hwpoison_entry(page)));
> } else if (PageAnon(page)) {
> swp_entry_t entry = { .val = page_private(page) };
>+ pte_t swp_pte;
>
> if (PageSwapCache(page)) {
> /*
>@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> entry = make_migration_entry(page, pte_write(pteval));
> }
>- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>+ swp_pte = swp_entry_to_pte(entry);
>+ if (pte_soft_dirty(pteval))
>+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
>+ set_pte_at(mm, address, pte, swp_pte);
> BUG_ON(pte_file(*pte));
> } else if (IS_ENABLED(CONFIG_MIGRATION) &&
> (TTU_ACTION(flags) == TTU_MIGRATION)) {
>Index: linux-2.6.git/mm/swapfile.c
>===================================================================
>--- linux-2.6.git.orig/mm/swapfile.c
>+++ linux-2.6.git/mm/swapfile.c
>@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> }
> #endif /* CONFIG_HIBERNATION */
>
>+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>+{
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+ /*
>+ * When pte keeps soft dirty bit the pte generated
>+ * from swap entry does not has it, still it's same
>+ * pte from logical point of view.
>+ */
>+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>+#else
>+ return pte_same(pte, swp_pte);
>+#endif
>+}
>+
> /*
> * No need to decide whether this PTE shares the swap entry with others,
> * just let do_wp_page work it out if a write is requested later - to
>@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> }
>
> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> mem_cgroup_cancel_charge_swapin(memcg);
> ret = 0;
> goto out;
>@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> * swapoff spends a _lot_ of time in this loop!
> * Test inline before going to call unuse_pte.
> */
>- if (unlikely(pte_same(*pte, swp_pte))) {
>+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> pte_unmap(pte);
> ret = unuse_pte(vma, pmd, addr, entry, page);
> if (ret)
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org. For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
` (2 preceding siblings ...)
2013-08-05 1:48 ` Wanpeng Li
@ 2013-08-05 1:48 ` Wanpeng Li
[not found] ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-07 20:21 ` Andrew Morton
5 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05 1:48 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm,
xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>bit set get swapped out, the bit is getting lost and no longer
>available when pte read back.
>
>To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>saved in pte entry for the page being swapped out. When such page
>is to be read back from a swap cache we check for bit presence
>and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>bit back.
>
>One of the problem was to find a place in pte entry where we can
>save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>_PAGE_PSE was chosen for that, it doesn't intersect with swap
>entry format stored in pte.
>
>Reported-by: Andy Lutomirski <luto@amacapital.net>
>Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>Cc: Pavel Emelyanov <xemul@parallels.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Matt Mackall <mpm@selenic.com>
>Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>Cc: Marcelo Tosatti <mtosatti@redhat.com>
>Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>---
> arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
> arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
> fs/proc/task_mmu.c | 21 +++++++++++++++------
> include/asm-generic/pgtable.h | 15 +++++++++++++++
> include/linux/swapops.h | 2 ++
> mm/memory.c | 2 ++
> mm/rmap.c | 6 +++++-
> mm/swapfile.c | 19 +++++++++++++++++--
> 8 files changed, 84 insertions(+), 9 deletions(-)
>
>Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> }
>
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>+}
>+
> /*
> * Mask out unsupported bits in a present pgprot. Non-present pgprots
> * can use those bits for other purposes, so leave them be.
>Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>===================================================================
>--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>@@ -67,6 +67,19 @@
> #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
> #endif
>
>+/*
>+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>+ * We need a bit which can be stored in pte _and_ not conflict
>+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>+ * into swap entry computation, but bit 6 is used for nonlinear
>+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>+ */
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
>+#else
>+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
>+#endif
>+
> #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
> #else
>Index: linux-2.6.git/fs/proc/task_mmu.c
>===================================================================
>--- linux-2.6.git.orig/fs/proc/task_mmu.c
>+++ linux-2.6.git/fs/proc/task_mmu.c
>@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> * of how soft-dirty works.
> */
> pte_t ptent = *pte;
>- ptent = pte_wrprotect(ptent);
>- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+
>+ if (pte_present(ptent)) {
>+ ptent = pte_wrprotect(ptent);
>+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>+ } else if (is_swap_pte(ptent)) {
>+ ptent = pte_swp_clear_soft_dirty(ptent);
>+ }
>+
> set_pte_at(vma->vm_mm, addr, pte, ptent);
> #endif
> }
>@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> for (; addr != end; pte++, addr += PAGE_SIZE) {
> ptent = *pte;
>- if (!pte_present(ptent))
>- continue;
>
> if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> clear_soft_dirty(vma, addr, pte);
> continue;
> }
>
>+ if (!pte_present(ptent))
>+ continue;
>+
> page = vm_normal_page(vma, addr, ptent);
> if (!page)
> continue;
>@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> flags = PM_PRESENT;
> page = vm_normal_page(vma, addr, pte);
> } else if (is_swap_pte(pte)) {
>- swp_entry_t entry = pte_to_swp_entry(pte);
>-
>+ swp_entry_t entry;
>+ if (pte_swp_soft_dirty(pte))
>+ flags2 |= __PM_SOFT_DIRTY;
>+ entry = pte_to_swp_entry(pte);
> frame = swp_type(entry) |
> (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> flags = PM_SWAP;
>Index: linux-2.6.git/include/asm-generic/pgtable.h
>===================================================================
>--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>+++ linux-2.6.git/include/asm-generic/pgtable.h
>@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> {
> return pmd;
> }
>+
>+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>+{
>+ return pte;
>+}
>+
>+static inline int pte_swp_soft_dirty(pte_t pte)
>+{
>+ return 0;
>+}
>+
>+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>+{
>+ return pte;
>+}
> #endif
>
> #ifndef __HAVE_PFNMAP_TRACKING
>Index: linux-2.6.git/include/linux/swapops.h
>===================================================================
>--- linux-2.6.git.orig/include/linux/swapops.h
>+++ linux-2.6.git/include/linux/swapops.h
>@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> swp_entry_t arch_entry;
>
> BUG_ON(pte_file(pte));
>+ if (pte_swp_soft_dirty(pte))
>+ pte = pte_swp_clear_soft_dirty(pte);
> arch_entry = __pte_to_swp_entry(pte);
> return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> }
>Index: linux-2.6.git/mm/memory.c
>===================================================================
>--- linux-2.6.git.orig/mm/memory.c
>+++ linux-2.6.git/mm/memory.c
>@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> exclusive = 1;
> }
> flush_icache_page(vma, page);
>+ if (pte_swp_soft_dirty(orig_pte))
>+ pte = pte_mksoft_dirty(pte);
entry = pte_to_swp_entry(orig_pte);
orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
> set_pte_at(mm, address, page_table, pte);
> if (page == swapcache)
> do_page_add_anon_rmap(page, vma, address, exclusive);
>Index: linux-2.6.git/mm/rmap.c
>===================================================================
>--- linux-2.6.git.orig/mm/rmap.c
>+++ linux-2.6.git/mm/rmap.c
>@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> swp_entry_to_pte(make_hwpoison_entry(page)));
> } else if (PageAnon(page)) {
> swp_entry_t entry = { .val = page_private(page) };
>+ pte_t swp_pte;
>
> if (PageSwapCache(page)) {
> /*
>@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> entry = make_migration_entry(page, pte_write(pteval));
> }
>- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>+ swp_pte = swp_entry_to_pte(entry);
>+ if (pte_soft_dirty(pteval))
>+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
>+ set_pte_at(mm, address, pte, swp_pte);
> BUG_ON(pte_file(*pte));
> } else if (IS_ENABLED(CONFIG_MIGRATION) &&
> (TTU_ACTION(flags) == TTU_MIGRATION)) {
>Index: linux-2.6.git/mm/swapfile.c
>===================================================================
>--- linux-2.6.git.orig/mm/swapfile.c
>+++ linux-2.6.git/mm/swapfile.c
>@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> }
> #endif /* CONFIG_HIBERNATION */
>
>+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>+{
>+#ifdef CONFIG_MEM_SOFT_DIRTY
>+ /*
>+ * When pte keeps soft dirty bit the pte generated
>+ * from swap entry does not has it, still it's same
>+ * pte from logical point of view.
>+ */
>+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>+#else
>+ return pte_same(pte, swp_pte);
>+#endif
>+}
>+
> /*
> * No need to decide whether this PTE shares the swap entry with others,
> * just let do_wp_page work it out if a write is requested later - to
>@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> }
>
> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> mem_cgroup_cancel_charge_swapin(memcg);
> ret = 0;
> goto out;
>@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> * swapoff spends a _lot_ of time in this loop!
> * Test inline before going to call unuse_pte.
> */
>- if (unlikely(pte_same(*pte, swp_pte))) {
>+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> pte_unmap(pte);
> ret = unuse_pte(vma, pmd, addr, entry, page);
> if (ret)
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org. For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
[not found] ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-05 2:17 ` Minchan Kim
2013-08-05 2:38 ` Wanpeng Li
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Minchan Kim @ 2013-08-05 2:17 UTC (permalink / raw)
To: Wanpeng Li
Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
Hello Wanpeng,
On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> >bit set get swapped out, the bit is getting lost and no longer
> >available when pte read back.
> >
> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> >saved in pte entry for the page being swapped out. When such page
> >is to be read back from a swap cache we check for bit presence
> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> >bit back.
> >
> >One of the problem was to find a place in pte entry where we can
> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
> >entry format stored in pte.
> >
> >Reported-by: Andy Lutomirski <luto@amacapital.net>
> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >Cc: Pavel Emelyanov <xemul@parallels.com>
> >Cc: Andrew Morton <akpm@linux-foundation.org>
> >Cc: Matt Mackall <mpm@selenic.com>
> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> >Cc: Peter Zijlstra <peterz@infradead.org>
> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >---
> > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
> > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
> > fs/proc/task_mmu.c | 21 +++++++++++++++------
> > include/asm-generic/pgtable.h | 15 +++++++++++++++
> > include/linux/swapops.h | 2 ++
> > mm/memory.c | 2 ++
> > mm/rmap.c | 6 +++++-
> > mm/swapfile.c | 19 +++++++++++++++++--
> > 8 files changed, 84 insertions(+), 9 deletions(-)
> >
> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
> >===================================================================
> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> > }
> >
> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >+{
> >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >+}
> >+
> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >+{
> >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
> >+}
> >+
> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >+{
> >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >+}
> >+
> > /*
> > * Mask out unsupported bits in a present pgprot. Non-present pgprots
> > * can use those bits for other purposes, so leave them be.
> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >===================================================================
> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >@@ -67,6 +67,19 @@
> > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
> > #endif
> >
> >+/*
> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
> >+ * We need a bit which can be stored in pte _and_ not conflict
> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
> >+ * into swap entry computation, but bit 6 is used for nonlinear
> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
> >+ */
> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
> >+#else
> >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
> >+#endif
> >+
> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
> > #else
> >Index: linux-2.6.git/fs/proc/task_mmu.c
> >===================================================================
> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
> >+++ linux-2.6.git/fs/proc/task_mmu.c
> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> > * of how soft-dirty works.
> > */
> > pte_t ptent = *pte;
> >- ptent = pte_wrprotect(ptent);
> >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >+
> >+ if (pte_present(ptent)) {
> >+ ptent = pte_wrprotect(ptent);
> >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >+ } else if (is_swap_pte(ptent)) {
> >+ ptent = pte_swp_clear_soft_dirty(ptent);
> >+ }
> >+
> > set_pte_at(vma->vm_mm, addr, pte, ptent);
> > #endif
> > }
> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> > for (; addr != end; pte++, addr += PAGE_SIZE) {
> > ptent = *pte;
> >- if (!pte_present(ptent))
> >- continue;
> >
> > if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> > clear_soft_dirty(vma, addr, pte);
> > continue;
> > }
> >
> >+ if (!pte_present(ptent))
> >+ continue;
> >+
> > page = vm_normal_page(vma, addr, ptent);
> > if (!page)
> > continue;
> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> > flags = PM_PRESENT;
> > page = vm_normal_page(vma, addr, pte);
> > } else if (is_swap_pte(pte)) {
> >- swp_entry_t entry = pte_to_swp_entry(pte);
> >-
> >+ swp_entry_t entry;
> >+ if (pte_swp_soft_dirty(pte))
> >+ flags2 |= __PM_SOFT_DIRTY;
> >+ entry = pte_to_swp_entry(pte);
> > frame = swp_type(entry) |
> > (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> > flags = PM_SWAP;
> >Index: linux-2.6.git/include/asm-generic/pgtable.h
> >===================================================================
> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
> >+++ linux-2.6.git/include/asm-generic/pgtable.h
> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> > {
> > return pmd;
> > }
> >+
> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >+{
> >+ return pte;
> >+}
> >+
> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >+{
> >+ return 0;
> >+}
> >+
> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >+{
> >+ return pte;
> >+}
> > #endif
> >
> > #ifndef __HAVE_PFNMAP_TRACKING
> >Index: linux-2.6.git/include/linux/swapops.h
> >===================================================================
> >--- linux-2.6.git.orig/include/linux/swapops.h
> >+++ linux-2.6.git/include/linux/swapops.h
> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> > swp_entry_t arch_entry;
> >
> > BUG_ON(pte_file(pte));
> >+ if (pte_swp_soft_dirty(pte))
> >+ pte = pte_swp_clear_soft_dirty(pte);
> > arch_entry = __pte_to_swp_entry(pte);
> > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> > }
> >Index: linux-2.6.git/mm/memory.c
> >===================================================================
> >--- linux-2.6.git.orig/mm/memory.c
> >+++ linux-2.6.git/mm/memory.c
> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> > exclusive = 1;
> > }
> > flush_icache_page(vma, page);
> >+ if (pte_swp_soft_dirty(orig_pte))
> >+ pte = pte_mksoft_dirty(pte);
>
> entry = pte_to_swp_entry(orig_pte);
> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
You seem to walk same way with me.
Please look at my stupid questions in this thread.
>
> > set_pte_at(mm, address, page_table, pte);
> > if (page == swapcache)
> > do_page_add_anon_rmap(page, vma, address, exclusive);
> >Index: linux-2.6.git/mm/rmap.c
> >===================================================================
> >--- linux-2.6.git.orig/mm/rmap.c
> >+++ linux-2.6.git/mm/rmap.c
> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> > swp_entry_to_pte(make_hwpoison_entry(page)));
> > } else if (PageAnon(page)) {
> > swp_entry_t entry = { .val = page_private(page) };
> >+ pte_t swp_pte;
> >
> > if (PageSwapCache(page)) {
> > /*
> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> > entry = make_migration_entry(page, pte_write(pteval));
> > }
> >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
> >+ swp_pte = swp_entry_to_pte(entry);
> >+ if (pte_soft_dirty(pteval))
> >+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
> >+ set_pte_at(mm, address, pte, swp_pte);
> > BUG_ON(pte_file(*pte));
> > } else if (IS_ENABLED(CONFIG_MIGRATION) &&
> > (TTU_ACTION(flags) == TTU_MIGRATION)) {
> >Index: linux-2.6.git/mm/swapfile.c
> >===================================================================
> >--- linux-2.6.git.orig/mm/swapfile.c
> >+++ linux-2.6.git/mm/swapfile.c
> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> > }
> > #endif /* CONFIG_HIBERNATION */
> >
> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
> >+{
> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >+ /*
> >+ * When pte keeps soft dirty bit the pte generated
> >+ * from swap entry does not has it, still it's same
> >+ * pte from logical point of view.
> >+ */
> >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
> >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
> >+#else
> >+ return pte_same(pte, swp_pte);
> >+#endif
> >+}
> >+
> > /*
> > * No need to decide whether this PTE shares the swap entry with others,
> > * just let do_wp_page work it out if a write is requested later - to
> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> > }
> >
> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
> >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> > mem_cgroup_cancel_charge_swapin(memcg);
> > ret = 0;
> > goto out;
> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> > * swapoff spends a _lot_ of time in this loop!
> > * Test inline before going to call unuse_pte.
> > */
> >- if (unlikely(pte_same(*pte, swp_pte))) {
> >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> > pte_unmap(pte);
> > ret = unuse_pte(vma, pmd, addr, entry, page);
> > if (ret)
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org. For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-05 2:17 ` Minchan Kim
@ 2013-08-05 2:38 ` Wanpeng Li
2013-08-05 2:38 ` Wanpeng Li
[not found] ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05 2:38 UTC (permalink / raw)
To: Minchan Kim
Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
Hi Minchan,
On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>Hello Wanpeng,
>
>On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >bit set get swapped out, the bit is getting lost and no longer
>> >available when pte read back.
>> >
>> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >saved in pte entry for the page being swapped out. When such page
>> >is to be read back from a swap cache we check for bit presence
>> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >bit back.
>> >
>> >One of the problem was to find a place in pte entry where we can
>> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >entry format stored in pte.
>> >
>> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >Cc: Matt Mackall <mpm@selenic.com>
>> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >---
>> > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
>> > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
>> > fs/proc/task_mmu.c | 21 +++++++++++++++------
>> > include/asm-generic/pgtable.h | 15 +++++++++++++++
>> > include/linux/swapops.h | 2 ++
>> > mm/memory.c | 2 ++
>> > mm/rmap.c | 6 +++++-
>> > mm/swapfile.c | 19 +++++++++++++++++--
>> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> > }
>> >
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> > /*
>> > * Mask out unsupported bits in a present pgprot. Non-present pgprots
>> > * can use those bits for other purposes, so leave them be.
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >@@ -67,6 +67,19 @@
>> > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
>> > #endif
>> >
>> >+/*
>> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >+ */
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
>> >+#else
>> >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
>> >+#endif
>> >+
>> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> > #else
>> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >===================================================================
>> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> > * of how soft-dirty works.
>> > */
>> > pte_t ptent = *pte;
>> >- ptent = pte_wrprotect(ptent);
>> >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+
>> >+ if (pte_present(ptent)) {
>> >+ ptent = pte_wrprotect(ptent);
>> >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+ } else if (is_swap_pte(ptent)) {
>> >+ ptent = pte_swp_clear_soft_dirty(ptent);
>> >+ }
>> >+
>> > set_pte_at(vma->vm_mm, addr, pte, ptent);
>> > #endif
>> > }
>> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> > for (; addr != end; pte++, addr += PAGE_SIZE) {
>> > ptent = *pte;
>> >- if (!pte_present(ptent))
>> >- continue;
>> >
>> > if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> > clear_soft_dirty(vma, addr, pte);
>> > continue;
>> > }
>> >
>> >+ if (!pte_present(ptent))
>> >+ continue;
>> >+
>> > page = vm_normal_page(vma, addr, ptent);
>> > if (!page)
>> > continue;
>> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> > flags = PM_PRESENT;
>> > page = vm_normal_page(vma, addr, pte);
>> > } else if (is_swap_pte(pte)) {
>> >- swp_entry_t entry = pte_to_swp_entry(pte);
>> >-
>> >+ swp_entry_t entry;
>> >+ if (pte_swp_soft_dirty(pte))
>> >+ flags2 |= __PM_SOFT_DIRTY;
>> >+ entry = pte_to_swp_entry(pte);
>> > frame = swp_type(entry) |
>> > (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> > flags = PM_SWAP;
>> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > {
>> > return pmd;
>> > }
>> >+
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+ return pte;
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+ return 0;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+ return pte;
>> >+}
>> > #endif
>> >
>> > #ifndef __HAVE_PFNMAP_TRACKING
>> >Index: linux-2.6.git/include/linux/swapops.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >+++ linux-2.6.git/include/linux/swapops.h
>> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> > swp_entry_t arch_entry;
>> >
>> > BUG_ON(pte_file(pte));
>> >+ if (pte_swp_soft_dirty(pte))
>> >+ pte = pte_swp_clear_soft_dirty(pte);
>> > arch_entry = __pte_to_swp_entry(pte);
>> > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> > }
>> >Index: linux-2.6.git/mm/memory.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/memory.c
>> >+++ linux-2.6.git/mm/memory.c
>> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> > exclusive = 1;
>> > }
>> > flush_icache_page(vma, page);
>> >+ if (pte_swp_soft_dirty(orig_pte))
>> >+ pte = pte_mksoft_dirty(pte);
>>
>> entry = pte_to_swp_entry(orig_pte);
>> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
>
>You seem to walk same way with me.
>Please look at my stupid questions in this thread.
>
I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
both against orig_pte, where I miss? ;-)
>>
>> > set_pte_at(mm, address, page_table, pte);
>> > if (page == swapcache)
>> > do_page_add_anon_rmap(page, vma, address, exclusive);
>> >Index: linux-2.6.git/mm/rmap.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/rmap.c
>> >+++ linux-2.6.git/mm/rmap.c
>> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> > swp_entry_to_pte(make_hwpoison_entry(page)));
>> > } else if (PageAnon(page)) {
>> > swp_entry_t entry = { .val = page_private(page) };
>> >+ pte_t swp_pte;
>> >
>> > if (PageSwapCache(page)) {
>> > /*
>> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> > entry = make_migration_entry(page, pte_write(pteval));
>> > }
>> >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >+ swp_pte = swp_entry_to_pte(entry);
>> >+ if (pte_soft_dirty(pteval))
>> >+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >+ set_pte_at(mm, address, pte, swp_pte);
>> > BUG_ON(pte_file(*pte));
>> > } else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> > (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >Index: linux-2.6.git/mm/swapfile.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/swapfile.c
>> >+++ linux-2.6.git/mm/swapfile.c
>> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> > }
>> > #endif /* CONFIG_HIBERNATION */
>> >
>> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >+{
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+ /*
>> >+ * When pte keeps soft dirty bit the pte generated
>> >+ * from swap entry does not has it, still it's same
>> >+ * pte from logical point of view.
>> >+ */
>> >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >+#else
>> >+ return pte_same(pte, swp_pte);
>> >+#endif
>> >+}
>> >+
>> > /*
>> > * No need to decide whether this PTE shares the swap entry with others,
>> > * just let do_wp_page work it out if a write is requested later - to
>> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> > }
>> >
>> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> > mem_cgroup_cancel_charge_swapin(memcg);
>> > ret = 0;
>> > goto out;
>> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> > * swapoff spends a _lot_ of time in this loop!
>> > * Test inline before going to call unuse_pte.
>> > */
>> >- if (unlikely(pte_same(*pte, swp_pte))) {
>> >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> > pte_unmap(pte);
>> > ret = unuse_pte(vma, pmd, addr, entry, page);
>> > if (ret)
>> >
>> >--
>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >the body to majordomo@kvack.org. For more info on Linux MM,
>> >see: http://www.linux-mm.org/ .
>> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>--
>Kind regards,
>Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-05 2:17 ` Minchan Kim
2013-08-05 2:38 ` Wanpeng Li
@ 2013-08-05 2:38 ` Wanpeng Li
[not found] ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05 2:38 UTC (permalink / raw)
To: Minchan Kim
Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
Hi Minchan,
On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>Hello Wanpeng,
>
>On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >bit set get swapped out, the bit is getting lost and no longer
>> >available when pte read back.
>> >
>> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >saved in pte entry for the page being swapped out. When such page
>> >is to be read back from a swap cache we check for bit presence
>> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >bit back.
>> >
>> >One of the problem was to find a place in pte entry where we can
>> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >entry format stored in pte.
>> >
>> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >Cc: Matt Mackall <mpm@selenic.com>
>> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >---
>> > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
>> > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
>> > fs/proc/task_mmu.c | 21 +++++++++++++++------
>> > include/asm-generic/pgtable.h | 15 +++++++++++++++
>> > include/linux/swapops.h | 2 ++
>> > mm/memory.c | 2 ++
>> > mm/rmap.c | 6 +++++-
>> > mm/swapfile.c | 19 +++++++++++++++++--
>> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> > }
>> >
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >+}
>> >+
>> > /*
>> > * Mask out unsupported bits in a present pgprot. Non-present pgprots
>> > * can use those bits for other purposes, so leave them be.
>> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >===================================================================
>> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >@@ -67,6 +67,19 @@
>> > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
>> > #endif
>> >
>> >+/*
>> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >+ */
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
>> >+#else
>> >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
>> >+#endif
>> >+
>> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> > #else
>> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >===================================================================
>> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> > * of how soft-dirty works.
>> > */
>> > pte_t ptent = *pte;
>> >- ptent = pte_wrprotect(ptent);
>> >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+
>> >+ if (pte_present(ptent)) {
>> >+ ptent = pte_wrprotect(ptent);
>> >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >+ } else if (is_swap_pte(ptent)) {
>> >+ ptent = pte_swp_clear_soft_dirty(ptent);
>> >+ }
>> >+
>> > set_pte_at(vma->vm_mm, addr, pte, ptent);
>> > #endif
>> > }
>> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> > for (; addr != end; pte++, addr += PAGE_SIZE) {
>> > ptent = *pte;
>> >- if (!pte_present(ptent))
>> >- continue;
>> >
>> > if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> > clear_soft_dirty(vma, addr, pte);
>> > continue;
>> > }
>> >
>> >+ if (!pte_present(ptent))
>> >+ continue;
>> >+
>> > page = vm_normal_page(vma, addr, ptent);
>> > if (!page)
>> > continue;
>> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> > flags = PM_PRESENT;
>> > page = vm_normal_page(vma, addr, pte);
>> > } else if (is_swap_pte(pte)) {
>> >- swp_entry_t entry = pte_to_swp_entry(pte);
>> >-
>> >+ swp_entry_t entry;
>> >+ if (pte_swp_soft_dirty(pte))
>> >+ flags2 |= __PM_SOFT_DIRTY;
>> >+ entry = pte_to_swp_entry(pte);
>> > frame = swp_type(entry) |
>> > (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> > flags = PM_SWAP;
>> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> > {
>> > return pmd;
>> > }
>> >+
>> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >+{
>> >+ return pte;
>> >+}
>> >+
>> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >+{
>> >+ return 0;
>> >+}
>> >+
>> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >+{
>> >+ return pte;
>> >+}
>> > #endif
>> >
>> > #ifndef __HAVE_PFNMAP_TRACKING
>> >Index: linux-2.6.git/include/linux/swapops.h
>> >===================================================================
>> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >+++ linux-2.6.git/include/linux/swapops.h
>> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> > swp_entry_t arch_entry;
>> >
>> > BUG_ON(pte_file(pte));
>> >+ if (pte_swp_soft_dirty(pte))
>> >+ pte = pte_swp_clear_soft_dirty(pte);
>> > arch_entry = __pte_to_swp_entry(pte);
>> > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> > }
>> >Index: linux-2.6.git/mm/memory.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/memory.c
>> >+++ linux-2.6.git/mm/memory.c
>> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> > exclusive = 1;
>> > }
>> > flush_icache_page(vma, page);
>> >+ if (pte_swp_soft_dirty(orig_pte))
>> >+ pte = pte_mksoft_dirty(pte);
>>
>> entry = pte_to_swp_entry(orig_pte);
>> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
>
>You seem to walk same way with me.
>Please look at my stupid questions in this thread.
>
I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
both against orig_pte, where I miss? ;-)
>>
>> > set_pte_at(mm, address, page_table, pte);
>> > if (page == swapcache)
>> > do_page_add_anon_rmap(page, vma, address, exclusive);
>> >Index: linux-2.6.git/mm/rmap.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/rmap.c
>> >+++ linux-2.6.git/mm/rmap.c
>> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> > swp_entry_to_pte(make_hwpoison_entry(page)));
>> > } else if (PageAnon(page)) {
>> > swp_entry_t entry = { .val = page_private(page) };
>> >+ pte_t swp_pte;
>> >
>> > if (PageSwapCache(page)) {
>> > /*
>> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> > entry = make_migration_entry(page, pte_write(pteval));
>> > }
>> >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >+ swp_pte = swp_entry_to_pte(entry);
>> >+ if (pte_soft_dirty(pteval))
>> >+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >+ set_pte_at(mm, address, pte, swp_pte);
>> > BUG_ON(pte_file(*pte));
>> > } else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> > (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >Index: linux-2.6.git/mm/swapfile.c
>> >===================================================================
>> >--- linux-2.6.git.orig/mm/swapfile.c
>> >+++ linux-2.6.git/mm/swapfile.c
>> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> > }
>> > #endif /* CONFIG_HIBERNATION */
>> >
>> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >+{
>> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >+ /*
>> >+ * When pte keeps soft dirty bit the pte generated
>> >+ * from swap entry does not has it, still it's same
>> >+ * pte from logical point of view.
>> >+ */
>> >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >+#else
>> >+ return pte_same(pte, swp_pte);
>> >+#endif
>> >+}
>> >+
>> > /*
>> > * No need to decide whether this PTE shares the swap entry with others,
>> > * just let do_wp_page work it out if a write is requested later - to
>> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> > }
>> >
>> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> > mem_cgroup_cancel_charge_swapin(memcg);
>> > ret = 0;
>> > goto out;
>> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> > * swapoff spends a _lot_ of time in this loop!
>> > * Test inline before going to call unuse_pte.
>> > */
>> >- if (unlikely(pte_same(*pte, swp_pte))) {
>> >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> > pte_unmap(pte);
>> > ret = unuse_pte(vma, pmd, addr, entry, page);
>> > if (ret)
>> >
>> >--
>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >the body to majordomo@kvack.org. For more info on Linux MM,
>> >see: http://www.linux-mm.org/ .
>> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>--
>Kind regards,
>Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
[not found] ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-05 2:54 ` Minchan Kim
2013-08-05 2:58 ` Wanpeng Li
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Minchan Kim @ 2013-08-05 2:54 UTC (permalink / raw)
To: Wanpeng Li
Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
On Mon, Aug 05, 2013 at 10:38:58AM +0800, Wanpeng Li wrote:
> Hi Minchan,
>
> On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
> >Hello Wanpeng,
> >
> >On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
> >> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
> >> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> >> >bit set get swapped out, the bit is getting lost and no longer
> >> >available when pte read back.
> >> >
> >> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> >> >saved in pte entry for the page being swapped out. When such page
> >> >is to be read back from a swap cache we check for bit presence
> >> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> >> >bit back.
> >> >
> >> >One of the problem was to find a place in pte entry where we can
> >> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> >> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
> >> >entry format stored in pte.
> >> >
> >> >Reported-by: Andy Lutomirski <luto@amacapital.net>
> >> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >> >Cc: Pavel Emelyanov <xemul@parallels.com>
> >> >Cc: Andrew Morton <akpm@linux-foundation.org>
> >> >Cc: Matt Mackall <mpm@selenic.com>
> >> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> >> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> >> >Cc: Peter Zijlstra <peterz@infradead.org>
> >> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> >---
> >> > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
> >> > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
> >> > fs/proc/task_mmu.c | 21 +++++++++++++++------
> >> > include/asm-generic/pgtable.h | 15 +++++++++++++++
> >> > include/linux/swapops.h | 2 ++
> >> > mm/memory.c | 2 ++
> >> > mm/rmap.c | 6 +++++-
> >> > mm/swapfile.c | 19 +++++++++++++++++--
> >> > 8 files changed, 84 insertions(+), 9 deletions(-)
> >> >
> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
> >> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> >> > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
> >> > }
> >> >
> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >> >+{
> >> >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >> >+}
> >> >+
> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >> >+{
> >> >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
> >> >+}
> >> >+
> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >> >+{
> >> >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> >> >+}
> >> >+
> >> > /*
> >> > * Mask out unsupported bits in a present pgprot. Non-present pgprots
> >> > * can use those bits for other purposes, so leave them be.
> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
> >> >@@ -67,6 +67,19 @@
> >> > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
> >> > #endif
> >> >
> >> >+/*
> >> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
> >> >+ * We need a bit which can be stored in pte _and_ not conflict
> >> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
> >> >+ * into swap entry computation, but bit 6 is used for nonlinear
> >> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
> >> >+ */
> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >> >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
> >> >+#else
> >> >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
> >> >+#endif
> >> >+
> >> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> >> > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
> >> > #else
> >> >Index: linux-2.6.git/fs/proc/task_mmu.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
> >> >+++ linux-2.6.git/fs/proc/task_mmu.c
> >> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
> >> > * of how soft-dirty works.
> >> > */
> >> > pte_t ptent = *pte;
> >> >- ptent = pte_wrprotect(ptent);
> >> >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >> >+
> >> >+ if (pte_present(ptent)) {
> >> >+ ptent = pte_wrprotect(ptent);
> >> >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> >> >+ } else if (is_swap_pte(ptent)) {
> >> >+ ptent = pte_swp_clear_soft_dirty(ptent);
> >> >+ }
> >> >+
> >> > set_pte_at(vma->vm_mm, addr, pte, ptent);
> >> > #endif
> >> > }
> >> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
> >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> >> > for (; addr != end; pte++, addr += PAGE_SIZE) {
> >> > ptent = *pte;
> >> >- if (!pte_present(ptent))
> >> >- continue;
> >> >
> >> > if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
> >> > clear_soft_dirty(vma, addr, pte);
> >> > continue;
> >> > }
> >> >
> >> >+ if (!pte_present(ptent))
> >> >+ continue;
> >> >+
> >> > page = vm_normal_page(vma, addr, ptent);
> >> > if (!page)
> >> > continue;
> >> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
> >> > flags = PM_PRESENT;
> >> > page = vm_normal_page(vma, addr, pte);
> >> > } else if (is_swap_pte(pte)) {
> >> >- swp_entry_t entry = pte_to_swp_entry(pte);
> >> >-
> >> >+ swp_entry_t entry;
> >> >+ if (pte_swp_soft_dirty(pte))
> >> >+ flags2 |= __PM_SOFT_DIRTY;
> >> >+ entry = pte_to_swp_entry(pte);
> >> > frame = swp_type(entry) |
> >> > (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> >> > flags = PM_SWAP;
> >> >Index: linux-2.6.git/include/asm-generic/pgtable.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
> >> >+++ linux-2.6.git/include/asm-generic/pgtable.h
> >> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
> >> > {
> >> > return pmd;
> >> > }
> >> >+
> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
> >> >+{
> >> >+ return pte;
> >> >+}
> >> >+
> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
> >> >+{
> >> >+ return 0;
> >> >+}
> >> >+
> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> >> >+{
> >> >+ return pte;
> >> >+}
> >> > #endif
> >> >
> >> > #ifndef __HAVE_PFNMAP_TRACKING
> >> >Index: linux-2.6.git/include/linux/swapops.h
> >> >===================================================================
> >> >--- linux-2.6.git.orig/include/linux/swapops.h
> >> >+++ linux-2.6.git/include/linux/swapops.h
> >> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
> >> > swp_entry_t arch_entry;
> >> >
> >> > BUG_ON(pte_file(pte));
> >> >+ if (pte_swp_soft_dirty(pte))
> >> >+ pte = pte_swp_clear_soft_dirty(pte);
> >> > arch_entry = __pte_to_swp_entry(pte);
> >> > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> >> > }
> >> >Index: linux-2.6.git/mm/memory.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/mm/memory.c
> >> >+++ linux-2.6.git/mm/memory.c
> >> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
> >> > exclusive = 1;
> >> > }
> >> > flush_icache_page(vma, page);
> >> >+ if (pte_swp_soft_dirty(orig_pte))
> >> >+ pte = pte_mksoft_dirty(pte);
> >>
> >> entry = pte_to_swp_entry(orig_pte);
> >> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
> >
> >You seem to walk same way with me.
> >Please look at my stupid questions in this thread.
> >
>
> I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
> both against orig_pte, where I miss? ;-)
pte_to_swp_entry is passed orig_pte by vaule, not a pointer
so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
So orig_pte is never changed.
>
> >>
> >> > set_pte_at(mm, address, page_table, pte);
> >> > if (page == swapcache)
> >> > do_page_add_anon_rmap(page, vma, address, exclusive);
> >> >Index: linux-2.6.git/mm/rmap.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/mm/rmap.c
> >> >+++ linux-2.6.git/mm/rmap.c
> >> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
> >> > swp_entry_to_pte(make_hwpoison_entry(page)));
> >> > } else if (PageAnon(page)) {
> >> > swp_entry_t entry = { .val = page_private(page) };
> >> >+ pte_t swp_pte;
> >> >
> >> > if (PageSwapCache(page)) {
> >> > /*
> >> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
> >> > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
> >> > entry = make_migration_entry(page, pte_write(pteval));
> >> > }
> >> >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
> >> >+ swp_pte = swp_entry_to_pte(entry);
> >> >+ if (pte_soft_dirty(pteval))
> >> >+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
> >> >+ set_pte_at(mm, address, pte, swp_pte);
> >> > BUG_ON(pte_file(*pte));
> >> > } else if (IS_ENABLED(CONFIG_MIGRATION) &&
> >> > (TTU_ACTION(flags) == TTU_MIGRATION)) {
> >> >Index: linux-2.6.git/mm/swapfile.c
> >> >===================================================================
> >> >--- linux-2.6.git.orig/mm/swapfile.c
> >> >+++ linux-2.6.git/mm/swapfile.c
> >> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
> >> > }
> >> > #endif /* CONFIG_HIBERNATION */
> >> >
> >> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
> >> >+{
> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
> >> >+ /*
> >> >+ * When pte keeps soft dirty bit the pte generated
> >> >+ * from swap entry does not has it, still it's same
> >> >+ * pte from logical point of view.
> >> >+ */
> >> >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
> >> >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
> >> >+#else
> >> >+ return pte_same(pte, swp_pte);
> >> >+#endif
> >> >+}
> >> >+
> >> > /*
> >> > * No need to decide whether this PTE shares the swap entry with others,
> >> > * just let do_wp_page work it out if a write is requested later - to
> >> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
> >> > }
> >> >
> >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> >> >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
> >> >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
> >> > mem_cgroup_cancel_charge_swapin(memcg);
> >> > ret = 0;
> >> > goto out;
> >> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
> >> > * swapoff spends a _lot_ of time in this loop!
> >> > * Test inline before going to call unuse_pte.
> >> > */
> >> >- if (unlikely(pte_same(*pte, swp_pte))) {
> >> >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
> >> > pte_unmap(pte);
> >> > ret = unuse_pte(vma, pmd, addr, entry, page);
> >> > if (ret)
> >> >
> >> >--
> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> >the body to majordomo@kvack.org. For more info on Linux MM,
> >> >see: http://www.linux-mm.org/ .
> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org. For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> >--
> >Kind regards,
> >Minchan Kim
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-05 2:54 ` Minchan Kim
@ 2013-08-05 2:58 ` Wanpeng Li
2013-08-05 2:58 ` Wanpeng Li
[not found] ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05 2:58 UTC (permalink / raw)
To: Minchan Kim
Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
On Mon, Aug 05, 2013 at 11:54:37AM +0900, Minchan Kim wrote:
>On Mon, Aug 05, 2013 at 10:38:58AM +0800, Wanpeng Li wrote:
>> Hi Minchan,
>>
>> On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>> >Hello Wanpeng,
>> >
>> >On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> >> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >> >bit set get swapped out, the bit is getting lost and no longer
>> >> >available when pte read back.
>> >> >
>> >> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >> >saved in pte entry for the page being swapped out. When such page
>> >> >is to be read back from a swap cache we check for bit presence
>> >> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >> >bit back.
>> >> >
>> >> >One of the problem was to find a place in pte entry where we can
>> >> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >> >entry format stored in pte.
>> >> >
>> >> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >> >Cc: Matt Mackall <mpm@selenic.com>
>> >> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> >---
>> >> > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
>> >> > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
>> >> > fs/proc/task_mmu.c | 21 +++++++++++++++------
>> >> > include/asm-generic/pgtable.h | 15 +++++++++++++++
>> >> > include/linux/swapops.h | 2 ++
>> >> > mm/memory.c | 2 ++
>> >> > mm/rmap.c | 6 +++++-
>> >> > mm/swapfile.c | 19 +++++++++++++++++--
>> >> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >> >
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> >> > }
>> >> >
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> > /*
>> >> > * Mask out unsupported bits in a present pgprot. Non-present pgprots
>> >> > * can use those bits for other purposes, so leave them be.
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >@@ -67,6 +67,19 @@
>> >> > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
>> >> > #endif
>> >> >
>> >> >+/*
>> >> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >> >+ */
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
>> >> >+#else
>> >> >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
>> >> >+#endif
>> >> >+
>> >> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> >> > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> >> > #else
>> >> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> >> > * of how soft-dirty works.
>> >> > */
>> >> > pte_t ptent = *pte;
>> >> >- ptent = pte_wrprotect(ptent);
>> >> >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+
>> >> >+ if (pte_present(ptent)) {
>> >> >+ ptent = pte_wrprotect(ptent);
>> >> >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+ } else if (is_swap_pte(ptent)) {
>> >> >+ ptent = pte_swp_clear_soft_dirty(ptent);
>> >> >+ }
>> >> >+
>> >> > set_pte_at(vma->vm_mm, addr, pte, ptent);
>> >> > #endif
>> >> > }
>> >> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> > for (; addr != end; pte++, addr += PAGE_SIZE) {
>> >> > ptent = *pte;
>> >> >- if (!pte_present(ptent))
>> >> >- continue;
>> >> >
>> >> > if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> >> > clear_soft_dirty(vma, addr, pte);
>> >> > continue;
>> >> > }
>> >> >
>> >> >+ if (!pte_present(ptent))
>> >> >+ continue;
>> >> >+
>> >> > page = vm_normal_page(vma, addr, ptent);
>> >> > if (!page)
>> >> > continue;
>> >> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> >> > flags = PM_PRESENT;
>> >> > page = vm_normal_page(vma, addr, pte);
>> >> > } else if (is_swap_pte(pte)) {
>> >> >- swp_entry_t entry = pte_to_swp_entry(pte);
>> >> >-
>> >> >+ swp_entry_t entry;
>> >> >+ if (pte_swp_soft_dirty(pte))
>> >> >+ flags2 |= __PM_SOFT_DIRTY;
>> >> >+ entry = pte_to_swp_entry(pte);
>> >> > frame = swp_type(entry) |
>> >> > (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> >> > flags = PM_SWAP;
>> >> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > {
>> >> > return pmd;
>> >> > }
>> >> >+
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte;
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return 0;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte;
>> >> >+}
>> >> > #endif
>> >> >
>> >> > #ifndef __HAVE_PFNMAP_TRACKING
>> >> >Index: linux-2.6.git/include/linux/swapops.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >> >+++ linux-2.6.git/include/linux/swapops.h
>> >> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> >> > swp_entry_t arch_entry;
>> >> >
>> >> > BUG_ON(pte_file(pte));
>> >> >+ if (pte_swp_soft_dirty(pte))
>> >> >+ pte = pte_swp_clear_soft_dirty(pte);
>> >> > arch_entry = __pte_to_swp_entry(pte);
>> >> > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> >> > }
>> >> >Index: linux-2.6.git/mm/memory.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/memory.c
>> >> >+++ linux-2.6.git/mm/memory.c
>> >> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> >> > exclusive = 1;
>> >> > }
>> >> > flush_icache_page(vma, page);
>> >> >+ if (pte_swp_soft_dirty(orig_pte))
>> >> >+ pte = pte_mksoft_dirty(pte);
>> >>
>> >> entry = pte_to_swp_entry(orig_pte);
>> >> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
>> >
>> >You seem to walk same way with me.
>> >Please look at my stupid questions in this thread.
>> >
>>
>> I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
>> both against orig_pte, where I miss? ;-)
>
>pte_to_swp_entry is passed orig_pte by vaule, not a pointer
>so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
>So orig_pte is never changed.
Ouch! Thanks for pointing out. ;-)
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>
>>
>> >>
>> >> > set_pte_at(mm, address, page_table, pte);
>> >> > if (page == swapcache)
>> >> > do_page_add_anon_rmap(page, vma, address, exclusive);
>> >> >Index: linux-2.6.git/mm/rmap.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/rmap.c
>> >> >+++ linux-2.6.git/mm/rmap.c
>> >> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> >> > swp_entry_to_pte(make_hwpoison_entry(page)));
>> >> > } else if (PageAnon(page)) {
>> >> > swp_entry_t entry = { .val = page_private(page) };
>> >> >+ pte_t swp_pte;
>> >> >
>> >> > if (PageSwapCache(page)) {
>> >> > /*
>> >> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> >> > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> >> > entry = make_migration_entry(page, pte_write(pteval));
>> >> > }
>> >> >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >> >+ swp_pte = swp_entry_to_pte(entry);
>> >> >+ if (pte_soft_dirty(pteval))
>> >> >+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >> >+ set_pte_at(mm, address, pte, swp_pte);
>> >> > BUG_ON(pte_file(*pte));
>> >> > } else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> >> > (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >> >Index: linux-2.6.git/mm/swapfile.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/swapfile.c
>> >> >+++ linux-2.6.git/mm/swapfile.c
>> >> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> >> > }
>> >> > #endif /* CONFIG_HIBERNATION */
>> >> >
>> >> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >> >+{
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+ /*
>> >> >+ * When pte keeps soft dirty bit the pte generated
>> >> >+ * from swap entry does not has it, still it's same
>> >> >+ * pte from logical point of view.
>> >> >+ */
>> >> >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >> >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >> >+#else
>> >> >+ return pte_same(pte, swp_pte);
>> >> >+#endif
>> >> >+}
>> >> >+
>> >> > /*
>> >> > * No need to decide whether this PTE shares the swap entry with others,
>> >> > * just let do_wp_page work it out if a write is requested later - to
>> >> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> >> > }
>> >> >
>> >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >> >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> >> > mem_cgroup_cancel_charge_swapin(memcg);
>> >> > ret = 0;
>> >> > goto out;
>> >> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> >> > * swapoff spends a _lot_ of time in this loop!
>> >> > * Test inline before going to call unuse_pte.
>> >> > */
>> >> >- if (unlikely(pte_same(*pte, swp_pte))) {
>> >> >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> >> > pte_unmap(pte);
>> >> > ret = unuse_pte(vma, pmd, addr, entry, page);
>> >> > if (ret)
>> >> >
>> >> >--
>> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> >the body to majordomo@kvack.org. For more info on Linux MM,
>> >> >see: http://www.linux-mm.org/ .
>> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org. For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> >--
>> >Kind regards,
>> >Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>--
>Kind regards,
>Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-05 2:54 ` Minchan Kim
2013-08-05 2:58 ` Wanpeng Li
@ 2013-08-05 2:58 ` Wanpeng Li
[not found] ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
2 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-08-05 2:58 UTC (permalink / raw)
To: Minchan Kim
Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
On Mon, Aug 05, 2013 at 11:54:37AM +0900, Minchan Kim wrote:
>On Mon, Aug 05, 2013 at 10:38:58AM +0800, Wanpeng Li wrote:
>> Hi Minchan,
>>
>> On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote:
>> >Hello Wanpeng,
>> >
>> >On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote:
>> >> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote:
>> >> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
>> >> >bit set get swapped out, the bit is getting lost and no longer
>> >> >available when pte read back.
>> >> >
>> >> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
>> >> >saved in pte entry for the page being swapped out. When such page
>> >> >is to be read back from a swap cache we check for bit presence
>> >> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
>> >> >bit back.
>> >> >
>> >> >One of the problem was to find a place in pte entry where we can
>> >> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
>> >> >_PAGE_PSE was chosen for that, it doesn't intersect with swap
>> >> >entry format stored in pte.
>> >> >
>> >> >Reported-by: Andy Lutomirski <luto@amacapital.net>
>> >> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >> >Cc: Pavel Emelyanov <xemul@parallels.com>
>> >> >Cc: Andrew Morton <akpm@linux-foundation.org>
>> >> >Cc: Matt Mackall <mpm@selenic.com>
>> >> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> >> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> >> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>> >> >Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> >> >Cc: Peter Zijlstra <peterz@infradead.org>
>> >> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> >---
>> >> > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++
>> >> > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++
>> >> > fs/proc/task_mmu.c | 21 +++++++++++++++------
>> >> > include/asm-generic/pgtable.h | 15 +++++++++++++++
>> >> > include/linux/swapops.h | 2 ++
>> >> > mm/memory.c | 2 ++
>> >> > mm/rmap.c | 6 +++++-
>> >> > mm/swapfile.c | 19 +++++++++++++++++--
>> >> > 8 files changed, 84 insertions(+), 9 deletions(-)
>> >> >
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h
>> >> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
>> >> > }
>> >> >
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
>> >> >+}
>> >> >+
>> >> > /*
>> >> > * Mask out unsupported bits in a present pgprot. Non-present pgprots
>> >> > * can use those bits for other purposes, so leave them be.
>> >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
>> >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
>> >> >@@ -67,6 +67,19 @@
>> >> > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
>> >> > #endif
>> >> >
>> >> >+/*
>> >> >+ * Tracking soft dirty bit when a page goes to a swap is tricky.
>> >> >+ * We need a bit which can be stored in pte _and_ not conflict
>> >> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved
>> >> >+ * into swap entry computation, but bit 6 is used for nonlinear
>> >> >+ * file mapping, so we borrow bit 7 for soft dirty tracking.
>> >> >+ */
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
>> >> >+#else
>> >> >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
>> >> >+#endif
>> >> >+
>> >> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>> >> > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
>> >> > #else
>> >> >Index: linux-2.6.git/fs/proc/task_mmu.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/fs/proc/task_mmu.c
>> >> >+++ linux-2.6.git/fs/proc/task_mmu.c
>> >> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru
>> >> > * of how soft-dirty works.
>> >> > */
>> >> > pte_t ptent = *pte;
>> >> >- ptent = pte_wrprotect(ptent);
>> >> >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+
>> >> >+ if (pte_present(ptent)) {
>> >> >+ ptent = pte_wrprotect(ptent);
>> >> >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
>> >> >+ } else if (is_swap_pte(ptent)) {
>> >> >+ ptent = pte_swp_clear_soft_dirty(ptent);
>> >> >+ }
>> >> >+
>> >> > set_pte_at(vma->vm_mm, addr, pte, ptent);
>> >> > #endif
>> >> > }
>> >> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p
>> >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> > for (; addr != end; pte++, addr += PAGE_SIZE) {
>> >> > ptent = *pte;
>> >> >- if (!pte_present(ptent))
>> >> >- continue;
>> >> >
>> >> > if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>> >> > clear_soft_dirty(vma, addr, pte);
>> >> > continue;
>> >> > }
>> >> >
>> >> >+ if (!pte_present(ptent))
>> >> >+ continue;
>> >> >+
>> >> > page = vm_normal_page(vma, addr, ptent);
>> >> > if (!page)
>> >> > continue;
>> >> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap
>> >> > flags = PM_PRESENT;
>> >> > page = vm_normal_page(vma, addr, pte);
>> >> > } else if (is_swap_pte(pte)) {
>> >> >- swp_entry_t entry = pte_to_swp_entry(pte);
>> >> >-
>> >> >+ swp_entry_t entry;
>> >> >+ if (pte_swp_soft_dirty(pte))
>> >> >+ flags2 |= __PM_SOFT_DIRTY;
>> >> >+ entry = pte_to_swp_entry(pte);
>> >> > frame = swp_type(entry) |
>> >> > (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>> >> > flags = PM_SWAP;
>> >> >Index: linux-2.6.git/include/asm-generic/pgtable.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h
>> >> >+++ linux-2.6.git/include/asm-generic/pgtable.h
>> >> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd
>> >> > {
>> >> > return pmd;
>> >> > }
>> >> >+
>> >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte;
>> >> >+}
>> >> >+
>> >> >+static inline int pte_swp_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return 0;
>> >> >+}
>> >> >+
>> >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
>> >> >+{
>> >> >+ return pte;
>> >> >+}
>> >> > #endif
>> >> >
>> >> > #ifndef __HAVE_PFNMAP_TRACKING
>> >> >Index: linux-2.6.git/include/linux/swapops.h
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/include/linux/swapops.h
>> >> >+++ linux-2.6.git/include/linux/swapops.h
>> >> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent
>> >> > swp_entry_t arch_entry;
>> >> >
>> >> > BUG_ON(pte_file(pte));
>> >> >+ if (pte_swp_soft_dirty(pte))
>> >> >+ pte = pte_swp_clear_soft_dirty(pte);
>> >> > arch_entry = __pte_to_swp_entry(pte);
>> >> > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
>> >> > }
>> >> >Index: linux-2.6.git/mm/memory.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/memory.c
>> >> >+++ linux-2.6.git/mm/memory.c
>> >> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct
>> >> > exclusive = 1;
>> >> > }
>> >> > flush_icache_page(vma, page);
>> >> >+ if (pte_swp_soft_dirty(orig_pte))
>> >> >+ pte = pte_mksoft_dirty(pte);
>> >>
>> >> entry = pte_to_swp_entry(orig_pte);
>> >> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared.
>> >
>> >You seem to walk same way with me.
>> >Please look at my stupid questions in this thread.
>> >
>>
>> I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty
>> both against orig_pte, where I miss? ;-)
>
>pte_to_swp_entry is passed orig_pte by vaule, not a pointer
>so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
>So orig_pte is never changed.
Ouch! Thanks for pointing out. ;-)
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>
>>
>> >>
>> >> > set_pte_at(mm, address, page_table, pte);
>> >> > if (page == swapcache)
>> >> > do_page_add_anon_rmap(page, vma, address, exclusive);
>> >> >Index: linux-2.6.git/mm/rmap.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/rmap.c
>> >> >+++ linux-2.6.git/mm/rmap.c
>> >> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page,
>> >> > swp_entry_to_pte(make_hwpoison_entry(page)));
>> >> > } else if (PageAnon(page)) {
>> >> > swp_entry_t entry = { .val = page_private(page) };
>> >> >+ pte_t swp_pte;
>> >> >
>> >> > if (PageSwapCache(page)) {
>> >> > /*
>> >> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page,
>> >> > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION);
>> >> > entry = make_migration_entry(page, pte_write(pteval));
>> >> > }
>> >> >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>> >> >+ swp_pte = swp_entry_to_pte(entry);
>> >> >+ if (pte_soft_dirty(pteval))
>> >> >+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
>> >> >+ set_pte_at(mm, address, pte, swp_pte);
>> >> > BUG_ON(pte_file(*pte));
>> >> > } else if (IS_ENABLED(CONFIG_MIGRATION) &&
>> >> > (TTU_ACTION(flags) == TTU_MIGRATION)) {
>> >> >Index: linux-2.6.git/mm/swapfile.c
>> >> >===================================================================
>> >> >--- linux-2.6.git.orig/mm/swapfile.c
>> >> >+++ linux-2.6.git/mm/swapfile.c
>> >> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type,
>> >> > }
>> >> > #endif /* CONFIG_HIBERNATION */
>> >> >
>> >> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte)
>> >> >+{
>> >> >+#ifdef CONFIG_MEM_SOFT_DIRTY
>> >> >+ /*
>> >> >+ * When pte keeps soft dirty bit the pte generated
>> >> >+ * from swap entry does not has it, still it's same
>> >> >+ * pte from logical point of view.
>> >> >+ */
>> >> >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte);
>> >> >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty);
>> >> >+#else
>> >> >+ return pte_same(pte, swp_pte);
>> >> >+#endif
>> >> >+}
>> >> >+
>> >> > /*
>> >> > * No need to decide whether this PTE shares the swap entry with others,
>> >> > * just let do_wp_page work it out if a write is requested later - to
>> >> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru
>> >> > }
>> >> >
>> >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> >> >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>> >> >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
>> >> > mem_cgroup_cancel_charge_swapin(memcg);
>> >> > ret = 0;
>> >> > goto out;
>> >> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are
>> >> > * swapoff spends a _lot_ of time in this loop!
>> >> > * Test inline before going to call unuse_pte.
>> >> > */
>> >> >- if (unlikely(pte_same(*pte, swp_pte))) {
>> >> >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) {
>> >> > pte_unmap(pte);
>> >> > ret = unuse_pte(vma, pmd, addr, entry, page);
>> >> > if (ret)
>> >> >
>> >> >--
>> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> >the body to majordomo@kvack.org. For more info on Linux MM,
>> >> >see: http://www.linux-mm.org/ .
>> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org. For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> >--
>> >Kind regards,
>> >Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>--
>Kind regards,
>Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
[not found] ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-05 5:43 ` Cyrill Gorcunov
0 siblings, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-05 5:43 UTC (permalink / raw)
To: Wanpeng Li
Cc: Minchan Kim, linux-mm, linux-kernel, luto, xemul, akpm, mpm,
xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
On Mon, Aug 05, 2013 at 10:58:35AM +0800, Wanpeng Li wrote:
> >
> >pte_to_swp_entry is passed orig_pte by vaule, not a pointer
> >so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy.
> >So orig_pte is never changed.
>
> Ouch! Thanks for pointing out. ;-)
>
> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Yeah, it's a bit tricky. Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
` (4 preceding siblings ...)
[not found] ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-08-07 20:21 ` Andrew Morton
2013-08-07 20:29 ` Cyrill Gorcunov
2013-08-10 17:48 ` James Bottomley
5 siblings, 2 replies; 34+ messages in thread
From: Andrew Morton @ 2013-08-07 20:21 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, mpm, xiaoguangrong,
mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar
On Wed, 31 Jul 2013 00:41:55 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> bit set get swapped out, the bit is getting lost and no longer
> available when pte read back.
>
> To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> saved in pte entry for the page being swapped out. When such page
> is to be read back from a swap cache we check for bit presence
> and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> bit back.
>
> One of the problem was to find a place in pte entry where we can
> save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> _PAGE_PSE was chosen for that, it doesn't intersect with swap
> entry format stored in pte.
So the implication is that if another architecture wants to support
this (and, realistically, wants to support CRIU), that architecture
must find a spare pte bit to implement _PTE_SWP_SOFT_DIRTY. Yes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-07 20:21 ` Andrew Morton
@ 2013-08-07 20:29 ` Cyrill Gorcunov
2013-08-10 17:48 ` James Bottomley
1 sibling, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2013-08-07 20:29 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti,
kosaki.motohiro, sfr, peterz, aneesh.kumar
On Wed, Aug 07, 2013 at 01:21:56PM -0700, Andrew Morton wrote:
> >
> > One of the problem was to find a place in pte entry where we can
> > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> > _PAGE_PSE was chosen for that, it doesn't intersect with swap
> > entry format stored in pte.
>
> So the implication is that if another architecture wants to support
> this (and, realistically, wants to support CRIU), that architecture
> must find a spare pte bit to implement _PTE_SWP_SOFT_DIRTY. Yes?
Exactly.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages
2013-08-07 20:21 ` Andrew Morton
2013-08-07 20:29 ` Cyrill Gorcunov
@ 2013-08-10 17:48 ` James Bottomley
1 sibling, 0 replies; 34+ messages in thread
From: James Bottomley @ 2013-08-10 17:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul,
mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz,
aneesh.kumar
On Wed, 2013-08-07 at 13:21 -0700, Andrew Morton wrote:
> On Wed, 31 Jul 2013 00:41:55 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
> > Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY
> > bit set get swapped out, the bit is getting lost and no longer
> > available when pte read back.
> >
> > To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is
> > saved in pte entry for the page being swapped out. When such page
> > is to be read back from a swap cache we check for bit presence
> > and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY
> > bit back.
> >
> > One of the problem was to find a place in pte entry where we can
> > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The
> > _PAGE_PSE was chosen for that, it doesn't intersect with swap
> > entry format stored in pte.
>
> So the implication is that if another architecture wants to support
> this (and, realistically, wants to support CRIU),
To be clear, CRIU is usable for basic checkpoint/restore without soft
dirty. It's using CRIU as an engine for process migration between nodes
that won't work efficiently without soft dirty. What happens without
soft dirty is that we have to freeze the source process state, transfer
the bits and then begin execution on the target ... that means the
process can be suspended for minutes (and means that customers notice
and your SLAs get blown). Using soft dirty, we can iteratively build up
the process image on the target while the source process is still
executing meaning the actual transfer between source and target takes
only seconds (when the delta is small enough, we freeze the source,
transfer the remaining changed bits and begin on the target).
James
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread