* [patch 0/2] Soft-dirty page tracker improvemens @ 2013-07-30 20:41 Cyrill Gorcunov 2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov 2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov 0 siblings, 2 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-07-30 20:41 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar Hi, as being reported by Andy, there are a couple of situations when soft-dirty bit will be lost, in paricular when page we're tracking is going to swap and when file page get reclaimed. In this series both problems are aimed. One more hardness which remains is the scenario when vma area (which has soft-dirty bit set in appropriate pte entries) get unmapped then new one mapped in-place. I'm working on it now hope to provide a patch soon. Thanks, Cyrill ^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-07-30 20:41 [patch 0/2] Soft-dirty page tracker improvemens Cyrill Gorcunov @ 2013-07-30 20:41 ` Cyrill Gorcunov 2013-07-31 8:16 ` Pavel Emelyanov ` (3 more replies) 2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov 1 sibling, 4 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-07-30 20:41 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar [-- Attachment #1: pte-sft-dirty-swap-4 --] [-- Type: text/plain, Size: 8886 bytes --] Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY bit set get swapped out, the bit is getting lost and no longer available when pte read back. To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in pte entry for the page being swapped out. When such page is to be read back from a swap cache we check for bit presence and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY bit back. One of the problem was to find a place in pte entry where we can save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was chosen for that, it doesn't intersect with swap entry format stored in pte. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> --- arch/x86/include/asm/pgtable.h | 15 +++++++++++++++ arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++ fs/proc/task_mmu.c | 21 +++++++++++++++------ include/asm-generic/pgtable.h | 15 +++++++++++++++ include/linux/swapops.h | 2 ++ mm/memory.c | 2 ++ mm/rmap.c | 6 +++++- mm/swapfile.c | 19 +++++++++++++++++-- 8 files changed, 84 insertions(+), 9 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h +++ linux-2.6.git/arch/x86/include/asm/pgtable.h @@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY); } +static inline pte_t pte_swp_mksoft_dirty(pte_t pte) +{ + return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY); +} + +static inline int pte_swp_soft_dirty(pte_t pte) +{ + return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY; +} + +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) +{ + return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); +} + /* * Mask out unsupported bits in a present pgprot. Non-present pgprots * can use those bits for other purposes, so leave them be. Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h @@ -67,6 +67,19 @@ #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) #endif +/* + * Tracking soft dirty bit when a page goes to a swap is tricky. + * We need a bit which can be stored in pte _and_ not conflict + * with swap entry format. On x86 bits 6 and 7 are *not* involved + * into swap entry computation, but bit 6 is used for nonlinear + * file mapping, so we borrow bit 7 for soft dirty tracking. + */ +#ifdef CONFIG_MEM_SOFT_DIRTY +#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE +#else +#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) +#endif + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) #else Index: linux-2.6.git/fs/proc/task_mmu.c =================================================================== --- linux-2.6.git.orig/fs/proc/task_mmu.c +++ linux-2.6.git/fs/proc/task_mmu.c @@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru * of how soft-dirty works. */ pte_t ptent = *pte; - ptent = pte_wrprotect(ptent); - ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); + + if (pte_present(ptent)) { + ptent = pte_wrprotect(ptent); + ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); + } else if (is_swap_pte(ptent)) { + ptent = pte_swp_clear_soft_dirty(ptent); + } + set_pte_at(vma->vm_mm, addr, pte, ptent); #endif } @@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); for (; addr != end; pte++, addr += PAGE_SIZE) { ptent = *pte; - if (!pte_present(ptent)) - continue; if (cp->type == CLEAR_REFS_SOFT_DIRTY) { clear_soft_dirty(vma, addr, pte); continue; } + if (!pte_present(ptent)) + continue; + page = vm_normal_page(vma, addr, ptent); if (!page) continue; @@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap flags = PM_PRESENT; page = vm_normal_page(vma, addr, pte); } else if (is_swap_pte(pte)) { - swp_entry_t entry = pte_to_swp_entry(pte); - + swp_entry_t entry; + if (pte_swp_soft_dirty(pte)) + flags2 |= __PM_SOFT_DIRTY; + entry = pte_to_swp_entry(pte); frame = swp_type(entry) | (swp_offset(entry) << MAX_SWAPFILES_SHIFT); flags = PM_SWAP; Index: linux-2.6.git/include/asm-generic/pgtable.h =================================================================== --- linux-2.6.git.orig/include/asm-generic/pgtable.h +++ linux-2.6.git/include/asm-generic/pgtable.h @@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd { return pmd; } + +static inline pte_t pte_swp_mksoft_dirty(pte_t pte) +{ + return pte; +} + +static inline int pte_swp_soft_dirty(pte_t pte) +{ + return 0; +} + +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) +{ + return pte; +} #endif #ifndef __HAVE_PFNMAP_TRACKING Index: linux-2.6.git/include/linux/swapops.h =================================================================== --- linux-2.6.git.orig/include/linux/swapops.h +++ linux-2.6.git/include/linux/swapops.h @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent swp_entry_t arch_entry; BUG_ON(pte_file(pte)); + if (pte_swp_soft_dirty(pte)) + pte = pte_swp_clear_soft_dirty(pte); arch_entry = __pte_to_swp_entry(pte); return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); } Index: linux-2.6.git/mm/memory.c =================================================================== --- linux-2.6.git.orig/mm/memory.c +++ linux-2.6.git/mm/memory.c @@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct exclusive = 1; } flush_icache_page(vma, page); + if (pte_swp_soft_dirty(orig_pte)) + pte = pte_mksoft_dirty(pte); set_pte_at(mm, address, page_table, pte); if (page == swapcache) do_page_add_anon_rmap(page, vma, address, exclusive); Index: linux-2.6.git/mm/rmap.c =================================================================== --- linux-2.6.git.orig/mm/rmap.c +++ linux-2.6.git/mm/rmap.c @@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page, swp_entry_to_pte(make_hwpoison_entry(page))); } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(page) }; + pte_t swp_pte; if (PageSwapCache(page)) { /* @@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page, BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION); entry = make_migration_entry(page, pte_write(pteval)); } - set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + set_pte_at(mm, address, pte, swp_pte); BUG_ON(pte_file(*pte)); } else if (IS_ENABLED(CONFIG_MIGRATION) && (TTU_ACTION(flags) == TTU_MIGRATION)) { Index: linux-2.6.git/mm/swapfile.c =================================================================== --- linux-2.6.git.orig/mm/swapfile.c +++ linux-2.6.git/mm/swapfile.c @@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type, } #endif /* CONFIG_HIBERNATION */ +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte) +{ +#ifdef CONFIG_MEM_SOFT_DIRTY + /* + * When pte keeps soft dirty bit the pte generated + * from swap entry does not has it, still it's same + * pte from logical point of view. + */ + pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte); + return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty); +#else + return pte_same(pte, swp_pte); +#endif +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru } pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) { + if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) { mem_cgroup_cancel_charge_swapin(memcg); ret = 0; goto out; @@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are * swapoff spends a _lot_ of time in this loop! * Test inline before going to call unuse_pte. */ - if (unlikely(pte_same(*pte, swp_pte))) { + if (unlikely(maybe_same_pte(*pte, swp_pte))) { pte_unmap(pte); ret = unuse_pte(vma, pmd, addr, entry, page); if (ret) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov @ 2013-07-31 8:16 ` Pavel Emelyanov 2013-08-01 0:51 ` Minchan Kim ` (2 subsequent siblings) 3 siblings, 0 replies; 28+ messages in thread From: Pavel Emelyanov @ 2013-07-31 8:16 UTC (permalink / raw) To: Cyrill Gorcunov, akpm Cc: linux-mm, linux-kernel, luto, gorcunov, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On 07/31/2013 12:41 AM, Cyrill Gorcunov wrote: > Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY > bit set get swapped out, the bit is getting lost and no longer > available when pte read back. > > To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is > saved in pte entry for the page being swapped out. When such page > is to be read back from a swap cache we check for bit presence > and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY > bit back. > > One of the problem was to find a place in pte entry where we can > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The > _PAGE_PSE was chosen for that, it doesn't intersect with swap > entry format stored in pte. > > Reported-by: Andy Lutomirski <luto@amacapital.net> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov 2013-07-31 8:16 ` Pavel Emelyanov @ 2013-08-01 0:51 ` Minchan Kim 2013-08-01 5:53 ` Cyrill Gorcunov [not found] ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com> 2013-08-07 20:21 ` Andrew Morton 3 siblings, 1 reply; 28+ messages in thread From: Minchan Kim @ 2013-08-01 0:51 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar Hello, On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote: > Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY > bit set get swapped out, the bit is getting lost and no longer > available when pte read back. > > To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is > saved in pte entry for the page being swapped out. When such page > is to be read back from a swap cache we check for bit presence > and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY > bit back. > > One of the problem was to find a place in pte entry where we can > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The > _PAGE_PSE was chosen for that, it doesn't intersect with swap > entry format stored in pte. > > Reported-by: Andy Lutomirski <luto@amacapital.net> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > Cc: Pavel Emelyanov <xemul@parallels.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Matt Mackall <mpm@selenic.com> > Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> > Cc: Stephen Rothwell <sfr@canb.auug.org.au> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > --- > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++ > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++ > fs/proc/task_mmu.c | 21 +++++++++++++++------ > include/asm-generic/pgtable.h | 15 +++++++++++++++ > include/linux/swapops.h | 2 ++ > mm/memory.c | 2 ++ > mm/rmap.c | 6 +++++- > mm/swapfile.c | 19 +++++++++++++++++-- > 8 files changed, 84 insertions(+), 9 deletions(-) > > Index: linux-2.6.git/arch/x86/include/asm/pgtable.h > =================================================================== > --- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h > +++ linux-2.6.git/arch/x86/include/asm/pgtable.h > @@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY); > } > > +static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > +{ > + return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY); > +} > + > +static inline int pte_swp_soft_dirty(pte_t pte) > +{ > + return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY; > +} > + > +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) > +{ > + return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); > +} > + > /* > * Mask out unsupported bits in a present pgprot. Non-present pgprots > * can use those bits for other purposes, so leave them be. > Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h > =================================================================== > --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h > +++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h > @@ -67,6 +67,19 @@ > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) > #endif > > +/* > + * Tracking soft dirty bit when a page goes to a swap is tricky. > + * We need a bit which can be stored in pte _and_ not conflict > + * with swap entry format. On x86 bits 6 and 7 are *not* involved > + * into swap entry computation, but bit 6 is used for nonlinear > + * file mapping, so we borrow bit 7 for soft dirty tracking. > + */ > +#ifdef CONFIG_MEM_SOFT_DIRTY > +#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE > +#else > +#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) > +#endif > + > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) > #else > Index: linux-2.6.git/fs/proc/task_mmu.c > =================================================================== > --- linux-2.6.git.orig/fs/proc/task_mmu.c > +++ linux-2.6.git/fs/proc/task_mmu.c > @@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru > * of how soft-dirty works. > */ > pte_t ptent = *pte; > - ptent = pte_wrprotect(ptent); > - ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); > + > + if (pte_present(ptent)) { > + ptent = pte_wrprotect(ptent); > + ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); > + } else if (is_swap_pte(ptent)) { > + ptent = pte_swp_clear_soft_dirty(ptent); > + } > + > set_pte_at(vma->vm_mm, addr, pte, ptent); > #endif > } > @@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > for (; addr != end; pte++, addr += PAGE_SIZE) { > ptent = *pte; > - if (!pte_present(ptent)) > - continue; > > if (cp->type == CLEAR_REFS_SOFT_DIRTY) { > clear_soft_dirty(vma, addr, pte); > continue; > } > > + if (!pte_present(ptent)) > + continue; > + > page = vm_normal_page(vma, addr, ptent); > if (!page) > continue; > @@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap > flags = PM_PRESENT; > page = vm_normal_page(vma, addr, pte); > } else if (is_swap_pte(pte)) { > - swp_entry_t entry = pte_to_swp_entry(pte); > - > + swp_entry_t entry; > + if (pte_swp_soft_dirty(pte)) > + flags2 |= __PM_SOFT_DIRTY; > + entry = pte_to_swp_entry(pte); > frame = swp_type(entry) | > (swp_offset(entry) << MAX_SWAPFILES_SHIFT); > flags = PM_SWAP; > Index: linux-2.6.git/include/asm-generic/pgtable.h > =================================================================== > --- linux-2.6.git.orig/include/asm-generic/pgtable.h > +++ linux-2.6.git/include/asm-generic/pgtable.h > @@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd > { > return pmd; > } > + > +static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > +{ > + return pte; > +} > + > +static inline int pte_swp_soft_dirty(pte_t pte) > +{ > + return 0; > +} > + > +static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) > +{ > + return pte; > +} > #endif > > #ifndef __HAVE_PFNMAP_TRACKING > Index: linux-2.6.git/include/linux/swapops.h > =================================================================== > --- linux-2.6.git.orig/include/linux/swapops.h > +++ linux-2.6.git/include/linux/swapops.h > @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent > swp_entry_t arch_entry; > > BUG_ON(pte_file(pte)); > + if (pte_swp_soft_dirty(pte)) > + pte = pte_swp_clear_soft_dirty(pte); Why do you remove soft-dirty flag whenever pte_to_swp_entry is called? Isn't there any problem if we use mincore? > arch_entry = __pte_to_swp_entry(pte); > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); > } > Index: linux-2.6.git/mm/memory.c > =================================================================== > --- linux-2.6.git.orig/mm/memory.c > +++ linux-2.6.git/mm/memory.c > @@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct > exclusive = 1; > } > flush_icache_page(vma, page); > + if (pte_swp_soft_dirty(orig_pte)) > + pte = pte_mksoft_dirty(pte); > set_pte_at(mm, address, page_table, pte); > if (page == swapcache) > do_page_add_anon_rmap(page, vma, address, exclusive); > Index: linux-2.6.git/mm/rmap.c > =================================================================== > --- linux-2.6.git.orig/mm/rmap.c > +++ linux-2.6.git/mm/rmap.c > @@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page, > swp_entry_to_pte(make_hwpoison_entry(page))); > } else if (PageAnon(page)) { > swp_entry_t entry = { .val = page_private(page) }; > + pte_t swp_pte; > > if (PageSwapCache(page)) { > /* > @@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page, > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION); > entry = make_migration_entry(page, pte_write(pteval)); > } > - set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); > + swp_pte = swp_entry_to_pte(entry); > + if (pte_soft_dirty(pteval)) > + swp_pte = pte_swp_mksoft_dirty(swp_pte); > + set_pte_at(mm, address, pte, swp_pte); > BUG_ON(pte_file(*pte)); > } else if (IS_ENABLED(CONFIG_MIGRATION) && > (TTU_ACTION(flags) == TTU_MIGRATION)) { > Index: linux-2.6.git/mm/swapfile.c > =================================================================== > --- linux-2.6.git.orig/mm/swapfile.c > +++ linux-2.6.git/mm/swapfile.c > @@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type, > } > #endif /* CONFIG_HIBERNATION */ > > +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte) Nitpick. If maybe_same_pte is used widely, it looks good to me but it's used for only swapoff at the moment so I think pte_swap_same would be better name. > +{ > +#ifdef CONFIG_MEM_SOFT_DIRTY > + /* > + * When pte keeps soft dirty bit the pte generated > + * from swap entry does not has it, still it's same > + * pte from logical point of view. > + */ > + pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte); > + return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty); > +#else > + return pte_same(pte, swp_pte); > +#endif > +} > + > /* > * No need to decide whether this PTE shares the swap entry with others, > * just let do_wp_page work it out if a write is requested later - to > @@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru > } > > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > - if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) { > + if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) { > mem_cgroup_cancel_charge_swapin(memcg); > ret = 0; > goto out; > @@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are > * swapoff spends a _lot_ of time in this loop! > * Test inline before going to call unuse_pte. > */ > - if (unlikely(pte_same(*pte, swp_pte))) { > + if (unlikely(maybe_same_pte(*pte, swp_pte))) { > pte_unmap(pte); > ret = unuse_pte(vma, pmd, addr, entry, page); > if (ret) > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-08-01 0:51 ` Minchan Kim @ 2013-08-01 5:53 ` Cyrill Gorcunov 2013-08-01 6:16 ` Minchan Kim 0 siblings, 1 reply; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-01 5:53 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Thu, Aug 01, 2013 at 09:51:32AM +0900, Minchan Kim wrote: > > Index: linux-2.6.git/include/linux/swapops.h > > =================================================================== > > --- linux-2.6.git.orig/include/linux/swapops.h > > +++ linux-2.6.git/include/linux/swapops.h > > @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent > > swp_entry_t arch_entry; > > > > BUG_ON(pte_file(pte)); > > + if (pte_swp_soft_dirty(pte)) > > + pte = pte_swp_clear_soft_dirty(pte); > > Why do you remove soft-dirty flag whenever pte_to_swp_entry is called? > Isn't there any problem if we use mincore? No, there is no problem. pte_to_swp_entry caller when we know that pte we're decoding is having swap format (except the case in swap code which figures out the number of bits allowed for offset). Still since this bit is set on "higher" level than __swp_type/__swp_offset helpers it should be cleaned before the value from pte comes to "one level down" helpers function. > > +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte) > > Nitpick. > If maybe_same_pte is used widely, it looks good to me > but it's used for only swapoff at the moment so I think pte_swap_same > would be better name. I don't see much difference, but sure, lets rename it on top once series in -mm tree, sounds good? Cyrill ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-08-01 5:53 ` Cyrill Gorcunov @ 2013-08-01 6:16 ` Minchan Kim 2013-08-01 6:28 ` Cyrill Gorcunov 0 siblings, 1 reply; 28+ messages in thread From: Minchan Kim @ 2013-08-01 6:16 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Thu, Aug 01, 2013 at 09:53:03AM +0400, Cyrill Gorcunov wrote: > On Thu, Aug 01, 2013 at 09:51:32AM +0900, Minchan Kim wrote: > > > Index: linux-2.6.git/include/linux/swapops.h > > > =================================================================== > > > --- linux-2.6.git.orig/include/linux/swapops.h > > > +++ linux-2.6.git/include/linux/swapops.h > > > @@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent > > > swp_entry_t arch_entry; > > > > > > BUG_ON(pte_file(pte)); > > > + if (pte_swp_soft_dirty(pte)) > > > + pte = pte_swp_clear_soft_dirty(pte); > > > > Why do you remove soft-dirty flag whenever pte_to_swp_entry is called? > > Isn't there any problem if we use mincore? > > No, there is no problem. pte_to_swp_entry caller when we know that pte > we're decoding is having swap format (except the case in swap code which > figures out the number of bits allowed for offset). Still since this bit > is set on "higher" level than __swp_type/__swp_offset helpers it should > be cleaned before the value from pte comes to "one level down" helpers > function. I don't get it. Could you correct me with below example? Process A context try_to_unmap swp_pte = swp_entry_to_pte /* change generic swp into arch swap */ swp_pte = pte_swp_mksoft_dirty(swp_pte); set_pte_at(, swp_pte); Process A context .. mincore_pte_range pte_to_swp_entry pte = pte_swp_clear_soft_dirty <=== 1) change arch swp with generic swp mincore_page Process B want to know dirty state of the page .. pagemap_read pte_to_pagemap_entry is_swap_pte if (pte_swap_soft_dirty(pte)) <=== but failed by 1) So, Process B can't get the dirty status from process A's the page. > > > > +static inline int maybe_same_pte(pte_t pte, pte_t swp_pte) > > > > Nitpick. > > If maybe_same_pte is used widely, it looks good to me > > but it's used for only swapoff at the moment so I think pte_swap_same > > would be better name. > > I don't see much difference, but sure, lets rename it on top once series > in -mm tree, sounds good? > > Cyrill > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-08-01 6:16 ` Minchan Kim @ 2013-08-01 6:28 ` Cyrill Gorcunov 2013-08-01 6:37 ` Minchan Kim 0 siblings, 1 reply; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-01 6:28 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Thu, Aug 01, 2013 at 03:16:32PM +0900, Minchan Kim wrote: > > I don't get it. Could you correct me with below example? > > Process A context > try_to_unmap > swp_pte = swp_entry_to_pte /* change generic swp into arch swap */ > swp_pte = pte_swp_mksoft_dirty(swp_pte); > set_pte_at(, swp_pte); > > Process A context > .. > mincore_pte_range pte_t pte = *ptep; <-- local copy of the pte value, in memory it remains the same with swap softdirty bit set > pte_to_swp_entry > pte = pte_swp_clear_soft_dirty <=== 1) > change arch swp with generic swp > mincore_page > > Process B want to know dirty state of the page > .. > pagemap_read > pte_to_pagemap_entry > is_swap_pte > if (pte_swap_soft_dirty(pte)) <=== but failed by 1) > > So, Process B can't get the dirty status from process A's the page. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-08-01 6:28 ` Cyrill Gorcunov @ 2013-08-01 6:37 ` Minchan Kim 2013-08-01 6:38 ` Cyrill Gorcunov 0 siblings, 1 reply; 28+ messages in thread From: Minchan Kim @ 2013-08-01 6:37 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Thu, Aug 01, 2013 at 10:28:14AM +0400, Cyrill Gorcunov wrote: > On Thu, Aug 01, 2013 at 03:16:32PM +0900, Minchan Kim wrote: > > > > I don't get it. Could you correct me with below example? > > > > Process A context > > try_to_unmap > > swp_pte = swp_entry_to_pte /* change generic swp into arch swap */ > > swp_pte = pte_swp_mksoft_dirty(swp_pte); > > set_pte_at(, swp_pte); > > > > Process A context > > .. > > mincore_pte_range > pte_t pte = *ptep; <-- local copy of the pte value, in memory it remains the same > with swap softdirty bit set Argh, I missed that. Thank you! Reviewed-by: Minchan Kim <minchan@kernel.org> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-08-01 6:37 ` Minchan Kim @ 2013-08-01 6:38 ` Cyrill Gorcunov 0 siblings, 0 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-01 6:38 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Thu, Aug 01, 2013 at 03:37:06PM +0900, Minchan Kim wrote: > > Reviewed-by: Minchan Kim <minchan@kernel.org> Thanks a lot for review, Minchan! ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages [not found] ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com> @ 2013-08-05 2:17 ` Minchan Kim [not found] ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com> 0 siblings, 1 reply; 28+ messages in thread From: Minchan Kim @ 2013-08-05 2:17 UTC (permalink / raw) To: Wanpeng Li Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar Hello Wanpeng, On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote: > On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote: > >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY > >bit set get swapped out, the bit is getting lost and no longer > >available when pte read back. > > > >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is > >saved in pte entry for the page being swapped out. When such page > >is to be read back from a swap cache we check for bit presence > >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY > >bit back. > > > >One of the problem was to find a place in pte entry where we can > >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The > >_PAGE_PSE was chosen for that, it doesn't intersect with swap > >entry format stored in pte. > > > >Reported-by: Andy Lutomirski <luto@amacapital.net> > >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > >Cc: Pavel Emelyanov <xemul@parallels.com> > >Cc: Andrew Morton <akpm@linux-foundation.org> > >Cc: Matt Mackall <mpm@selenic.com> > >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> > >Cc: Marcelo Tosatti <mtosatti@redhat.com> > >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> > >Cc: Stephen Rothwell <sfr@canb.auug.org.au> > >Cc: Peter Zijlstra <peterz@infradead.org> > >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > >--- > > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++ > > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++ > > fs/proc/task_mmu.c | 21 +++++++++++++++------ > > include/asm-generic/pgtable.h | 15 +++++++++++++++ > > include/linux/swapops.h | 2 ++ > > mm/memory.c | 2 ++ > > mm/rmap.c | 6 +++++- > > mm/swapfile.c | 19 +++++++++++++++++-- > > 8 files changed, 84 insertions(+), 9 deletions(-) > > > >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h > >=================================================================== > >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h > >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h > >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd > > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY); > > } > > > >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > >+{ > >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY); > >+} > >+ > >+static inline int pte_swp_soft_dirty(pte_t pte) > >+{ > >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY; > >+} > >+ > >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) > >+{ > >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); > >+} > >+ > > /* > > * Mask out unsupported bits in a present pgprot. Non-present pgprots > > * can use those bits for other purposes, so leave them be. > >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h > >=================================================================== > >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h > >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h > >@@ -67,6 +67,19 @@ > > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) > > #endif > > > >+/* > >+ * Tracking soft dirty bit when a page goes to a swap is tricky. > >+ * We need a bit which can be stored in pte _and_ not conflict > >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved > >+ * into swap entry computation, but bit 6 is used for nonlinear > >+ * file mapping, so we borrow bit 7 for soft dirty tracking. > >+ */ > >+#ifdef CONFIG_MEM_SOFT_DIRTY > >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE > >+#else > >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) > >+#endif > >+ > > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) > > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) > > #else > >Index: linux-2.6.git/fs/proc/task_mmu.c > >=================================================================== > >--- linux-2.6.git.orig/fs/proc/task_mmu.c > >+++ linux-2.6.git/fs/proc/task_mmu.c > >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru > > * of how soft-dirty works. > > */ > > pte_t ptent = *pte; > >- ptent = pte_wrprotect(ptent); > >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); > >+ > >+ if (pte_present(ptent)) { > >+ ptent = pte_wrprotect(ptent); > >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); > >+ } else if (is_swap_pte(ptent)) { > >+ ptent = pte_swp_clear_soft_dirty(ptent); > >+ } > >+ > > set_pte_at(vma->vm_mm, addr, pte, ptent); > > #endif > > } > >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p > > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > > for (; addr != end; pte++, addr += PAGE_SIZE) { > > ptent = *pte; > >- if (!pte_present(ptent)) > >- continue; > > > > if (cp->type == CLEAR_REFS_SOFT_DIRTY) { > > clear_soft_dirty(vma, addr, pte); > > continue; > > } > > > >+ if (!pte_present(ptent)) > >+ continue; > >+ > > page = vm_normal_page(vma, addr, ptent); > > if (!page) > > continue; > >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap > > flags = PM_PRESENT; > > page = vm_normal_page(vma, addr, pte); > > } else if (is_swap_pte(pte)) { > >- swp_entry_t entry = pte_to_swp_entry(pte); > >- > >+ swp_entry_t entry; > >+ if (pte_swp_soft_dirty(pte)) > >+ flags2 |= __PM_SOFT_DIRTY; > >+ entry = pte_to_swp_entry(pte); > > frame = swp_type(entry) | > > (swp_offset(entry) << MAX_SWAPFILES_SHIFT); > > flags = PM_SWAP; > >Index: linux-2.6.git/include/asm-generic/pgtable.h > >=================================================================== > >--- linux-2.6.git.orig/include/asm-generic/pgtable.h > >+++ linux-2.6.git/include/asm-generic/pgtable.h > >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd > > { > > return pmd; > > } > >+ > >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > >+{ > >+ return pte; > >+} > >+ > >+static inline int pte_swp_soft_dirty(pte_t pte) > >+{ > >+ return 0; > >+} > >+ > >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) > >+{ > >+ return pte; > >+} > > #endif > > > > #ifndef __HAVE_PFNMAP_TRACKING > >Index: linux-2.6.git/include/linux/swapops.h > >=================================================================== > >--- linux-2.6.git.orig/include/linux/swapops.h > >+++ linux-2.6.git/include/linux/swapops.h > >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent > > swp_entry_t arch_entry; > > > > BUG_ON(pte_file(pte)); > >+ if (pte_swp_soft_dirty(pte)) > >+ pte = pte_swp_clear_soft_dirty(pte); > > arch_entry = __pte_to_swp_entry(pte); > > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); > > } > >Index: linux-2.6.git/mm/memory.c > >=================================================================== > >--- linux-2.6.git.orig/mm/memory.c > >+++ linux-2.6.git/mm/memory.c > >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct > > exclusive = 1; > > } > > flush_icache_page(vma, page); > >+ if (pte_swp_soft_dirty(orig_pte)) > >+ pte = pte_mksoft_dirty(pte); > > entry = pte_to_swp_entry(orig_pte); > orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. You seem to walk same way with me. Please look at my stupid questions in this thread. > > > set_pte_at(mm, address, page_table, pte); > > if (page == swapcache) > > do_page_add_anon_rmap(page, vma, address, exclusive); > >Index: linux-2.6.git/mm/rmap.c > >=================================================================== > >--- linux-2.6.git.orig/mm/rmap.c > >+++ linux-2.6.git/mm/rmap.c > >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page, > > swp_entry_to_pte(make_hwpoison_entry(page))); > > } else if (PageAnon(page)) { > > swp_entry_t entry = { .val = page_private(page) }; > >+ pte_t swp_pte; > > > > if (PageSwapCache(page)) { > > /* > >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page, > > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION); > > entry = make_migration_entry(page, pte_write(pteval)); > > } > >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); > >+ swp_pte = swp_entry_to_pte(entry); > >+ if (pte_soft_dirty(pteval)) > >+ swp_pte = pte_swp_mksoft_dirty(swp_pte); > >+ set_pte_at(mm, address, pte, swp_pte); > > BUG_ON(pte_file(*pte)); > > } else if (IS_ENABLED(CONFIG_MIGRATION) && > > (TTU_ACTION(flags) == TTU_MIGRATION)) { > >Index: linux-2.6.git/mm/swapfile.c > >=================================================================== > >--- linux-2.6.git.orig/mm/swapfile.c > >+++ linux-2.6.git/mm/swapfile.c > >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type, > > } > > #endif /* CONFIG_HIBERNATION */ > > > >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte) > >+{ > >+#ifdef CONFIG_MEM_SOFT_DIRTY > >+ /* > >+ * When pte keeps soft dirty bit the pte generated > >+ * from swap entry does not has it, still it's same > >+ * pte from logical point of view. > >+ */ > >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte); > >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty); > >+#else > >+ return pte_same(pte, swp_pte); > >+#endif > >+} > >+ > > /* > > * No need to decide whether this PTE shares the swap entry with others, > > * just let do_wp_page work it out if a write is requested later - to > >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru > > } > > > > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) { > >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) { > > mem_cgroup_cancel_charge_swapin(memcg); > > ret = 0; > > goto out; > >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are > > * swapoff spends a _lot_ of time in this loop! > > * Test inline before going to call unuse_pte. > > */ > >- if (unlikely(pte_same(*pte, swp_pte))) { > >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) { > > pte_unmap(pte); > > ret = unuse_pte(vma, pmd, addr, entry, page); > > if (ret) > > > >-- > >To unsubscribe, send a message with 'unsubscribe linux-mm' in > >the body to majordomo@kvack.org. For more info on Linux MM, > >see: http://www.linux-mm.org/ . > >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages [not found] ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com> @ 2013-08-05 2:54 ` Minchan Kim [not found] ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com> 0 siblings, 1 reply; 28+ messages in thread From: Minchan Kim @ 2013-08-05 2:54 UTC (permalink / raw) To: Wanpeng Li Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Mon, Aug 05, 2013 at 10:38:58AM +0800, Wanpeng Li wrote: > Hi Minchan, > > On Mon, Aug 05, 2013 at 11:17:15AM +0900, Minchan Kim wrote: > >Hello Wanpeng, > > > >On Mon, Aug 05, 2013 at 09:48:29AM +0800, Wanpeng Li wrote: > >> On Wed, Jul 31, 2013 at 12:41:55AM +0400, Cyrill Gorcunov wrote: > >> >Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY > >> >bit set get swapped out, the bit is getting lost and no longer > >> >available when pte read back. > >> > > >> >To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is > >> >saved in pte entry for the page being swapped out. When such page > >> >is to be read back from a swap cache we check for bit presence > >> >and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY > >> >bit back. > >> > > >> >One of the problem was to find a place in pte entry where we can > >> >save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The > >> >_PAGE_PSE was chosen for that, it doesn't intersect with swap > >> >entry format stored in pte. > >> > > >> >Reported-by: Andy Lutomirski <luto@amacapital.net> > >> >Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > >> >Cc: Pavel Emelyanov <xemul@parallels.com> > >> >Cc: Andrew Morton <akpm@linux-foundation.org> > >> >Cc: Matt Mackall <mpm@selenic.com> > >> >Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> > >> >Cc: Marcelo Tosatti <mtosatti@redhat.com> > >> >Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> > >> >Cc: Stephen Rothwell <sfr@canb.auug.org.au> > >> >Cc: Peter Zijlstra <peterz@infradead.org> > >> >Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > >> >--- > >> > arch/x86/include/asm/pgtable.h | 15 +++++++++++++++ > >> > arch/x86/include/asm/pgtable_types.h | 13 +++++++++++++ > >> > fs/proc/task_mmu.c | 21 +++++++++++++++------ > >> > include/asm-generic/pgtable.h | 15 +++++++++++++++ > >> > include/linux/swapops.h | 2 ++ > >> > mm/memory.c | 2 ++ > >> > mm/rmap.c | 6 +++++- > >> > mm/swapfile.c | 19 +++++++++++++++++-- > >> > 8 files changed, 84 insertions(+), 9 deletions(-) > >> > > >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable.h > >> >=================================================================== > >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h > >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable.h > >> >@@ -314,6 +314,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd > >> > return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY); > >> > } > >> > > >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > >> >+{ > >> >+ return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY); > >> >+} > >> >+ > >> >+static inline int pte_swp_soft_dirty(pte_t pte) > >> >+{ > >> >+ return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY; > >> >+} > >> >+ > >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) > >> >+{ > >> >+ return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); > >> >+} > >> >+ > >> > /* > >> > * Mask out unsupported bits in a present pgprot. Non-present pgprots > >> > * can use those bits for other purposes, so leave them be. > >> >Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h > >> >=================================================================== > >> >--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h > >> >+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h > >> >@@ -67,6 +67,19 @@ > >> > #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) > >> > #endif > >> > > >> >+/* > >> >+ * Tracking soft dirty bit when a page goes to a swap is tricky. > >> >+ * We need a bit which can be stored in pte _and_ not conflict > >> >+ * with swap entry format. On x86 bits 6 and 7 are *not* involved > >> >+ * into swap entry computation, but bit 6 is used for nonlinear > >> >+ * file mapping, so we borrow bit 7 for soft dirty tracking. > >> >+ */ > >> >+#ifdef CONFIG_MEM_SOFT_DIRTY > >> >+#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE > >> >+#else > >> >+#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) > >> >+#endif > >> >+ > >> > #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) > >> > #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) > >> > #else > >> >Index: linux-2.6.git/fs/proc/task_mmu.c > >> >=================================================================== > >> >--- linux-2.6.git.orig/fs/proc/task_mmu.c > >> >+++ linux-2.6.git/fs/proc/task_mmu.c > >> >@@ -730,8 +730,14 @@ static inline void clear_soft_dirty(stru > >> > * of how soft-dirty works. > >> > */ > >> > pte_t ptent = *pte; > >> >- ptent = pte_wrprotect(ptent); > >> >- ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); > >> >+ > >> >+ if (pte_present(ptent)) { > >> >+ ptent = pte_wrprotect(ptent); > >> >+ ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); > >> >+ } else if (is_swap_pte(ptent)) { > >> >+ ptent = pte_swp_clear_soft_dirty(ptent); > >> >+ } > >> >+ > >> > set_pte_at(vma->vm_mm, addr, pte, ptent); > >> > #endif > >> > } > >> >@@ -752,14 +758,15 @@ static int clear_refs_pte_range(pmd_t *p > >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > >> > for (; addr != end; pte++, addr += PAGE_SIZE) { > >> > ptent = *pte; > >> >- if (!pte_present(ptent)) > >> >- continue; > >> > > >> > if (cp->type == CLEAR_REFS_SOFT_DIRTY) { > >> > clear_soft_dirty(vma, addr, pte); > >> > continue; > >> > } > >> > > >> >+ if (!pte_present(ptent)) > >> >+ continue; > >> >+ > >> > page = vm_normal_page(vma, addr, ptent); > >> > if (!page) > >> > continue; > >> >@@ -930,8 +937,10 @@ static void pte_to_pagemap_entry(pagemap > >> > flags = PM_PRESENT; > >> > page = vm_normal_page(vma, addr, pte); > >> > } else if (is_swap_pte(pte)) { > >> >- swp_entry_t entry = pte_to_swp_entry(pte); > >> >- > >> >+ swp_entry_t entry; > >> >+ if (pte_swp_soft_dirty(pte)) > >> >+ flags2 |= __PM_SOFT_DIRTY; > >> >+ entry = pte_to_swp_entry(pte); > >> > frame = swp_type(entry) | > >> > (swp_offset(entry) << MAX_SWAPFILES_SHIFT); > >> > flags = PM_SWAP; > >> >Index: linux-2.6.git/include/asm-generic/pgtable.h > >> >=================================================================== > >> >--- linux-2.6.git.orig/include/asm-generic/pgtable.h > >> >+++ linux-2.6.git/include/asm-generic/pgtable.h > >> >@@ -417,6 +417,21 @@ static inline pmd_t pmd_mksoft_dirty(pmd > >> > { > >> > return pmd; > >> > } > >> >+ > >> >+static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > >> >+{ > >> >+ return pte; > >> >+} > >> >+ > >> >+static inline int pte_swp_soft_dirty(pte_t pte) > >> >+{ > >> >+ return 0; > >> >+} > >> >+ > >> >+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) > >> >+{ > >> >+ return pte; > >> >+} > >> > #endif > >> > > >> > #ifndef __HAVE_PFNMAP_TRACKING > >> >Index: linux-2.6.git/include/linux/swapops.h > >> >=================================================================== > >> >--- linux-2.6.git.orig/include/linux/swapops.h > >> >+++ linux-2.6.git/include/linux/swapops.h > >> >@@ -67,6 +67,8 @@ static inline swp_entry_t pte_to_swp_ent > >> > swp_entry_t arch_entry; > >> > > >> > BUG_ON(pte_file(pte)); > >> >+ if (pte_swp_soft_dirty(pte)) > >> >+ pte = pte_swp_clear_soft_dirty(pte); > >> > arch_entry = __pte_to_swp_entry(pte); > >> > return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); > >> > } > >> >Index: linux-2.6.git/mm/memory.c > >> >=================================================================== > >> >--- linux-2.6.git.orig/mm/memory.c > >> >+++ linux-2.6.git/mm/memory.c > >> >@@ -3115,6 +3115,8 @@ static int do_swap_page(struct mm_struct > >> > exclusive = 1; > >> > } > >> > flush_icache_page(vma, page); > >> >+ if (pte_swp_soft_dirty(orig_pte)) > >> >+ pte = pte_mksoft_dirty(pte); > >> > >> entry = pte_to_swp_entry(orig_pte); > >> orig_pte's _PTE_SWP_SOFT_DIRTY bit has already been cleared. > > > >You seem to walk same way with me. > >Please look at my stupid questions in this thread. > > > > I see your discussion with Cyrill, however, pte_to_swp_entry and pte_swp_soft_dirty > both against orig_pte, where I miss? ;-) pte_to_swp_entry is passed orig_pte by vaule, not a pointer so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy. So orig_pte is never changed. > > >> > >> > set_pte_at(mm, address, page_table, pte); > >> > if (page == swapcache) > >> > do_page_add_anon_rmap(page, vma, address, exclusive); > >> >Index: linux-2.6.git/mm/rmap.c > >> >=================================================================== > >> >--- linux-2.6.git.orig/mm/rmap.c > >> >+++ linux-2.6.git/mm/rmap.c > >> >@@ -1236,6 +1236,7 @@ int try_to_unmap_one(struct page *page, > >> > swp_entry_to_pte(make_hwpoison_entry(page))); > >> > } else if (PageAnon(page)) { > >> > swp_entry_t entry = { .val = page_private(page) }; > >> >+ pte_t swp_pte; > >> > > >> > if (PageSwapCache(page)) { > >> > /* > >> >@@ -1264,7 +1265,10 @@ int try_to_unmap_one(struct page *page, > >> > BUG_ON(TTU_ACTION(flags) != TTU_MIGRATION); > >> > entry = make_migration_entry(page, pte_write(pteval)); > >> > } > >> >- set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); > >> >+ swp_pte = swp_entry_to_pte(entry); > >> >+ if (pte_soft_dirty(pteval)) > >> >+ swp_pte = pte_swp_mksoft_dirty(swp_pte); > >> >+ set_pte_at(mm, address, pte, swp_pte); > >> > BUG_ON(pte_file(*pte)); > >> > } else if (IS_ENABLED(CONFIG_MIGRATION) && > >> > (TTU_ACTION(flags) == TTU_MIGRATION)) { > >> >Index: linux-2.6.git/mm/swapfile.c > >> >=================================================================== > >> >--- linux-2.6.git.orig/mm/swapfile.c > >> >+++ linux-2.6.git/mm/swapfile.c > >> >@@ -866,6 +866,21 @@ unsigned int count_swap_pages(int type, > >> > } > >> > #endif /* CONFIG_HIBERNATION */ > >> > > >> >+static inline int maybe_same_pte(pte_t pte, pte_t swp_pte) > >> >+{ > >> >+#ifdef CONFIG_MEM_SOFT_DIRTY > >> >+ /* > >> >+ * When pte keeps soft dirty bit the pte generated > >> >+ * from swap entry does not has it, still it's same > >> >+ * pte from logical point of view. > >> >+ */ > >> >+ pte_t swp_pte_dirty = pte_swp_mksoft_dirty(swp_pte); > >> >+ return pte_same(pte, swp_pte) || pte_same(pte, swp_pte_dirty); > >> >+#else > >> >+ return pte_same(pte, swp_pte); > >> >+#endif > >> >+} > >> >+ > >> > /* > >> > * No need to decide whether this PTE shares the swap entry with others, > >> > * just let do_wp_page work it out if a write is requested later - to > >> >@@ -892,7 +907,7 @@ static int unuse_pte(struct vm_area_stru > >> > } > >> > > >> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > >> >- if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) { > >> >+ if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) { > >> > mem_cgroup_cancel_charge_swapin(memcg); > >> > ret = 0; > >> > goto out; > >> >@@ -947,7 +962,7 @@ static int unuse_pte_range(struct vm_are > >> > * swapoff spends a _lot_ of time in this loop! > >> > * Test inline before going to call unuse_pte. > >> > */ > >> >- if (unlikely(pte_same(*pte, swp_pte))) { > >> >+ if (unlikely(maybe_same_pte(*pte, swp_pte))) { > >> > pte_unmap(pte); > >> > ret = unuse_pte(vma, pmd, addr, entry, page); > >> > if (ret) > >> > > >> >-- > >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in > >> >the body to majordomo@kvack.org. For more info on Linux MM, > >> >see: http://www.linux-mm.org/ . > >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > >> > >> -- > >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >> the body to majordomo@kvack.org. For more info on Linux MM, > >> see: http://www.linux-mm.org/ . > >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > > >-- > >Kind regards, > >Minchan Kim > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages [not found] ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com> @ 2013-08-05 5:43 ` Cyrill Gorcunov 0 siblings, 0 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-05 5:43 UTC (permalink / raw) To: Wanpeng Li Cc: Minchan Kim, linux-mm, linux-kernel, luto, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Mon, Aug 05, 2013 at 10:58:35AM +0800, Wanpeng Li wrote: > > > >pte_to_swp_entry is passed orig_pte by vaule, not a pointer > >so although pte_to_swp_entry clear out _PTE_SWP_SOFT_DIRTY, it does it in local-copy. > >So orig_pte is never changed. > > Ouch! Thanks for pointing out. ;-) > > Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Yeah, it's a bit tricky. Thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov ` (2 preceding siblings ...) [not found] ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com> @ 2013-08-07 20:21 ` Andrew Morton 2013-08-07 20:29 ` Cyrill Gorcunov 2013-08-10 17:48 ` James Bottomley 3 siblings, 2 replies; 28+ messages in thread From: Andrew Morton @ 2013-08-07 20:21 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Wed, 31 Jul 2013 00:41:55 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote: > Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY > bit set get swapped out, the bit is getting lost and no longer > available when pte read back. > > To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is > saved in pte entry for the page being swapped out. When such page > is to be read back from a swap cache we check for bit presence > and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY > bit back. > > One of the problem was to find a place in pte entry where we can > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The > _PAGE_PSE was chosen for that, it doesn't intersect with swap > entry format stored in pte. So the implication is that if another architecture wants to support this (and, realistically, wants to support CRIU), that architecture must find a spare pte bit to implement _PTE_SWP_SOFT_DIRTY. Yes? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-08-07 20:21 ` Andrew Morton @ 2013-08-07 20:29 ` Cyrill Gorcunov 2013-08-10 17:48 ` James Bottomley 1 sibling, 0 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-07 20:29 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Wed, Aug 07, 2013 at 01:21:56PM -0700, Andrew Morton wrote: > > > > One of the problem was to find a place in pte entry where we can > > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The > > _PAGE_PSE was chosen for that, it doesn't intersect with swap > > entry format stored in pte. > > So the implication is that if another architecture wants to support > this (and, realistically, wants to support CRIU), that architecture > must find a spare pte bit to implement _PTE_SWP_SOFT_DIRTY. Yes? Exactly. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages 2013-08-07 20:21 ` Andrew Morton 2013-08-07 20:29 ` Cyrill Gorcunov @ 2013-08-10 17:48 ` James Bottomley 1 sibling, 0 replies; 28+ messages in thread From: James Bottomley @ 2013-08-10 17:48 UTC (permalink / raw) To: Andrew Morton Cc: Cyrill Gorcunov, linux-mm, linux-kernel, luto, gorcunov, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Wed, 2013-08-07 at 13:21 -0700, Andrew Morton wrote: > On Wed, 31 Jul 2013 00:41:55 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > > Andy Lutomirski reported that in case if a page with _PAGE_SOFT_DIRTY > > bit set get swapped out, the bit is getting lost and no longer > > available when pte read back. > > > > To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is > > saved in pte entry for the page being swapped out. When such page > > is to be read back from a swap cache we check for bit presence > > and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY > > bit back. > > > > One of the problem was to find a place in pte entry where we can > > save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The > > _PAGE_PSE was chosen for that, it doesn't intersect with swap > > entry format stored in pte. > > So the implication is that if another architecture wants to support > this (and, realistically, wants to support CRIU), To be clear, CRIU is usable for basic checkpoint/restore without soft dirty. It's using CRIU as an engine for process migration between nodes that won't work efficiently without soft dirty. What happens without soft dirty is that we have to freeze the source process state, transfer the bits and then begin execution on the target ... that means the process can be suspended for minutes (and means that customers notice and your SLAs get blown). Using soft dirty, we can iteratively build up the process image on the target while the source process is still executing meaning the actual transfer between source and target takes only seconds (when the delta is small enough, we freeze the source, transfer the remaining changed bits and begin on the target). James ^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-07-30 20:41 [patch 0/2] Soft-dirty page tracker improvemens Cyrill Gorcunov 2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov @ 2013-07-30 20:41 ` Cyrill Gorcunov 2013-07-31 8:16 ` Pavel Emelyanov 2013-08-07 20:28 ` Andrew Morton 1 sibling, 2 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-07-30 20:41 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, luto, gorcunov, xemul, akpm, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar [-- Attachment #1: pte-sft-dirty-file-2 --] [-- Type: text/plain, Size: 9713 bytes --] Andy reported that if file page get reclaimed we loose soft-dirty bit if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get encoded into pte entry. Thus when #pf happens on such non-present pte we can restore it back. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> --- arch/x86/include/asm/pgtable-2level.h | 48 +++++++++++++++++++++++++++++++++- arch/x86/include/asm/pgtable-3level.h | 3 ++ arch/x86/include/asm/pgtable.h | 15 ++++++++++ arch/x86/include/asm/pgtable_types.h | 4 ++ fs/proc/task_mmu.c | 2 + include/asm-generic/pgtable.h | 15 ++++++++++ mm/fremap.c | 11 +++++-- mm/memory.c | 11 +++++-- mm/rmap.c | 8 ++++- 9 files changed, 107 insertions(+), 10 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h @@ -55,9 +55,53 @@ static inline pmd_t native_pmdp_get_and_ #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp) #endif +#ifdef CONFIG_MEM_SOFT_DIRTY + +/* + * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE, _PAGE_BIT_SOFT_DIRTY and + * _PAGE_BIT_PROTNONE are taken, split up the 28 bits of offset + * into this range. + */ +#define PTE_FILE_MAX_BITS 28 +#define PTE_FILE_SHIFT1 (_PAGE_BIT_PRESENT + 1) +#define PTE_FILE_SHIFT2 (_PAGE_BIT_FILE + 1) +#define PTE_FILE_SHIFT3 (_PAGE_BIT_PROTNONE + 1) +#define PTE_FILE_SHIFT4 (_PAGE_BIT_SOFT_DIRTY + 1) +#define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) +#define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) +#define PTE_FILE_BITS3 (PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1) + +#define pte_to_pgoff(pte) \ + ((((pte).pte_low >> (PTE_FILE_SHIFT1)) \ + & ((1U << PTE_FILE_BITS1) - 1))) \ + + ((((pte).pte_low >> (PTE_FILE_SHIFT2)) \ + & ((1U << PTE_FILE_BITS2) - 1)) \ + << (PTE_FILE_BITS1)) \ + + ((((pte).pte_low >> (PTE_FILE_SHIFT3)) \ + & ((1U << PTE_FILE_BITS3) - 1)) \ + << (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ + + ((((pte).pte_low >> (PTE_FILE_SHIFT4))) \ + << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)) + +#define pgoff_to_pte(off) \ + ((pte_t) { .pte_low = \ + ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1) \ + + ((((off) >> PTE_FILE_BITS1) \ + & ((1U << PTE_FILE_BITS2) - 1)) \ + << PTE_FILE_SHIFT2) \ + + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ + & ((1U << PTE_FILE_BITS3) - 1)) \ + << PTE_FILE_SHIFT3) \ + + ((((off) >> \ + (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))) \ + << PTE_FILE_SHIFT4) \ + + _PAGE_FILE }) + +#else /* CONFIG_MEM_SOFT_DIRTY */ + /* * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken, - * split up the 29 bits of offset into this range: + * split up the 29 bits of offset into this range. */ #define PTE_FILE_MAX_BITS 29 #define PTE_FILE_SHIFT1 (_PAGE_BIT_PRESENT + 1) @@ -88,6 +132,8 @@ static inline pmd_t native_pmdp_get_and_ << PTE_FILE_SHIFT3) \ + _PAGE_FILE }) +#endif /* CONFIG_MEM_SOFT_DIRTY */ + /* Encode and de-code a swap entry */ #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) Index: linux-2.6.git/arch/x86/include/asm/pgtable-3level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-3level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-3level.h @@ -179,6 +179,9 @@ static inline pmd_t native_pmdp_get_and_ /* * Bits 0, 6 and 7 are taken in the low part of the pte, * put the 32 bits of offset into the high part. + * + * For soft-dirty tracking 11 bit is taken from + * the low part of pte as well. */ #define pte_to_pgoff(pte) ((pte).pte_high) #define pgoff_to_pte(off) \ Index: linux-2.6.git/arch/x86/include/asm/pgtable.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable.h +++ linux-2.6.git/arch/x86/include/asm/pgtable.h @@ -329,6 +329,21 @@ static inline pte_t pte_swp_clear_soft_d return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } +static inline pte_t pte_file_clear_soft_dirty(pte_t pte) +{ + return pte_clear_flags(pte, _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_file_mksoft_dirty(pte_t pte) +{ + return pte_set_flags(pte, _PAGE_SOFT_DIRTY); +} + +static inline int pte_file_soft_dirty(pte_t pte) +{ + return pte_flags(pte) & _PAGE_SOFT_DIRTY; +} + /* * Mask out unsupported bits in a present pgprot. Non-present pgprots * can use those bits for other purposes, so leave them be. Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h @@ -61,8 +61,10 @@ * they do not conflict with each other. */ +#define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_HIDDEN + #ifdef CONFIG_MEM_SOFT_DIRTY -#define _PAGE_SOFT_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_HIDDEN) +#define _PAGE_SOFT_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY) #else #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) #endif Index: linux-2.6.git/fs/proc/task_mmu.c =================================================================== --- linux-2.6.git.orig/fs/proc/task_mmu.c +++ linux-2.6.git/fs/proc/task_mmu.c @@ -736,6 +736,8 @@ static inline void clear_soft_dirty(stru ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY); } else if (is_swap_pte(ptent)) { ptent = pte_swp_clear_soft_dirty(ptent); + } else if (pte_file(ptent)) { + ptent = pte_file_clear_soft_dirty(ptent); } set_pte_at(vma->vm_mm, addr, pte, ptent); Index: linux-2.6.git/include/asm-generic/pgtable.h =================================================================== --- linux-2.6.git.orig/include/asm-generic/pgtable.h +++ linux-2.6.git/include/asm-generic/pgtable.h @@ -432,6 +432,21 @@ static inline pte_t pte_swp_clear_soft_d { return pte; } + +static inline pte_t pte_file_clear_soft_dirty(pte_t pte) +{ + return pte; +} + +static inline pte_t pte_file_mksoft_dirty(pte_t pte) +{ + return pte; +} + +static inline int pte_file_soft_dirty(pte_t pte) +{ + return 0; +} #endif #ifndef __HAVE_PFNMAP_TRACKING Index: linux-2.6.git/mm/fremap.c =================================================================== --- linux-2.6.git.orig/mm/fremap.c +++ linux-2.6.git/mm/fremap.c @@ -57,17 +57,22 @@ static int install_file_pte(struct mm_st unsigned long addr, unsigned long pgoff, pgprot_t prot) { int err = -ENOMEM; - pte_t *pte; + pte_t *pte, ptfile; spinlock_t *ptl; pte = get_locked_pte(mm, addr, &ptl); if (!pte) goto out; - if (!pte_none(*pte)) + ptfile = pgoff_to_pte(pgoff); + + if (!pte_none(*pte)) { + if (pte_present(*pte) && pte_soft_dirty(*pte)) + pte_file_mksoft_dirty(ptfile); zap_pte(mm, vma, addr, pte); + } - set_pte_at(mm, addr, pte, pgoff_to_pte(pgoff)); + set_pte_at(mm, addr, pte, ptfile); /* * We don't need to run update_mmu_cache() here because the "file pte" * being installed by install_file_pte() is not a real pte - it's a Index: linux-2.6.git/mm/memory.c =================================================================== --- linux-2.6.git.orig/mm/memory.c +++ linux-2.6.git/mm/memory.c @@ -1141,9 +1141,12 @@ again: continue; if (unlikely(details) && details->nonlinear_vma && linear_page_index(details->nonlinear_vma, - addr) != page->index) - set_pte_at(mm, addr, pte, - pgoff_to_pte(page->index)); + addr) != page->index) { + pte_t ptfile = pgoff_to_pte(page->index); + if (pte_soft_dirty(ptent)) + pte_file_mksoft_dirty(ptfile); + set_pte_at(mm, addr, pte, ptfile); + } if (PageAnon(page)) rss[MM_ANONPAGES]--; else { @@ -3410,6 +3413,8 @@ static int __do_fault(struct mm_struct * entry = mk_pte(page, vma->vm_page_prot); if (flags & FAULT_FLAG_WRITE) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + else if (pte_file(orig_pte) && pte_file_soft_dirty(orig_pte)) + pte_mksoft_dirty(entry); if (anon) { inc_mm_counter_fast(mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, address); Index: linux-2.6.git/mm/rmap.c =================================================================== --- linux-2.6.git.orig/mm/rmap.c +++ linux-2.6.git/mm/rmap.c @@ -1405,8 +1405,12 @@ static int try_to_unmap_cluster(unsigned pteval = ptep_clear_flush(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ - if (page->index != linear_page_index(vma, address)) - set_pte_at(mm, address, pte, pgoff_to_pte(page->index)); + if (page->index != linear_page_index(vma, address)) { + pte_t ptfile = pgoff_to_pte(page->index); + if (pte_soft_dirty(pteval)) + pte_file_mksoft_dirty(ptfile); + set_pte_at(mm, address, pte, ptfile); + } /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov @ 2013-07-31 8:16 ` Pavel Emelyanov 2013-08-07 20:28 ` Andrew Morton 1 sibling, 0 replies; 28+ messages in thread From: Pavel Emelyanov @ 2013-07-31 8:16 UTC (permalink / raw) To: Cyrill Gorcunov, akpm Cc: linux-mm, linux-kernel, luto, gorcunov, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On 07/31/2013 12:41 AM, Cyrill Gorcunov wrote: > Andy reported that if file page get reclaimed we loose soft-dirty bit > if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address > get encoded into pte entry. Thus when #pf happens on such non-present > pte we can restore it back. > > Reported-by: Andy Lutomirski <luto@amacapital.net> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov 2013-07-31 8:16 ` Pavel Emelyanov @ 2013-08-07 20:28 ` Andrew Morton 2013-08-07 20:31 ` Cyrill Gorcunov 2013-08-08 14:51 ` Cyrill Gorcunov 1 sibling, 2 replies; 28+ messages in thread From: Andrew Morton @ 2013-08-07 20:28 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-mm, linux-kernel, luto, gorcunov, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Wed, 31 Jul 2013 00:41:56 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote: > +#define pte_to_pgoff(pte) \ > + ((((pte).pte_low >> (PTE_FILE_SHIFT1)) \ > + & ((1U << PTE_FILE_BITS1) - 1))) \ > + + ((((pte).pte_low >> (PTE_FILE_SHIFT2)) \ > + & ((1U << PTE_FILE_BITS2) - 1)) \ > + << (PTE_FILE_BITS1)) \ > + + ((((pte).pte_low >> (PTE_FILE_SHIFT3)) \ > + & ((1U << PTE_FILE_BITS3) - 1)) \ > + << (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ > + + ((((pte).pte_low >> (PTE_FILE_SHIFT4))) \ > + << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)) > + > +#define pgoff_to_pte(off) \ > + ((pte_t) { .pte_low = \ > + ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1) \ > + + ((((off) >> PTE_FILE_BITS1) \ > + & ((1U << PTE_FILE_BITS2) - 1)) \ > + << PTE_FILE_SHIFT2) \ > + + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ > + & ((1U << PTE_FILE_BITS3) - 1)) \ > + << PTE_FILE_SHIFT3) \ > + + ((((off) >> \ > + (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))) \ > + << PTE_FILE_SHIFT4) \ > + + _PAGE_FILE }) Good god. I wonder if these can be turned into out-of-line functions in some form which humans can understand. or #define pte_to_pgoff(pte) frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) + frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) + frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) + frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-07 20:28 ` Andrew Morton @ 2013-08-07 20:31 ` Cyrill Gorcunov 2013-08-08 14:51 ` Cyrill Gorcunov 1 sibling, 0 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-07 20:31 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote: > > Good god. > > I wonder if these can be turned into out-of-line functions in some form > which humans can understand. > > or > > #define pte_to_pgoff(pte) > frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) + > frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) + > frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) + > frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) I copied this code from existing one, not mine invention ;) I'll clean it up on top. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-07 20:28 ` Andrew Morton 2013-08-07 20:31 ` Cyrill Gorcunov @ 2013-08-08 14:51 ` Cyrill Gorcunov 2013-08-12 21:57 ` Andrew Morton 1 sibling, 1 reply; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-08 14:51 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar [-- Attachment #1: Type: text/plain, Size: 576 bytes --] On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote: > > Good god. > > I wonder if these can be turned into out-of-line functions in some form > which humans can understand. > > or > > #define pte_to_pgoff(pte) > frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) + > frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) + > frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) + > frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) Hi, here is what I ended up with. Please take a look (I decided to post patch in the thread since it's related to the context of the mails). [-- Attachment #2: pte-sft-dirty-file-cleanup-2 --] [-- Type: text/plain, Size: 5559 bytes --] From: Cyrill Gorcunov <gorcunov@gmail.com> Subject: mm: Cleanup pte_to_pgoff and pgoff_to_pte helpers Andrew asked if there a way to make pte_to_pgoff and pgoff_to_pte macro helpers somehow more readable. With this patch it should be more understandable what is happening with bits when they come to and from pte entry. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> --- Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE" test present in this pgtable-2level.h file at all? I can't imagine where it can be false on x86. arch/x86/include/asm/pgtable-2level.h | 82 +++++++++++++++++----------------- 1 file changed, 41 insertions(+), 41 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h @@ -55,6 +55,9 @@ static inline pmd_t native_pmdp_get_and_ #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp) #endif +#define _mfrob(v,r,m,l) ((((v) >> (r)) & (m)) << (l)) +#define __frob(v,r,l) (((v) >> (r)) << (l)) + #ifdef CONFIG_MEM_SOFT_DIRTY /* @@ -71,31 +74,27 @@ static inline pmd_t native_pmdp_get_and_ #define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) #define PTE_FILE_BITS3 (PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1) -#define pte_to_pgoff(pte) \ - ((((pte).pte_low >> (PTE_FILE_SHIFT1)) \ - & ((1U << PTE_FILE_BITS1) - 1))) \ - + ((((pte).pte_low >> (PTE_FILE_SHIFT2)) \ - & ((1U << PTE_FILE_BITS2) - 1)) \ - << (PTE_FILE_BITS1)) \ - + ((((pte).pte_low >> (PTE_FILE_SHIFT3)) \ - & ((1U << PTE_FILE_BITS3) - 1)) \ - << (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ - + ((((pte).pte_low >> (PTE_FILE_SHIFT4))) \ - << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)) - -#define pgoff_to_pte(off) \ - ((pte_t) { .pte_low = \ - ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1) \ - + ((((off) >> PTE_FILE_BITS1) \ - & ((1U << PTE_FILE_BITS2) - 1)) \ - << PTE_FILE_SHIFT2) \ - + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ - & ((1U << PTE_FILE_BITS3) - 1)) \ - << PTE_FILE_SHIFT3) \ - + ((((off) >> \ - (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))) \ - << PTE_FILE_SHIFT4) \ - + _PAGE_FILE }) +#define PTE_FILE_MASK1 ((1U << PTE_FILE_BITS1) - 1) +#define PTE_FILE_MASK2 ((1U << PTE_FILE_BITS2) - 1) +#define PTE_FILE_MASK3 ((1U << PTE_FILE_BITS3) - 1) + +#define PTE_FILE_LSHIFT2 (PTE_FILE_BITS1) +#define PTE_FILE_LSHIFT3 (PTE_FILE_BITS1 + PTE_FILE_BITS2) +#define PTE_FILE_LSHIFT4 (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) + +#define pte_to_pgoff(pte) \ + (_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ + _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ + _mfrob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \ + __frob((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_LSHIFT4)) + +#define pgoff_to_pte(off) \ + ((pte_t) { .pte_low = \ + _mfrob(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ + _mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ + _mfrob(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3) + \ + __frob(off, PTE_FILE_LSHIFT4, PTE_FILE_SHIFT4) + \ + _PAGE_FILE }) #else /* CONFIG_MEM_SOFT_DIRTY */ @@ -115,22 +114,23 @@ static inline pmd_t native_pmdp_get_and_ #define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) #define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) -#define pte_to_pgoff(pte) \ - ((((pte).pte_low >> PTE_FILE_SHIFT1) \ - & ((1U << PTE_FILE_BITS1) - 1)) \ - + ((((pte).pte_low >> PTE_FILE_SHIFT2) \ - & ((1U << PTE_FILE_BITS2) - 1)) << PTE_FILE_BITS1) \ - + (((pte).pte_low >> PTE_FILE_SHIFT3) \ - << (PTE_FILE_BITS1 + PTE_FILE_BITS2))) - -#define pgoff_to_pte(off) \ - ((pte_t) { .pte_low = \ - (((off) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1) \ - + ((((off) >> PTE_FILE_BITS1) & ((1U << PTE_FILE_BITS2) - 1)) \ - << PTE_FILE_SHIFT2) \ - + (((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ - << PTE_FILE_SHIFT3) \ - + _PAGE_FILE }) +#define PTE_FILE_MASK1 ((1U << PTE_FILE_BITS1) - 1) +#define PTE_FILE_MASK2 ((1U << PTE_FILE_BITS2) - 1) + +#define PTE_FILE_LSHIFT2 (PTE_FILE_BITS1) +#define PTE_FILE_LSHIFT3 (PTE_FILE_BITS1 + PTE_FILE_BITS2) + +#define pte_to_pgoff(pte) \ + (_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ + _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ + __frob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_LSHIFT3)) + +#define pgoff_to_pte(off) \ + ((pte_t) { .pte_low = \ + _mfrob(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ + _mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ + __frob(off, PTE_FILE_LSHIFT3, PTE_FILE_SHIFT3) + \ + _PAGE_FILE }) #endif /* CONFIG_MEM_SOFT_DIRTY */ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-08 14:51 ` Cyrill Gorcunov @ 2013-08-12 21:57 ` Andrew Morton 2013-08-12 22:28 ` Andy Lutomirski 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2013-08-12 21:57 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-mm, linux-kernel, luto, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, H. Peter Anvin, Thomas Gleixner On Thu, 8 Aug 2013 18:51:20 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote: > On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote: > > > > Good god. > > > > I wonder if these can be turned into out-of-line functions in some form > > which humans can understand. > > > > or > > > > #define pte_to_pgoff(pte) > > frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) + > > frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) + > > frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) + > > frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) > > Hi, here is what I ended up with. Please take a look (I decided to post > patch in the thread since it's related to the context of the mails). You could have #undefed _mfrob and __frob after using them, but whatever. I saved this patch to wave at the x86 guys for 3.12. I plan to merge mm-save-soft-dirty-bits-on-file-pages.patch for 3.11. > Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE" > test present in this pgtable-2level.h file at all? I can't imagine > where it can be false on x86. I doubt if "Guys" read this. x86 maintainers cc'ed. From: Cyrill Gorcunov <gorcunov@gmail.com> Subject: arch/x86/include/asm/pgtable-2level.h: clean up pte_to_pgoff and pgoff_to_pte helpers Andrew asked if there a way to make pte_to_pgoff and pgoff_to_pte macro helpers somehow more readable. With this patch it should be more understandable what is happening with bits when they come to and from pte entry. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/x86/include/asm/pgtable-2level.h | 82 ++++++++++++------------ 1 file changed, 41 insertions(+), 41 deletions(-) diff -puN arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers arch/x86/include/asm/pgtable-2level.h --- a/arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers +++ a/arch/x86/include/asm/pgtable-2level.h @@ -55,6 +55,9 @@ static inline pmd_t native_pmdp_get_and_ #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp) #endif +#define _mfrob(v,r,m,l) ((((v) >> (r)) & (m)) << (l)) +#define __frob(v,r,l) (((v) >> (r)) << (l)) + #ifdef CONFIG_MEM_SOFT_DIRTY /* @@ -71,31 +74,27 @@ static inline pmd_t native_pmdp_get_and_ #define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) #define PTE_FILE_BITS3 (PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1) -#define pte_to_pgoff(pte) \ - ((((pte).pte_low >> (PTE_FILE_SHIFT1)) \ - & ((1U << PTE_FILE_BITS1) - 1))) \ - + ((((pte).pte_low >> (PTE_FILE_SHIFT2)) \ - & ((1U << PTE_FILE_BITS2) - 1)) \ - << (PTE_FILE_BITS1)) \ - + ((((pte).pte_low >> (PTE_FILE_SHIFT3)) \ - & ((1U << PTE_FILE_BITS3) - 1)) \ - << (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ - + ((((pte).pte_low >> (PTE_FILE_SHIFT4))) \ - << (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)) - -#define pgoff_to_pte(off) \ - ((pte_t) { .pte_low = \ - ((((off)) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1) \ - + ((((off) >> PTE_FILE_BITS1) \ - & ((1U << PTE_FILE_BITS2) - 1)) \ - << PTE_FILE_SHIFT2) \ - + ((((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ - & ((1U << PTE_FILE_BITS3) - 1)) \ - << PTE_FILE_SHIFT3) \ - + ((((off) >> \ - (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3))) \ - << PTE_FILE_SHIFT4) \ - + _PAGE_FILE }) +#define PTE_FILE_MASK1 ((1U << PTE_FILE_BITS1) - 1) +#define PTE_FILE_MASK2 ((1U << PTE_FILE_BITS2) - 1) +#define PTE_FILE_MASK3 ((1U << PTE_FILE_BITS3) - 1) + +#define PTE_FILE_LSHIFT2 (PTE_FILE_BITS1) +#define PTE_FILE_LSHIFT3 (PTE_FILE_BITS1 + PTE_FILE_BITS2) +#define PTE_FILE_LSHIFT4 (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) + +#define pte_to_pgoff(pte) \ + (_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ + _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ + _mfrob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \ + __frob((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_LSHIFT4)) + +#define pgoff_to_pte(off) \ + ((pte_t) { .pte_low = \ + _mfrob(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ + _mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ + _mfrob(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3) + \ + __frob(off, PTE_FILE_LSHIFT4, PTE_FILE_SHIFT4) + \ + _PAGE_FILE }) #else /* CONFIG_MEM_SOFT_DIRTY */ @@ -115,22 +114,23 @@ static inline pmd_t native_pmdp_get_and_ #define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) #define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) -#define pte_to_pgoff(pte) \ - ((((pte).pte_low >> PTE_FILE_SHIFT1) \ - & ((1U << PTE_FILE_BITS1) - 1)) \ - + ((((pte).pte_low >> PTE_FILE_SHIFT2) \ - & ((1U << PTE_FILE_BITS2) - 1)) << PTE_FILE_BITS1) \ - + (((pte).pte_low >> PTE_FILE_SHIFT3) \ - << (PTE_FILE_BITS1 + PTE_FILE_BITS2))) - -#define pgoff_to_pte(off) \ - ((pte_t) { .pte_low = \ - (((off) & ((1U << PTE_FILE_BITS1) - 1)) << PTE_FILE_SHIFT1) \ - + ((((off) >> PTE_FILE_BITS1) & ((1U << PTE_FILE_BITS2) - 1)) \ - << PTE_FILE_SHIFT2) \ - + (((off) >> (PTE_FILE_BITS1 + PTE_FILE_BITS2)) \ - << PTE_FILE_SHIFT3) \ - + _PAGE_FILE }) +#define PTE_FILE_MASK1 ((1U << PTE_FILE_BITS1) - 1) +#define PTE_FILE_MASK2 ((1U << PTE_FILE_BITS2) - 1) + +#define PTE_FILE_LSHIFT2 (PTE_FILE_BITS1) +#define PTE_FILE_LSHIFT3 (PTE_FILE_BITS1 + PTE_FILE_BITS2) + +#define pte_to_pgoff(pte) \ + (_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ + _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ + __frob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_LSHIFT3)) + +#define pgoff_to_pte(off) \ + ((pte_t) { .pte_low = \ + _mfrob(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ + _mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ + __frob(off, PTE_FILE_LSHIFT3, PTE_FILE_SHIFT3) + \ + _PAGE_FILE }) #endif /* CONFIG_MEM_SOFT_DIRTY */ _ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-12 21:57 ` Andrew Morton @ 2013-08-12 22:28 ` Andy Lutomirski 2013-08-12 22:37 ` Andrew Morton 2013-08-13 5:02 ` Cyrill Gorcunov 0 siblings, 2 replies; 28+ messages in thread From: Andy Lutomirski @ 2013-08-12 22:28 UTC (permalink / raw) To: Andrew Morton Cc: Cyrill Gorcunov, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, H. Peter Anvin, Thomas Gleixner On Mon, Aug 12, 2013 at 2:57 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 8 Aug 2013 18:51:20 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote: > >> On Wed, Aug 07, 2013 at 01:28:12PM -0700, Andrew Morton wrote: >> > >> > Good god. >> > >> > I wonder if these can be turned into out-of-line functions in some form >> > which humans can understand. >> > >> > or >> > >> > #define pte_to_pgoff(pte) >> > frob(pte, PTE_FILE_SHIFT1, PTE_FILE_BITS1) + >> > frob(PTE_FILE_SHIFT2, PTE_FILE_BITS2) + >> > frob(PTE_FILE_SHIFT3, PTE_FILE_BITS3) + >> > frob(PTE_FILE_SHIFT4, PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) >> >> Hi, here is what I ended up with. Please take a look (I decided to post >> patch in the thread since it's related to the context of the mails). > > You could have #undefed _mfrob and __frob after using them, but whatever. > > I saved this patch to wave at the x86 guys for 3.12. I plan to merge > mm-save-soft-dirty-bits-on-file-pages.patch for 3.11. > >> Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE" >> test present in this pgtable-2level.h file at all? I can't imagine >> where it can be false on x86. > > I doubt if "Guys" read this. x86 maintainers cc'ed. > > > > > > From: Cyrill Gorcunov <gorcunov@gmail.com> > Subject: arch/x86/include/asm/pgtable-2level.h: clean up pte_to_pgoff and pgoff_to_pte helpers > > Andrew asked if there a way to make pte_to_pgoff and pgoff_to_pte macro > helpers somehow more readable. > > With this patch it should be more understandable what is happening with > bits when they come to and from pte entry. > > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > arch/x86/include/asm/pgtable-2level.h | 82 ++++++++++++------------ > 1 file changed, 41 insertions(+), 41 deletions(-) > > diff -puN arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers arch/x86/include/asm/pgtable-2level.h > --- a/arch/x86/include/asm/pgtable-2level.h~arch-x86-include-asm-pgtable-2levelh-clean-up-pte_to_pgoff-and-pgoff_to_pte-helpers > +++ a/arch/x86/include/asm/pgtable-2level.h > @@ -55,6 +55,9 @@ static inline pmd_t native_pmdp_get_and_ > #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp) > #endif > > +#define _mfrob(v,r,m,l) ((((v) >> (r)) & (m)) << (l)) > +#define __frob(v,r,l) (((v) >> (r)) << (l)) > + > #ifdef CONFIG_MEM_SOFT_DIRTY > If I'm understanding this right, the idea is to take the bits in the range a..b of v and stick them at c..d, where a-b == c-d. Would it make sense to change this to look something like #define __frob(v, inmsb, inlsb, outlsb) ((v >> inlsb) & ((1<<(inmsb - inlsb + 1)-1) << outlsb) For extra fun, there could be an __unfrob macro that takes the same inmsg, inlsb, outlsb parameters but undoes it so that it's (more) clear that the operations that are supposed to be inverses are indeed inverses. --Andy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-12 22:28 ` Andy Lutomirski @ 2013-08-12 22:37 ` Andrew Morton 2013-08-13 5:02 ` Cyrill Gorcunov 1 sibling, 0 replies; 28+ messages in thread From: Andrew Morton @ 2013-08-12 22:37 UTC (permalink / raw) To: Andy Lutomirski Cc: Cyrill Gorcunov, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, H. Peter Anvin, Thomas Gleixner On Mon, 12 Aug 2013 15:28:06 -0700 Andy Lutomirski <luto@amacapital.net> wrote: > > +#define _mfrob(v,r,m,l) ((((v) >> (r)) & (m)) << (l)) > > +#define __frob(v,r,l) (((v) >> (r)) << (l)) > > + > > #ifdef CONFIG_MEM_SOFT_DIRTY > > > > If I'm understanding this right, the idea is to take the bits in the > range a..b of v and stick them at c..d, where a-b == c-d. Would it > make sense to change this to look something like > > #define __frob(v, inmsb, inlsb, outlsb) ((v >> inlsb) & ((1<<(inmsb - > inlsb + 1)-1) << outlsb) > > For extra fun, there could be an __unfrob macro that takes the same > inmsg, inlsb, outlsb parameters but undoes it so that it's (more) > clear that the operations that are supposed to be inverses are indeed > inverses. hm, I seem to remember writing drivers/net/ethernet/3com/3c59x.c:BFINS() and BFEXT() shortly after the invention of the electronic computer. I'm kinda surprised that we don't already have something like this in kernel.h or somewhere - there's surely a ton of code which does such things. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-12 22:28 ` Andy Lutomirski 2013-08-12 22:37 ` Andrew Morton @ 2013-08-13 5:02 ` Cyrill Gorcunov 2013-08-13 15:14 ` H. Peter Anvin 1 sibling, 1 reply; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-13 5:02 UTC (permalink / raw) To: Andy Lutomirski Cc: Andrew Morton, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, H. Peter Anvin, Thomas Gleixner On Mon, Aug 12, 2013 at 03:28:06PM -0700, Andy Lutomirski wrote: > > > > You could have #undefed _mfrob and __frob after using them, but whatever. Sure, for some reason I forgot to do that. Will send update on top. > > I saved this patch to wave at the x86 guys for 3.12. I plan to merge > > mm-save-soft-dirty-bits-on-file-pages.patch for 3.11. > > > >> Guys, is there a reason for "if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE" > >> test present in this pgtable-2level.h file at all? I can't imagine > >> where it can be false on x86. > > > > I doubt if "Guys" read this. x86 maintainers cc'ed. Thanks! > > +#define _mfrob(v,r,m,l) ((((v) >> (r)) & (m)) << (l)) > > +#define __frob(v,r,l) (((v) >> (r)) << (l)) > > + > > #ifdef CONFIG_MEM_SOFT_DIRTY > > If I'm understanding this right, the idea is to take the bits in the > range a..b of v and stick them at c..d, where a-b == c-d. Would it > make sense to change this to look something like > > #define __frob(v, inmsb, inlsb, outlsb) ((v >> inlsb) & ((1<<(inmsb - > inlsb + 1)-1) << outlsb) There is a case when you don't need a mask completely. And because this pte conversion is on hot path and time critical I kept generated code as it was (even if that lead to slightly less clear source code). > For extra fun, there could be an __unfrob macro that takes the same > inmsg, inlsb, outlsb parameters but undoes it so that it's (more) > clear that the operations that are supposed to be inverses are indeed > inverses. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-13 5:02 ` Cyrill Gorcunov @ 2013-08-13 15:14 ` H. Peter Anvin 2013-08-13 15:37 ` Cyrill Gorcunov 0 siblings, 1 reply; 28+ messages in thread From: H. Peter Anvin @ 2013-08-13 15:14 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, Thomas Gleixner On 08/12/2013 10:02 PM, Cyrill Gorcunov wrote: > > There is a case when you don't need a mask completely. And because this > pte conversion is on hot path and time critical I kept generated code > as it was (even if that lead to slightly less clear source code). > Does it actually matter, generated-code-wise, or is the compiler smart enough to figure it out? The reason I'm asking is because it makes the code much harder to follow. The other thing is can we please pretty please call it something other than "frob"? -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-13 15:14 ` H. Peter Anvin @ 2013-08-13 15:37 ` Cyrill Gorcunov 2013-08-13 16:43 ` H. Peter Anvin 0 siblings, 1 reply; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-13 15:37 UTC (permalink / raw) To: H. Peter Anvin Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, Thomas Gleixner On Tue, Aug 13, 2013 at 08:14:39AM -0700, H. Peter Anvin wrote: > On 08/12/2013 10:02 PM, Cyrill Gorcunov wrote: > > > > There is a case when you don't need a mask completely. And because this > > pte conversion is on hot path and time critical I kept generated code > > as it was (even if that lead to slightly less clear source code). > > > > Does it actually matter, generated-code-wise, or is the compiler smart > enough to figure it out? The reason I'm asking is because it makes the gcc-4.7.2 is smart enough to suppress useless masking (ie ((1u << 31) - 1)) completely but I don't know if this can be assumed for all gcc series. > code much harder to follow. I see. OK, I'll try to prepare more readable macro helpers. > > The other thing is can we please pretty please call it something other > than "frob"? Sure. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-13 15:37 ` Cyrill Gorcunov @ 2013-08-13 16:43 ` H. Peter Anvin 2013-08-13 21:28 ` Cyrill Gorcunov 0 siblings, 1 reply; 28+ messages in thread From: H. Peter Anvin @ 2013-08-13 16:43 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, Thomas Gleixner On 08/13/2013 08:37 AM, Cyrill Gorcunov wrote: >> >> Does it actually matter, generated-code-wise, or is the compiler smart >> enough to figure it out? The reason I'm asking is because it makes the > > gcc-4.7.2 is smart enough to suppress useless masking (ie ((1u << 31) - 1)) > completely but I don't know if this can be assumed for all gcc series. > I would be highly surprised if it wasn't the case for any gcc we care about. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages 2013-08-13 16:43 ` H. Peter Anvin @ 2013-08-13 21:28 ` Cyrill Gorcunov 0 siblings, 0 replies; 28+ messages in thread From: Cyrill Gorcunov @ 2013-08-13 21:28 UTC (permalink / raw) To: H. Peter Anvin Cc: Andy Lutomirski, Andrew Morton, linux-mm, linux-kernel, xemul, mpm, xiaoguangrong, mtosatti, kosaki.motohiro, sfr, peterz, aneesh.kumar, Ingo Molnar, Thomas Gleixner On Tue, Aug 13, 2013 at 09:43:23AM -0700, H. Peter Anvin wrote: > On 08/13/2013 08:37 AM, Cyrill Gorcunov wrote: > >> > >> Does it actually matter, generated-code-wise, or is the compiler smart > >> enough to figure it out? The reason I'm asking is because it makes the > > > > gcc-4.7.2 is smart enough to suppress useless masking (ie ((1u << 31) - 1)) > > completely but I don't know if this can be assumed for all gcc series. > > > > I would be highly surprised if it wasn't the case for any gcc we care about. Does below one looks better? (Btw, what about the snippet we have there as well #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define PTE_FILE_SHIFT2 (_PAGE_BIT_FILE + 1) #define PTE_FILE_SHIFT3 (_PAGE_BIT_PROTNONE + 1) #else #define PTE_FILE_SHIFT2 (_PAGE_BIT_PROTNONE + 1) #define PTE_FILE_SHIFT3 (_PAGE_BIT_FILE + 1) #endif where #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL -> 8 #define _PAGE_BIT_FILE _PAGE_BIT_DIRTY -> 6 so I wonder where the cases on x86 when _PAGE_BIT_FILE > _PAGE_BIT_PROTNONE, what i'm missing here?) --- arch/x86/include/asm/pgtable-2level.h | 37 +++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 16 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h @@ -55,8 +55,11 @@ static inline pmd_t native_pmdp_get_and_ #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp) #endif -#define _mfrob(v,r,m,l) ((((v) >> (r)) & (m)) << (l)) -#define __frob(v,r,l) (((v) >> (r)) << (l)) +/* + * For readable bitfield manipulations. + */ +#define PTE_FILE_NOMASK (-1U) +#define __bfop(v,r,m,l) ((((v) >> (r)) & (m)) << (l)) #ifdef CONFIG_MEM_SOFT_DIRTY @@ -83,17 +86,17 @@ static inline pmd_t native_pmdp_get_and_ #define PTE_FILE_LSHIFT4 (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) #define pte_to_pgoff(pte) \ - (_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ - _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ - _mfrob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \ - __frob((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_LSHIFT4)) + (__bfop((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ + __bfop((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ + __bfop((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + \ + __bfop((pte).pte_low, PTE_FILE_SHIFT4, PTE_FILE_NOMASK, PTE_FILE_LSHIFT4)) #define pgoff_to_pte(off) \ ((pte_t) { .pte_low = \ - _mfrob(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ - _mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ - _mfrob(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3) + \ - __frob(off, PTE_FILE_LSHIFT4, PTE_FILE_SHIFT4) + \ + __bfop(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ + __bfop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ + __bfop(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3) + \ + __bfop(off, PTE_FILE_LSHIFT4, PTE_FILE_NOMASK, PTE_FILE_SHIFT4) + \ _PAGE_FILE }) #else /* CONFIG_MEM_SOFT_DIRTY */ @@ -121,19 +124,21 @@ static inline pmd_t native_pmdp_get_and_ #define PTE_FILE_LSHIFT3 (PTE_FILE_BITS1 + PTE_FILE_BITS2) #define pte_to_pgoff(pte) \ - (_mfrob((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ - _mfrob((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ - __frob((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_LSHIFT3)) + (__bfop((pte).pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + \ + __bfop((pte).pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + \ + __bfop((pte).pte_low, PTE_FILE_SHIFT3, PTE_FILE_NOMASK, PTE_FILE_LSHIFT3)) #define pgoff_to_pte(off) \ ((pte_t) { .pte_low = \ - _mfrob(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ - _mfrob(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ - __frob(off, PTE_FILE_LSHIFT3, PTE_FILE_SHIFT3) + \ + __bfop(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + \ + __bfop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + \ + __bfop(off, PTE_FILE_LSHIFT3, PTE_FILE_NOMASK, PTE_FILE_SHIFT3) + \ _PAGE_FILE }) #endif /* CONFIG_MEM_SOFT_DIRTY */ +#undef __bfop + /* Encode and de-code a swap entry */ #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2013-08-13 21:29 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-30 20:41 [patch 0/2] Soft-dirty page tracker improvemens Cyrill Gorcunov
2013-07-30 20:41 ` [patch 1/2] [PATCH] mm: Save soft-dirty bits on swapped pages Cyrill Gorcunov
2013-07-31 8:16 ` Pavel Emelyanov
2013-08-01 0:51 ` Minchan Kim
2013-08-01 5:53 ` Cyrill Gorcunov
2013-08-01 6:16 ` Minchan Kim
2013-08-01 6:28 ` Cyrill Gorcunov
2013-08-01 6:37 ` Minchan Kim
2013-08-01 6:38 ` Cyrill Gorcunov
[not found] ` <51ff047d.2768310a.2fc4.340fSMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-05 2:17 ` Minchan Kim
[not found] ` <51ff1053.ab47310a.5d3f.566cSMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-05 2:54 ` Minchan Kim
[not found] ` <51ff14e9.87ef440a.1424.ffffe470SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-05 5:43 ` Cyrill Gorcunov
2013-08-07 20:21 ` Andrew Morton
2013-08-07 20:29 ` Cyrill Gorcunov
2013-08-10 17:48 ` James Bottomley
2013-07-30 20:41 ` [patch 2/2] [PATCH] mm: Save soft-dirty bits on file pages Cyrill Gorcunov
2013-07-31 8:16 ` Pavel Emelyanov
2013-08-07 20:28 ` Andrew Morton
2013-08-07 20:31 ` Cyrill Gorcunov
2013-08-08 14:51 ` Cyrill Gorcunov
2013-08-12 21:57 ` Andrew Morton
2013-08-12 22:28 ` Andy Lutomirski
2013-08-12 22:37 ` Andrew Morton
2013-08-13 5:02 ` Cyrill Gorcunov
2013-08-13 15:14 ` H. Peter Anvin
2013-08-13 15:37 ` Cyrill Gorcunov
2013-08-13 16:43 ` H. Peter Anvin
2013-08-13 21:28 ` Cyrill Gorcunov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).