From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 0/4] mm: new flag to forbid zero page mappings for a vma Date: Fri, 17 Oct 2014 16:09:46 +0200 Message-ID: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Return-path: Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky List-ID: s390 has the special notion of storage keys which are some sort of page flags associated with physical pages and live outside of direct addressable memory. These storage keys can be queried and changed with a special set of instructions. The mentioned instructions behave quite nicely under virtualization, if there is: - an invalid pte, then the instructions will work on some memory reserved in the host page table - a valid pte, then the instructions will work with the real storage key Thanks to Martin with his software reference and dirty bit tracking, the kernel does not issue any storage key instructions as now a software based approach will be taken, on the other hand distributions in the wild are currently using them. However, for virtualized guests we still have a problem with guest pages mapped to zero pages and the kernel same page merging. WIth each one multiple guest pages will point to the same physical page and share the same storage key. Let's fix this by introducing a new flag which will forbid new zero page mappings. If the guest issues a storage key related instruction we flag all vmas and drop existing zero page mappings and unmerge the guest memory. Dominik Dingel (4): s390/mm: recfactor global pgste updates mm: introduce new VM_NOZEROPAGE flag s390/mm: prevent and break zero page mappings in case of storage keys s390/mm: disable KSM for storage key enabled pages arch/s390/Kconfig | 3 + arch/s390/include/asm/pgalloc.h | 2 - arch/s390/include/asm/pgtable.h | 3 +- arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/priv.c | 17 ++-- arch/s390/mm/pgtable.c | 181 ++++++++++++++++++---------------------- include/linux/mm.h | 13 ++- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 9 files changed, 112 insertions(+), 113 deletions(-) -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys Date: Fri, 17 Oct 2014 16:09:49 +0200 Message-ID: <1413554990-48512-4-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Return-path: In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky List-ID: As soon as storage keys are enabled we need to work around of zero page mappings to prevent inconsistencies between storage keys and pgste. Otherwise following data corruption could happen: 1) guest enables storage key 2) guest sets storage key for not mapped page X -> change goes to PGSTE 3) guest reads from page X -> as X was not dirty before, the page will be zero page backed, storage key from PGSTE for X will go to storage key for zero page 4) guest sets storage key for not mapped page Y (same logic as above 5) guest reads from page Y -> as Y was not dirty before, the page will be zero page backed, storage key from PGSTE for Y will got to storage key for zero page overwriting storage key for X While holding the mmap sem, we are safe before changes on entries we already fixed. As sske and host large pages are also mutual exclusive we do not even need to retry the fixup_user_fault. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/Kconfig | 3 +++ arch/s390/mm/pgtable.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 05c78bb..4e04e63 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -1,6 +1,9 @@ config MMU def_bool y +config NOZEROPAGE + def_bool y + config ZONE_DMA def_bool y diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index ab55ba8..6321692 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr, pgste_t pgste; pgste = pgste_get_lock(pte); + /* + * Remove all zero page mappings, + * after establishing a policy to forbid zero page mappings + * following faults for that page will get fresh anonymous pages + */ + if (is_zero_pfn(pte_pfn(*pte))) { + ptep_flush_direct(walk->mm, addr, pte); + pte_val(*pte) = _PAGE_INVALID; + } /* Clear storage key */ pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT); @@ -1323,10 +1332,16 @@ void s390_enable_skey(void) { struct mm_walk walk = { .pte_entry = __s390_enable_skey }; struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; down_write(&mm->mmap_sem); if (mm_use_skey(mm)) goto out_up; + + for (vma = mm->mmap; vma; vma = vma->vm_next) + vma->vm_flags |= VM_NOZEROPAGE; + mm->def_flags |= VM_NOZEROPAGE; + walk.mm = mm; walk_page_range(0, TASK_SIZE, &walk); mm->context.use_skey = 1; -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Fri, 17 Oct 2014 16:09:48 +0200 Message-ID: <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Return-path: In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky List-ID: Add a new vma flag to allow an architecture to disable the backing of non-present, anonymous pages with the read-only empty zero page. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- include/linux/mm.h | 13 +++++++++++-- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cd33ae2..8f09c91 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ - +#define VM_NOZEROPAGE 0x00001000 /* forbid new zero page mappings */ #define VM_LOCKED 0x00002000 #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ @@ -179,7 +179,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP) /* This mask defines which mm->def_flags a process can inherit its parent */ -#define VM_INIT_DEF_MASK VM_NOHUGEPAGE +#define VM_INIT_DEF_MASK (VM_NOHUGEPAGE | VM_NOZEROPAGE) /* * mapping from the currently active vm_flags protection bits (the @@ -1293,6 +1293,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } +static inline int vma_forbids_zeropage(struct vm_area_struct *vma) +{ +#ifdef CONFIG_NOZEROPAGE + return vma->vm_flags & VM_NOZEROPAGE; +#else + return 0; +#endif +} + extern struct task_struct *task_of_stack(struct task_struct *task, struct vm_area_struct *vma, bool in_group); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index de98415..c271265 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_OOM; if (unlikely(khugepaged_enter(vma, vma->vm_flags))) return VM_FAULT_OOM; - if (!(flags & FAULT_FLAG_WRITE) && + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma) && transparent_hugepage_use_zero_page()) { spinlock_t *ptl; pgtable_t pgtable; diff --git a/mm/memory.c b/mm/memory.c index 64f82aa..1859b2b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_SIGBUS; /* Use the zero-page for reads */ - if (!(flags & FAULT_FLAG_WRITE)) { + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma)) { entry = pte_mkspecial(pfn_pte(my_zero_pfn(address), vma->vm_page_prot)); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 1/4] s390/mm: recfactor global pgste updates Date: Fri, 17 Oct 2014 16:09:47 +0200 Message-ID: <1413554990-48512-2-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Return-path: In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-Archive: List-Post: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik List-ID: Replace the s390 specific page table walker for the pgste updates with a call to the common code walk_page_range function. There are now two pte modification functions, one for the reset of the CMMA state and another one for the initialization of the storage keys. Signed-off-by: Dominik Dingel Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/pgalloc.h | 2 - arch/s390/include/asm/pgtable.h | 1 + arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/mm/pgtable.c | 153 ++++++++++++++-------------------------- 4 files changed, 56 insertions(+), 102 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 9e18a61..120e126 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned long); void page_table_free(struct mm_struct *, unsigned long *); void page_table_free_rcu(struct mmu_gather *, unsigned long *); -void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long, - bool init_skey); int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, unsigned long key, bool nq); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 5efb2fe..1e991f6a 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); extern void s390_enable_skey(void); +extern void s390_reset_cmma(struct mm_struct *mm); /* * No page table caches to initialise diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 81b0e11..7a33c11 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr) case KVM_S390_VM_MEM_CLR_CMMA: mutex_lock(&kvm->lock); idx = srcu_read_lock(&kvm->srcu); - page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false); + s390_reset_cmma(kvm->arch.gmap->mm); srcu_read_unlock(&kvm->srcu, idx); mutex_unlock(&kvm->lock); ret = 0; diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 5404a62..ab55ba8 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long *table) __free_page(page); } -static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, bool init_skey) -{ - pte_t *start_pte, *pte; - spinlock_t *ptl; - pgste_t pgste; - - start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte = start_pte; - do { - pgste = pgste_get_lock(pte); - pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK; - if (init_skey) { - unsigned long address; - - pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | - PGSTE_GR_BIT | PGSTE_GC_BIT); - - /* skip invalid and not writable pages */ - if (pte_val(*pte) & _PAGE_INVALID || - !(pte_val(*pte) & _PAGE_WRITE)) { - pgste_set_unlock(pte, pgste); - continue; - } - - address = pte_val(*pte) & PAGE_MASK; - page_set_storage_key(address, PAGE_DEFAULT_KEY, 1); - } - pgste_set_unlock(pte, pgste); - } while (pte++, addr += PAGE_SIZE, addr != end); - pte_unmap_unlock(start_pte, ptl); - - return addr; -} - -static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, bool init_skey) -{ - unsigned long next; - pmd_t *pmd; - - pmd = pmd_offset(pud, addr); - do { - next = pmd_addr_end(addr, end); - if (pmd_none_or_clear_bad(pmd)) - continue; - next = page_table_reset_pte(mm, pmd, addr, next, init_skey); - } while (pmd++, addr = next, addr != end); - - return addr; -} - -static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, bool init_skey) -{ - unsigned long next; - pud_t *pud; - - pud = pud_offset(pgd, addr); - do { - next = pud_addr_end(addr, end); - if (pud_none_or_clear_bad(pud)) - continue; - next = page_table_reset_pmd(mm, pud, addr, next, init_skey); - } while (pud++, addr = next, addr != end); - - return addr; -} - -void page_table_reset_pgste(struct mm_struct *mm, unsigned long start, - unsigned long end, bool init_skey) -{ - unsigned long addr, next; - pgd_t *pgd; - - down_write(&mm->mmap_sem); - if (init_skey && mm_use_skey(mm)) - goto out_up; - addr = start; - pgd = pgd_offset(mm, addr); - do { - next = pgd_addr_end(addr, end); - if (pgd_none_or_clear_bad(pgd)) - continue; - next = page_table_reset_pud(mm, pgd, addr, next, init_skey); - } while (pgd++, addr = next, addr != end); - if (init_skey) - current->mm->context.use_skey = 1; -out_up: - up_write(&mm->mmap_sem); -} -EXPORT_SYMBOL(page_table_reset_pgste); - int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, unsigned long key, bool nq) { @@ -1044,11 +951,6 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm, return NULL; } -void page_table_reset_pgste(struct mm_struct *mm, unsigned long start, - unsigned long end, bool init_skey) -{ -} - static inline void page_table_free_pgste(unsigned long *table) { } @@ -1400,13 +1302,66 @@ EXPORT_SYMBOL_GPL(s390_enable_sie); * Enable storage key handling from now on and initialize the storage * keys with the default key. */ +static int __s390_enable_skey(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + unsigned long ptev; + pgste_t pgste; + + pgste = pgste_get_lock(pte); + /* Clear storage key */ + pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | + PGSTE_GR_BIT | PGSTE_GC_BIT); + ptev = pte_val(*pte); + if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE)) + page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1); + pgste_set_unlock(pte, pgste); + return 0; +} + void s390_enable_skey(void) { - page_table_reset_pgste(current->mm, 0, TASK_SIZE, true); + struct mm_walk walk = { .pte_entry = __s390_enable_skey }; + struct mm_struct *mm = current->mm; + + down_write(&mm->mmap_sem); + if (mm_use_skey(mm)) + goto out_up; + walk.mm = mm; + walk_page_range(0, TASK_SIZE, &walk); + mm->context.use_skey = 1; + +out_up: + up_write(&mm->mmap_sem); } EXPORT_SYMBOL_GPL(s390_enable_skey); /* + * Reset CMMA state, make all pages stable again. + */ +static int __s390_reset_cmma(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + pgste_t pgste; + + pgste = pgste_get_lock(pte); + pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK; + pgste_set_unlock(pte, pgste); + return 0; +} + +void s390_reset_cmma(struct mm_struct *mm) +{ + struct mm_walk walk = { .pte_entry = __s390_reset_cmma }; + + down_write(&mm->mmap_sem); + walk.mm = mm; + walk_page_range(0, TASK_SIZE, &walk); + up_write(&mm->mmap_sem); +} +EXPORT_SYMBOL_GPL(s390_reset_cmma); + +/* * Test and reset if a guest page is dirty */ bool gmap_test_and_clear_dirty(unsigned long address, struct gmap *gmap) -- 1.8.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages Date: Fri, 17 Oct 2014 16:09:50 +0200 Message-ID: <1413554990-48512-5-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Return-path: In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-Archive: List-Post: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik List-ID: When storage keys are enabled unmerge already merged pages and prevent new pages from being merged. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/pgtable.h | 2 +- arch/s390/kvm/priv.c | 17 ++++++++++++----- arch/s390/mm/pgtable.c | 15 +++++++++++++-- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 1e991f6a..a5362e4 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1749,7 +1749,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset) extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); -extern void s390_enable_skey(void); +extern int s390_enable_skey(void); extern void s390_reset_cmma(struct mm_struct *mm); /* diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index f89c1cd..e0967fd 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu) return 0; } -static void __skey_check_enable(struct kvm_vcpu *vcpu) +static int __skey_check_enable(struct kvm_vcpu *vcpu) { + int rc = 0; if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE))) - return; + return rc; - s390_enable_skey(); + rc = s390_enable_skey(); trace_kvm_s390_skey_related_inst(vcpu); vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); + return rc; } static int handle_skey(struct kvm_vcpu *vcpu) { - __skey_check_enable(vcpu); + int rc = __skey_check_enable(vcpu); + if (rc) + return rc; vcpu->stat.instruction_storage_key++; if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) @@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) { - __skey_check_enable(vcpu); + int rc = __skey_check_enable(vcpu); + + if (rc) + return rc; if (set_guest_storage_key(current->mm, useraddr, vcpu->run->s.regs.gprs[reg1] & PFMF_KEY, vcpu->run->s.regs.gprs[reg1] & PFMF_NQ)) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 6321692..b3311c1 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include #include @@ -1328,18 +1330,26 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr, return 0; } -void s390_enable_skey(void) +int s390_enable_skey(void) { struct mm_walk walk = { .pte_entry = __s390_enable_skey }; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; + int rc = 0; down_write(&mm->mmap_sem); if (mm_use_skey(mm)) goto out_up; - for (vma = mm->mmap; vma; vma = vma->vm_next) + for (vma = mm->mmap; vma; vma = vma->vm_next) { + if (ksm_madvise(vma, vma->vm_start, vma->vm_end, + MADV_UNMERGEABLE, &vma->vm_flags)) { + rc = -ENOMEM; + goto out_up; + } vma->vm_flags |= VM_NOZEROPAGE; + } + mm->def_flags &= ~VM_MERGEABLE; mm->def_flags |= VM_NOZEROPAGE; walk.mm = mm; @@ -1348,6 +1358,7 @@ void s390_enable_skey(void) out_up: up_write(&mm->mmap_sem); + return rc; } EXPORT_SYMBOL_GPL(s390_enable_skey); -- 1.8.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Hansen Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Fri, 17 Oct 2014 15:04:21 -0700 Message-ID: <54419265.9000000@intel.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin List-ID: On 10/17/2014 07:09 AM, Dominik Dingel wrote: > diff --git a/include/linux/mm.h b/include/linux/mm.h > index cd33ae2..8f09c91 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp); > #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ > #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ > #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ > - > +#define VM_NOZEROPAGE 0x00001000 /* forbid new zero page mappings */ > #define VM_LOCKED 0x00002000 > #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ This seems like an awfully obscure use for a very constrained resource (VM_ flags). Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE status? Reading the patches, it _looks_ like it might be an all or nothing thing. Full disclosure: I've got an x86-specific feature I want to steal a flag for. Maybe we should just define another VM_ARCH bit. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Hansen Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Sat, 18 Oct 2014 09:28:17 -0700 Message-ID: <54429521.80402@intel.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20141018164928.2341415f@BR9TG4T3.de.ibm.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Dominik Dingel Cc: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin List-ID: On 10/18/2014 07:49 AM, Dominik Dingel wrote: > On Fri, 17 Oct 2014 15:04:21 -0700 > Dave Hansen wrote: >> Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE >> status? Reading the patches, it _looks_ like it might be an all or >> nothing thing. > > Currently it is an all or nothing thing, but for a future change we might want to just > tag the guest memory instead of the complete user address space. I think it's a bad idea to reserve a flag for potential future use. If you _need_ it in the future, let's have the discussion then. For now, I think it should probably just be stored in the mm somewhere. >> Full disclosure: I've got an x86-specific feature I want to steal a flag >> for. Maybe we should just define another VM_ARCH bit. >> > > So you think of something like: > > #if defined(CONFIG_S390) > # define VM_NOZEROPAGE VM_ARCH_1 > #endif > > #ifndef VM_NOZEROPAGE > # define VM_NOZEROPAGE VM_NONE > #endif > > right? Yeah, something like that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Mon, 20 Oct 2014 20:14:53 +0200 Message-ID: <5445511D.1090603@redhat.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <54429521.80402@intel.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Dave Hansen , Dominik Dingel Cc: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin List-ID: On 10/18/2014 06:28 PM, Dave Hansen wrote: > > Currently it is an all or nothing thing, but for a future change we might want to just > > tag the guest memory instead of the complete user address space. > > I think it's a bad idea to reserve a flag for potential future use. If > you_need_ it in the future, let's have the discussion then. For now, I > think it should probably just be stored in the mm somewhere. I agree with Dave (I thought I disagreed, but I changed my mind while writing down my thoughts). Just define mm_forbids_zeropage in arch/s390/include/asm, and make it return mm->context.use_skey---with a comment explaining how this is only for processes that use KVM, and then only for guests that use storage keys. Paolo (who was just taught what storage keys really are) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 0/4] mm: new flag to forbid zero page mappings for a vma Date: Fri, 17 Oct 2014 16:09:46 +0200 Message-ID: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini < To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Return-path: Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org s390 has the special notion of storage keys which are some sort of page flags associated with physical pages and live outside of direct addressable memory. These storage keys can be queried and changed with a special set of instructions. The mentioned instructions behave quite nicely under virtualization, if there is: - an invalid pte, then the instructions will work on some memory reserved in the host page table - a valid pte, then the instructions will work with the real storage key Thanks to Martin with his software reference and dirty bit tracking, the kernel does not issue any storage key instructions as now a software based approach will be taken, on the other hand distributions in the wild are currently using them. However, for virtualized guests we still have a problem with guest pages mapped to zero pages and the kernel same page merging. WIth each one multiple guest pages will point to the same physical page and share the same storage key. Let's fix this by introducing a new flag which will forbid new zero page mappings. If the guest issues a storage key related instruction we flag all vmas and drop existing zero page mappings and unmerge the guest memory. Dominik Dingel (4): s390/mm: recfactor global pgste updates mm: introduce new VM_NOZEROPAGE flag s390/mm: prevent and break zero page mappings in case of storage keys s390/mm: disable KSM for storage key enabled pages arch/s390/Kconfig | 3 + arch/s390/include/asm/pgalloc.h | 2 - arch/s390/include/asm/pgtable.h | 3 +- arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/priv.c | 17 ++-- arch/s390/mm/pgtable.c | 181 ++++++++++++++++++---------------------- include/linux/mm.h | 13 ++- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 9 files changed, 112 insertions(+), 113 deletions(-) -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys Date: Fri, 17 Oct 2014 16:09:49 +0200 Message-ID: <1413554990-48512-4-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini < To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Return-path: In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org As soon as storage keys are enabled we need to work around of zero page mappings to prevent inconsistencies between storage keys and pgste. Otherwise following data corruption could happen: 1) guest enables storage key 2) guest sets storage key for not mapped page X -> change goes to PGSTE 3) guest reads from page X -> as X was not dirty before, the page will be zero page backed, storage key from PGSTE for X will go to storage key for zero page 4) guest sets storage key for not mapped page Y (same logic as above 5) guest reads from page Y -> as Y was not dirty before, the page will be zero page backed, storage key from PGSTE for Y will got to storage key for zero page overwriting storage key for X While holding the mmap sem, we are safe before changes on entries we already fixed. As sske and host large pages are also mutual exclusive we do not even need to retry the fixup_user_fault. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/Kconfig | 3 +++ arch/s390/mm/pgtable.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 05c78bb..4e04e63 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -1,6 +1,9 @@ config MMU def_bool y +config NOZEROPAGE + def_bool y + config ZONE_DMA def_bool y diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index ab55ba8..6321692 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr, pgste_t pgste; pgste = pgste_get_lock(pte); + /* + * Remove all zero page mappings, + * after establishing a policy to forbid zero page mappings + * following faults for that page will get fresh anonymous pages + */ + if (is_zero_pfn(pte_pfn(*pte))) { + ptep_flush_direct(walk->mm, addr, pte); + pte_val(*pte) = _PAGE_INVALID; + } /* Clear storage key */ pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT); @@ -1323,10 +1332,16 @@ void s390_enable_skey(void) { struct mm_walk walk = { .pte_entry = __s390_enable_skey }; struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; down_write(&mm->mmap_sem); if (mm_use_skey(mm)) goto out_up; + + for (vma = mm->mmap; vma; vma = vma->vm_next) + vma->vm_flags |= VM_NOZEROPAGE; + mm->def_flags |= VM_NOZEROPAGE; + walk.mm = mm; walk_page_range(0, TASK_SIZE, &walk); mm->context.use_skey = 1; -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Fri, 17 Oct 2014 16:09:48 +0200 Message-ID: <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini < To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Return-path: In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org Add a new vma flag to allow an architecture to disable the backing of non-present, anonymous pages with the read-only empty zero page. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- include/linux/mm.h | 13 +++++++++++-- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cd33ae2..8f09c91 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ - +#define VM_NOZEROPAGE 0x00001000 /* forbid new zero page mappings */ #define VM_LOCKED 0x00002000 #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ @@ -179,7 +179,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP) /* This mask defines which mm->def_flags a process can inherit its parent */ -#define VM_INIT_DEF_MASK VM_NOHUGEPAGE +#define VM_INIT_DEF_MASK (VM_NOHUGEPAGE | VM_NOZEROPAGE) /* * mapping from the currently active vm_flags protection bits (the @@ -1293,6 +1293,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } +static inline int vma_forbids_zeropage(struct vm_area_struct *vma) +{ +#ifdef CONFIG_NOZEROPAGE + return vma->vm_flags & VM_NOZEROPAGE; +#else + return 0; +#endif +} + extern struct task_struct *task_of_stack(struct task_struct *task, struct vm_area_struct *vma, bool in_group); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index de98415..c271265 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_OOM; if (unlikely(khugepaged_enter(vma, vma->vm_flags))) return VM_FAULT_OOM; - if (!(flags & FAULT_FLAG_WRITE) && + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma) && transparent_hugepage_use_zero_page()) { spinlock_t *ptl; pgtable_t pgtable; diff --git a/mm/memory.c b/mm/memory.c index 64f82aa..1859b2b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_SIGBUS; /* Use the zero-page for reads */ - if (!(flags & FAULT_FLAG_WRITE)) { + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma)) { entry = pte_mkspecial(pfn_pte(my_zero_pfn(address), vma->vm_page_prot)); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Sat, 18 Oct 2014 16:49:28 +0200 Message-ID: <20141018164928.2341415f@BR9TG4T3.de.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz Return-path: In-Reply-To: <54419265.9000000@intel.com> Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org On Fri, 17 Oct 2014 15:04:21 -0700 Dave Hansen wrote: > Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE > status? Reading the patches, it _looks_ like it might be an all or > nothing thing. Currently it is an all or nothing thing, but for a future change we might want to just tag the guest memory instead of the complete user address space. > Full disclosure: I've got an x86-specific feature I want to steal a flag > for. Maybe we should just define another VM_ARCH bit. > So you think of something like: #if defined(CONFIG_S390) # define VM_NOZEROPAGE VM_ARCH_1 #endif #ifndef VM_NOZEROPAGE # define VM_NOZEROPAGE VM_NONE #endif right? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Schwidefsky Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Tue, 21 Oct 2014 08:11:31 +0200 Message-ID: <20141021081131.641c6104@mschwide> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Dave Hansen , Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@d To: Paolo Bonzini Return-path: In-Reply-To: <5445511D.1090603@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Mon, 20 Oct 2014 20:14:53 +0200 Paolo Bonzini wrote: > On 10/18/2014 06:28 PM, Dave Hansen wrote: > > > Currently it is an all or nothing thing, but for a future change we might want to just > > > tag the guest memory instead of the complete user address space. > > > > I think it's a bad idea to reserve a flag for potential future use. If > > you_need_ it in the future, let's have the discussion then. For now, I > > think it should probably just be stored in the mm somewhere. > > I agree with Dave (I thought I disagreed, but I changed my mind while > writing down my thoughts). Just define mm_forbids_zeropage in > arch/s390/include/asm, and make it return mm->context.use_skey---with a > comment explaining how this is only for processes that use KVM, and then > only for guests that use storage keys. The mm_forbids_zeropage() sure will work for now, but I think a vma flag is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, the best solution would be to only mark those vmas that are mapped to the guest. That we have not found a way to do that yet in a sensible way does not change the fact that "no-zero-page" is a per-vma property, no? But if you insist we go with the mm_forbids_zeropage() until we find a clever way to distinguish the guest vmas from the qemu ones. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Tue, 21 Oct 2014 10:11:43 +0200 Message-ID: <5446153F.6030407@redhat.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> <20141021081131.641c6104@mschwide> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: Dave Hansen , Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" Return-path: In-Reply-To: <20141021081131.641c6104@mschwide> Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org On 10/21/2014 08:11 AM, Martin Schwidefsky wrote: >> I agree with Dave (I thought I disagreed, but I changed my mind while >> writing down my thoughts). Just define mm_forbids_zeropage in >> arch/s390/include/asm, and make it return mm->context.use_skey---with a >> comment explaining how this is only for processes that use KVM, and then >> only for guests that use storage keys. > > The mm_forbids_zeropage() sure will work for now, but I think a vma flag > is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, > the best solution would be to only mark those vmas that are mapped to > the guest. That we have not found a way to do that yet in a sensible way > does not change the fact that "no-zero-page" is a per-vma property, no? I agree it should be per-VMA. However, right now the code is complicated unnecessarily by making it a per-VMA flag. Also, setting the flag per VMA should probably be done in kvm_arch_prepare_memory_region together with some kind of storage key notifier. This is not very much like Dominik's patch. All in all, mm_forbids_zeropage() provides a non-intrusive and non-controversial way to fix the bug. Later on, switching to vma_forbids_zeropage() will be trivial as far as mm/ code is concerned. > But if you insist we go with the mm_forbids_zeropage() until we find a > clever way to distinguish the guest vmas from the qemu ones. Yeah, I think it is simpler for now. Paolo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Dingel Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Tue, 21 Oct 2014 13:20:25 +0200 Message-ID: <20141021132025.60dd3390@BR9TG4T3.de.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> <20141021081131.641c6104@mschwide> <5446153F.6030407@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Martin Schwidefsky , Dave Hansen , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A To: Paolo Bonzini Return-path: In-Reply-To: <5446153F.6030407@redhat.com> Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org On Tue, 21 Oct 2014 10:11:43 +0200 Paolo Bonzini wrote: > > > On 10/21/2014 08:11 AM, Martin Schwidefsky wrote: > >> I agree with Dave (I thought I disagreed, but I changed my mind while > >> writing down my thoughts). Just define mm_forbids_zeropage in > >> arch/s390/include/asm, and make it return mm->context.use_skey---with a > >> comment explaining how this is only for processes that use KVM, and then > >> only for guests that use storage keys. > > > > The mm_forbids_zeropage() sure will work for now, but I think a vma flag > > is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, > > the best solution would be to only mark those vmas that are mapped to > > the guest. That we have not found a way to do that yet in a sensible way > > does not change the fact that "no-zero-page" is a per-vma property, no? > > I agree it should be per-VMA. However, right now the code is > complicated unnecessarily by making it a per-VMA flag. Also, setting > the flag per VMA should probably be done in > kvm_arch_prepare_memory_region together with some kind of storage key > notifier. This is not very much like Dominik's patch. All in all, > mm_forbids_zeropage() provides a non-intrusive and non-controversial way > to fix the bug. Later on, switching to vma_forbids_zeropage() will be > trivial as far as mm/ code is concerned. > Thank you for all the feedback, will cook up a new version. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f46.google.com (mail-wg0-f46.google.com [74.125.82.46]) by kanga.kvack.org (Postfix) with ESMTP id 9471A6B0072 for ; Fri, 17 Oct 2014 10:10:13 -0400 (EDT) Received: by mail-wg0-f46.google.com with SMTP id l18so1000464wgh.29 for ; Fri, 17 Oct 2014 07:10:13 -0700 (PDT) Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com. [195.75.94.110]) by mx.google.com with ESMTPS id bn2si5974869wib.24.2014.10.17.07.10.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 17 Oct 2014 07:10:10 -0700 (PDT) Received: from /spool/local by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 17 Oct 2014 15:10:09 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id AC4CB17D805F for ; Fri, 17 Oct 2014 15:12:23 +0100 (BST) Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9HEA6Jd64946426 for ; Fri, 17 Oct 2014 14:10:06 GMT Received: from d06av06.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9H97pkh022176 for ; Fri, 17 Oct 2014 05:07:52 -0400 From: Dominik Dingel Subject: [PATCH 1/4] s390/mm: recfactor global pgste updates Date: Fri, 17 Oct 2014 16:09:47 +0200 Message-Id: <1413554990-48512-2-git-send-email-dingel@linux.vnet.ibm.com> In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel Replace the s390 specific page table walker for the pgste updates with a call to the common code walk_page_range function. There are now two pte modification functions, one for the reset of the CMMA state and another one for the initialization of the storage keys. Signed-off-by: Dominik Dingel Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/pgalloc.h | 2 - arch/s390/include/asm/pgtable.h | 1 + arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/mm/pgtable.c | 153 ++++++++++++++-------------------------- 4 files changed, 56 insertions(+), 102 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 9e18a61..120e126 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned long); void page_table_free(struct mm_struct *, unsigned long *); void page_table_free_rcu(struct mmu_gather *, unsigned long *); -void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long, - bool init_skey); int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, unsigned long key, bool nq); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 5efb2fe..1e991f6a 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); extern void s390_enable_skey(void); +extern void s390_reset_cmma(struct mm_struct *mm); /* * No page table caches to initialise diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 81b0e11..7a33c11 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr) case KVM_S390_VM_MEM_CLR_CMMA: mutex_lock(&kvm->lock); idx = srcu_read_lock(&kvm->srcu); - page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false); + s390_reset_cmma(kvm->arch.gmap->mm); srcu_read_unlock(&kvm->srcu, idx); mutex_unlock(&kvm->lock); ret = 0; diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 5404a62..ab55ba8 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long *table) __free_page(page); } -static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, bool init_skey) -{ - pte_t *start_pte, *pte; - spinlock_t *ptl; - pgste_t pgste; - - start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte = start_pte; - do { - pgste = pgste_get_lock(pte); - pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK; - if (init_skey) { - unsigned long address; - - pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | - PGSTE_GR_BIT | PGSTE_GC_BIT); - - /* skip invalid and not writable pages */ - if (pte_val(*pte) & _PAGE_INVALID || - !(pte_val(*pte) & _PAGE_WRITE)) { - pgste_set_unlock(pte, pgste); - continue; - } - - address = pte_val(*pte) & PAGE_MASK; - page_set_storage_key(address, PAGE_DEFAULT_KEY, 1); - } - pgste_set_unlock(pte, pgste); - } while (pte++, addr += PAGE_SIZE, addr != end); - pte_unmap_unlock(start_pte, ptl); - - return addr; -} - -static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, bool init_skey) -{ - unsigned long next; - pmd_t *pmd; - - pmd = pmd_offset(pud, addr); - do { - next = pmd_addr_end(addr, end); - if (pmd_none_or_clear_bad(pmd)) - continue; - next = page_table_reset_pte(mm, pmd, addr, next, init_skey); - } while (pmd++, addr = next, addr != end); - - return addr; -} - -static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, bool init_skey) -{ - unsigned long next; - pud_t *pud; - - pud = pud_offset(pgd, addr); - do { - next = pud_addr_end(addr, end); - if (pud_none_or_clear_bad(pud)) - continue; - next = page_table_reset_pmd(mm, pud, addr, next, init_skey); - } while (pud++, addr = next, addr != end); - - return addr; -} - -void page_table_reset_pgste(struct mm_struct *mm, unsigned long start, - unsigned long end, bool init_skey) -{ - unsigned long addr, next; - pgd_t *pgd; - - down_write(&mm->mmap_sem); - if (init_skey && mm_use_skey(mm)) - goto out_up; - addr = start; - pgd = pgd_offset(mm, addr); - do { - next = pgd_addr_end(addr, end); - if (pgd_none_or_clear_bad(pgd)) - continue; - next = page_table_reset_pud(mm, pgd, addr, next, init_skey); - } while (pgd++, addr = next, addr != end); - if (init_skey) - current->mm->context.use_skey = 1; -out_up: - up_write(&mm->mmap_sem); -} -EXPORT_SYMBOL(page_table_reset_pgste); - int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, unsigned long key, bool nq) { @@ -1044,11 +951,6 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm, return NULL; } -void page_table_reset_pgste(struct mm_struct *mm, unsigned long start, - unsigned long end, bool init_skey) -{ -} - static inline void page_table_free_pgste(unsigned long *table) { } @@ -1400,13 +1302,66 @@ EXPORT_SYMBOL_GPL(s390_enable_sie); * Enable storage key handling from now on and initialize the storage * keys with the default key. */ +static int __s390_enable_skey(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + unsigned long ptev; + pgste_t pgste; + + pgste = pgste_get_lock(pte); + /* Clear storage key */ + pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | + PGSTE_GR_BIT | PGSTE_GC_BIT); + ptev = pte_val(*pte); + if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE)) + page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1); + pgste_set_unlock(pte, pgste); + return 0; +} + void s390_enable_skey(void) { - page_table_reset_pgste(current->mm, 0, TASK_SIZE, true); + struct mm_walk walk = { .pte_entry = __s390_enable_skey }; + struct mm_struct *mm = current->mm; + + down_write(&mm->mmap_sem); + if (mm_use_skey(mm)) + goto out_up; + walk.mm = mm; + walk_page_range(0, TASK_SIZE, &walk); + mm->context.use_skey = 1; + +out_up: + up_write(&mm->mmap_sem); } EXPORT_SYMBOL_GPL(s390_enable_skey); /* + * Reset CMMA state, make all pages stable again. + */ +static int __s390_reset_cmma(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + pgste_t pgste; + + pgste = pgste_get_lock(pte); + pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK; + pgste_set_unlock(pte, pgste); + return 0; +} + +void s390_reset_cmma(struct mm_struct *mm) +{ + struct mm_walk walk = { .pte_entry = __s390_reset_cmma }; + + down_write(&mm->mmap_sem); + walk.mm = mm; + walk_page_range(0, TASK_SIZE, &walk); + up_write(&mm->mmap_sem); +} +EXPORT_SYMBOL_GPL(s390_reset_cmma); + +/* * Test and reset if a guest page is dirty */ bool gmap_test_and_clear_dirty(unsigned long address, struct gmap *gmap) -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f180.google.com (mail-wi0-f180.google.com [209.85.212.180]) by kanga.kvack.org (Postfix) with ESMTP id 616716B0073 for ; Fri, 17 Oct 2014 10:10:14 -0400 (EDT) Received: by mail-wi0-f180.google.com with SMTP id em10so1378965wid.1 for ; Fri, 17 Oct 2014 07:10:13 -0700 (PDT) Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com. [195.75.94.110]) by mx.google.com with ESMTPS id l1si1673645wjb.38.2014.10.17.07.10.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 17 Oct 2014 07:10:10 -0700 (PDT) Received: from /spool/local by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 17 Oct 2014 15:10:09 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id C488717D8062 for ; Fri, 17 Oct 2014 15:12:23 +0100 (BST) Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9HEA6a95570938 for ; Fri, 17 Oct 2014 14:10:06 GMT Received: from d06av06.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9H97p0p022181 for ; Fri, 17 Oct 2014 05:07:52 -0400 From: Dominik Dingel Subject: [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages Date: Fri, 17 Oct 2014 16:09:50 +0200 Message-Id: <1413554990-48512-5-git-send-email-dingel@linux.vnet.ibm.com> In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel When storage keys are enabled unmerge already merged pages and prevent new pages from being merged. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/pgtable.h | 2 +- arch/s390/kvm/priv.c | 17 ++++++++++++----- arch/s390/mm/pgtable.c | 15 +++++++++++++-- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 1e991f6a..a5362e4 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1749,7 +1749,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset) extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); -extern void s390_enable_skey(void); +extern int s390_enable_skey(void); extern void s390_reset_cmma(struct mm_struct *mm); /* diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index f89c1cd..e0967fd 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu) return 0; } -static void __skey_check_enable(struct kvm_vcpu *vcpu) +static int __skey_check_enable(struct kvm_vcpu *vcpu) { + int rc = 0; if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE))) - return; + return rc; - s390_enable_skey(); + rc = s390_enable_skey(); trace_kvm_s390_skey_related_inst(vcpu); vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); + return rc; } static int handle_skey(struct kvm_vcpu *vcpu) { - __skey_check_enable(vcpu); + int rc = __skey_check_enable(vcpu); + if (rc) + return rc; vcpu->stat.instruction_storage_key++; if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) @@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) { - __skey_check_enable(vcpu); + int rc = __skey_check_enable(vcpu); + + if (rc) + return rc; if (set_guest_storage_key(current->mm, useraddr, vcpu->run->s.regs.gprs[reg1] & PFMF_KEY, vcpu->run->s.regs.gprs[reg1] & PFMF_NQ)) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 6321692..b3311c1 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include #include @@ -1328,18 +1330,26 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr, return 0; } -void s390_enable_skey(void) +int s390_enable_skey(void) { struct mm_walk walk = { .pte_entry = __s390_enable_skey }; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; + int rc = 0; down_write(&mm->mmap_sem); if (mm_use_skey(mm)) goto out_up; - for (vma = mm->mmap; vma; vma = vma->vm_next) + for (vma = mm->mmap; vma; vma = vma->vm_next) { + if (ksm_madvise(vma, vma->vm_start, vma->vm_end, + MADV_UNMERGEABLE, &vma->vm_flags)) { + rc = -ENOMEM; + goto out_up; + } vma->vm_flags |= VM_NOZEROPAGE; + } + mm->def_flags &= ~VM_MERGEABLE; mm->def_flags |= VM_NOZEROPAGE; walk.mm = mm; @@ -1348,6 +1358,7 @@ void s390_enable_skey(void) out_up: up_write(&mm->mmap_sem); + return rc; } EXPORT_SYMBOL_GPL(s390_enable_skey); -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by kanga.kvack.org (Postfix) with ESMTP id ABE856B006C for ; Fri, 17 Oct 2014 10:10:10 -0400 (EDT) Received: by mail-wi0-f181.google.com with SMTP id hi2so1356610wib.14 for ; Fri, 17 Oct 2014 07:10:10 -0700 (PDT) Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com. [195.75.94.107]) by mx.google.com with ESMTPS id fb15si2022094wid.76.2014.10.17.07.10.08 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 17 Oct 2014 07:10:09 -0700 (PDT) Received: from /spool/local by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 17 Oct 2014 15:10:08 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 7F3E717D8043 for ; Fri, 17 Oct 2014 15:12:23 +0100 (BST) Received: from d06av08.portsmouth.uk.ibm.com (d06av08.portsmouth.uk.ibm.com [9.149.37.249]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9HEA6pg54591508 for ; Fri, 17 Oct 2014 14:10:06 GMT Received: from d06av08.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av08.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9HEA4B0015001 for ; Fri, 17 Oct 2014 08:10:05 -0600 From: Dominik Dingel Subject: [PATCH 0/4] mm: new flag to forbid zero page mappings for a vma Date: Fri, 17 Oct 2014 16:09:46 +0200 Message-Id: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel s390 has the special notion of storage keys which are some sort of page flags associated with physical pages and live outside of direct addressable memory. These storage keys can be queried and changed with a special set of instructions. The mentioned instructions behave quite nicely under virtualization, if there is: - an invalid pte, then the instructions will work on some memory reserved in the host page table - a valid pte, then the instructions will work with the real storage key Thanks to Martin with his software reference and dirty bit tracking, the kernel does not issue any storage key instructions as now a software based approach will be taken, on the other hand distributions in the wild are currently using them. However, for virtualized guests we still have a problem with guest pages mapped to zero pages and the kernel same page merging. WIth each one multiple guest pages will point to the same physical page and share the same storage key. Let's fix this by introducing a new flag which will forbid new zero page mappings. If the guest issues a storage key related instruction we flag all vmas and drop existing zero page mappings and unmerge the guest memory. Dominik Dingel (4): s390/mm: recfactor global pgste updates mm: introduce new VM_NOZEROPAGE flag s390/mm: prevent and break zero page mappings in case of storage keys s390/mm: disable KSM for storage key enabled pages arch/s390/Kconfig | 3 + arch/s390/include/asm/pgalloc.h | 2 - arch/s390/include/asm/pgtable.h | 3 +- arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/priv.c | 17 ++-- arch/s390/mm/pgtable.c | 181 ++++++++++++++++++---------------------- include/linux/mm.h | 13 ++- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 9 files changed, 112 insertions(+), 113 deletions(-) -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by kanga.kvack.org (Postfix) with ESMTP id CA2F76B0070 for ; Fri, 17 Oct 2014 10:10:11 -0400 (EDT) Received: by mail-wg0-f50.google.com with SMTP id a1so1024765wgh.9 for ; Fri, 17 Oct 2014 07:10:11 -0700 (PDT) Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com. [195.75.94.106]) by mx.google.com with ESMTPS id dn5si1561346wjb.163.2014.10.17.07.10.09 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 17 Oct 2014 07:10:09 -0700 (PDT) Received: from /spool/local by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 17 Oct 2014 15:10:08 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id 311462190056 for ; Fri, 17 Oct 2014 15:09:44 +0100 (BST) Received: from d06av12.portsmouth.uk.ibm.com (d06av12.portsmouth.uk.ibm.com [9.149.37.247]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9HEA7S814352880 for ; Fri, 17 Oct 2014 14:10:07 GMT Received: from d06av12.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av12.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9HEA4QQ026756 for ; Fri, 17 Oct 2014 08:10:07 -0600 From: Dominik Dingel Subject: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys Date: Fri, 17 Oct 2014 16:09:49 +0200 Message-Id: <1413554990-48512-4-git-send-email-dingel@linux.vnet.ibm.com> In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel As soon as storage keys are enabled we need to work around of zero page mappings to prevent inconsistencies between storage keys and pgste. Otherwise following data corruption could happen: 1) guest enables storage key 2) guest sets storage key for not mapped page X -> change goes to PGSTE 3) guest reads from page X -> as X was not dirty before, the page will be zero page backed, storage key from PGSTE for X will go to storage key for zero page 4) guest sets storage key for not mapped page Y (same logic as above 5) guest reads from page Y -> as Y was not dirty before, the page will be zero page backed, storage key from PGSTE for Y will got to storage key for zero page overwriting storage key for X While holding the mmap sem, we are safe before changes on entries we already fixed. As sske and host large pages are also mutual exclusive we do not even need to retry the fixup_user_fault. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/Kconfig | 3 +++ arch/s390/mm/pgtable.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 05c78bb..4e04e63 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -1,6 +1,9 @@ config MMU def_bool y +config NOZEROPAGE + def_bool y + config ZONE_DMA def_bool y diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index ab55ba8..6321692 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr, pgste_t pgste; pgste = pgste_get_lock(pte); + /* + * Remove all zero page mappings, + * after establishing a policy to forbid zero page mappings + * following faults for that page will get fresh anonymous pages + */ + if (is_zero_pfn(pte_pfn(*pte))) { + ptep_flush_direct(walk->mm, addr, pte); + pte_val(*pte) = _PAGE_INVALID; + } /* Clear storage key */ pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT); @@ -1323,10 +1332,16 @@ void s390_enable_skey(void) { struct mm_walk walk = { .pte_entry = __s390_enable_skey }; struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; down_write(&mm->mmap_sem); if (mm_use_skey(mm)) goto out_up; + + for (vma = mm->mmap; vma; vma = vma->vm_next) + vma->vm_flags |= VM_NOZEROPAGE; + mm->def_flags |= VM_NOZEROPAGE; + walk.mm = mm; walk_page_range(0, TASK_SIZE, &walk); mm->context.use_skey = 1; -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by kanga.kvack.org (Postfix) with ESMTP id C2F306B0071 for ; Fri, 17 Oct 2014 10:10:12 -0400 (EDT) Received: by mail-wi0-f182.google.com with SMTP id n3so1372381wiv.3 for ; Fri, 17 Oct 2014 07:10:12 -0700 (PDT) Received: from e06smtp17.uk.ibm.com (e06smtp17.uk.ibm.com. [195.75.94.113]) by mx.google.com with ESMTPS id wo4si1687032wjb.23.2014.10.17.07.10.09 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 17 Oct 2014 07:10:10 -0700 (PDT) Received: from /spool/local by e06smtp17.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 17 Oct 2014 15:10:09 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id 820A92190046 for ; Fri, 17 Oct 2014 15:09:42 +0100 (BST) Received: from d06av07.portsmouth.uk.ibm.com (d06av07.portsmouth.uk.ibm.com [9.149.37.248]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9HEA55M1376680 for ; Fri, 17 Oct 2014 14:10:05 GMT Received: from d06av07.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av07.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9HEA48q031967 for ; Fri, 17 Oct 2014 10:10:05 -0400 From: Dominik Dingel Subject: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Fri, 17 Oct 2014 16:09:48 +0200 Message-Id: <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel Add a new vma flag to allow an architecture to disable the backing of non-present, anonymous pages with the read-only empty zero page. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- include/linux/mm.h | 13 +++++++++++-- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cd33ae2..8f09c91 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ - +#define VM_NOZEROPAGE 0x00001000 /* forbid new zero page mappings */ #define VM_LOCKED 0x00002000 #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ @@ -179,7 +179,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP) /* This mask defines which mm->def_flags a process can inherit its parent */ -#define VM_INIT_DEF_MASK VM_NOHUGEPAGE +#define VM_INIT_DEF_MASK (VM_NOHUGEPAGE | VM_NOZEROPAGE) /* * mapping from the currently active vm_flags protection bits (the @@ -1293,6 +1293,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } +static inline int vma_forbids_zeropage(struct vm_area_struct *vma) +{ +#ifdef CONFIG_NOZEROPAGE + return vma->vm_flags & VM_NOZEROPAGE; +#else + return 0; +#endif +} + extern struct task_struct *task_of_stack(struct task_struct *task, struct vm_area_struct *vma, bool in_group); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index de98415..c271265 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_OOM; if (unlikely(khugepaged_enter(vma, vma->vm_flags))) return VM_FAULT_OOM; - if (!(flags & FAULT_FLAG_WRITE) && + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma) && transparent_hugepage_use_zero_page()) { spinlock_t *ptl; pgtable_t pgtable; diff --git a/mm/memory.c b/mm/memory.c index 64f82aa..1859b2b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_SIGBUS; /* Use the zero-page for reads */ - if (!(flags & FAULT_FLAG_WRITE)) { + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma)) { entry = pte_mkspecial(pfn_pte(my_zero_pfn(address), vma->vm_page_prot)); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); -- 1.8.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f52.google.com (mail-la0-f52.google.com [209.85.215.52]) by kanga.kvack.org (Postfix) with ESMTP id CD1706B0069 for ; Sat, 18 Oct 2014 10:49:39 -0400 (EDT) Received: by mail-la0-f52.google.com with SMTP id hz20so1982579lab.25 for ; Sat, 18 Oct 2014 07:49:39 -0700 (PDT) Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com. [195.75.94.108]) by mx.google.com with ESMTPS id o1si6511204lbw.57.2014.10.18.07.49.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 18 Oct 2014 07:49:38 -0700 (PDT) Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 18 Oct 2014 15:49:36 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 005C21B08049 for ; Sat, 18 Oct 2014 15:50:54 +0100 (BST) Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9IEnYGb19005948 for ; Sat, 18 Oct 2014 14:49:34 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9IEnWS2003779 for ; Sat, 18 Oct 2014 08:49:34 -0600 Date: Sat, 18 Oct 2014 16:49:28 +0200 From: Dominik Dingel Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Message-ID: <20141018164928.2341415f@BR9TG4T3.de.ibm.com> In-Reply-To: <54419265.9000000@intel.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin On Fri, 17 Oct 2014 15:04:21 -0700 Dave Hansen wrote: > Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE > status? Reading the patches, it _looks_ like it might be an all or > nothing thing. Currently it is an all or nothing thing, but for a future change we might want to just tag the guest memory instead of the complete user address space. > Full disclosure: I've got an x86-specific feature I want to steal a flag > for. Maybe we should just define another VM_ARCH bit. > So you think of something like: #if defined(CONFIG_S390) # define VM_NOZEROPAGE VM_ARCH_1 #endif #ifndef VM_NOZEROPAGE # define VM_NOZEROPAGE VM_NONE #endif right? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f173.google.com (mail-pd0-f173.google.com [209.85.192.173]) by kanga.kvack.org (Postfix) with ESMTP id 910476B0069 for ; Sat, 18 Oct 2014 12:28:22 -0400 (EDT) Received: by mail-pd0-f173.google.com with SMTP id g10so2461500pdj.18 for ; Sat, 18 Oct 2014 09:28:22 -0700 (PDT) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTP id pp3si3688095pdb.218.2014.10.18.09.28.19 for ; Sat, 18 Oct 2014 09:28:20 -0700 (PDT) Message-ID: <54429521.80402@intel.com> Date: Sat, 18 Oct 2014 09:28:17 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> In-Reply-To: <20141018164928.2341415f@BR9TG4T3.de.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dominik Dingel Cc: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin On 10/18/2014 07:49 AM, Dominik Dingel wrote: > On Fri, 17 Oct 2014 15:04:21 -0700 > Dave Hansen wrote: >> Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE >> status? Reading the patches, it _looks_ like it might be an all or >> nothing thing. > > Currently it is an all or nothing thing, but for a future change we might want to just > tag the guest memory instead of the complete user address space. I think it's a bad idea to reserve a flag for potential future use. If you _need_ it in the future, let's have the discussion then. For now, I think it should probably just be stored in the mm somewhere. >> Full disclosure: I've got an x86-specific feature I want to steal a flag >> for. Maybe we should just define another VM_ARCH bit. >> > > So you think of something like: > > #if defined(CONFIG_S390) > # define VM_NOZEROPAGE VM_ARCH_1 > #endif > > #ifndef VM_NOZEROPAGE > # define VM_NOZEROPAGE VM_NONE > #endif > > right? Yeah, something like that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) by kanga.kvack.org (Postfix) with ESMTP id 010496B0069 for ; Mon, 20 Oct 2014 14:15:01 -0400 (EDT) Received: by mail-wg0-f47.google.com with SMTP id x13so6030240wgg.18 for ; Mon, 20 Oct 2014 11:15:00 -0700 (PDT) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com. [2a00:1450:400c:c05::234]) by mx.google.com with ESMTPS id fl4si9306379wib.99.2014.10.20.11.14.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 20 Oct 2014 11:14:59 -0700 (PDT) Received: by mail-wi0-f180.google.com with SMTP id em10so7099944wid.13 for ; Mon, 20 Oct 2014 11:14:59 -0700 (PDT) Message-ID: <5445511D.1090603@redhat.com> Date: Mon, 20 Oct 2014 20:14:53 +0200 From: Paolo Bonzini MIME-Version: 1.0 Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> In-Reply-To: <54429521.80402@intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen , Dominik Dingel Cc: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Peter Zijlstra , Sasha Levin On 10/18/2014 06:28 PM, Dave Hansen wrote: > > Currently it is an all or nothing thing, but for a future change we might want to just > > tag the guest memory instead of the complete user address space. > > I think it's a bad idea to reserve a flag for potential future use. If > you_need_ it in the future, let's have the discussion then. For now, I > think it should probably just be stored in the mm somewhere. I agree with Dave (I thought I disagreed, but I changed my mind while writing down my thoughts). Just define mm_forbids_zeropage in arch/s390/include/asm, and make it return mm->context.use_skey---with a comment explaining how this is only for processes that use KVM, and then only for guests that use storage keys. Paolo (who was just taught what storage keys really are) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49]) by kanga.kvack.org (Postfix) with ESMTP id 2B24B6B0069 for ; Tue, 21 Oct 2014 02:11:44 -0400 (EDT) Received: by mail-wg0-f49.google.com with SMTP id x12so484854wgg.32 for ; Mon, 20 Oct 2014 23:11:43 -0700 (PDT) Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com. [195.75.94.110]) by mx.google.com with ESMTPS id q9si10923159wiz.17.2014.10.20.23.11.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 20 Oct 2014 23:11:42 -0700 (PDT) Received: from /spool/local by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 21 Oct 2014 07:11:41 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 7055A1B08023 for ; Tue, 21 Oct 2014 07:11:38 +0100 (BST) Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9L6BcbF15401406 for ; Tue, 21 Oct 2014 06:11:38 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9L6BZkg013845 for ; Tue, 21 Oct 2014 00:11:38 -0600 Date: Tue, 21 Oct 2014 08:11:31 +0200 From: Martin Schwidefsky Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Message-ID: <20141021081131.641c6104@mschwide> In-Reply-To: <5445511D.1090603@redhat.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Paolo Bonzini Cc: Dave Hansen , Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Peter Zijlstra , Sasha Levin On Mon, 20 Oct 2014 20:14:53 +0200 Paolo Bonzini wrote: > On 10/18/2014 06:28 PM, Dave Hansen wrote: > > > Currently it is an all or nothing thing, but for a future change we might want to just > > > tag the guest memory instead of the complete user address space. > > > > I think it's a bad idea to reserve a flag for potential future use. If > > you_need_ it in the future, let's have the discussion then. For now, I > > think it should probably just be stored in the mm somewhere. > > I agree with Dave (I thought I disagreed, but I changed my mind while > writing down my thoughts). Just define mm_forbids_zeropage in > arch/s390/include/asm, and make it return mm->context.use_skey---with a > comment explaining how this is only for processes that use KVM, and then > only for guests that use storage keys. The mm_forbids_zeropage() sure will work for now, but I think a vma flag is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, the best solution would be to only mark those vmas that are mapped to the guest. That we have not found a way to do that yet in a sensible way does not change the fact that "no-zero-page" is a per-vma property, no? But if you insist we go with the mm_forbids_zeropage() until we find a clever way to distinguish the guest vmas from the qemu ones. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f180.google.com (mail-wi0-f180.google.com [209.85.212.180]) by kanga.kvack.org (Postfix) with ESMTP id DAD7E6B0083 for ; Tue, 21 Oct 2014 04:12:47 -0400 (EDT) Received: by mail-wi0-f180.google.com with SMTP id em10so1050204wid.7 for ; Tue, 21 Oct 2014 01:12:47 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id m12si11254476wiv.58.2014.10.21.01.12.41 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Oct 2014 01:12:43 -0700 (PDT) Message-ID: <5446153F.6030407@redhat.com> Date: Tue, 21 Oct 2014 10:11:43 +0200 From: Paolo Bonzini MIME-Version: 1.0 Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> <20141021081131.641c6104@mschwide> In-Reply-To: <20141021081131.641c6104@mschwide> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Martin Schwidefsky Cc: Dave Hansen , Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Peter Zijlstra , Sasha Levin On 10/21/2014 08:11 AM, Martin Schwidefsky wrote: >> I agree with Dave (I thought I disagreed, but I changed my mind while >> writing down my thoughts). Just define mm_forbids_zeropage in >> arch/s390/include/asm, and make it return mm->context.use_skey---with a >> comment explaining how this is only for processes that use KVM, and then >> only for guests that use storage keys. > > The mm_forbids_zeropage() sure will work for now, but I think a vma flag > is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, > the best solution would be to only mark those vmas that are mapped to > the guest. That we have not found a way to do that yet in a sensible way > does not change the fact that "no-zero-page" is a per-vma property, no? I agree it should be per-VMA. However, right now the code is complicated unnecessarily by making it a per-VMA flag. Also, setting the flag per VMA should probably be done in kvm_arch_prepare_memory_region together with some kind of storage key notifier. This is not very much like Dominik's patch. All in all, mm_forbids_zeropage() provides a non-intrusive and non-controversial way to fix the bug. Later on, switching to vma_forbids_zeropage() will be trivial as far as mm/ code is concerned. > But if you insist we go with the mm_forbids_zeropage() until we find a > clever way to distinguish the guest vmas from the qemu ones. Yeah, I think it is simpler for now. Paolo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by kanga.kvack.org (Postfix) with ESMTP id 62BF682BDA for ; Tue, 21 Oct 2014 07:20:39 -0400 (EDT) Received: by mail-wi0-f181.google.com with SMTP id hi2so1500740wib.14 for ; Tue, 21 Oct 2014 04:20:38 -0700 (PDT) Received: from e06smtp16.uk.ibm.com (e06smtp16.uk.ibm.com. [195.75.94.112]) by mx.google.com with ESMTPS id cz3si11908045wib.83.2014.10.21.04.20.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 21 Oct 2014 04:20:37 -0700 (PDT) Received: from /spool/local by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 21 Oct 2014 12:20:35 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 6DA9317D8056 for ; Tue, 21 Oct 2014 12:20:32 +0100 (BST) Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com [9.149.37.216]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9LBKWt052494414 for ; Tue, 21 Oct 2014 11:20:32 GMT Received: from d06av04.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av04.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s9LBKRaY032162 for ; Tue, 21 Oct 2014 05:20:32 -0600 Date: Tue, 21 Oct 2014 13:20:25 +0200 From: Dominik Dingel Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Message-ID: <20141021132025.60dd3390@BR9TG4T3.de.ibm.com> In-Reply-To: <5446153F.6030407@redhat.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> <20141021081131.641c6104@mschwide> <5446153F.6030407@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Paolo Bonzini Cc: Martin Schwidefsky , Dave Hansen , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Peter Zijlstra , Sasha Levin On Tue, 21 Oct 2014 10:11:43 +0200 Paolo Bonzini wrote: > > > On 10/21/2014 08:11 AM, Martin Schwidefsky wrote: > >> I agree with Dave (I thought I disagreed, but I changed my mind while > >> writing down my thoughts). Just define mm_forbids_zeropage in > >> arch/s390/include/asm, and make it return mm->context.use_skey---with a > >> comment explaining how this is only for processes that use KVM, and then > >> only for guests that use storage keys. > > > > The mm_forbids_zeropage() sure will work for now, but I think a vma flag > > is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, > > the best solution would be to only mark those vmas that are mapped to > > the guest. That we have not found a way to do that yet in a sensible way > > does not change the fact that "no-zero-page" is a per-vma property, no? > > I agree it should be per-VMA. However, right now the code is > complicated unnecessarily by making it a per-VMA flag. Also, setting > the flag per VMA should probably be done in > kvm_arch_prepare_memory_region together with some kind of storage key > notifier. This is not very much like Dominik's patch. All in all, > mm_forbids_zeropage() provides a non-intrusive and non-controversial way > to fix the bug. Later on, switching to vma_forbids_zeropage() will be > trivial as far as mm/ code is concerned. > Thank you for all the feedback, will cook up a new version. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752991AbaJQOKU (ORCPT ); Fri, 17 Oct 2014 10:10:20 -0400 Received: from e06smtp15.uk.ibm.com ([195.75.94.111]:36988 "EHLO e06smtp15.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752274AbaJQOKL (ORCPT ); Fri, 17 Oct 2014 10:10:11 -0400 From: Dominik Dingel To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel Subject: [PATCH 1/4] s390/mm: recfactor global pgste updates Date: Fri, 17 Oct 2014 16:09:47 +0200 Message-Id: <1413554990-48512-2-git-send-email-dingel@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.5.5 In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14101714-0021-0000-0000-0000016F6EC2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Replace the s390 specific page table walker for the pgste updates with a call to the common code walk_page_range function. There are now two pte modification functions, one for the reset of the CMMA state and another one for the initialization of the storage keys. Signed-off-by: Dominik Dingel Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/pgalloc.h | 2 - arch/s390/include/asm/pgtable.h | 1 + arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/mm/pgtable.c | 153 ++++++++++++++-------------------------- 4 files changed, 56 insertions(+), 102 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 9e18a61..120e126 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned long); void page_table_free(struct mm_struct *, unsigned long *); void page_table_free_rcu(struct mmu_gather *, unsigned long *); -void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long, - bool init_skey); int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, unsigned long key, bool nq); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 5efb2fe..1e991f6a 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); extern void s390_enable_skey(void); +extern void s390_reset_cmma(struct mm_struct *mm); /* * No page table caches to initialise diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 81b0e11..7a33c11 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr) case KVM_S390_VM_MEM_CLR_CMMA: mutex_lock(&kvm->lock); idx = srcu_read_lock(&kvm->srcu); - page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false); + s390_reset_cmma(kvm->arch.gmap->mm); srcu_read_unlock(&kvm->srcu, idx); mutex_unlock(&kvm->lock); ret = 0; diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 5404a62..ab55ba8 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long *table) __free_page(page); } -static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, bool init_skey) -{ - pte_t *start_pte, *pte; - spinlock_t *ptl; - pgste_t pgste; - - start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte = start_pte; - do { - pgste = pgste_get_lock(pte); - pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK; - if (init_skey) { - unsigned long address; - - pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | - PGSTE_GR_BIT | PGSTE_GC_BIT); - - /* skip invalid and not writable pages */ - if (pte_val(*pte) & _PAGE_INVALID || - !(pte_val(*pte) & _PAGE_WRITE)) { - pgste_set_unlock(pte, pgste); - continue; - } - - address = pte_val(*pte) & PAGE_MASK; - page_set_storage_key(address, PAGE_DEFAULT_KEY, 1); - } - pgste_set_unlock(pte, pgste); - } while (pte++, addr += PAGE_SIZE, addr != end); - pte_unmap_unlock(start_pte, ptl); - - return addr; -} - -static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, bool init_skey) -{ - unsigned long next; - pmd_t *pmd; - - pmd = pmd_offset(pud, addr); - do { - next = pmd_addr_end(addr, end); - if (pmd_none_or_clear_bad(pmd)) - continue; - next = page_table_reset_pte(mm, pmd, addr, next, init_skey); - } while (pmd++, addr = next, addr != end); - - return addr; -} - -static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, bool init_skey) -{ - unsigned long next; - pud_t *pud; - - pud = pud_offset(pgd, addr); - do { - next = pud_addr_end(addr, end); - if (pud_none_or_clear_bad(pud)) - continue; - next = page_table_reset_pmd(mm, pud, addr, next, init_skey); - } while (pud++, addr = next, addr != end); - - return addr; -} - -void page_table_reset_pgste(struct mm_struct *mm, unsigned long start, - unsigned long end, bool init_skey) -{ - unsigned long addr, next; - pgd_t *pgd; - - down_write(&mm->mmap_sem); - if (init_skey && mm_use_skey(mm)) - goto out_up; - addr = start; - pgd = pgd_offset(mm, addr); - do { - next = pgd_addr_end(addr, end); - if (pgd_none_or_clear_bad(pgd)) - continue; - next = page_table_reset_pud(mm, pgd, addr, next, init_skey); - } while (pgd++, addr = next, addr != end); - if (init_skey) - current->mm->context.use_skey = 1; -out_up: - up_write(&mm->mmap_sem); -} -EXPORT_SYMBOL(page_table_reset_pgste); - int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, unsigned long key, bool nq) { @@ -1044,11 +951,6 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm, return NULL; } -void page_table_reset_pgste(struct mm_struct *mm, unsigned long start, - unsigned long end, bool init_skey) -{ -} - static inline void page_table_free_pgste(unsigned long *table) { } @@ -1400,13 +1302,66 @@ EXPORT_SYMBOL_GPL(s390_enable_sie); * Enable storage key handling from now on and initialize the storage * keys with the default key. */ +static int __s390_enable_skey(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + unsigned long ptev; + pgste_t pgste; + + pgste = pgste_get_lock(pte); + /* Clear storage key */ + pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | + PGSTE_GR_BIT | PGSTE_GC_BIT); + ptev = pte_val(*pte); + if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE)) + page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1); + pgste_set_unlock(pte, pgste); + return 0; +} + void s390_enable_skey(void) { - page_table_reset_pgste(current->mm, 0, TASK_SIZE, true); + struct mm_walk walk = { .pte_entry = __s390_enable_skey }; + struct mm_struct *mm = current->mm; + + down_write(&mm->mmap_sem); + if (mm_use_skey(mm)) + goto out_up; + walk.mm = mm; + walk_page_range(0, TASK_SIZE, &walk); + mm->context.use_skey = 1; + +out_up: + up_write(&mm->mmap_sem); } EXPORT_SYMBOL_GPL(s390_enable_skey); /* + * Reset CMMA state, make all pages stable again. + */ +static int __s390_reset_cmma(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + pgste_t pgste; + + pgste = pgste_get_lock(pte); + pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK; + pgste_set_unlock(pte, pgste); + return 0; +} + +void s390_reset_cmma(struct mm_struct *mm) +{ + struct mm_walk walk = { .pte_entry = __s390_reset_cmma }; + + down_write(&mm->mmap_sem); + walk.mm = mm; + walk_page_range(0, TASK_SIZE, &walk); + up_write(&mm->mmap_sem); +} +EXPORT_SYMBOL_GPL(s390_reset_cmma); + +/* * Test and reset if a guest page is dirty */ bool gmap_test_and_clear_dirty(unsigned long address, struct gmap *gmap) -- 1.8.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752311AbaJQOKQ (ORCPT ); Fri, 17 Oct 2014 10:10:16 -0400 Received: from e06smtp17.uk.ibm.com ([195.75.94.113]:36992 "EHLO e06smtp17.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752032AbaJQOKL (ORCPT ); Fri, 17 Oct 2014 10:10:11 -0400 From: Dominik Dingel To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel Subject: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Date: Fri, 17 Oct 2014 16:09:48 +0200 Message-Id: <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.5.5 In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14101714-0029-0000-0000-0000013DF88D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add a new vma flag to allow an architecture to disable the backing of non-present, anonymous pages with the read-only empty zero page. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- include/linux/mm.h | 13 +++++++++++-- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cd33ae2..8f09c91 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ - +#define VM_NOZEROPAGE 0x00001000 /* forbid new zero page mappings */ #define VM_LOCKED 0x00002000 #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ @@ -179,7 +179,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP) /* This mask defines which mm->def_flags a process can inherit its parent */ -#define VM_INIT_DEF_MASK VM_NOHUGEPAGE +#define VM_INIT_DEF_MASK (VM_NOHUGEPAGE | VM_NOZEROPAGE) /* * mapping from the currently active vm_flags protection bits (the @@ -1293,6 +1293,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } +static inline int vma_forbids_zeropage(struct vm_area_struct *vma) +{ +#ifdef CONFIG_NOZEROPAGE + return vma->vm_flags & VM_NOZEROPAGE; +#else + return 0; +#endif +} + extern struct task_struct *task_of_stack(struct task_struct *task, struct vm_area_struct *vma, bool in_group); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index de98415..c271265 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_OOM; if (unlikely(khugepaged_enter(vma, vma->vm_flags))) return VM_FAULT_OOM; - if (!(flags & FAULT_FLAG_WRITE) && + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma) && transparent_hugepage_use_zero_page()) { spinlock_t *ptl; pgtable_t pgtable; diff --git a/mm/memory.c b/mm/memory.c index 64f82aa..1859b2b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_SIGBUS; /* Use the zero-page for reads */ - if (!(flags & FAULT_FLAG_WRITE)) { + if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma)) { entry = pte_mkspecial(pfn_pte(my_zero_pfn(address), vma->vm_page_prot)); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); -- 1.8.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752766AbaJQOKO (ORCPT ); Fri, 17 Oct 2014 10:10:14 -0400 Received: from e06smtp17.uk.ibm.com ([195.75.94.113]:36989 "EHLO e06smtp17.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751939AbaJQOKL (ORCPT ); Fri, 17 Oct 2014 10:10:11 -0400 From: Dominik Dingel To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel Subject: [PATCH 0/4] mm: new flag to forbid zero page mappings for a vma Date: Fri, 17 Oct 2014 16:09:46 +0200 Message-Id: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.5.5 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14101714-0029-0000-0000-0000013DF890 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org s390 has the special notion of storage keys which are some sort of page flags associated with physical pages and live outside of direct addressable memory. These storage keys can be queried and changed with a special set of instructions. The mentioned instructions behave quite nicely under virtualization, if there is: - an invalid pte, then the instructions will work on some memory reserved in the host page table - a valid pte, then the instructions will work with the real storage key Thanks to Martin with his software reference and dirty bit tracking, the kernel does not issue any storage key instructions as now a software based approach will be taken, on the other hand distributions in the wild are currently using them. However, for virtualized guests we still have a problem with guest pages mapped to zero pages and the kernel same page merging. WIth each one multiple guest pages will point to the same physical page and share the same storage key. Let's fix this by introducing a new flag which will forbid new zero page mappings. If the guest issues a storage key related instruction we flag all vmas and drop existing zero page mappings and unmerge the guest memory. Dominik Dingel (4): s390/mm: recfactor global pgste updates mm: introduce new VM_NOZEROPAGE flag s390/mm: prevent and break zero page mappings in case of storage keys s390/mm: disable KSM for storage key enabled pages arch/s390/Kconfig | 3 + arch/s390/include/asm/pgalloc.h | 2 - arch/s390/include/asm/pgtable.h | 3 +- arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/priv.c | 17 ++-- arch/s390/mm/pgtable.c | 181 ++++++++++++++++++---------------------- include/linux/mm.h | 13 ++- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 9 files changed, 112 insertions(+), 113 deletions(-) -- 1.8.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753309AbaJQOLu (ORCPT ); Fri, 17 Oct 2014 10:11:50 -0400 Received: from e06smtp17.uk.ibm.com ([195.75.94.113]:36999 "EHLO e06smtp17.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752358AbaJQOKM (ORCPT ); Fri, 17 Oct 2014 10:10:12 -0400 From: Dominik Dingel To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel Subject: [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages Date: Fri, 17 Oct 2014 16:09:50 +0200 Message-Id: <1413554990-48512-5-git-send-email-dingel@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.5.5 In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14101714-0029-0000-0000-0000013DF894 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When storage keys are enabled unmerge already merged pages and prevent new pages from being merged. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/pgtable.h | 2 +- arch/s390/kvm/priv.c | 17 ++++++++++++----- arch/s390/mm/pgtable.c | 15 +++++++++++++-- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 1e991f6a..a5362e4 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1749,7 +1749,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset) extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); -extern void s390_enable_skey(void); +extern int s390_enable_skey(void); extern void s390_reset_cmma(struct mm_struct *mm); /* diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index f89c1cd..e0967fd 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu) return 0; } -static void __skey_check_enable(struct kvm_vcpu *vcpu) +static int __skey_check_enable(struct kvm_vcpu *vcpu) { + int rc = 0; if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE))) - return; + return rc; - s390_enable_skey(); + rc = s390_enable_skey(); trace_kvm_s390_skey_related_inst(vcpu); vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); + return rc; } static int handle_skey(struct kvm_vcpu *vcpu) { - __skey_check_enable(vcpu); + int rc = __skey_check_enable(vcpu); + if (rc) + return rc; vcpu->stat.instruction_storage_key++; if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) @@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) { - __skey_check_enable(vcpu); + int rc = __skey_check_enable(vcpu); + + if (rc) + return rc; if (set_guest_storage_key(current->mm, useraddr, vcpu->run->s.regs.gprs[reg1] & PFMF_KEY, vcpu->run->s.regs.gprs[reg1] & PFMF_NQ)) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 6321692..b3311c1 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include #include @@ -1328,18 +1330,26 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr, return 0; } -void s390_enable_skey(void) +int s390_enable_skey(void) { struct mm_walk walk = { .pte_entry = __s390_enable_skey }; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; + int rc = 0; down_write(&mm->mmap_sem); if (mm_use_skey(mm)) goto out_up; - for (vma = mm->mmap; vma; vma = vma->vm_next) + for (vma = mm->mmap; vma; vma = vma->vm_next) { + if (ksm_madvise(vma, vma->vm_start, vma->vm_end, + MADV_UNMERGEABLE, &vma->vm_flags)) { + rc = -ENOMEM; + goto out_up; + } vma->vm_flags |= VM_NOZEROPAGE; + } + mm->def_flags &= ~VM_MERGEABLE; mm->def_flags |= VM_NOZEROPAGE; walk.mm = mm; @@ -1348,6 +1358,7 @@ void s390_enable_skey(void) out_up: up_write(&mm->mmap_sem); + return rc; } EXPORT_SYMBOL_GPL(s390_enable_skey); -- 1.8.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752325AbaJQOLb (ORCPT ); Fri, 17 Oct 2014 10:11:31 -0400 Received: from e06smtp11.uk.ibm.com ([195.75.94.107]:57956 "EHLO e06smtp11.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752442AbaJQOKN (ORCPT ); Fri, 17 Oct 2014 10:10:13 -0400 From: Dominik Dingel To: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel Cc: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin , Dominik Dingel Subject: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys Date: Fri, 17 Oct 2014 16:09:49 +0200 Message-Id: <1413554990-48512-4-git-send-email-dingel@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.5.5 In-Reply-To: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14101714-0005-0000-0000-000001B395B2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As soon as storage keys are enabled we need to work around of zero page mappings to prevent inconsistencies between storage keys and pgste. Otherwise following data corruption could happen: 1) guest enables storage key 2) guest sets storage key for not mapped page X -> change goes to PGSTE 3) guest reads from page X -> as X was not dirty before, the page will be zero page backed, storage key from PGSTE for X will go to storage key for zero page 4) guest sets storage key for not mapped page Y (same logic as above 5) guest reads from page Y -> as Y was not dirty before, the page will be zero page backed, storage key from PGSTE for Y will got to storage key for zero page overwriting storage key for X While holding the mmap sem, we are safe before changes on entries we already fixed. As sske and host large pages are also mutual exclusive we do not even need to retry the fixup_user_fault. Signed-off-by: Dominik Dingel Acked-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/Kconfig | 3 +++ arch/s390/mm/pgtable.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 05c78bb..4e04e63 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -1,6 +1,9 @@ config MMU def_bool y +config NOZEROPAGE + def_bool y + config ZONE_DMA def_bool y diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index ab55ba8..6321692 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr, pgste_t pgste; pgste = pgste_get_lock(pte); + /* + * Remove all zero page mappings, + * after establishing a policy to forbid zero page mappings + * following faults for that page will get fresh anonymous pages + */ + if (is_zero_pfn(pte_pfn(*pte))) { + ptep_flush_direct(walk->mm, addr, pte); + pte_val(*pte) = _PAGE_INVALID; + } /* Clear storage key */ pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT); @@ -1323,10 +1332,16 @@ void s390_enable_skey(void) { struct mm_walk walk = { .pte_entry = __s390_enable_skey }; struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; down_write(&mm->mmap_sem); if (mm_use_skey(mm)) goto out_up; + + for (vma = mm->mmap; vma; vma = vma->vm_next) + vma->vm_flags |= VM_NOZEROPAGE; + mm->def_flags |= VM_NOZEROPAGE; + walk.mm = mm; walk_page_range(0, TASK_SIZE, &walk); mm->context.use_skey = 1; -- 1.8.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751372AbaJQWEs (ORCPT ); Fri, 17 Oct 2014 18:04:48 -0400 Received: from mga03.intel.com ([134.134.136.65]:58621 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750867AbaJQWEq (ORCPT ); Fri, 17 Oct 2014 18:04:46 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,741,1406617200"; d="scan'208";a="591217535" Message-ID: <54419265.9000000@intel.com> Date: Fri, 17 Oct 2014 15:04:21 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel CC: Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> In-Reply-To: <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/17/2014 07:09 AM, Dominik Dingel wrote: > diff --git a/include/linux/mm.h b/include/linux/mm.h > index cd33ae2..8f09c91 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp); > #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ > #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ > #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ > - > +#define VM_NOZEROPAGE 0x00001000 /* forbid new zero page mappings */ > #define VM_LOCKED 0x00002000 > #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ This seems like an awfully obscure use for a very constrained resource (VM_ flags). Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE status? Reading the patches, it _looks_ like it might be an all or nothing thing. Full disclosure: I've got an x86-specific feature I want to steal a flag for. Maybe we should just define another VM_ARCH bit. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751539AbaJROtm (ORCPT ); Sat, 18 Oct 2014 10:49:42 -0400 Received: from e06smtp15.uk.ibm.com ([195.75.94.111]:43939 "EHLO e06smtp15.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751306AbaJROtk (ORCPT ); Sat, 18 Oct 2014 10:49:40 -0400 Date: Sat, 18 Oct 2014 16:49:28 +0200 From: Dominik Dingel To: Dave Hansen Cc: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Message-ID: <20141018164928.2341415f@BR9TG4T3.de.ibm.com> In-Reply-To: <54419265.9000000@intel.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> Organization: IBM X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14101814-0021-0000-0000-000001712557 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 17 Oct 2014 15:04:21 -0700 Dave Hansen wrote: > Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE > status? Reading the patches, it _looks_ like it might be an all or > nothing thing. Currently it is an all or nothing thing, but for a future change we might want to just tag the guest memory instead of the complete user address space. > Full disclosure: I've got an x86-specific feature I want to steal a flag > for. Maybe we should just define another VM_ARCH bit. > So you think of something like: #if defined(CONFIG_S390) # define VM_NOZEROPAGE VM_ARCH_1 #endif #ifndef VM_NOZEROPAGE # define VM_NOZEROPAGE VM_NONE #endif right? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751520AbaJRQ2Z (ORCPT ); Sat, 18 Oct 2014 12:28:25 -0400 Received: from mga11.intel.com ([192.55.52.93]:3386 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751215AbaJRQ2U (ORCPT ); Sat, 18 Oct 2014 12:28:20 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,745,1406617200"; d="scan'208";a="616581327" Message-ID: <54429521.80402@intel.com> Date: Sat, 18 Oct 2014 09:28:17 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Dominik Dingel CC: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Paolo Bonzini , Peter Zijlstra , Sasha Levin Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> In-Reply-To: <20141018164928.2341415f@BR9TG4T3.de.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/18/2014 07:49 AM, Dominik Dingel wrote: > On Fri, 17 Oct 2014 15:04:21 -0700 > Dave Hansen wrote: >> Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE >> status? Reading the patches, it _looks_ like it might be an all or >> nothing thing. > > Currently it is an all or nothing thing, but for a future change we might want to just > tag the guest memory instead of the complete user address space. I think it's a bad idea to reserve a flag for potential future use. If you _need_ it in the future, let's have the discussion then. For now, I think it should probably just be stored in the mm somewhere. >> Full disclosure: I've got an x86-specific feature I want to steal a flag >> for. Maybe we should just define another VM_ARCH bit. >> > > So you think of something like: > > #if defined(CONFIG_S390) > # define VM_NOZEROPAGE VM_ARCH_1 > #endif > > #ifndef VM_NOZEROPAGE > # define VM_NOZEROPAGE VM_NONE > #endif > > right? Yeah, something like that. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752601AbaJTSPE (ORCPT ); Mon, 20 Oct 2014 14:15:04 -0400 Received: from mail-wg0-f43.google.com ([74.125.82.43]:34383 "EHLO mail-wg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751102AbaJTSPB (ORCPT ); Mon, 20 Oct 2014 14:15:01 -0400 Message-ID: <5445511D.1090603@redhat.com> Date: Mon, 20 Oct 2014 20:14:53 +0200 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Dave Hansen , Dominik Dingel CC: Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Peter Zijlstra , Sasha Levin Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> In-Reply-To: <54429521.80402@intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/18/2014 06:28 PM, Dave Hansen wrote: > > Currently it is an all or nothing thing, but for a future change we might want to just > > tag the guest memory instead of the complete user address space. > > I think it's a bad idea to reserve a flag for potential future use. If > you_need_ it in the future, let's have the discussion then. For now, I > think it should probably just be stored in the mm somewhere. I agree with Dave (I thought I disagreed, but I changed my mind while writing down my thoughts). Just define mm_forbids_zeropage in arch/s390/include/asm, and make it return mm->context.use_skey---with a comment explaining how this is only for processes that use KVM, and then only for guests that use storage keys. Paolo (who was just taught what storage keys really are) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753393AbaJUGLs (ORCPT ); Tue, 21 Oct 2014 02:11:48 -0400 Received: from e06smtp13.uk.ibm.com ([195.75.94.109]:43524 "EHLO e06smtp13.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbaJUGLq (ORCPT ); Tue, 21 Oct 2014 02:11:46 -0400 Date: Tue, 21 Oct 2014 08:11:31 +0200 From: Martin Schwidefsky To: Paolo Bonzini Cc: Dave Hansen , Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Peter Zijlstra , Sasha Levin Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Message-ID: <20141021081131.641c6104@mschwide> In-Reply-To: <5445511D.1090603@redhat.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14102106-0013-0000-0000-00000187ECA3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 20 Oct 2014 20:14:53 +0200 Paolo Bonzini wrote: > On 10/18/2014 06:28 PM, Dave Hansen wrote: > > > Currently it is an all or nothing thing, but for a future change we might want to just > > > tag the guest memory instead of the complete user address space. > > > > I think it's a bad idea to reserve a flag for potential future use. If > > you_need_ it in the future, let's have the discussion then. For now, I > > think it should probably just be stored in the mm somewhere. > > I agree with Dave (I thought I disagreed, but I changed my mind while > writing down my thoughts). Just define mm_forbids_zeropage in > arch/s390/include/asm, and make it return mm->context.use_skey---with a > comment explaining how this is only for processes that use KVM, and then > only for guests that use storage keys. The mm_forbids_zeropage() sure will work for now, but I think a vma flag is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, the best solution would be to only mark those vmas that are mapped to the guest. That we have not found a way to do that yet in a sensible way does not change the fact that "no-zero-page" is a per-vma property, no? But if you insist we go with the mm_forbids_zeropage() until we find a clever way to distinguish the guest vmas from the qemu ones. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754685AbaJUIM6 (ORCPT ); Tue, 21 Oct 2014 04:12:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21929 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751664AbaJUIMy (ORCPT ); Tue, 21 Oct 2014 04:12:54 -0400 Message-ID: <5446153F.6030407@redhat.com> Date: Tue, 21 Oct 2014 10:11:43 +0200 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Martin Schwidefsky CC: Dave Hansen , Dominik Dingel , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Peter Zijlstra , Sasha Levin Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> <20141021081131.641c6104@mschwide> In-Reply-To: <20141021081131.641c6104@mschwide> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/21/2014 08:11 AM, Martin Schwidefsky wrote: >> I agree with Dave (I thought I disagreed, but I changed my mind while >> writing down my thoughts). Just define mm_forbids_zeropage in >> arch/s390/include/asm, and make it return mm->context.use_skey---with a >> comment explaining how this is only for processes that use KVM, and then >> only for guests that use storage keys. > > The mm_forbids_zeropage() sure will work for now, but I think a vma flag > is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, > the best solution would be to only mark those vmas that are mapped to > the guest. That we have not found a way to do that yet in a sensible way > does not change the fact that "no-zero-page" is a per-vma property, no? I agree it should be per-VMA. However, right now the code is complicated unnecessarily by making it a per-VMA flag. Also, setting the flag per VMA should probably be done in kvm_arch_prepare_memory_region together with some kind of storage key notifier. This is not very much like Dominik's patch. All in all, mm_forbids_zeropage() provides a non-intrusive and non-controversial way to fix the bug. Later on, switching to vma_forbids_zeropage() will be trivial as far as mm/ code is concerned. > But if you insist we go with the mm_forbids_zeropage() until we find a > clever way to distinguish the guest vmas from the qemu ones. Yeah, I think it is simpler for now. Paolo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755291AbaJULUo (ORCPT ); Tue, 21 Oct 2014 07:20:44 -0400 Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:34269 "EHLO e06smtp10.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755039AbaJULUm (ORCPT ); Tue, 21 Oct 2014 07:20:42 -0400 Date: Tue, 21 Oct 2014 13:20:25 +0200 From: Dominik Dingel To: Paolo Bonzini Cc: Martin Schwidefsky , Dave Hansen , Andrew Morton , linux-mm@kvack.org, Mel Gorman , Michal Hocko , Rik van Riel , Andrea Arcangeli , Andy Lutomirski , "Aneesh Kumar K.V" , Bob Liu , Christian Borntraeger , Cornelia Huck , Gleb Natapov , Heiko Carstens , "H. Peter Anvin" , Hugh Dickins , Ingo Molnar , Jianyu Zhan , Johannes Weiner , "Kirill A. Shutemov" , Konstantin Weitz , kvm@vger.kernel.org, linux390@de.ibm.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Peter Zijlstra , Sasha Levin Subject: Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag Message-ID: <20141021132025.60dd3390@BR9TG4T3.de.ibm.com> In-Reply-To: <5446153F.6030407@redhat.com> References: <1413554990-48512-1-git-send-email-dingel@linux.vnet.ibm.com> <1413554990-48512-3-git-send-email-dingel@linux.vnet.ibm.com> <54419265.9000000@intel.com> <20141018164928.2341415f@BR9TG4T3.de.ibm.com> <54429521.80402@intel.com> <5445511D.1090603@redhat.com> <20141021081131.641c6104@mschwide> <5446153F.6030407@redhat.com> Organization: IBM X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14102111-0041-0000-0000-000001C0F3C2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 21 Oct 2014 10:11:43 +0200 Paolo Bonzini wrote: > > > On 10/21/2014 08:11 AM, Martin Schwidefsky wrote: > >> I agree with Dave (I thought I disagreed, but I changed my mind while > >> writing down my thoughts). Just define mm_forbids_zeropage in > >> arch/s390/include/asm, and make it return mm->context.use_skey---with a > >> comment explaining how this is only for processes that use KVM, and then > >> only for guests that use storage keys. > > > > The mm_forbids_zeropage() sure will work for now, but I think a vma flag > > is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE, > > the best solution would be to only mark those vmas that are mapped to > > the guest. That we have not found a way to do that yet in a sensible way > > does not change the fact that "no-zero-page" is a per-vma property, no? > > I agree it should be per-VMA. However, right now the code is > complicated unnecessarily by making it a per-VMA flag. Also, setting > the flag per VMA should probably be done in > kvm_arch_prepare_memory_region together with some kind of storage key > notifier. This is not very much like Dominik's patch. All in all, > mm_forbids_zeropage() provides a non-intrusive and non-controversial way > to fix the bug. Later on, switching to vma_forbids_zeropage() will be > trivial as far as mm/ code is concerned. > Thank you for all the feedback, will cook up a new version.