From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031194AbdAHG7j (ORCPT ); Sun, 8 Jan 2017 01:59:39 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57452 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755783AbdAHG7b (ORCPT ); Sun, 8 Jan 2017 01:59:31 -0500 From: "Aneesh Kumar K.V" To: Hugh Dickins , Linus Torvalds Cc: Andrew Morton , "Kirill A. Shutemov" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] mm: stop leaking PageTables In-Reply-To: References: Date: Sun, 08 Jan 2017 12:29:21 +0530 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17010806-0016-0000-0000-0000059599AF X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006394; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000199; SDB=6.00804948; UDB=6.00391846; IPR=6.00582802; BA=6.00005033; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013853; XFM=3.00000011; UTC=2017-01-08 06:59:28 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17010806-0017-0000-0000-000036144B39 Message-Id: <87mvf2kpfa.fsf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-01-08_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=5 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701080107 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hugh Dickins writes: > 4.10-rc loadtest (even on x86, even without THPCache) fails with > "fork: Cannot allocate memory" or some such; and /proc/meminfo > shows PageTables growing. > > rc1 removed the freeing of an unused preallocated pagetable after > do_fault_around() has called map_pages(): which is usually a good > optimization, so that the followup doesn't have to reallocate one; > but it's not sufficient to shift the freeing into alloc_set_pte(), > since there are failure cases (most commonly VM_FAULT_RETRY) which > never reach finish_fault(). > > Check and free it at the outer level in do_fault(), then we don't > need to worry in alloc_set_pte(), and can restore that to how it was > (I cannot find any reason to pte_free() under lock as it was doing). > > And fix a separate pagetable leak, or crash, introduced by the same > change, that could only show up on some ppc64: why does do_set_pmd()'s > failure case attempt to withdraw a pagetable when it never deposited > one, at the same time overwriting (so leaking) the vmf->prealloc_pte? > Residue of an earlier implementation, perhaps? Delete it. That change is part of -mm tree. https://lkml.kernel.org/r/20161212163428.6780-1-aneesh.kumar@linux.vnet.ibm.com > > Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64") > Signed-off-by: Hugh Dickins Reviewed-by: Aneesh Kumar K.V > --- > > mm/memory.c | 47 ++++++++++++++++++++--------------------------- > 1 file changed, 20 insertions(+), 27 deletions(-) > > --- 4.10-rc2/mm/memory.c 2016-12-25 18:40:50.830453384 -0800 > +++ linux/mm/memory.c 2017-01-07 13:34:29.373381551 -0800 > @@ -3008,13 +3008,6 @@ static int do_set_pmd(struct vm_fault *v > ret = 0; > count_vm_event(THP_FILE_MAPPED); > out: > - /* > - * If we are going to fallback to pte mapping, do a > - * withdraw with pmd lock held. > - */ > - if (arch_needs_pgtable_deposit() && ret == VM_FAULT_FALLBACK) > - vmf->prealloc_pte = pgtable_trans_huge_withdraw(vma->vm_mm, > - vmf->pmd); > spin_unlock(vmf->ptl); > return ret; > } > @@ -3055,20 +3048,18 @@ int alloc_set_pte(struct vm_fault *vmf, > > ret = do_set_pmd(vmf, page); > if (ret != VM_FAULT_FALLBACK) > - goto fault_handled; > + return ret; > } > > if (!vmf->pte) { > ret = pte_alloc_one_map(vmf); > if (ret) > - goto fault_handled; > + return ret; > } > > /* Re-check under ptl */ > - if (unlikely(!pte_none(*vmf->pte))) { > - ret = VM_FAULT_NOPAGE; > - goto fault_handled; > - } > + if (unlikely(!pte_none(*vmf->pte))) > + return VM_FAULT_NOPAGE; > > flush_icache_page(vma, page); > entry = mk_pte(page, vma->vm_page_prot); > @@ -3088,15 +3079,8 @@ int alloc_set_pte(struct vm_fault *vmf, > > /* no need to invalidate: a not-present page won't be cached */ > update_mmu_cache(vma, vmf->address, vmf->pte); > - ret = 0; > > -fault_handled: > - /* preallocated pagetable is unused: free it */ > - if (vmf->prealloc_pte) { > - pte_free(vmf->vma->vm_mm, vmf->prealloc_pte); > - vmf->prealloc_pte = 0; > - } > - return ret; > + return 0; > } > > > @@ -3360,15 +3344,24 @@ static int do_shared_fault(struct vm_fau > static int do_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > + int ret; > > /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ > if (!vma->vm_ops->fault) > - return VM_FAULT_SIGBUS; > - if (!(vmf->flags & FAULT_FLAG_WRITE)) > - return do_read_fault(vmf); > - if (!(vma->vm_flags & VM_SHARED)) > - return do_cow_fault(vmf); > - return do_shared_fault(vmf); > + ret = VM_FAULT_SIGBUS; > + else if (!(vmf->flags & FAULT_FLAG_WRITE)) > + ret = do_read_fault(vmf); > + else if (!(vma->vm_flags & VM_SHARED)) > + ret = do_cow_fault(vmf); > + else > + ret = do_shared_fault(vmf); > + > + /* preallocated pagetable is unused: free it */ > + if (vmf->prealloc_pte) { > + pte_free(vma->vm_mm, vmf->prealloc_pte); > + vmf->prealloc_pte = 0; > + } > + return ret; > } > > static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,