From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756167Ab0K3TDY (ORCPT ); Tue, 30 Nov 2010 14:03:24 -0500 Received: from mx1.redhat.com ([209.132.183.28]:28438 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752617Ab0K3TDX (ORCPT ); Tue, 30 Nov 2010 14:03:23 -0500 Date: Tue, 30 Nov 2010 20:01:59 +0100 From: Andrea Arcangeli To: Daisuke Nishimura Cc: linux-mm@kvack.org, Linus Torvalds , Andrew Morton , linux-kernel@vger.kernel.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Hugh Dickins , Rik van Riel , Mel Gorman , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , bpicco@redhat.com, KOSAKI Motohiro , Balbir Singh , "Michael S. Tsirkin" , Peter Zijlstra , Johannes Weiner , Chris Mason , Borislav Petkov Subject: Re: [PATCH 53 of 66] add numa awareness to hugepage allocations Message-ID: <20101130190159.GJ30389@random.random> References: <223ee926614158fc1353.1288798108@v2.random> <20101129143801.abef5228.nishimura@mxp.nes.nec.co.jp> <20101129161103.GE24474@random.random> <20101130093804.23f8c355.nishimura@mxp.nes.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101130093804.23f8c355.nishimura@mxp.nes.nec.co.jp> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 30, 2010 at 09:38:04AM +0900, Daisuke Nishimura wrote: > I'm sorry if I miss something, "new_page" will be reused in !CONFIG_NUMA case > as you say, but, in CONFIG_NUMA case, it is allocated in this function > (collapse_huge_page()) by alloc_hugepage_vma(), and is not freed when memcg's > charge failed. > Actually, we do in collapse_huge_page(): > if (unlikely(!isolated)) { > ... > #ifdef CONFIG_NUMA > put_page(new_page); > #endif > goto out; > } > later. I think we need a similar logic in memcg's failure path too. Apologies, you really found a minor memleak in case of memcg accounting failure. diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1726,7 +1726,7 @@ static void collapse_huge_page(struct mm } #endif if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) - goto out; + goto out_put_page; anon_vma_lock(vma->anon_vma); @@ -1755,10 +1755,7 @@ static void collapse_huge_page(struct mm spin_unlock(&mm->page_table_lock); anon_vma_unlock(vma->anon_vma); mem_cgroup_uncharge_page(new_page); -#ifdef CONFIG_NUMA - put_page(new_page); -#endif - goto out; + goto out_put_page; } /* @@ -1799,6 +1796,13 @@ static void collapse_huge_page(struct mm khugepaged_pages_collapsed++; out: up_write(&mm->mmap_sem); + return; + +out_put_page: +#ifdef CONFIG_NUMA + put_page(new_page); +#endif + goto out; } static int khugepaged_scan_pmd(struct mm_struct *mm, I was too optimistic that there wasn't really a bug, I thought it was some confusion about the hpage usage that differs with numa and not numa. On a side note, the CONFIG_NUMA case will later change further to move the allocation out of the mmap_sem write mode to make the fs submitting I/O from userland and doing memory allocations in the I/O paths happier.