From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 2D8C06B0022 for ; Mon, 9 May 2011 15:37:16 -0400 (EDT) Date: Mon, 9 May 2011 15:36:50 -0400 From: Stephen Wilson Subject: Re: [PATCH 2/8] mm: use walk_page_range() instead of custom page table walking code Message-ID: <20110509193650.GA2865@wicker.gateway.2wire.net> References: <1303947349-3620-1-git-send-email-wilsons@start.ca> <1303947349-3620-3-git-send-email-wilsons@start.ca> <20110509164034.164C.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110509164034.164C.A69D9226@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KOSAKI Motohiro Cc: Stephen Wilson , Andrew Morton , Alexander Viro , Hugh Dickins , David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Mon, May 09, 2011 at 04:38:49PM +0900, KOSAKI Motohiro wrote: > Hello, > > sorry for the long delay. Please, no apologies. Thank you for the review! > > In the specific case of show_numa_map(), the custom page table walking > > logic implemented in mempolicy.c does not provide any special service > > beyond that provided by walk_page_range(). > > > > Also, converting show_numa_map() to use the generic routine decouples > > the function from mempolicy.c, allowing it to be moved out of the mm > > subsystem and into fs/proc. > > > > Signed-off-by: Stephen Wilson > > --- > > mm/mempolicy.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- > > 1 files changed, 46 insertions(+), 7 deletions(-) > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 5bfb03e..dfe27e3 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -2568,6 +2568,22 @@ static void gather_stats(struct page *page, void *private, int pte_dirty) > > md->node[page_to_nid(page)]++; > > } > > > > +static int gather_pte_stats(pte_t *pte, unsigned long addr, > > + unsigned long pte_size, struct mm_walk *walk) > > +{ > > + struct page *page; > > + > > + if (pte_none(*pte)) > > + return 0; > > + > > + page = pte_page(*pte); > > + if (!page) > > + return 0; > > original check_pte_range() has following logic. > > orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > do { > struct page *page; > int nid; > > if (!pte_present(*pte)) > continue; > page = vm_normal_page(vma, addr, *pte); > if (!page) > continue; > /* > * vm_normal_page() filters out zero pages, but there might > * still be PageReserved pages to skip, perhaps in a VDSO. > * And we cannot move PageKsm pages sensibly or safely yet. > */ > if (PageReserved(page) || PageKsm(page)) > continue; > gather_stats(page, private, pte_dirty(*pte)); > > Why did you drop a lot of check? Is it safe? I must have been confused. For one, walk_page_range() does not even lock the pmd entry when iterating over the pte's. I completely overlooked that fact and so with that, the series is totally broken. I am currently testing a slightly reworked set based on the following variation. When finished I will send v2 of the series which will address all issues raised so far. Thanks again for the review!