From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sd0109e.au.ibm.com (d23rh905.au.ibm.com [202.81.18.225]) by e23smtp03.au.ibm.com (8.13.1/8.13.1) with ESMTP id l8J60kq4007619 for ; Wed, 19 Sep 2007 16:00:46 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by sd0109e.au.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l8J64J8w168270 for ; Wed, 19 Sep 2007 16:04:20 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l8J60TeC012921 for ; Wed, 19 Sep 2007 16:00:29 +1000 Message-ID: <46F0BAF0.2020806@linux.vnet.ibm.com> Date: Wed, 19 Sep 2007 11:30:16 +0530 From: Balbir Singh Reply-To: balbir@linux.vnet.ibm.com MIME-Version: 1.0 Subject: Re: [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure" References: <20070914205359.6536.98017.sendpatchset@localhost> <20070914205438.6536.49500.sendpatchset@localhost> In-Reply-To: <20070914205438.6536.49500.sendpatchset@localhost> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Lee Schermerhorn Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mel@csn.ul.ie, clameter@sgi.com, riel@redhat.com, andrea@suse.de, a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de List-ID: Lee Schermerhorn wrote: > PATCH/RFC 06/14 Reclaim Scalability: "No Reclaim LRU Infrastructure" > > Against: 2.6.23-rc4-mm1 > > Infrastructure to manage pages excluded from reclaim--i.e., hidden > from vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked > to maintain "nonreclaimable" pages on a separate per-zone LRU list, > to "hide" them from vmscan. A separate noreclaim pagevec is provided > for shrink_active_list() to move nonreclaimable pages to the noreclaim > list without over burdening the zone lru_lock. > > Pages on the noreclaim list have both PG_noreclaim and PG_lru set. > Thus, PG_noreclaim is analogous to and mutually exclusive with > PG_active--it specifies which LRU list the page is on. > > The noreclaim infrastructure is enabled by a new mm Kconfig option > [CONFIG_]NORECLAIM. > Could we use a different name. CONFIG_NORECLAIM could be misunderstood to be that reclaim is disabled on the system all together. > > 4. TODO: Memory Controllers maintain separate active and inactive lists. > Need to consider whether they should also maintain a noreclaim list. > Also, convert to use Christoph's array of indexed lru variables? > > See //TODO note in mm/memcontrol.c re: isolating non-reclaimable > pages. > Thanks, I'll look into exploiting this in the memory controller. > Index: Linux/mm/swap.c > =================================================================== > --- Linux.orig/mm/swap.c 2007-09-14 10:21:45.000000000 -0400 > +++ Linux/mm/swap.c 2007-09-14 10:21:48.000000000 -0400 > @@ -116,14 +116,14 @@ int rotate_reclaimable_page(struct page > return 1; > if (PageDirty(page)) > return 1; > - if (PageActive(page)) > + if (PageActive(page) | PageNoreclaim(page)) Did you intend to make this bitwise or? > - if (PageLRU(page) && !PageActive(page)) { > + if (PageLRU(page) && !PageActive(page) && !PageNoreclaim(page)) { Since we use this even below, does it make sense to wrap it into an inline function and call it check_page_lru_inactive_reclaimable()? > void lru_add_drain(void) > @@ -277,14 +312,18 @@ void release_pages(struct page **pages, > > if (PageLRU(page)) { > struct zone *pagezone = page_zone(page); > + int is_lru_page; > + > if (pagezone != zone) { > if (zone) > spin_unlock_irq(&zone->lru_lock); > zone = pagezone; > spin_lock_irq(&zone->lru_lock); > } > - VM_BUG_ON(!PageLRU(page)); > - __ClearPageLRU(page); > + is_lru_page = PageLRU(page); > + VM_BUG_ON(!(is_lru_page)); > + if (is_lru_page) This is a little confusing, after asserting that the page is indeed in LRU, why add the check for is_lru_page again? Comments will be helpful here. > +#ifdef CONFIG_NORECLAIM > +void __pagevec_lru_add_noreclaim(struct pagevec *pvec) > +{ > + int i; > + struct zone *zone = NULL; > + > + for (i = 0; i < pagevec_count(pvec); i++) { > + struct page *page = pvec->pages[i]; > + struct zone *pagezone = page_zone(page); > + > + if (pagezone != zone) { > + if (zone) > + spin_unlock_irq(&zone->lru_lock); > + zone = pagezone; > + spin_lock_irq(&zone->lru_lock); > + } > + VM_BUG_ON(PageLRU(page)); > + SetPageLRU(page); > + VM_BUG_ON(PageActive(page) || PageNoreclaim(page)); > + SetPageNoreclaim(page); > + add_page_to_noreclaim_list(zone, page); These two calls seem to be the only difference between __pagevec_lru_add and this routine, any chance we could refactor to reuse most of the code? Something like __pagevec_lru_add_prepare(), do the stuff and then call __pagevec_lru_add_finish() > +/* > + * move_to_lru() - place @page onto appropriate lru list > + * based on preserved page flags: active, noreclaim, none > + */ > static inline void move_to_lru(struct page *page) > { > - if (PageActive(page)) { > + if (PageNoreclaim(page)) { > + VM_BUG_ON(PageActive(page)); > + ClearPageNoreclaim(page); > + lru_cache_add_noreclaim(page); I know that lru_cache_add_noreclaim() does the right thing by looking at PageNoReclaim(), but the sequence is a little confusing to read. > -int __isolate_lru_page(struct page *page, int mode) > +int __isolate_lru_page(struct page *page, int mode, int take_nonreclaimable) > { > int ret = -EINVAL; > > @@ -652,12 +660,27 @@ int __isolate_lru_page(struct page *page > return ret; > > /* > - * When checking the active state, we need to be sure we are > - * dealing with comparible boolean values. Take the logical not > - * of each. > + * Non-reclaimable pages shouldn't make it onto the inactive list, > + * so if we encounter one, we should be scanning either the active > + * list--e.g., after splicing noreclaim list to end of active list-- > + * or nearby pages [lumpy reclaim]. Take it only if scanning active > + * list. > */ > - if (mode != ISOLATE_BOTH && (!PageActive(page) != !mode)) > - return ret; > + if (PageNoreclaim(page)) { > + if (!take_nonreclaimable) > + return -EBUSY; /* lumpy reclaim -- skip this page */ > + /* > + * else fall thru' and try to isolate > + */ I think we need to distinguish between the types of nonreclaimable pages. Is it the heavily mapped pages that you pass on further? A casual reader like me finds it hard to understand how lumpy reclaim might try to reclaim a non-reclaimable page :-) -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org