From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure" From: Lee Schermerhorn In-Reply-To: References: <20070914205359.6536.98017.sendpatchset@localhost> <20070914205438.6536.49500.sendpatchset@localhost> Content-Type: text/plain Date: Mon, 17 Sep 2007 11:17:25 -0400 Message-Id: <1190042245.5460.81.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mel@csn.ul.ie, riel@redhat.com, balbir@linux.vnet.ibm.com, andrea@suse.de, a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de List-ID: On Fri, 2007-09-14 at 15:47 -0700, Christoph Lameter wrote: > On Fri, 14 Sep 2007, Lee Schermerhorn wrote: > > > 1. for now, use bit 30 in page flags. This restricts the no reclaim > > infrastructure to 64-bit systems. [The mlock patch, later in this > > series, uses another of these 64-bit-system-only flags.] > > > > Rationale: 32-bit systems have no free page flags and are less > > likely to have the large amounts of memory that exhibit the problems > > this series attempts to solve. [I'm sure someone will disabuse me > > of this notion.] > > > > Thus, NORECLAIM currently depends on [CONFIG_]64BIT. > > Hmmm.. Good a creative solution to the page flag dilemma. Still not sure I can get away with it tho' :-). > > > +#ifdef CONFIG_NORECLAIM > > +static inline void > > +add_page_to_noreclaim_list(struct zone *zone, struct page *page) > > +{ > > + add_page_to_lru_list(zone, page, LRU_NORECLAIM); > > +} > > + > > +static inline void > > +del_page_from_noreclaim_list(struct zone *zone, struct page *page) > > +{ > > + del_page_from_lru_list(zone, page, LRU_NORECLAIM); > > +} > > +#else > > +static inline void > > +add_page_to_noreclaim_list(struct zone *zone, struct page *page) { } > > + > > +static inline void > > +del_page_from_noreclaim_list(struct zone *zone, struct page *page) { } > > +#endif > > + > > Do we really need to spell these out separately? Well, you left the "{add|del}_page_to_[in]active_list() functions, so I kept these separate as well. We could make another cleanup pass and replace all of these with calls to the "{add|del}_page_to_lru_list()" functions with the appropriate list enum. Also, nothing calls del_page_from_noreclaim_list() right now, so we can probably lose it. > > > Index: Linux/mm/migrate.c > > =================================================================== > > --- Linux.orig/mm/migrate.c 2007-09-14 10:17:54.000000000 -0400 > > +++ Linux/mm/migrate.c 2007-09-14 10:21:48.000000000 -0400 > > @@ -52,13 +52,22 @@ int migrate_prep(void) > > return 0; > > } > > > > +/* > > + * move_to_lru() - place @page onto appropriate lru list > > + * based on preserved page flags: active, noreclaim, none > > + */ > > static inline void move_to_lru(struct page *page) > > { > > - if (PageActive(page)) { > > + if (PageNoreclaim(page)) { > > + VM_BUG_ON(PageActive(page)); > > + ClearPageNoreclaim(page); > > + lru_cache_add_noreclaim(page); > > + } else if (PageActive(page)) { > > /* > > * lru_cache_add_active checks that > > * the PG_active bit is off. > > */ > > + VM_BUG_ON(PageNoreclaim(page)); /* race ? */ > > ClearPageActive(page); > > lru_cache_add_active(page); > > } else { > > Could this be unified with the generic LRU handling in mm_inline.h? If you > have a function that determines the LRU_xxx from the page flags then you > can target the right list by indexing. > > Maybe also create a generic lru_cache_add(page, list) function? Possibly. When you created the migration facility, you had these as separate. It's still private to migrate.c. There are a number of different variants of this. Here, we put the page back onto the appropriate list based on the Active|Noreclaim| flag, preserved by isolate_lru_page(). In mm/mlock.c, I have a similar function--putback_lru_page() that just clears the flags and calls lru_cache_add_active_or_noreclaim() to retest page_reclaimable() and chose the appropriate list. It never chooses the inactive list, tho' Maybe that's a mistake?. Neither of these is a particularly hot path, I think. So, maybe we can come up with one that serves both purposes with a steering argument. I'd want to test it for performance regression, of course. > > > + * Non-reclaimable pages shouldn't make it onto the inactive list, > > + * so if we encounter one, we should be scanning either the active > > + * list--e.g., after splicing noreclaim list to end of active list-- > > + * or nearby pages [lumpy reclaim]. Take it only if scanning active > > + * list. > > > One fleeting thought here: It may be useful to *not* allocate known > unreclaimable pages with __GFP_MOVABLE. Sorry, I don't understand where you're coming from here. Non-reclaimable pages should be migratable, but maybe __GFP_MOVABLE means something else? > > > @@ -670,6 +693,8 @@ int __isolate_lru_page(struct page *page > > ret = 0; > > } > > > > + if (TestClearPageNoreclaim(page)) > > + SetPageActive(page); /* will recheck in shrink_active_list */ > > return ret; > > } > > Failing to do the isoilation in vmscan.c is not an option? 1) This test doesn't fail the isolation. It just ensures that the noreclaim flag is cleared and, if it was set, replaces it with Active. I think this is OK because I only accept non-reclaimable pages if we're scanning the active list. This is in support of splicing the noreclaim list onto the active list when we want to rescan. As I mentioned in mail to Peter, I'm not too happy with this approach--my current implementation anyway. Need to revisit/discuss this... 2) Since lumpy reclaim, __isolate_lru_page() CAN return -EBUSY and isolate_lru_pages() will just stick the page back on the list being scanned. We do this if the page state [active/inactive] doesn't match the isolation "mode"--i.e., when lumpy reclaim is looking for physically adjacent pages. I also do this if the page is non-reclaimable and isolate_lru_pages() doesn't specify that it's OK to take them. As mentioned above, it's only OK if we're scanning the active list from shrink_active_list(). I think this whole thing is fragile--thus my dissatisfaction... > > > @@ -843,6 +870,8 @@ int isolate_lru_page(struct page *page) > > ClearPageLRU(page); > > if (PageActive(page)) > > del_page_from_active_list(zone, page); > > + else if (PageNoreclaim(page)) > > + del_page_from_noreclaim_list(zone, page); > > else > > del_page_from_inactive_list(zone, page); > > } > > Another place where an indexing function from page flags to type of LRU > list could simplify code. Agreed. Need another pass... > > > @@ -933,14 +962,21 @@ static unsigned long shrink_inactive_lis > > VM_BUG_ON(PageLRU(page)); > > SetPageLRU(page); > > list_del(&page->lru); > > - add_page_to_lru_list(zone, page, PageActive(page)); > > + if (PageActive(page)) { > > + VM_BUG_ON(PageNoreclaim(page)); > > + add_page_to_active_list(zone, page); > > + } else if (PageNoreclaim(page)) { > > + VM_BUG_ON(PageActive(page)); > > + add_page_to_noreclaim_list(zone, page); > > + } else > > + add_page_to_inactive_list(zone, page); > > if (!pagevec_add(&pvec, page)) { > > Ditto. > > > +void putback_all_noreclaim_pages(void) > > +{ > > + struct zone *zone; > > + > > + for_each_zone(zone) { > > + spin_lock(&zone->lru_lock); > > + > > + list_splice(&zone->list[LRU_NORECLAIM], > > + &zone->list[LRU_ACTIVE]); > > + INIT_LIST_HEAD(&zone->list[LRU_NORECLAIM]); > > + > > + zone_page_state_add(zone_page_state(zone, NR_NORECLAIM), zone, > > + NR_ACTIVE); > > + atomic_long_set(&zone->vm_stat[NR_NORECLAIM], 0); > > Racy if multiple reclaims are ongoing. Better subtract the value via > mod_zone_page_state OK. I'll make that change. But, again, I need to revisit this entire concept--splicing the noreclaim list back to the active. Thanks for the review. Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org