linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: balbir@linux.vnet.ibm.com
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mel@csn.ul.ie,
	clameter@sgi.com, riel@redhat.com, andrea@suse.de,
	a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de
Subject: Re: [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure"
Date: Wed, 19 Sep 2007 10:47:29 -0400	[thread overview]
Message-ID: <1190213249.5301.25.camel@localhost> (raw)
In-Reply-To: <46F0BAF0.2020806@linux.vnet.ibm.com>

On Wed, 2007-09-19 at 11:30 +0530, Balbir Singh wrote:
> Lee Schermerhorn wrote:
> > PATCH/RFC 06/14 Reclaim Scalability: "No Reclaim LRU Infrastructure"
> > 
> > Against:  2.6.23-rc4-mm1
> > 
> > Infrastructure to manage pages excluded from reclaim--i.e., hidden
> > from vmscan.  Based on a patch by Larry Woodman of Red Hat. Reworked
> > to maintain "nonreclaimable" pages on a separate per-zone LRU list,
> > to "hide" them from vmscan.  A separate noreclaim pagevec is provided
> > for shrink_active_list() to move nonreclaimable pages to the noreclaim
> > list without over burdening the zone lru_lock.
> > 
> > Pages on the noreclaim list have both PG_noreclaim and PG_lru set.
> > Thus, PG_noreclaim is analogous to and mutually exclusive with
> > PG_active--it specifies which LRU list the page is on.  
> > 
> > The noreclaim infrastructure is enabled by a new mm Kconfig option
> > [CONFIG_]NORECLAIM.
> > 
> 
> Could we use a different name. CONFIG_NORECLAIM could be misunderstood
> to be that reclaim is disabled on the system all together.

Sure.  When this settles down, if something like it gets accepted, we
can choose a different name--if we still want it to be configurable.


<snip>
> > Index: Linux/mm/swap.c
> > ===================================================================
> > --- Linux.orig/mm/swap.c	2007-09-14 10:21:45.000000000 -0400
> > +++ Linux/mm/swap.c	2007-09-14 10:21:48.000000000 -0400
> > @@ -116,14 +116,14 @@ int rotate_reclaimable_page(struct page 
> >  		return 1;
> >  	if (PageDirty(page))
> >  		return 1;
> > -	if (PageActive(page))
> > +	if (PageActive(page) | PageNoreclaim(page))
> 
> Did you intend to make this bitwise or?

Uh, no...  Thanks.  will fix.

> 
> > -	if (PageLRU(page) && !PageActive(page)) {
> > +	if (PageLRU(page) && !PageActive(page) && !PageNoreclaim(page)) {
> 
> Since we use this even below, does it make sense to wrap it into an
> inline function and call it check_page_lru_inactive_reclaimable()?

Perhaps.  But sometimes folks complain that the kernel is programmed in
C not in cpp.   So, I tend to err on the side of open coding..

> 
> 
> >  void lru_add_drain(void)
> > @@ -277,14 +312,18 @@ void release_pages(struct page **pages, 
> > 
> >  		if (PageLRU(page)) {
> >  			struct zone *pagezone = page_zone(page);
> > +			int is_lru_page;
> > +
> >  			if (pagezone != zone) {
> >  				if (zone)
> >  					spin_unlock_irq(&zone->lru_lock);
> >  				zone = pagezone;
> >  				spin_lock_irq(&zone->lru_lock);
> >  			}
> > -			VM_BUG_ON(!PageLRU(page));
> > -			__ClearPageLRU(page);
> > +			is_lru_page = PageLRU(page);
> > +			VM_BUG_ON(!(is_lru_page));
> > +			if (is_lru_page)
> 
> This is a little confusing, after asserting that the page
> is indeed in LRU, why add the check for is_lru_page again?
> Comments will be helpful here.

Not sure why I did this.  Maybe a hold over from previous code.  I'll
check and fix or comment.

> 
> 
> > +#ifdef CONFIG_NORECLAIM
> > +void __pagevec_lru_add_noreclaim(struct pagevec *pvec)
> > +{
> > +	int i;
> > +	struct zone *zone = NULL;
> > +
> > +	for (i = 0; i < pagevec_count(pvec); i++) {
> > +		struct page *page = pvec->pages[i];
> > +		struct zone *pagezone = page_zone(page);
> > +
> > +		if (pagezone != zone) {
> > +			if (zone)
> > +				spin_unlock_irq(&zone->lru_lock);
> > +			zone = pagezone;
> > +			spin_lock_irq(&zone->lru_lock);
> > +		}
> > +		VM_BUG_ON(PageLRU(page));
> > +		SetPageLRU(page);
> 
> > +		VM_BUG_ON(PageActive(page) || PageNoreclaim(page));
> > +		SetPageNoreclaim(page);
> > +		add_page_to_noreclaim_list(zone, page);
> 
> These two calls seem to be the only difference between __pagevec_lru_add
> and this routine, any chance we could refactor to reuse most of the
> code? Something like __pagevec_lru_add_prepare(), do the stuff and
> then call __pagevec_lru_add_finish()

Yeah.   There is a lot of duplicated code in the lru pagevec management.
I assumed this was intentional because it's fast path [fault path] code.
As the number of lru lists increases [Rik has a patch to double the
number of active/inactive lists per zone] we may want to factor this
area and the loops at the end of shrink_active_list() that distribute
pages back to the appropriate lists.  Have to be careful of performance
regression, tho'.

> 
> 
> > +/*
> > + * move_to_lru() - place @page onto appropriate lru list
> > + * based on preserved page flags:  active, noreclaim, none
> > + */
> >  static inline void move_to_lru(struct page *page)
> >  {
> > -	if (PageActive(page)) {
> > +	if (PageNoreclaim(page)) {
> > +		VM_BUG_ON(PageActive(page));
> > +		ClearPageNoreclaim(page);
> > +		lru_cache_add_noreclaim(page);
> 
> I know that lru_cache_add_noreclaim() does the right thing
> by looking at PageNoReclaim(), but the sequence is a little
> confusing to read.

I don't understand your comment.  I was just following the pre-existing
pattern in move_to_lru().  This function just moves pages that have been
isolated for migration back to the appropriate lru list based on the
page flags via the pagevec.  The page flag that determines the
"appropriate list" must be cleared to avoid a VM_BUG_ON later.  The
VM_BUG_ON here is just my paranoia--to ensure that I don't have both
PG_active and PG_noreclaim set at the same time.

> 
> 
> > -int __isolate_lru_page(struct page *page, int mode)
> > +int __isolate_lru_page(struct page *page, int mode, int take_nonreclaimable)
> >  {
> >  	int ret = -EINVAL;
> > 
> > @@ -652,12 +660,27 @@ int __isolate_lru_page(struct page *page
> >  		return ret;
> > 
> >  	/*
> > -	 * When checking the active state, we need to be sure we are
> > -	 * dealing with comparible boolean values.  Take the logical not
> > -	 * of each.
> > +	 * Non-reclaimable pages shouldn't make it onto the inactive list,
> > +	 * so if we encounter one, we should be scanning either the active
> > +	 * list--e.g., after splicing noreclaim list to end of active list--
> > +	 * or nearby pages [lumpy reclaim].  Take it only if scanning active
> > +	 * list.
> >  	 */
> > -	if (mode != ISOLATE_BOTH && (!PageActive(page) != !mode))
> > -		return ret;
> > +	if (PageNoreclaim(page)) {
> > +		if (!take_nonreclaimable)
> > +			return -EBUSY;	/* lumpy reclaim -- skip this page */
> > +		/*
> > +		 * else fall thru' and try to isolate
> > +		 */
> 
> I think we need to distinguish between the types of nonreclaimable
> pages. Is it the heavily mapped pages that you pass on further?
> A casual reader like me finds it hard to understand how lumpy reclaim
> might try to reclaim a non-reclaimable page :-)

If you look at isolate_lru_pages(), after it calls __isolate_lru_page()
to isolate a page off the list that it's scanning, if order is non-zero,
it attempts to isolate other pages that are part of the same higher
order page, w/o regard to what list they're on.  If it succeeded in
taking a non-reclaimable page here, this page would have to be tested
again in shrink_active_list() where it would probably be found to still
be non-reclaimable [but maybe not--more on this below] and we would have
wasted part of this batch.

I explicitly allowed isolating non-reclaimable pages from the active
list, because currently I have a function to splice the zones' noreclaim
lists to the end of the active list for another check for
reclaimability.  This would occur when swap was added, etc.  Note that
when I do take a non-reclaimable page, I reset PG_noreclaim [via
TestClear...] and set PG_active if it was non-reclaimable.

Now, I don't have a feel for how frequent lumpy reclaim will be, but
maybe it's not so bad to just allow non-reclaimable pages to be
unconditionally isolated and rechecked for reclaimability in
shrink_[in]active_list().  Just, need to ensure that the page flags
[active or not] are set correctly depending on which list we're
scanning.

Thanks,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-09-19 14:47 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-14 20:53 [PATCH/RFC 0/14] Page Reclaim Scalability Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 1/14] Reclaim Scalability: Convert anon_vma lock to read/write lock Lee Schermerhorn
2007-09-17 11:02   ` Mel Gorman
2007-09-18  2:41     ` KAMEZAWA Hiroyuki
2007-09-18 11:01       ` Mel Gorman
2007-09-18 14:57         ` Rik van Riel
2007-09-18 15:37       ` Lee Schermerhorn
2007-09-18 20:17     ` Lee Schermerhorn
2007-09-20 10:19       ` Mel Gorman
2007-09-14 20:54 ` [PATCH/RFC 2/14] Reclaim Scalability: convert inode i_mmap_lock to reader/writer lock Lee Schermerhorn
2007-09-17 12:53   ` Mel Gorman
2007-09-20  1:24   ` Andrea Arcangeli
2007-09-20 14:10     ` Lee Schermerhorn
2007-09-20 14:16       ` Andrea Arcangeli
2007-09-14 20:54 ` [PATCH/RFC 3/14] Reclaim Scalability: move isolate_lru_page() to vmscan.c Lee Schermerhorn
2007-09-14 21:34   ` Peter Zijlstra
2007-09-15  1:55     ` Rik van Riel
2007-09-17 14:11     ` Lee Schermerhorn
2007-09-17  9:20   ` Balbir Singh
2007-09-17 19:19     ` Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 4/14] Reclaim Scalability: Define page_anon() function Lee Schermerhorn
2007-09-15  2:00   ` Rik van Riel
2007-09-17 13:19   ` Mel Gorman
2007-09-18  1:58   ` KAMEZAWA Hiroyuki
2007-09-18  2:27     ` Rik van Riel
2007-09-18  2:40       ` KAMEZAWA Hiroyuki
2007-09-18 15:04     ` Lee Schermerhorn
2007-09-18 19:41       ` Christoph Lameter
2007-09-19  0:30       ` KAMEZAWA Hiroyuki
2007-09-19 16:58         ` Lee Schermerhorn
2007-09-20  0:56           ` KAMEZAWA Hiroyuki
2007-09-14 20:54 ` [PATCH/RFC 5/14] Reclaim Scalability: Use an indexed array for LRU variables Lee Schermerhorn
2007-09-17 13:40   ` Mel Gorman
2007-09-17 14:17     ` Lee Schermerhorn
2007-09-17 14:39       ` Lee Schermerhorn
2007-09-17 18:58   ` Balbir Singh
2007-09-17 19:12     ` Lee Schermerhorn
2007-09-17 19:36       ` Balbir Singh
2007-09-17 19:36     ` Rik van Riel
2007-09-17 20:21       ` Balbir Singh
2007-09-17 21:01         ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure" Lee Schermerhorn
2007-09-14 22:47   ` Christoph Lameter
2007-09-17 15:17     ` Lee Schermerhorn
2007-09-17 18:41       ` Christoph Lameter
2007-09-18  9:54         ` Mel Gorman
2007-09-18 19:45           ` Christoph Lameter
2007-09-19 11:11             ` Mel Gorman
2007-09-19 18:03               ` Christoph Lameter
2007-09-19  6:00   ` Balbir Singh
2007-09-19 14:47     ` Lee Schermerhorn [this message]
2007-09-14 20:54 ` [PATCH/RFC 7/14] Reclaim Scalability: Non-reclaimable page statistics Lee Schermerhorn
2007-09-17  1:56   ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 8/14] Reclaim Scalability: Ram Disk Pages are non-reclaimable Lee Schermerhorn
2007-09-17  1:57   ` Rik van Riel
2007-09-17 14:40     ` Lee Schermerhorn
2007-09-17 18:42       ` Christoph Lameter
2007-09-14 20:54 ` [PATCH/RFC 9/14] Reclaim Scalability: SHM_LOCKED pages are nonreclaimable Lee Schermerhorn
2007-09-17  2:18   ` Rik van Riel
2007-09-14 20:55 ` [PATCH/RFC 10/14] Reclaim Scalability: track anon_vma "related vmas" Lee Schermerhorn
2007-09-17  2:52   ` Rik van Riel
2007-09-17 15:52     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 11/14] Reclaim Scalability: swap backed pages are nonreclaimable when no swap space available Lee Schermerhorn
2007-09-17  2:53   ` Rik van Riel
2007-09-18 17:46     ` Lee Schermerhorn
2007-09-18 20:01       ` Rik van Riel
2007-09-19 14:55         ` Lee Schermerhorn
2007-09-18  2:59   ` KAMEZAWA Hiroyuki
2007-09-18 15:47     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 12/14] Reclaim Scalability: Non-reclaimable Mlock'ed pages Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 13/14] Reclaim Scalability: Handle Mlock'ed pages during map/unmap and truncate Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 14/14] Reclaim Scalability: cull non-reclaimable anon pages in fault path Lee Schermerhorn
2007-09-14 21:11 ` [PATCH/RFC 0/14] Page Reclaim Scalability Peter Zijlstra
2007-09-14 21:42   ` Linus Torvalds
2007-09-14 22:02     ` Peter Zijlstra
2007-09-15  0:07       ` Linus Torvalds
2007-09-17  6:44 ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1190213249.5301.25.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).