Re: [PATCH 6/9] mm: compaction: Acquire the zone->lock as late as possible

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Richard Davies <richard@arachsys.com>,
	Shaohua Li <shli@kernel.org>, Rik van Riel <riel@redhat.com>,
	Avi Kivity <avi@redhat.com>, QEMU-devel <qemu-devel@nongnu.org>,
	KVM <kvm@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/9] mm: compaction: Acquire the zone->lock as late as possible
Date: Tue, 25 Sep 2012 16:36:54 +0900	[thread overview]
Message-ID: <20120925073654.GN13234@bbox> (raw)
In-Reply-To: <20120924085238.GY11266@suse.de>

On Mon, Sep 24, 2012 at 09:52:38AM +0100, Mel Gorman wrote:
> On Fri, Sep 21, 2012 at 02:35:57PM -0700, Andrew Morton wrote:
> > On Fri, 21 Sep 2012 11:46:20 +0100
> > Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > Compactions free scanner acquires the zone->lock when checking for PageBuddy
> > > pages and isolating them. It does this even if there are no PageBuddy pages
> > > in the range.
> > > 
> > > This patch defers acquiring the zone lock for as long as possible. In the
> > > event there are no free pages in the pageblock then the lock will not be
> > > acquired at all which reduces contention on zone->lock.
> > >
> > > ...
> > >
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -93,6 +93,28 @@ static inline bool compact_trylock_irqsave(spinlock_t *lock,
> > >  	return compact_checklock_irqsave(lock, flags, false, cc);
> > >  }
> > >  
> > > +/* Returns true if the page is within a block suitable for migration to */
> > > +static bool suitable_migration_target(struct page *page)
> > > +{
> > > +
> > 
> > stray newline
> > 
> 
> Fixed.
> 
> > > +	int migratetype = get_pageblock_migratetype(page);
> > > +
> > > +	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
> > > +	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
> > > +		return false;
> > > +
> > > +	/* If the page is a large free page, then allow migration */
> > > +	if (PageBuddy(page) && page_order(page) >= pageblock_order)
> > > +		return true;
> > > +
> > > +	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> > > +	if (migrate_async_suitable(migratetype))
> > > +		return true;
> > > +
> > > +	/* Otherwise skip the block */
> > > +	return false;
> > > +}
> > > +
> > >
> > > ...
> > >
> > > @@ -168,23 +193,38 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > >  		int isolated, i;
> > >  		struct page *page = cursor;
> > >  
> > > -		if (!pfn_valid_within(blockpfn)) {
> > > -			if (strict)
> > > -				return 0;
> > > -			continue;
> > > -		}
> > > +		if (!pfn_valid_within(blockpfn))
> > > +			goto strict_check;
> > >  		nr_scanned++;
> > >  
> > > -		if (!PageBuddy(page)) {
> > > -			if (strict)
> > > -				return 0;
> > > -			continue;
> > > -		}
> > > +		if (!PageBuddy(page))
> > > +			goto strict_check;
> > > +
> > > +		/*
> > > +		 * The zone lock must be held to isolate freepages. This
> > > +		 * unfortunately this is a very coarse lock and can be
> > 
> > this this
> > 
> 
> Fixed.
> 
> > > +		 * heavily contended if there are parallel allocations
> > > +		 * or parallel compactions. For async compaction do not
> > > +		 * spin on the lock and we acquire the lock as late as
> > > +		 * possible.
> > > +		 */
> > > +		locked = compact_checklock_irqsave(&cc->zone->lock, &flags,
> > > +								locked, cc);
> > > +		if (!locked)
> > > +			break;
> > > +
> > > +		/* Recheck this is a suitable migration target under lock */
> > > +		if (!strict && !suitable_migration_target(page))
> > > +			break;
> > > +
> > > +		/* Recheck this is a buddy page under lock */
> > > +		if (!PageBuddy(page))
> > > +			goto strict_check;
> > >  
> > >  		/* Found a free page, break it into order-0 pages */
> > >  		isolated = split_free_page(page);
> > >  		if (!isolated && strict)
> > > -			return 0;
> > > +			goto strict_check;
> > >  		total_isolated += isolated;
> > >  		for (i = 0; i < isolated; i++) {
> > >  			list_add(&page->lru, freelist);
> > > @@ -196,9 +236,23 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > >  			blockpfn += isolated - 1;
> > >  			cursor += isolated - 1;
> > >  		}
> > > +
> > > +		continue;
> > > +
> > > +strict_check:
> > > +		/* Abort isolation if the caller requested strict isolation */
> > > +		if (strict) {
> > > +			total_isolated = 0;
> > > +			goto out;
> > > +		}
> > >  	}
> > >  
> > >  	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
> > > +
> > > +out:
> > > +	if (locked)
> > > +		spin_unlock_irqrestore(&cc->zone->lock, flags);
> > > +
> > >  	return total_isolated;
> > >  }
> > 
> > A a few things about this function.
> > 
> > Would it be cleaner if we did
> > 
> > 	if (!strict) {
> > 		if (!suitable_migration_target(page))
> > 			break;
> > 	} else {
> > 		if (!PageBuddy(page))
> > 			goto strict_check;
> > 	}
> > 
> > and then remove the test of `strict' from strict_check (which then
> > should be renamed)?
> > 
> 
> I was not able to implement what you suggested without breakage.
> However, I can do something very similar if "strict" mode does not bail
> out ASAP and instead double checks at the end that everything was
> isolated correctly. This does mean that CMAs failure case is slightly
> more expensive but that really is a corner case and I think it's
> acceptable if the code is easier to follow as a result.
> 
> > Which perhaps means that the whole `strict_check' block can go away:
> > 
> > 	if (!strict) {
> > 		if (!suitable_migration_target(page))
> > 			break;
> > 	} else {
> > 		if (!PageBuddy(page)) {
> > 			total_isolated = 0;
> > 			goto out;
> > 	}
> > 
> > Have a think about it?  The function is a little straggly.
> > 
> 
> Proposal below.
> 
> > Secondly, it is correct/desirable to skip the (now misnamed
> > `trace_mm_compaction_isolate_freepages') tracepoint generation if we
> > baled out of the loop?  The fact that we entered
> > isolate_freepages_block() but failed to isolate anything is data which
> > people might be interested in?
> > 
> 
> You're right, it may be interesting for someone debugging CMA to know that
> nr_scanned != nr_isolated at the time of allocation failure. 
> 
> > Thirdly, that (existing) comment "This assumes the block is valid" is
> > either too vague or wrong.  If it's valid, why wo we check
> > pfn_valid_within()?
> > 
> 
> Comment is stale and should be removed.
> 
> > > @@ -218,13 +272,18 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > >  unsigned long
> > >  isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
> > >  {
> > > -	unsigned long isolated, pfn, block_end_pfn, flags;
> > > +	unsigned long isolated, pfn, block_end_pfn;
> > >  	struct zone *zone = NULL;
> > >  	LIST_HEAD(freelist);
> > > +	struct compact_control cc;
> > >  
> > >  	if (pfn_valid(start_pfn))
> > >  		zone = page_zone(pfn_to_page(start_pfn));
> > >  
> > > +	/* cc needed for isolate_freepages_block to acquire zone->lock */
> > > +	cc.zone = zone;
> > > +	cc.sync = true;
> > 
> > We initialise two of cc's fields, leave the other 12 fields containing
> > random garbage, then start using it.  I see no bug here, but...
> > 
> 
> I get your point. At the very least we should initialise it with zeros.
> 
> How about this?
> 
> ---8<---
> mm: compaction: Iron out isolate_freepages_block() and isolate_freepages_range()
> 
> Andrew pointed out that isolate_freepages_block() is "straggly" and
> isolate_freepages_range() is making assumptions on how compact_control is
> used which is delicate. This patch straightens isolate_freepages_block()
> and makes it fly straight and initialses compact_control to zeros in
> isolate_freepages_range(). The code should be easier to follow and
> is functionally equivalent. The CMA failure path is now a little more
> expensive but that is a marginal corner-case.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-09-25  7:33 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-21 10:46 [PATCH 0/9] Reduce compaction scanning and lock contention Mel Gorman
2012-09-21 10:46 ` [PATCH 1/9] Revert "mm: compaction: check lock contention first before taking lock" Mel Gorman
2012-09-21 17:46   ` Rafael Aquini
2012-09-21 10:46 ` [PATCH 2/9] Revert "mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix" Mel Gorman
2012-09-21 17:47   ` Rafael Aquini
2012-09-21 10:46 ` [PATCH 3/9] Revert "mm: compaction: abort compaction loop if lock is contended or run too long" Mel Gorman
2012-09-21 17:48   ` Rafael Aquini
2012-09-21 10:46 ` [PATCH 4/9] mm: compaction: Abort compaction loop if lock is contended or run too long Mel Gorman
2012-09-21 17:50   ` Rafael Aquini
2012-09-21 21:31   ` Andrew Morton
2012-09-25  7:34   ` Minchan Kim
2012-09-21 10:46 ` [PATCH 5/9] mm: compaction: Acquire the zone->lru_lock as late as possible Mel Gorman
2012-09-21 17:51   ` Rafael Aquini
2012-09-25  7:05   ` Minchan Kim
2012-09-25  7:51     ` Mel Gorman
2012-09-25  8:13       ` Minchan Kim
2012-09-25 21:39         ` Andrew Morton
2012-09-26  0:23           ` Minchan Kim
2012-09-26 10:17           ` Mel Gorman
2012-09-21 10:46 ` [PATCH 6/9] mm: compaction: Acquire the zone->lock " Mel Gorman
2012-09-21 17:52   ` Rafael Aquini
2012-09-21 21:35   ` Andrew Morton
2012-09-24  8:52     ` Mel Gorman
2012-09-25  7:36       ` Minchan Kim [this message]
2012-09-25  7:35   ` Minchan Kim
2012-09-21 10:46 ` [PATCH 7/9] Revert "mm: have order > 0 compaction start off where it left" Mel Gorman
2012-09-21 17:52   ` Rafael Aquini
2012-09-25  7:37   ` Minchan Kim
2012-09-21 10:46 ` [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated Mel Gorman
2012-09-21 17:53   ` Rafael Aquini
2012-09-21 21:36   ` Andrew Morton
2012-09-24  9:39     ` Mel Gorman
2012-09-24 21:26       ` Andrew Morton
2012-09-25  9:12         ` Mel Gorman
2012-09-25 20:03           ` Andrew Morton
2012-09-27 12:06             ` [PATCH] mm: compaction: cache if a pageblock was scanned and no pages were isolated -fix2 Mel Gorman
2012-09-27 13:12             ` [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated Mel Gorman
2012-09-26  0:49           ` Minchan Kim
2012-09-27 12:14             ` Mel Gorman
2012-09-21 10:46 ` [PATCH 9/9] mm: compaction: Restart compaction from near where it left off Mel Gorman
2012-09-21 17:54   ` Rafael Aquini
2012-09-21 13:51 ` [PATCH 0/9] Reduce compaction scanning and lock contention Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120925073654.GN13234@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=qemu-devel@nongnu.org \
    --cc=richard@arachsys.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).