linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Nazarewicz <mina86@mina86.com>, Mel Gorman <mel@csn.ul.ie>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>
Subject: Re: [RFC v2] memory-hotplug: remove MIGRATE_ISOLATE from free_area->free_list
Date: Fri, 14 Sep 2012 10:15:50 +0900	[thread overview]
Message-ID: <20120914011550.GE5085@bbox> (raw)
In-Reply-To: <201209131621.43074.b.zolnierkie@samsung.com>

Hi Bart,

On Thu, Sep 13, 2012 at 04:21:42PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Thursday 06 September 2012 18:34:35 Bartlomiej Zolnierkiewicz wrote:
> > 
> > Hi,
> > 
> > On Thursday 06 September 2012 04:53:38 Minchan Kim wrote:
> > > Normally, MIGRATE_ISOLATE type is used for memory-hotplug.
> > > But it's irony type because the pages isolated would exist
> > > as free page in free_area->free_list[MIGRATE_ISOLATE] so people
> > > can think of it as allocatable pages but it is *never* allocatable.
> > > It ends up confusing NR_FREE_PAGES vmstat so it would be
> > > totally not accurate so some of place which depend on such vmstat
> > > could reach wrong decision by the context.
> > > 
> > > There were already report about it.[1]
> > > [1] 702d1a6e, memory-hotplug: fix kswapd looping forever problem
> > > 
> > > Then, there was other report which is other problem.[2]
> > > [2] http://www.spinics.net/lists/linux-mm/msg41251.html
> > > 
> > > I believe it can make problems in future, too.
> > > So I hope removing such irony type by another design.
> > > 
> > > I hope this patch solves it and let's revert [1] and doesn't need [2].
> 
> For our needs (CMA) patch [2] is much simpler / less intrusive way
> to have correct NR_FREE_PAGES counter than this patch and currently
> I would prefer to have it merged upstream instead of this one.

I agree my patch could be somewhat big change so I will take things easy.
Of course, it shouldn't prevent your patch merge if yours make sense.
Afterward, if this patch solves all issues and better than other band-aid,
then, we can revert.
Shortly, I will review yours.

> 
> > > * Changelog v1
> > >  * Fix from Michal's many suggestion
> > > 
> > > Cc: Michal Nazarewicz <mina86@mina86.com>
> > > Cc: Mel Gorman <mel@csn.ul.ie>
> > > Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> > > Cc: Wen Congyang <wency@cn.fujitsu.com>
> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > ---
> > > It's very early version which show the concept so I still marked it with RFC.
> > > I just tested it with simple test and works.
> > > This patch is needed indepth review from memory-hotplug guys from fujitsu
> > > because I saw there are lots of patches recenlty they sent to about
> > > memory-hotplug change. Please take a look at this patch.
> > 
> > [...]
> > 
> > > @@ -948,8 +954,13 @@ static int move_freepages(struct zone *zone,
> > >  		}
> > >  
> > >  		order = page_order(page);
> > > -		list_move(&page->lru,
> > > -			  &zone->free_area[order].free_list[migratetype]);
> > > +		if (migratetype != MIGRATE_ISOLATE) {
> > > +			list_move(&page->lru,
> > > +				&zone->free_area[order].free_list[migratetype]);
> > > +		} else {
> > > +			list_del(&page->lru);
> > > +			isolate_free_page(page, order);
> > > +		}
> > >  		page += 1 << order;
> > >  		pages_moved += 1 << order;
> > >  	}
> > 
> > Shouldn't NR_FREE_PAGES counter be decreased somewhere above?
> > 
> > [ I can see that it is not modified in __free_pages_ok() and
> >   free_hot_cold_page() because page is still counted as non-free one but
> >   here situation is different AFAICS. ]
> > 
> > I tested the patch locally here with CONFIG_CMA=y and it causes some
> > major problems for CMA (multiple errors from dma_alloc_from_contiguous()
> > about memory ranges being busy and allocation failures).
> > 
> > [ I'm sorry that I don't know more details yet but the issue should be
> >   easily reproducible. ]
> 
> We spent some more time on the issue and it seems that the approach
> taken in the patch (removal of MIGRATE_ISOLATE free_list) is currently
> incompatible with CMA.
> 
> In alloc_contig_range() we have:
> 
> 	order = 0;
> 	outer_start = start;
> 	while (!PageBuddy(pfn_to_page(outer_start))) {
> 		if (++order >= MAX_ORDER) {
> 			ret = -EBUSY;
> 			goto done;
> 		}
> 		outer_start &= ~0UL << order;
> 	}
> 
> for handling cases when the CMA area begins inside the higher order
> page from buddy (that got already isolated).  Unfortunately this code
> no longer works as isolated pages are no longer hold in buddy allocator
> (isolate_free_page() clears buddy bit).
> 
> The other part of code that is probably affected by your patch is:
> 
> 	/* Grab isolated pages from freelists. */
> 	outer_end = isolate_freepages_range(outer_start, end);
> 	if (!outer_end) {
> 		ret = -EBUSY;
> 		goto done;
> 	}
> 
> also in alloc_contig_range().  isolate_freepages_range() calls
> isolate_freepages_block() which assume that free pages (in isolated
> pageblock) are in buddy allocator:
> 
> 		if (!PageBuddy(page)) {
> 			if (strict)
> 				return 0;
> 			continue;
> 		}
> 
> (which is no longer true) and also calls split_free_page() that
> attempts to remove page from the free_list & buddy:
> 
> 	/* Remove page from free list */
> 	list_del(&page->lru);
> 	zone->free_area[order].nr_free--;
> 	rmv_page_order(page);
> 
> (the isolated page is on the isolated_pages list instead).

Thanks for detailed pointing out, Bart!
I will revisit this issues after I will fisnish other jobs.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-09-14  1:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-06  2:53 [RFC v2] memory-hotplug: remove MIGRATE_ISOLATE from free_area->free_list Minchan Kim
2012-09-06  8:14 ` Lai Jiangshan
2012-09-06  8:18   ` Minchan Kim
2012-09-06  8:57     ` Lai Jiangshan
2012-09-06 12:51       ` Michal Nazarewicz
2012-09-06  9:01     ` Lai Jiangshan
2012-09-06 12:56       ` Michal Nazarewicz
2012-09-06 12:48 ` Michal Nazarewicz
2012-09-06 16:34 ` Bartlomiej Zolnierkiewicz
2012-09-11  0:49   ` Minchan Kim
2012-09-13 14:21   ` Bartlomiej Zolnierkiewicz
2012-09-14  1:15     ` Minchan Kim [this message]
2012-09-07  7:28 ` Wen Congyang
2012-09-11  0:52   ` Minchan Kim
2012-09-11  1:37     ` Wen Congyang
2012-09-11  6:16       ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120914011550.GE5085@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mel@csn.ul.ie \
    --cc=mina86@mina86.com \
    --cc=wency@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).