Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
       [not found] <201204231202.55739.b.zolnierkie@samsung.com>
@ 2012-04-23 14:56 ` Mel Gorman
  2012-04-24 12:03   ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 3+ messages in thread
From: Mel Gorman @ 2012-04-23 14:56 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: linux-mm, Kyungmin Park, Marek Szyprowski

On Mon, Apr 23, 2012 at 12:02:55PM +0200, Bartlomiej Zolnierkiewicz wrote:
> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Subject: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
> 
> When Unmovable pages are freed from Unmovable type pageblock
> (and some Movable type pages are left in it) the type of
> the pageblock remains unchanged and therefore the pageblock
> cannot be used as a migration target during compaction.
> 

It does not remain unchanged forever. It can get reset a allocation time
although this depends on detecting that much of the pageblock is free.
This depends on high-order pages being freed which your adverse workload
avoids.

> Fix it by recording Unmovable type pages in a separate bitmap
> (which consumes 128 bytes per 4MiB of memory) and actively
> trying to fix the whole pageblock type during compaction
> (so the previously unsuitable pageblocks can now be used as
> a migration targets).
> 
> [ I also tried using counter for Unmovable pages per pageblock
>   but this approach turned out to be insufficient as we don't
>   always have an information about type of the page that we are
>   freeing. ]
> 

I have not read the patch yet but it seems very heavy-handed to add a
whole new bitmap for this. Based on your estimate it is 1 bit per page in a
pageblock which means that every allocation or free is likely to be updating
this bitmap. On machines with many cores that is potentially a lot of dirty
cache line bouncing and may incur significant overhead.  I'll know for sure
when I see the patch but my initial feeling is that this is a big problem.

> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> - allocate 120000 pages for kernel's usage
> - free every second page (60000 pages) of memory just allocated
> - allocate and use 60000 pages from user space
> - free remaining 60000 pages of kernel memory
> (now we have fragmented memory occupied mostly by user space pages)
> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
> 

Ok, that is indeed an adverse workload that the current system will not
properly deal with. I think you are right to try fixing this but may need
a different approach that takes the cost out of the allocation/free path
and moves it the compaction path.

> The results:
> - with compaction disabled I get 11 successful allocations
> - with compaction enabled - 14 successful allocations
> - with this patch I'm able to get all 100 successful allocations
> 
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
> This patch replaces http://marc.info/?l=linux-mm&m=133364363709346&w=2
> 
>  include/linux/mmzone.h |   10 ++
>  mm/compaction.c        |    3 
>  mm/internal.h          |    1 
>  mm/page_alloc.c        |  128 +++++++++++++++++++++++++++++
>  mm/sparse.c            |  216 +++++++++++++++++++++++++++++++++++++++++++++++--
>  5 files changed, 353 insertions(+), 5 deletions(-)
> 
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h	2012-04-20 16:35:16.894872193 +0200
> +++ b/include/linux/mmzone.h	2012-04-23 09:55:01.845549009 +0200
> @@ -379,6 +379,10 @@
>  	 * In SPARSEMEM, this map is stored in struct mem_section
>  	 */
>  	unsigned long		*pageblock_flags;
> +
> +#ifdef CONFIG_COMPACTION
> +	unsigned long		*unmovable_map;
> +#endif
>  #endif /* CONFIG_SPARSEMEM */
>  
>  #ifdef CONFIG_COMPACTION
> @@ -1033,6 +1037,12 @@
>  
>  	/* See declaration of similar field in struct zone */
>  	unsigned long *pageblock_flags;
> +
> +#ifdef CONFIG_COMPACTION
> +	unsigned long *unmovable_map;
> +	unsigned long pad0; /* Why this is needed? */
> +#endif
> +

You tell us, you added the padding :)

If I had to guess you are trying to avoid sharing a cache line between
unmovable_map and adjacent fields but I doubt it is necessary.

>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>  	/*
>  	 * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> Index: b/mm/compaction.c
> ===================================================================
> --- a/mm/compaction.c	2012-04-20 16:35:16.910872188 +0200
> +++ b/mm/compaction.c	2012-04-23 09:33:54.525527592 +0200
> @@ -376,6 +376,9 @@
>  	if (migrate_async_suitable(migratetype))
>  		return true;
>  
> +	if (migratetype == MIGRATE_UNMOVABLE && set_unmovable_movable(page))
> +		return true;
> +

Ok, I have a two suggested changes to this

1. compaction currently has sync and async compaction. I suggest you
   make it a three states called async_partial, async_full and sync.
   async_partial would be the current behaviour. async_full and sync
   would both scan within MIGRATE_UNMOVABLE blocks to see if they
   needed to be changed. This will add a new slower path but the
   common path will be as it is today.

2. You maintain a bitmap of unmovable pages. Get rid of it. Instead have
   set_unmovable_movable scan the pageblock and build a free count based
   on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
   If all pages within the block are in one of those three sets, call
   set_pageblock_migratetype(MIGRATE_MOVABLE) and call move_freepages_block()
   I also suggest finding a better name than set_unmovable_movable
   although  I do not have a better suggestion myself right now.

>  	/* Otherwise skip the block */
>  	return false;
>  }
> Index: b/mm/internal.h
> ===================================================================
> --- a/mm/internal.h	2012-04-20 16:35:16.898872189 +0200
> +++ b/mm/internal.h	2012-04-20 16:36:45.566872179 +0200
> @@ -95,6 +95,7 @@
>   * in mm/page_alloc.c
>   */
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> +extern bool set_unmovable_movable(struct page *page);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
>  extern bool is_free_buddy_page(struct page *page);
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c	2012-04-20 16:36:44.054872175 +0200
> +++ b/mm/page_alloc.c	2012-04-23 09:53:27.861547420 +0200
> @@ -257,6 +257,95 @@
>  					PB_migrate, PB_migrate_end);
>  }
>  
> +#ifdef CONFIG_COMPACTION
> +static inline unsigned long *get_unmovable_bitmap(struct zone *zone,
> +						  unsigned long pfn)
> +{
> +#ifdef CONFIG_SPARSEMEM
> +	return __pfn_to_section(pfn)->unmovable_map;
> +#else
> +	return zone->unmovable_map;
> +#endif /* CONFIG_SPARSEMEM */
> +}
> +
> +static inline int pfn_to_idx(struct zone *zone, unsigned long pfn)
> +{
> +#ifdef CONFIG_SPARSEMEM
> +	pfn &= (PAGES_PER_SECTION-1);
> +	return pfn;
> +#else
> +	pfn = pfn - zone->zone_start_pfn;
> +	return pfn;
> +#endif /* CONFIG_SPARSEMEM */
> +}
> +
> +static void set_unmovable_bitmap(struct page *page, int order)
> +{
> +	struct zone *zone;
> +	unsigned long *map;
> +	unsigned long pfn;
> +	int idx, i;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	map = get_unmovable_bitmap(zone, pfn);
> +	idx = pfn_to_idx(zone, pfn);
> +
> +	for (i = 0; i < (1 << order); i++)
> +		__set_bit(idx + i, map);
> +}
> +
> +static void clear_unmovable_bitmap(struct page *page, int order)
> +{
> +	struct zone *zone;
> +	unsigned long *map;
> +	unsigned long pfn;
> +	int idx, i;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	map = get_unmovable_bitmap(zone, pfn);
> +	idx = pfn_to_idx(zone, pfn);
> +
> +	for (i = 0; i < (1 << order); i++)
> +		__clear_bit(idx + i, map);
> +
> +}
> +

This stuff is called from page allocator fast paths which means it will
have a system-wide slowdown. This should be avoided.

> +static int move_freepages_block(struct zone *, struct page *, int);
> +
> +bool set_unmovable_movable(struct page *page)
> +{
> +	struct zone *zone;
> +	unsigned long *map;
> +	unsigned long pfn, start_pfn, end_pfn, t = 0;
> +	int idx, i, step = sizeof(unsigned long) * 8;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	map = get_unmovable_bitmap(zone, pfn);
> +	idx = pfn_to_idx(zone, pfn);
> +
> +	start_pfn = idx & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	/* check if pageblock is free of unmovable pages */
> +	for (i = start_pfn; i <= end_pfn; i += step)
> +		t |= map[i / step];
> +
> +	if (!t) {
> +		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +		move_freepages_block(zone, page, MIGRATE_MOVABLE);
> +		return true;
> +	}
> +
> +	return false;
> +}

And it can be avoided if this thing scans a pageblock looking for PageBuddy,
page_count==0 and PageLRU pages and changing the pageblock if all the
patches are of that type. Compaction may be slower as a result but that's
better than hurting the page allocator fast paths and it will also remove
an awful lot of complexity.

I did not read much of the rest of the patch because large parts of it
deal with this unmovable_bitmap which I do not think is necessary at
this point.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-23 14:56 ` [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks Mel Gorman
@ 2012-04-24 12:03   ` Bartlomiej Zolnierkiewicz
  2012-04-26  9:06     ` Mel Gorman
  0 siblings, 1 reply; 3+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-04-24 12:03 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Kyungmin Park, Marek Szyprowski


Hi,

On Monday 23 April 2012 16:56:31 Mel Gorman wrote:
> On Mon, Apr 23, 2012 at 12:02:55PM +0200, Bartlomiej Zolnierkiewicz wrote:

[...]

> >  include/linux/mmzone.h |   10 ++
> >  mm/compaction.c        |    3 
> >  mm/internal.h          |    1 
> >  mm/page_alloc.c        |  128 +++++++++++++++++++++++++++++
> >  mm/sparse.c            |  216 +++++++++++++++++++++++++++++++++++++++++++++++--
> >  5 files changed, 353 insertions(+), 5 deletions(-)
> > 
> > Index: b/include/linux/mmzone.h
> > ===================================================================
> > --- a/include/linux/mmzone.h	2012-04-20 16:35:16.894872193 +0200
> > +++ b/include/linux/mmzone.h	2012-04-23 09:55:01.845549009 +0200
> > @@ -379,6 +379,10 @@
> >  	 * In SPARSEMEM, this map is stored in struct mem_section
> >  	 */
> >  	unsigned long		*pageblock_flags;
> > +
> > +#ifdef CONFIG_COMPACTION
> > +	unsigned long		*unmovable_map;
> > +#endif
> >  #endif /* CONFIG_SPARSEMEM */
> >  
> >  #ifdef CONFIG_COMPACTION
> > @@ -1033,6 +1037,12 @@
> >  
> >  	/* See declaration of similar field in struct zone */
> >  	unsigned long *pageblock_flags;
> > +
> > +#ifdef CONFIG_COMPACTION
> > +	unsigned long *unmovable_map;
> > +	unsigned long pad0; /* Why this is needed? */
> > +#endif
> > +
> 
> You tell us, you added the padding :)

I wish I could.. :)

> If I had to guess you are trying to avoid sharing a cache line between
> unmovable_map and adjacent fields but I doubt it is necessary.

Unfortunately the pad is needed or the kernel just freezes somewhere
early during memory zone initialization.  I don't remember details but
it was somewhere on the access to page->flags of the first page..

> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> >  	/*
> >  	 * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> > Index: b/mm/compaction.c
> > ===================================================================
> > --- a/mm/compaction.c	2012-04-20 16:35:16.910872188 +0200
> > +++ b/mm/compaction.c	2012-04-23 09:33:54.525527592 +0200
> > @@ -376,6 +376,9 @@
> >  	if (migrate_async_suitable(migratetype))
> >  		return true;
> >  
> > +	if (migratetype == MIGRATE_UNMOVABLE && set_unmovable_movable(page))
> > +		return true;
> > +
> 
> Ok, I have a two suggested changes to this
> 
> 1. compaction currently has sync and async compaction. I suggest you
>    make it a three states called async_partial, async_full and sync.
>    async_partial would be the current behaviour. async_full and sync
>    would both scan within MIGRATE_UNMOVABLE blocks to see if they
>    needed to be changed. This will add a new slower path but the
>    common path will be as it is today.
> 
> 2. You maintain a bitmap of unmovable pages. Get rid of it. Instead have
>    set_unmovable_movable scan the pageblock and build a free count based
>    on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
>    If all pages within the block are in one of those three sets, call
>    set_pageblock_migratetype(MIGRATE_MOVABLE) and call move_freepages_block()
>    I also suggest finding a better name than set_unmovable_movable
>    although  I do not have a better suggestion myself right now.

Ok, I'll post the updated patch shortly.

Thank you for the review.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-24 12:03   ` Bartlomiej Zolnierkiewicz
@ 2012-04-26  9:06     ` Mel Gorman
  0 siblings, 0 replies; 3+ messages in thread
From: Mel Gorman @ 2012-04-26  9:06 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: linux-mm, Kyungmin Park, Marek Szyprowski

On Tue, Apr 24, 2012 at 02:03:29PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > >  include/linux/mmzone.h |   10 ++
> > >  mm/compaction.c        |    3 
> > >  mm/internal.h          |    1 
> > >  mm/page_alloc.c        |  128 +++++++++++++++++++++++++++++
> > >  mm/sparse.c            |  216 +++++++++++++++++++++++++++++++++++++++++++++++--
> > >  5 files changed, 353 insertions(+), 5 deletions(-)
> > > 
> > > Index: b/include/linux/mmzone.h
> > > ===================================================================
> > > --- a/include/linux/mmzone.h	2012-04-20 16:35:16.894872193 +0200
> > > +++ b/include/linux/mmzone.h	2012-04-23 09:55:01.845549009 +0200
> > > @@ -379,6 +379,10 @@
> > >  	 * In SPARSEMEM, this map is stored in struct mem_section
> > >  	 */
> > >  	unsigned long		*pageblock_flags;
> > > +
> > > +#ifdef CONFIG_COMPACTION
> > > +	unsigned long		*unmovable_map;
> > > +#endif
> > >  #endif /* CONFIG_SPARSEMEM */
> > >  
> > >  #ifdef CONFIG_COMPACTION
> > > @@ -1033,6 +1037,12 @@
> > >  
> > >  	/* See declaration of similar field in struct zone */
> > >  	unsigned long *pageblock_flags;
> > > +
> > > +#ifdef CONFIG_COMPACTION
> > > +	unsigned long *unmovable_map;
> > > +	unsigned long pad0; /* Why this is needed? */
> > > +#endif
> > > +
> > 
> > You tell us, you added the padding :)
> 
> I wish I could.. :)
> 
> > If I had to guess you are trying to avoid sharing a cache line between
> > unmovable_map and adjacent fields but I doubt it is necessary.
> 
> Unfortunately the pad is needed or the kernel just freezes somewhere
> early during memory zone initialization.  I don't remember details but
> it was somewhere on the access to page->flags of the first page..
> 

Sounds like it was a bug in how the bitmask was managed. It's not worth
losing sleep over as I expect the bitmap is removed by now.

> > >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > >  	/*
> > >  	 * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> > > Index: b/mm/compaction.c
> > > ===================================================================
> > > --- a/mm/compaction.c	2012-04-20 16:35:16.910872188 +0200
> > > +++ b/mm/compaction.c	2012-04-23 09:33:54.525527592 +0200
> > > @@ -376,6 +376,9 @@
> > >  	if (migrate_async_suitable(migratetype))
> > >  		return true;
> > >  
> > > +	if (migratetype == MIGRATE_UNMOVABLE && set_unmovable_movable(page))
> > > +		return true;
> > > +
> > 
> > Ok, I have a two suggested changes to this
> > 
> > 1. compaction currently has sync and async compaction. I suggest you
> >    make it a three states called async_partial, async_full and sync.
> >    async_partial would be the current behaviour. async_full and sync
> >    would both scan within MIGRATE_UNMOVABLE blocks to see if they
> >    needed to be changed. This will add a new slower path but the
> >    common path will be as it is today.
> > 
> > 2. You maintain a bitmap of unmovable pages. Get rid of it. Instead have
> >    set_unmovable_movable scan the pageblock and build a free count based
> >    on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
> >    If all pages within the block are in one of those three sets, call
> >    set_pageblock_migratetype(MIGRATE_MOVABLE) and call move_freepages_block()
> >    I also suggest finding a better name than set_unmovable_movable
> >    although  I do not have a better suggestion myself right now.
> 
> Ok, I'll post the updated patch shortly.
> 

I'll review this ASAP.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-04-26  9:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <201204231202.55739.b.zolnierkie@samsung.com>
2012-04-23 14:56 ` [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks Mel Gorman
2012-04-24 12:03   ` Bartlomiej Zolnierkiewicz
2012-04-26  9:06     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).