* Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
[not found] <201204231202.55739.b.zolnierkie@samsung.com>
@ 2012-04-23 14:56 ` Mel Gorman
2012-04-24 12:03 ` Bartlomiej Zolnierkiewicz
0 siblings, 1 reply; 3+ messages in thread
From: Mel Gorman @ 2012-04-23 14:56 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz; +Cc: linux-mm, Kyungmin Park, Marek Szyprowski
On Mon, Apr 23, 2012 at 12:02:55PM +0200, Bartlomiej Zolnierkiewicz wrote:
> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Subject: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
>
> When Unmovable pages are freed from Unmovable type pageblock
> (and some Movable type pages are left in it) the type of
> the pageblock remains unchanged and therefore the pageblock
> cannot be used as a migration target during compaction.
>
It does not remain unchanged forever. It can get reset a allocation time
although this depends on detecting that much of the pageblock is free.
This depends on high-order pages being freed which your adverse workload
avoids.
> Fix it by recording Unmovable type pages in a separate bitmap
> (which consumes 128 bytes per 4MiB of memory) and actively
> trying to fix the whole pageblock type during compaction
> (so the previously unsuitable pageblocks can now be used as
> a migration targets).
>
> [ I also tried using counter for Unmovable pages per pageblock
> but this approach turned out to be insufficient as we don't
> always have an information about type of the page that we are
> freeing. ]
>
I have not read the patch yet but it seems very heavy-handed to add a
whole new bitmap for this. Based on your estimate it is 1 bit per page in a
pageblock which means that every allocation or free is likely to be updating
this bitmap. On machines with many cores that is potentially a lot of dirty
cache line bouncing and may incur significant overhead. I'll know for sure
when I see the patch but my initial feeling is that this is a big problem.
> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> - allocate 120000 pages for kernel's usage
> - free every second page (60000 pages) of memory just allocated
> - allocate and use 60000 pages from user space
> - free remaining 60000 pages of kernel memory
> (now we have fragmented memory occupied mostly by user space pages)
> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>
Ok, that is indeed an adverse workload that the current system will not
properly deal with. I think you are right to try fixing this but may need
a different approach that takes the cost out of the allocation/free path
and moves it the compaction path.
> The results:
> - with compaction disabled I get 11 successful allocations
> - with compaction enabled - 14 successful allocations
> - with this patch I'm able to get all 100 successful allocations
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
> This patch replaces http://marc.info/?l=linux-mm&m=133364363709346&w=2
>
> include/linux/mmzone.h | 10 ++
> mm/compaction.c | 3
> mm/internal.h | 1
> mm/page_alloc.c | 128 +++++++++++++++++++++++++++++
> mm/sparse.c | 216 +++++++++++++++++++++++++++++++++++++++++++++++--
> 5 files changed, 353 insertions(+), 5 deletions(-)
>
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h 2012-04-20 16:35:16.894872193 +0200
> +++ b/include/linux/mmzone.h 2012-04-23 09:55:01.845549009 +0200
> @@ -379,6 +379,10 @@
> * In SPARSEMEM, this map is stored in struct mem_section
> */
> unsigned long *pageblock_flags;
> +
> +#ifdef CONFIG_COMPACTION
> + unsigned long *unmovable_map;
> +#endif
> #endif /* CONFIG_SPARSEMEM */
>
> #ifdef CONFIG_COMPACTION
> @@ -1033,6 +1037,12 @@
>
> /* See declaration of similar field in struct zone */
> unsigned long *pageblock_flags;
> +
> +#ifdef CONFIG_COMPACTION
> + unsigned long *unmovable_map;
> + unsigned long pad0; /* Why this is needed? */
> +#endif
> +
You tell us, you added the padding :)
If I had to guess you are trying to avoid sharing a cache line between
unmovable_map and adjacent fields but I doubt it is necessary.
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> /*
> * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> Index: b/mm/compaction.c
> ===================================================================
> --- a/mm/compaction.c 2012-04-20 16:35:16.910872188 +0200
> +++ b/mm/compaction.c 2012-04-23 09:33:54.525527592 +0200
> @@ -376,6 +376,9 @@
> if (migrate_async_suitable(migratetype))
> return true;
>
> + if (migratetype == MIGRATE_UNMOVABLE && set_unmovable_movable(page))
> + return true;
> +
Ok, I have a two suggested changes to this
1. compaction currently has sync and async compaction. I suggest you
make it a three states called async_partial, async_full and sync.
async_partial would be the current behaviour. async_full and sync
would both scan within MIGRATE_UNMOVABLE blocks to see if they
needed to be changed. This will add a new slower path but the
common path will be as it is today.
2. You maintain a bitmap of unmovable pages. Get rid of it. Instead have
set_unmovable_movable scan the pageblock and build a free count based
on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
If all pages within the block are in one of those three sets, call
set_pageblock_migratetype(MIGRATE_MOVABLE) and call move_freepages_block()
I also suggest finding a better name than set_unmovable_movable
although I do not have a better suggestion myself right now.
> /* Otherwise skip the block */
> return false;
> }
> Index: b/mm/internal.h
> ===================================================================
> --- a/mm/internal.h 2012-04-20 16:35:16.898872189 +0200
> +++ b/mm/internal.h 2012-04-20 16:36:45.566872179 +0200
> @@ -95,6 +95,7 @@
> * in mm/page_alloc.c
> */
> extern void __free_pages_bootmem(struct page *page, unsigned int order);
> +extern bool set_unmovable_movable(struct page *page);
> extern void prep_compound_page(struct page *page, unsigned long order);
> #ifdef CONFIG_MEMORY_FAILURE
> extern bool is_free_buddy_page(struct page *page);
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c 2012-04-20 16:36:44.054872175 +0200
> +++ b/mm/page_alloc.c 2012-04-23 09:53:27.861547420 +0200
> @@ -257,6 +257,95 @@
> PB_migrate, PB_migrate_end);
> }
>
> +#ifdef CONFIG_COMPACTION
> +static inline unsigned long *get_unmovable_bitmap(struct zone *zone,
> + unsigned long pfn)
> +{
> +#ifdef CONFIG_SPARSEMEM
> + return __pfn_to_section(pfn)->unmovable_map;
> +#else
> + return zone->unmovable_map;
> +#endif /* CONFIG_SPARSEMEM */
> +}
> +
> +static inline int pfn_to_idx(struct zone *zone, unsigned long pfn)
> +{
> +#ifdef CONFIG_SPARSEMEM
> + pfn &= (PAGES_PER_SECTION-1);
> + return pfn;
> +#else
> + pfn = pfn - zone->zone_start_pfn;
> + return pfn;
> +#endif /* CONFIG_SPARSEMEM */
> +}
> +
> +static void set_unmovable_bitmap(struct page *page, int order)
> +{
> + struct zone *zone;
> + unsigned long *map;
> + unsigned long pfn;
> + int idx, i;
> +
> + zone = page_zone(page);
> + pfn = page_to_pfn(page);
> + map = get_unmovable_bitmap(zone, pfn);
> + idx = pfn_to_idx(zone, pfn);
> +
> + for (i = 0; i < (1 << order); i++)
> + __set_bit(idx + i, map);
> +}
> +
> +static void clear_unmovable_bitmap(struct page *page, int order)
> +{
> + struct zone *zone;
> + unsigned long *map;
> + unsigned long pfn;
> + int idx, i;
> +
> + zone = page_zone(page);
> + pfn = page_to_pfn(page);
> + map = get_unmovable_bitmap(zone, pfn);
> + idx = pfn_to_idx(zone, pfn);
> +
> + for (i = 0; i < (1 << order); i++)
> + __clear_bit(idx + i, map);
> +
> +}
> +
This stuff is called from page allocator fast paths which means it will
have a system-wide slowdown. This should be avoided.
> +static int move_freepages_block(struct zone *, struct page *, int);
> +
> +bool set_unmovable_movable(struct page *page)
> +{
> + struct zone *zone;
> + unsigned long *map;
> + unsigned long pfn, start_pfn, end_pfn, t = 0;
> + int idx, i, step = sizeof(unsigned long) * 8;
> +
> + zone = page_zone(page);
> + pfn = page_to_pfn(page);
> + map = get_unmovable_bitmap(zone, pfn);
> + idx = pfn_to_idx(zone, pfn);
> +
> + start_pfn = idx & ~(pageblock_nr_pages - 1);
> + end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> + /* check if pageblock is free of unmovable pages */
> + for (i = start_pfn; i <= end_pfn; i += step)
> + t |= map[i / step];
> +
> + if (!t) {
> + set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> + move_freepages_block(zone, page, MIGRATE_MOVABLE);
> + return true;
> + }
> +
> + return false;
> +}
And it can be avoided if this thing scans a pageblock looking for PageBuddy,
page_count==0 and PageLRU pages and changing the pageblock if all the
patches are of that type. Compaction may be slower as a result but that's
better than hurting the page allocator fast paths and it will also remove
an awful lot of complexity.
I did not read much of the rest of the patch because large parts of it
deal with this unmovable_bitmap which I do not think is necessary at
this point.
Thanks.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
2012-04-23 14:56 ` [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks Mel Gorman
@ 2012-04-24 12:03 ` Bartlomiej Zolnierkiewicz
2012-04-26 9:06 ` Mel Gorman
0 siblings, 1 reply; 3+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-04-24 12:03 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm, Kyungmin Park, Marek Szyprowski
Hi,
On Monday 23 April 2012 16:56:31 Mel Gorman wrote:
> On Mon, Apr 23, 2012 at 12:02:55PM +0200, Bartlomiej Zolnierkiewicz wrote:
[...]
> > include/linux/mmzone.h | 10 ++
> > mm/compaction.c | 3
> > mm/internal.h | 1
> > mm/page_alloc.c | 128 +++++++++++++++++++++++++++++
> > mm/sparse.c | 216 +++++++++++++++++++++++++++++++++++++++++++++++--
> > 5 files changed, 353 insertions(+), 5 deletions(-)
> >
> > Index: b/include/linux/mmzone.h
> > ===================================================================
> > --- a/include/linux/mmzone.h 2012-04-20 16:35:16.894872193 +0200
> > +++ b/include/linux/mmzone.h 2012-04-23 09:55:01.845549009 +0200
> > @@ -379,6 +379,10 @@
> > * In SPARSEMEM, this map is stored in struct mem_section
> > */
> > unsigned long *pageblock_flags;
> > +
> > +#ifdef CONFIG_COMPACTION
> > + unsigned long *unmovable_map;
> > +#endif
> > #endif /* CONFIG_SPARSEMEM */
> >
> > #ifdef CONFIG_COMPACTION
> > @@ -1033,6 +1037,12 @@
> >
> > /* See declaration of similar field in struct zone */
> > unsigned long *pageblock_flags;
> > +
> > +#ifdef CONFIG_COMPACTION
> > + unsigned long *unmovable_map;
> > + unsigned long pad0; /* Why this is needed? */
> > +#endif
> > +
>
> You tell us, you added the padding :)
I wish I could.. :)
> If I had to guess you are trying to avoid sharing a cache line between
> unmovable_map and adjacent fields but I doubt it is necessary.
Unfortunately the pad is needed or the kernel just freezes somewhere
early during memory zone initialization. I don't remember details but
it was somewhere on the access to page->flags of the first page..
> > #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > /*
> > * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> > Index: b/mm/compaction.c
> > ===================================================================
> > --- a/mm/compaction.c 2012-04-20 16:35:16.910872188 +0200
> > +++ b/mm/compaction.c 2012-04-23 09:33:54.525527592 +0200
> > @@ -376,6 +376,9 @@
> > if (migrate_async_suitable(migratetype))
> > return true;
> >
> > + if (migratetype == MIGRATE_UNMOVABLE && set_unmovable_movable(page))
> > + return true;
> > +
>
> Ok, I have a two suggested changes to this
>
> 1. compaction currently has sync and async compaction. I suggest you
> make it a three states called async_partial, async_full and sync.
> async_partial would be the current behaviour. async_full and sync
> would both scan within MIGRATE_UNMOVABLE blocks to see if they
> needed to be changed. This will add a new slower path but the
> common path will be as it is today.
>
> 2. You maintain a bitmap of unmovable pages. Get rid of it. Instead have
> set_unmovable_movable scan the pageblock and build a free count based
> on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
> If all pages within the block are in one of those three sets, call
> set_pageblock_migratetype(MIGRATE_MOVABLE) and call move_freepages_block()
> I also suggest finding a better name than set_unmovable_movable
> although I do not have a better suggestion myself right now.
Ok, I'll post the updated patch shortly.
Thank you for the review.
Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
2012-04-24 12:03 ` Bartlomiej Zolnierkiewicz
@ 2012-04-26 9:06 ` Mel Gorman
0 siblings, 0 replies; 3+ messages in thread
From: Mel Gorman @ 2012-04-26 9:06 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz; +Cc: linux-mm, Kyungmin Park, Marek Szyprowski
On Tue, Apr 24, 2012 at 02:03:29PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > include/linux/mmzone.h | 10 ++
> > > mm/compaction.c | 3
> > > mm/internal.h | 1
> > > mm/page_alloc.c | 128 +++++++++++++++++++++++++++++
> > > mm/sparse.c | 216 +++++++++++++++++++++++++++++++++++++++++++++++--
> > > 5 files changed, 353 insertions(+), 5 deletions(-)
> > >
> > > Index: b/include/linux/mmzone.h
> > > ===================================================================
> > > --- a/include/linux/mmzone.h 2012-04-20 16:35:16.894872193 +0200
> > > +++ b/include/linux/mmzone.h 2012-04-23 09:55:01.845549009 +0200
> > > @@ -379,6 +379,10 @@
> > > * In SPARSEMEM, this map is stored in struct mem_section
> > > */
> > > unsigned long *pageblock_flags;
> > > +
> > > +#ifdef CONFIG_COMPACTION
> > > + unsigned long *unmovable_map;
> > > +#endif
> > > #endif /* CONFIG_SPARSEMEM */
> > >
> > > #ifdef CONFIG_COMPACTION
> > > @@ -1033,6 +1037,12 @@
> > >
> > > /* See declaration of similar field in struct zone */
> > > unsigned long *pageblock_flags;
> > > +
> > > +#ifdef CONFIG_COMPACTION
> > > + unsigned long *unmovable_map;
> > > + unsigned long pad0; /* Why this is needed? */
> > > +#endif
> > > +
> >
> > You tell us, you added the padding :)
>
> I wish I could.. :)
>
> > If I had to guess you are trying to avoid sharing a cache line between
> > unmovable_map and adjacent fields but I doubt it is necessary.
>
> Unfortunately the pad is needed or the kernel just freezes somewhere
> early during memory zone initialization. I don't remember details but
> it was somewhere on the access to page->flags of the first page..
>
Sounds like it was a bug in how the bitmask was managed. It's not worth
losing sleep over as I expect the bitmap is removed by now.
> > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > > /*
> > > * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> > > Index: b/mm/compaction.c
> > > ===================================================================
> > > --- a/mm/compaction.c 2012-04-20 16:35:16.910872188 +0200
> > > +++ b/mm/compaction.c 2012-04-23 09:33:54.525527592 +0200
> > > @@ -376,6 +376,9 @@
> > > if (migrate_async_suitable(migratetype))
> > > return true;
> > >
> > > + if (migratetype == MIGRATE_UNMOVABLE && set_unmovable_movable(page))
> > > + return true;
> > > +
> >
> > Ok, I have a two suggested changes to this
> >
> > 1. compaction currently has sync and async compaction. I suggest you
> > make it a three states called async_partial, async_full and sync.
> > async_partial would be the current behaviour. async_full and sync
> > would both scan within MIGRATE_UNMOVABLE blocks to see if they
> > needed to be changed. This will add a new slower path but the
> > common path will be as it is today.
> >
> > 2. You maintain a bitmap of unmovable pages. Get rid of it. Instead have
> > set_unmovable_movable scan the pageblock and build a free count based
> > on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
> > If all pages within the block are in one of those three sets, call
> > set_pageblock_migratetype(MIGRATE_MOVABLE) and call move_freepages_block()
> > I also suggest finding a better name than set_unmovable_movable
> > although I do not have a better suggestion myself right now.
>
> Ok, I'll post the updated patch shortly.
>
I'll review this ASAP.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-04-26 9:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <201204231202.55739.b.zolnierkie@samsung.com>
2012-04-23 14:56 ` [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks Mel Gorman
2012-04-24 12:03 ` Bartlomiej Zolnierkiewicz
2012-04-26 9:06 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).