From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx142.postini.com [74.125.245.142]) by kanga.kvack.org (Postfix) with SMTP id BF86D6B005A for ; Tue, 14 Aug 2012 04:55:08 -0400 (EDT) From: Minchan Kim Subject: [RFC 0/2] Reduce alloc_contig_range latency Date: Tue, 14 Aug 2012 17:57:05 +0900 Message-Id: <1344934627-8473-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim Hi All, I played with CMA's core function alloc_contig_range and found it's very very slow so I suspect we can use it in real practice. I tested it with a bit tweak for working CMA in x86 on qemu. Test environment is following as. 1. x86_64 machince, 2G RAM, 4 core, movable zone 40M with try alloc_contig_range(movable_zone->middle_pfn, movable_zone->middle_pfn + 10M) per 5sec until background stress test program is terminated. 2. There is background stress program which can make lots of clean cache page. It mimics movie player. alloc_contig_range's latency unit: usec before: min 204000 max 8156000 mean 3109310.34482759 success count 58 after: min 8000 max 112000 mean 45788.2352941177 success count 85 So this patch reduces 8 sec as worst case, 3 sec as mean case. I'm off from now on until the day of tomorrow so please understand if I can't reply instantly. Minchan Kim (2): cma: remove __reclaim_pages cma: support MIGRATE_DISCARD include/linux/migrate_mode.h | 11 +++++-- include/linux/mm.h | 2 +- include/linux/mmzone.h | 9 ------ mm/compaction.c | 2 +- mm/migrate.c | 50 +++++++++++++++++++++++------ mm/page_alloc.c | 73 +++++------------------------------------- 6 files changed, 58 insertions(+), 89 deletions(-) -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx142.postini.com [74.125.245.142]) by kanga.kvack.org (Postfix) with SMTP id A45196B0068 for ; Tue, 14 Aug 2012 04:55:12 -0400 (EDT) From: Minchan Kim Subject: [RFC 1/2] cma: remove __reclaim_pages Date: Tue, 14 Aug 2012 17:57:06 +0900 Message-Id: <1344934627-8473-2-git-send-email-minchan@kernel.org> In-Reply-To: <1344934627-8473-1-git-send-email-minchan@kernel.org> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim Now cma reclaims too many pages by __reclaim_pages which says following as * Reclaim enough pages to make sure that contiguous allocation * will not starve the system. Starve? What does it starve the system? The function which allocate free page for migration target would wake up kswapd and do direct reclaim if needed during migration so system doesn't starve. Let remove __reclaim_pages and related function and fields. I modified split_free_page slightly because I removed __reclaim_pages, isolate_freepages_range can fail by split_free_page's watermark check. It's very critical in CMA because it ends up failing alloc_contig_range. I think we don't need the check in case of CMA because CMA allocates free pages by alloc_pages, not isolate_freepages_block in migrate_pages so watermark is already checked in alloc_pages. If the condition isn't meet, kswapd/direct-reclaim could reclaim pages. There is no reason to check it in split_free_page. Signed-off-by: Minchan Kim --- include/linux/mm.h | 2 +- include/linux/mmzone.h | 9 ------ mm/compaction.c | 2 +- mm/page_alloc.c | 71 +++++------------------------------------------- 4 files changed, 9 insertions(+), 75 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0514fe9..fd042ae 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -441,7 +441,7 @@ void put_page(struct page *page); void put_pages_list(struct list_head *pages); void split_page(struct page *page, unsigned int order); -int split_free_page(struct page *page); +int split_free_page(struct page *page, bool check_wmark); /* * Compound pages have a destructor function. Provide a diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 2daa54f..ca034a1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -63,10 +63,8 @@ enum { #ifdef CONFIG_CMA # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) -# define cma_wmark_pages(zone) zone->min_cma_pages #else # define is_migrate_cma(migratetype) false -# define cma_wmark_pages(zone) 0 #endif #define for_each_migratetype_order(order, type) \ @@ -376,13 +374,6 @@ struct zone { /* see spanned/present_pages for more description */ seqlock_t span_seqlock; #endif -#ifdef CONFIG_CMA - /* - * CMA needs to increase watermark levels during the allocation - * process to make sure that the system is not starved. - */ - unsigned long min_cma_pages; -#endif struct free_area free_area[MAX_ORDER]; #ifndef CONFIG_SPARSEMEM diff --git a/mm/compaction.c b/mm/compaction.c index e78cb96..8afa6dc 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -85,7 +85,7 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn, } /* Found a free page, break it into order-0 pages */ - isolated = split_free_page(page); + isolated = split_free_page(page, !strict); if (!isolated && strict) return 0; total_isolated += isolated; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cefac39..d8540eb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1388,7 +1388,7 @@ void split_page(struct page *page, unsigned int order) * Note: this is probably too low level an operation for use in drivers. * Please consult with lkml before using this in your driver. */ -int split_free_page(struct page *page) +int split_free_page(struct page *page, bool check_wmark) { unsigned int order; unsigned long watermark; @@ -1399,10 +1399,12 @@ int split_free_page(struct page *page) zone = page_zone(page); order = page_order(page); - /* Obey watermarks as if the page was being allocated */ - watermark = low_wmark_pages(zone) + (1 << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) - return 0; + if (check_wmark) { + /* Obey watermarks as if the page was being allocated */ + watermark = low_wmark_pages(zone) + (1 << order); + if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) + return 0; + } /* Remove page from free list */ list_del(&page->lru); @@ -5116,10 +5118,6 @@ static void __setup_per_zone_wmarks(void) zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2); zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1); - zone->watermark[WMARK_MIN] += cma_wmark_pages(zone); - zone->watermark[WMARK_LOW] += cma_wmark_pages(zone); - zone->watermark[WMARK_HIGH] += cma_wmark_pages(zone); - setup_zone_migrate_reserve(zone); spin_unlock_irqrestore(&zone->lock, flags); } @@ -5671,54 +5669,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) return ret > 0 ? 0 : ret; } -/* - * Update zone's cma pages counter used for watermark level calculation. - */ -static inline void __update_cma_watermarks(struct zone *zone, int count) -{ - unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); - zone->min_cma_pages += count; - spin_unlock_irqrestore(&zone->lock, flags); - setup_per_zone_wmarks(); -} - -/* - * Trigger memory pressure bump to reclaim some pages in order to be able to - * allocate 'count' pages in single page units. Does similar work as - *__alloc_pages_slowpath() function. - */ -static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count) -{ - enum zone_type high_zoneidx = gfp_zone(gfp_mask); - struct zonelist *zonelist = node_zonelist(0, gfp_mask); - int did_some_progress = 0; - int order = 1; - - /* - * Increase level of watermarks to force kswapd do his job - * to stabilise at new watermark level. - */ - __update_cma_watermarks(zone, count); - - /* Obey watermarks as if the page was being allocated */ - while (!zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0)) { - wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone)); - - did_some_progress = __perform_reclaim(gfp_mask, order, zonelist, - NULL); - if (!did_some_progress) { - /* Exhausted what can be done so it's blamo time */ - out_of_memory(zonelist, gfp_mask, order, NULL, false); - } - } - - /* Restore original watermark levels. */ - __update_cma_watermarks(zone, -count); - - return count; -} - /** * alloc_contig_range() -- tries to allocate given range of pages * @start: start PFN to allocate @@ -5742,7 +5692,6 @@ static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count) int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype) { - struct zone *zone = page_zone(pfn_to_page(start)); unsigned long outer_start, outer_end; int ret = 0, order; @@ -5817,12 +5766,6 @@ int alloc_contig_range(unsigned long start, unsigned long end, goto done; } - /* - * Reclaim enough pages to make sure that contiguous allocation - * will not starve the system. - */ - __reclaim_pages(zone, GFP_HIGHUSER_MOVABLE, end-start); - /* Grab isolated pages from freelists. */ outer_end = isolate_freepages_range(outer_start, end); if (!outer_end) { -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx142.postini.com [74.125.245.142]) by kanga.kvack.org (Postfix) with SMTP id 772036B005D for ; Tue, 14 Aug 2012 04:55:14 -0400 (EDT) From: Minchan Kim Subject: [RFC 2/2] cma: support MIGRATE_DISCARD Date: Tue, 14 Aug 2012 17:57:07 +0900 Message-Id: <1344934627-8473-3-git-send-email-minchan@kernel.org> In-Reply-To: <1344934627-8473-1-git-send-email-minchan@kernel.org> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim This patch introudes MIGRATE_DISCARD mode in migration. It drop clean cache pages instead of migration so that migration latency could be reduced. Of course, it could evict code pages but latency of big contiguous memory is more important than some background application's slow down in mobile embedded enviroment. Signed-off-by: Minchan Kim --- include/linux/migrate_mode.h | 11 +++++++--- mm/migrate.c | 50 +++++++++++++++++++++++++++++++++--------- mm/page_alloc.c | 2 +- 3 files changed, 49 insertions(+), 14 deletions(-) diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h index ebf3d89..04ca19c 100644 --- a/include/linux/migrate_mode.h +++ b/include/linux/migrate_mode.h @@ -6,11 +6,16 @@ * on most operations but not ->writepage as the potential stall time * is too significant * MIGRATE_SYNC will block when migrating pages + * MIGRTATE_DISCARD will discard clean cache page instead of migration + * + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used + * together as OR flag. */ enum migrate_mode { - MIGRATE_ASYNC, - MIGRATE_SYNC_LIGHT, - MIGRATE_SYNC, + MIGRATE_ASYNC = 1 << 0, + MIGRATE_SYNC_LIGHT = 1 << 1, + MIGRATE_SYNC = 1 << 2, + MIGRATE_DISCARD = 1 << 3, }; #endif /* MIGRATE_MODE_H_INCLUDED */ diff --git a/mm/migrate.c b/mm/migrate.c index 77ed2d7..8119a59 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -225,7 +225,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, struct buffer_head *bh = head; /* Simple case, sync compaction */ - if (mode != MIGRATE_ASYNC) { + if (!(mode & MIGRATE_ASYNC)) { do { get_bh(bh); lock_buffer(bh); @@ -313,7 +313,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, * the mapping back due to an elevated page count, we would have to * block waiting on other references to be dropped. */ - if (mode == MIGRATE_ASYNC && head && + if (mode & MIGRATE_ASYNC && head && !buffer_migrate_lock_buffers(head, mode)) { page_unfreeze_refs(page, expected_count); spin_unlock_irq(&mapping->tree_lock); @@ -521,7 +521,7 @@ int buffer_migrate_page(struct address_space *mapping, * with an IRQ-safe spinlock held. In the sync case, the buffers * need to be locked now */ - if (mode != MIGRATE_ASYNC) + if (!(mode & MIGRATE_ASYNC)) BUG_ON(!buffer_migrate_lock_buffers(head, mode)); ClearPagePrivate(page); @@ -603,7 +603,7 @@ static int fallback_migrate_page(struct address_space *mapping, { if (PageDirty(page)) { /* Only writeback pages in full synchronous migration */ - if (mode != MIGRATE_SYNC) + if (!(mode & MIGRATE_SYNC)) return -EBUSY; return writeout(mapping, page); } @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage, int remap_swapcache = 1; struct mem_cgroup *mem; struct anon_vma *anon_vma = NULL; + enum ttu_flags ttu_flags; + bool discard_mode = false; + bool file = false; if (!trylock_page(page)) { - if (!force || mode == MIGRATE_ASYNC) + if (!force || mode & MIGRATE_ASYNC) goto out; /* @@ -733,7 +736,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * the retry loop is too short and in the sync-light case, * the overhead of stalling is too much */ - if (mode != MIGRATE_SYNC) { + if (!(mode & MIGRATE_SYNC)) { rc = -EBUSY; goto uncharge; } @@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, goto skip_unmap; } + file = page_is_file_cache(page); + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; + + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) + ttu_flags |= TTU_MIGRATION; + else + discard_mode = true; + /* Establish migration ptes or remove ptes */ - try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); + try_to_unmap(page, ttu_flags); skip_unmap: - if (!page_mapped(page)) - rc = move_to_new_page(newpage, page, remap_swapcache, mode); + if (!page_mapped(page)) { + if (!discard_mode) + rc = move_to_new_page(newpage, page, remap_swapcache, mode); + else { + struct address_space *mapping; + mapping = page_mapping(page); + + if (page_has_private(page)) { + if (!try_to_release_page(page, GFP_KERNEL)) { + rc = -EAGAIN; + goto uncharge; + } + } + + if (remove_mapping(mapping, page)) + rc = 0; + else + rc = -EAGAIN; + goto uncharge; + } + } if (rc && remap_swapcache) remove_migration_ptes(page, page); @@ -907,7 +937,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, rc = -EAGAIN; if (!trylock_page(hpage)) { - if (!force || mode != MIGRATE_SYNC) + if (!force || !(mode & MIGRATE_SYNC)) goto out; lock_page(hpage); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d8540eb..58ea96d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5662,7 +5662,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) ret = migrate_pages(&cc.migratepages, __alloc_contig_migrate_alloc, - 0, false, MIGRATE_SYNC); + 0, false, MIGRATE_SYNC|MIGRATE_DISCARD); } putback_lru_pages(&cc.migratepages); -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx106.postini.com [74.125.245.106]) by kanga.kvack.org (Postfix) with SMTP id 5AFF16B0044 for ; Tue, 14 Aug 2012 10:20:05 -0400 (EDT) Received: by eaaf11 with SMTP id f11so190837eaa.14 for ; Tue, 14 Aug 2012 07:20:03 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD In-Reply-To: <1344934627-8473-3-git-send-email-minchan@kernel.org> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> Date: Tue, 14 Aug 2012 16:19:55 +0200 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim , Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Minchan Kim writes: > This patch introudes MIGRATE_DISCARD mode in migration. > It drop clean cache pages instead of migration so that > migration latency could be reduced. Of course, it could > evict code pages but latency of big contiguous memory > is more important than some background application's slow down > in mobile embedded enviroment. > > Signed-off-by: Minchan Kim This looks good to me. > --- > include/linux/migrate_mode.h | 11 +++++++--- > mm/migrate.c | 50 +++++++++++++++++++++++++++++++++---= ------ > mm/page_alloc.c | 2 +- > 3 files changed, 49 insertions(+), 14 deletions(-) > > diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h > index ebf3d89..04ca19c 100644 > --- a/include/linux/migrate_mode.h > +++ b/include/linux/migrate_mode.h > @@ -6,11 +6,16 @@ > * on most operations but not ->writepage as the potential stall time > * is too significant > * MIGRATE_SYNC will block when migrating pages > + * MIGRTATE_DISCARD will discard clean cache page instead of migration > + * > + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used > + * together as OR flag. > */ > enum migrate_mode { > - MIGRATE_ASYNC, > - MIGRATE_SYNC_LIGHT, > - MIGRATE_SYNC, > + MIGRATE_ASYNC =3D 1 << 0, > + MIGRATE_SYNC_LIGHT =3D 1 << 1, > + MIGRATE_SYNC =3D 1 << 2, > + MIGRATE_DISCARD =3D 1 << 3, > }; Since CMA is the only user of MIGRATE_DISCARD it may be worth it to guard it inside an #ifdef, eg: #ifdef CONFIG_CMA MIGRATE_DISCARD =3D 1 << 3, #define is_migrate_discard(mode) (((mode) & MIGRATE_DISCARD) =3D=3D MIGRATE= _DISCARD) #endif =20=20 > #endif /* MIGRATE_MODE_H_INCLUDED */ > diff --git a/mm/migrate.c b/mm/migrate.c > index 77ed2d7..8119a59 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struc= t page *newpage, > int remap_swapcache =3D 1; > struct mem_cgroup *mem; > struct anon_vma *anon_vma =3D NULL; > + enum ttu_flags ttu_flags; > + bool discard_mode =3D false; > + bool file =3D false; >=20=20 > if (!trylock_page(page)) { > - if (!force || mode =3D=3D MIGRATE_ASYNC) > + if (!force || mode & MIGRATE_ASYNC) + if (!force || (mode & MIGRATE_ASYNC)) > goto out; >=20=20 > /* --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------------------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAEBAgAGBQJQKl6LAAoJECBgQBJQdR/0goYP/0/hPtydiXAf13PcPR+dWLrx UhGN2//ZkV8KnA11/q/1fLJ7mIuUoR84gGi9PPkQUOdos5iOxoVj9yMB+UWlok68 qwrsOeBwJZLcYzpeLP3UAyuElxUBWCs4A/zvRXjA9Sb8Bnk0p7W/ONqfIjeYQsyN phpxS8dEZkwLff4QOaQpHNYopTxDyBfjzbL2F0vlnXfP0NuLVEYk6BDE1d1oCXWt 7t61/34i3e5fNYGCybgxcmMWG32vzg6i1a1MTt4kXFVg+H8PtC8iM2u+/o0Ma2oW 76le3gW1P3zGFreBABKYViyu3MUEdS5yNzjspvCM7d20nTLlxKN7ZnqZF2Pmldwb VU/eCmWTNbA3aCH+2zjbuV1M08CS4TmQcQa+T2Txp6PctRM6zfoM1v/kxdTgHKIW PdtXiYN1yryjHGb6RDt34W9VGhEmiJ4wOtVb1taDHODM/VzrWe8YRIqvyqDjqTD8 PoyeoP4lqA71KZelHJh5KY7YXw0owROkesoADZcxWJ01D4jRo5m6G+ZkyfiAwVgj IE3g4T5wcG4BxVPGazIni3HqoPX0HbSdbb7SnI4MHoSK1LypBGmuKsaoAqSb3o28 8qXxNNfPO2Cj6TNxyJDAkBnUXGL5b2nQYF1FR2Ha4t79ljqsMAX+54JqkOErWMtE 1MeBYF5Gv4LtAgQONM6c =v1Nm -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx116.postini.com [74.125.245.116]) by kanga.kvack.org (Postfix) with SMTP id 23D856B006E for ; Wed, 15 Aug 2012 14:51:52 -0400 (EDT) Message-ID: <502BEF53.80200@redhat.com> Date: Wed, 15 Aug 2012 14:49:55 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: [RFC 1/2] cma: remove __reclaim_pages References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> In-Reply-To: <1344934627-8473-2-git-send-email-minchan@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 08/14/2012 04:57 AM, Minchan Kim wrote: > Now cma reclaims too many pages by __reclaim_pages which says > following as > > * Reclaim enough pages to make sure that contiguous allocation > * will not starve the system. > > Starve? What does it starve the system? The function which allocate > free page for migration target would wake up kswapd and do direct reclaim > if needed during migration so system doesn't starve. > > Let remove __reclaim_pages and related function and fields. Fair enough. > Signed-off-by: Minchan Kim Acked-by: Rik van Riel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx148.postini.com [74.125.245.148]) by kanga.kvack.org (Postfix) with SMTP id 656986B006C for ; Wed, 15 Aug 2012 14:59:58 -0400 (EDT) Message-ID: <502BF139.3040403@redhat.com> Date: Wed, 15 Aug 2012 14:58:01 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> In-Reply-To: <1344934627-8473-3-git-send-email-minchan@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 08/14/2012 04:57 AM, Minchan Kim wrote: > This patch introudes MIGRATE_DISCARD mode in migration. > It drop clean cache pages instead of migration so that > migration latency could be reduced. Of course, it could > evict code pages but latency of big contiguous memory > is more important than some background application's slow down > in mobile embedded enviroment. Would it be an idea to only drop clean UNMAPPED page cache pages? > Signed-off-by: Minchan Kim > @@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > goto skip_unmap; > } > > + file = page_is_file_cache(page); > + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > + > + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) > + ttu_flags |= TTU_MIGRATION; > + else > + discard_mode = true; > + > /* Establish migration ptes or remove ptes */ > - try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); > + try_to_unmap(page, ttu_flags); This bit looks wrong, because you end up ignoring mlock and then discarding the page. Only dropping clean page cache pages that are not mapped would avoid that problem, without introducing much complexity in the code. That would turn the test above into: if (!page_mapped(page)) discard_mode = true; > skip_unmap: > - if (!page_mapped(page)) > - rc = move_to_new_page(newpage, page, remap_swapcache, mode); > + if (!page_mapped(page)) { > + if (!discard_mode) > + rc = move_to_new_page(newpage, page, remap_swapcache, mode); > + else { > + struct address_space *mapping; > + mapping = page_mapping(page); > + > + if (page_has_private(page)) { > + if (!try_to_release_page(page, GFP_KERNEL)) { > + rc = -EAGAIN; > + goto uncharge; > + } > + } > + > + if (remove_mapping(mapping, page)) > + rc = 0; > + else > + rc = -EAGAIN; > + goto uncharge; > + } > + } This big piece of code could probably be split out into its own function. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id 462386B005D for ; Wed, 15 Aug 2012 19:18:17 -0400 (EDT) Date: Thu, 16 Aug 2012 08:20:23 +0900 From: Minchan Kim Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD Message-ID: <20120815232023.GA15225@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Michal Nazarewicz Cc: Marek Szyprowski , Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Hi Michal, On Tue, Aug 14, 2012 at 04:19:55PM +0200, Michal Nazarewicz wrote: > Minchan Kim writes: > > This patch introudes MIGRATE_DISCARD mode in migration. > > It drop clean cache pages instead of migration so that > > migration latency could be reduced. Of course, it could > > evict code pages but latency of big contiguous memory > > is more important than some background application's slow down > > in mobile embedded enviroment. > > > > Signed-off-by: Minchan Kim > > This looks good to me. > > > --- > > include/linux/migrate_mode.h | 11 +++++++--- > > mm/migrate.c | 50 +++++++++++++++++++++++++++++++++--------- > > mm/page_alloc.c | 2 +- > > 3 files changed, 49 insertions(+), 14 deletions(-) > > > > diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h > > index ebf3d89..04ca19c 100644 > > --- a/include/linux/migrate_mode.h > > +++ b/include/linux/migrate_mode.h > > @@ -6,11 +6,16 @@ > > * on most operations but not ->writepage as the potential stall time > > * is too significant > > * MIGRATE_SYNC will block when migrating pages > > + * MIGRTATE_DISCARD will discard clean cache page instead of migration > > + * > > + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used > > + * together as OR flag. > > */ > > enum migrate_mode { > > - MIGRATE_ASYNC, > > - MIGRATE_SYNC_LIGHT, > > - MIGRATE_SYNC, > > + MIGRATE_ASYNC = 1 << 0, > > + MIGRATE_SYNC_LIGHT = 1 << 1, > > + MIGRATE_SYNC = 1 << 2, > > + MIGRATE_DISCARD = 1 << 3, > > }; > > Since CMA is the only user of MIGRATE_DISCARD it may be worth it to > guard it inside an #ifdef, eg: > > #ifdef CONFIG_CMA > MIGRATE_DISCARD = 1 << 3, > #define is_migrate_discard(mode) (((mode) & MIGRATE_DISCARD) == MIGRATE_DISCARD) The mode bit can be used with other bits like MIGRATE_SYNC|MIGRATE_DISCARD. So it is correct that (mode & MIGRATE_DISCARD). Anyway, I don't want to fold it into only CMA because I think we can have a pontential users in mm. For example, memory-hotplug case. No enough free memory in the system but lots of page cache page as a migration source, then we can remove page cache page instead of migration and it might be better than failing memory-hotremove. In summary, I want to open it for potential usecases in future if anyone doesn't oppose strongly. > #endif > > > > #endif /* MIGRATE_MODE_H_INCLUDED */ > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 77ed2d7..8119a59 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > > int remap_swapcache = 1; > > struct mem_cgroup *mem; > > struct anon_vma *anon_vma = NULL; > > + enum ttu_flags ttu_flags; > > + bool discard_mode = false; > > + bool file = false; > > > > if (!trylock_page(page)) { > > - if (!force || mode == MIGRATE_ASYNC) > > + if (!force || mode & MIGRATE_ASYNC) It's not wrong technically but for readability, NP. > > + if (!force || (mode & MIGRATE_ASYNC)) > > > goto out; > > > > /* > > > -- > Best regards, _ _ > .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o > ..o | Computer Science, MichaA? a??mina86a?? Nazarewicz (o o) > ooo +------------------ooO--(_)--Ooo-- -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx102.postini.com [74.125.245.102]) by kanga.kvack.org (Postfix) with SMTP id AEE626B005D for ; Wed, 15 Aug 2012 19:31:16 -0400 (EDT) Date: Thu, 16 Aug 2012 08:33:23 +0900 From: Minchan Kim Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD Message-ID: <20120815233323.GB15225@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> <502BF139.3040403@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <502BF139.3040403@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Hi Rik, On Wed, Aug 15, 2012 at 02:58:01PM -0400, Rik van Riel wrote: > On 08/14/2012 04:57 AM, Minchan Kim wrote: > >This patch introudes MIGRATE_DISCARD mode in migration. > >It drop clean cache pages instead of migration so that > >migration latency could be reduced. Of course, it could > >evict code pages but latency of big contiguous memory > >is more important than some background application's slow down > >in mobile embedded enviroment. > > Would it be an idea to only drop clean UNMAPPED > page cache pages? Firstly I thougt about that but I chose more agressive thing. Namely, even drop mapped page cache. Because it can reduce latency more(ex, memcpy + remapping cost during migration) and it could not trivial if migration range is big. > > >Signed-off-by: Minchan Kim > > >@@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > > goto skip_unmap; > > } > > > >+ file = page_is_file_cache(page); > >+ ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > >+ > >+ if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) > >+ ttu_flags |= TTU_MIGRATION; > >+ else > >+ discard_mode = true; > >+ > > /* Establish migration ptes or remove ptes */ > >- try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); > >+ try_to_unmap(page, ttu_flags); > > This bit looks wrong, because you end up ignoring > mlock and then discarding the page. Argh, Thanks! I will fix it in next spin. > > Only dropping clean page cache pages that are not > mapped would avoid that problem, without introducing > much complexity in the code. Hmm, I don't think it makes code much complex. How about this? diff --git a/mm/rmap.c b/mm/rmap.c index 0f3b7cd..0909d79 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1223,7 +1223,8 @@ out: * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file. */ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, - unsigned long address, enum ttu_flags flags) + unsigned long address, enum ttu_flags flags, + unsigned long *vm_flags) { struct mm_struct *mm = vma->vm_mm; pte_t *pte; @@ -1235,6 +1236,7 @@ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, if (!pte) goto out; + vm_flags |= vma->vm_flags; /* * If the page is mlock()d, we cannot swap it out. * If it's recently referenced (perhaps page_referenced @@ -1652,7 +1654,7 @@ out: * SWAP_FAIL - the page is unswappable * SWAP_MLOCK - page is mlocked. */ -int try_to_unmap(struct page *page, enum ttu_flags flags) +int try_to_unmap(struct page *page, enum ttu_flags flags, unsigned long *vm_flags) { int ret; + file = page_is_file_cache(page); + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; + + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page) || + vm_flags & VM_LOCKED) + ttu_flags |= TTU_MIGRATION; + else + discard_mode = true; + > > That would turn the test above into: > > if (!page_mapped(page)) > discard_mode = true; > > > skip_unmap: > >- if (!page_mapped(page)) > >- rc = move_to_new_page(newpage, page, remap_swapcache, mode); > >+ if (!page_mapped(page)) { > >+ if (!discard_mode) > >+ rc = move_to_new_page(newpage, page, remap_swapcache, mode); > >+ else { > >+ struct address_space *mapping; > >+ mapping = page_mapping(page); > >+ > >+ if (page_has_private(page)) { > >+ if (!try_to_release_page(page, GFP_KERNEL)) { > >+ rc = -EAGAIN; > >+ goto uncharge; > >+ } > >+ } > >+ > >+ if (remove_mapping(mapping, page)) > >+ rc = 0; > >+ else > >+ rc = -EAGAIN; > >+ goto uncharge; > >+ } > >+ } > > This big piece of code could probably be split out > into its own function. > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx192.postini.com [74.125.245.192]) by kanga.kvack.org (Postfix) with SMTP id EC43B6B005D for ; Wed, 15 Aug 2012 20:13:14 -0400 (EDT) Date: Thu, 16 Aug 2012 09:15:17 +0900 From: Minchan Kim Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD Message-ID: <20120816001517.GC15225@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> <502BF139.3040403@redhat.com> <20120815233323.GB15225@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120815233323.GB15225@bbox> Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Thu, Aug 16, 2012 at 08:33:23AM +0900, Minchan Kim wrote: > Hi Rik, > > On Wed, Aug 15, 2012 at 02:58:01PM -0400, Rik van Riel wrote: > > On 08/14/2012 04:57 AM, Minchan Kim wrote: > > >This patch introudes MIGRATE_DISCARD mode in migration. > > >It drop clean cache pages instead of migration so that > > >migration latency could be reduced. Of course, it could > > >evict code pages but latency of big contiguous memory > > >is more important than some background application's slow down > > >in mobile embedded enviroment. > > > > Would it be an idea to only drop clean UNMAPPED > > page cache pages? > > Firstly I thougt about that but I chose more agressive thing. > Namely, even drop mapped page cache. > Because it can reduce latency more(ex, memcpy + remapping cost > during migration) and it could not trivial if migration range is big. > > > > > >Signed-off-by: Minchan Kim > > > > >@@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > > > goto skip_unmap; > > > } > > > > > >+ file = page_is_file_cache(page); > > >+ ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > > >+ > > >+ if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) > > >+ ttu_flags |= TTU_MIGRATION; > > >+ else > > >+ discard_mode = true; > > >+ > > > /* Establish migration ptes or remove ptes */ > > >- try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); > > >+ try_to_unmap(page, ttu_flags); > > > > This bit looks wrong, because you end up ignoring > > mlock and then discarding the page. > > Argh, Thanks! > I will fix it in next spin. > > > > > Only dropping clean page cache pages that are not > > mapped would avoid that problem, without introducing > > much complexity in the code. > > Hmm, I don't think it makes code much complex. > How about this? > > diff --git a/mm/rmap.c b/mm/rmap.c > index 0f3b7cd..0909d79 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1223,7 +1223,8 @@ out: > * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file. > */ > int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > - unsigned long address, enum ttu_flags flags) > + unsigned long address, enum ttu_flags flags, > + unsigned long *vm_flags) > { > struct mm_struct *mm = vma->vm_mm; > pte_t *pte; > @@ -1235,6 +1236,7 @@ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > if (!pte) > goto out; > > + vm_flags |= vma->vm_flags; > /* > * If the page is mlock()d, we cannot swap it out. > * If it's recently referenced (perhaps page_referenced > @@ -1652,7 +1654,7 @@ out: > * SWAP_FAIL - the page is unswappable > * SWAP_MLOCK - page is mlocked. > */ > -int try_to_unmap(struct page *page, enum ttu_flags flags) > +int try_to_unmap(struct page *page, enum ttu_flags flags, unsigned long *vm_flags) > { > int ret; > > > > + file = page_is_file_cache(page); > + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > + > + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page) || > + vm_flags & VM_LOCKED) We do try_to_unmap after this piece so we can't get the information in advance. :( I don't have better idea which doesn't have a drawback so I will accept your idea. Thanks, Rik. > + ttu_flags |= TTU_MIGRATION; > + else > + discard_mode = true; > + > > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx121.postini.com [74.125.245.121]) by kanga.kvack.org (Postfix) with SMTP id A53176B005D for ; Thu, 16 Aug 2012 09:17:49 -0400 (EDT) Received: by eeke49 with SMTP id e49so880718eek.14 for ; Thu, 16 Aug 2012 06:17:48 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD In-Reply-To: <20120815232023.GA15225@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> <20120815232023.GA15225@bbox> Date: Thu, 16 Aug 2012 15:17:40 +0200 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Marek Szyprowski , Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > On Tue, Aug 14, 2012 at 04:19:55PM +0200, Michal Nazarewicz wrote: >> Since CMA is the only user of MIGRATE_DISCARD it may be worth it to >> guard it inside an #ifdef, eg: Minchan Kim writes: > In summary, I want to open it for potential usecases in future if anyone > doesn't oppose strongly. Fair enough. >>> if (!trylock_page(page)) { >>> - if (!force || mode =3D=3D MIGRATE_ASYNC) >>> + if (!force || mode & MIGRATE_ASYNC) > It's not wrong technically but for readability, NP. Yep, that was my point, thanks. :) --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------------------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAEBAgAGBQJQLPL0AAoJECBgQBJQdR/0GeEP/iRcc7bd6j8GvshNvmU8amgt Hh0+A/Gb6jygt0ieRQDZcb/qC/ODYLuHQ4NMmns1mXbpswItIvmgBg1pQoblzZo0 3UOsST2EzJm5UtFaLzw4vqoXtUwvI5pQUx+qRtS3ROvZqQeCreRn5au2hWPMhTlY 0bAgFED9KX1rihMp+gn8ega96Lon/0Sdnc+yfQgMFfRl9kXGRauVUrnREO2nMj9o kg8NM/syC/I/t3D7sL8ifq5mbe2bIOZMCV0o0vbMrpEs1bQaIsOsD5F8+4XG98kH 6vy5oAwqRNnFWjcR5QdG5dp+gcr/OArewDsRX5I49iIB8IuhkJvnfWQtbVVZy98B ozNdmI5PANN6Kzs7mm0prtHsoiNvY/ODf3UatAK/w5ru+UJW7+SnC0e2pYmjlGhR Qsrhio5x5geFjSV6iszA54rl313D9mzVSmNONgtKDj06ATAwRpMRErlh6huK32Zu G5XVIoIXaljlPpOBtT59sEUpmt0FXiX5dCmInIPJkZdXuf4vnibyQlgO126zzoMU /c7Bjw7RwGgAK3euKJVd69Ro/GS1PIzmfF+Q9+Dk3B/ZyTCEkLeNWxfSf912si4o BOK4KrM5kjn7gZLYPWxBJ38sU43Oonix6gsaXG8GGCFRhtxQSjayszyH5lDdWl7f YhSR0kdZ7FICbt9E2kuI =mflS -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id CD7756B005D for ; Thu, 16 Aug 2012 09:58:24 -0400 (EDT) Date: Thu, 16 Aug 2012 14:58:18 +0100 From: Mel Gorman Subject: Re: [RFC 1/2] cma: remove __reclaim_pages Message-ID: <20120816135817.GS4177@suse.de> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1344934627-8473-2-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Marek Szyprowski , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote: > Now cma reclaims too many pages by __reclaim_pages which says > following as > > * Reclaim enough pages to make sure that contiguous allocation > * will not starve the system. > > Starve? What does it starve the system? The function which allocate > free page for migration target would wake up kswapd and do direct reclaim > if needed during migration so system doesn't starve. > I thought this patch was overkill at the time it was introduced but didn't have a concrete reason to reject it when I commented on it https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed up with "contiguous allocations should have higher priority than others" which I took to mean that he was also ok with excessive reclaim. > Let remove __reclaim_pages and related function and fields. > That should be one patch and I don't object to it being removed as such but it's Marek's call. > I modified split_free_page slightly because I removed __reclaim_pages, > isolate_freepages_range can fail by split_free_page's watermark check. > It's very critical in CMA because it ends up failing alloc_contig_range. > This is a big change and should have been in a patch on its own. split_free_page checks watermarks because if the watermarks are not obeyed a zone can become fully allocated. This can cause a system to livelock under certain circumstances if a page cannot be allocated and a free page is required before other pages can be freed. > I think we don't need the check in case of CMA because CMA allocates > free pages by alloc_pages, not isolate_freepages_block in migrate_pages > so watermark is already checked in alloc_pages. It uses alloc_pages when migrating pages out of the CMA area but note that it uses isolate_freepages_block when allocating the CMA buffer when alloc_contig_range calls isolate_freepages_range isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) { for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { isolated = isolate_freepages_block(pfn, block_end_pfn, &freelist, true); } map_pages(&freelist); } so the actual CMA allocation itself is not using alloc_pages. By removing the watermark check you allow the CMA to breach watermarks and puts the system at risk of livelock. I'm not keen on the split_free_page() change at all. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx196.postini.com [74.125.245.196]) by kanga.kvack.org (Postfix) with SMTP id 4B2376B005D for ; Thu, 16 Aug 2012 21:05:49 -0400 (EDT) Date: Fri, 17 Aug 2012 10:05:47 +0900 From: Minchan Kim Subject: Re: [RFC 1/2] cma: remove __reclaim_pages Message-ID: <20120817010547.GA3061@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> <20120816135817.GS4177@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120816135817.GS4177@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Marek Szyprowski , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Hi Mel, On Thu, Aug 16, 2012 at 02:58:18PM +0100, Mel Gorman wrote: > On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote: > > Now cma reclaims too many pages by __reclaim_pages which says > > following as > > > > * Reclaim enough pages to make sure that contiguous allocation > > * will not starve the system. > > > > Starve? What does it starve the system? The function which allocate > > free page for migration target would wake up kswapd and do direct reclaim > > if needed during migration so system doesn't starve. > > > > I thought this patch was overkill at the time it was introduced but > didn't have a concrete reason to reject it when I commented on it > https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed > up with "contiguous allocations should have higher priority than others" > which I took to mean that he was also ok with excessive reclaim. I think OOM kill to background applications is more appropriate than big latency of foreground(ex, Camera app) application in your mobile phone. In other words, excessive reclaim is *really* bad which elapsed 8sec in my test as worst case. :( > > > Let remove __reclaim_pages and related function and fields. > > > > That should be one patch and I don't object to it being removed as such > but it's Marek's call. Marek. Any thought? > > > I modified split_free_page slightly because I removed __reclaim_pages, > > isolate_freepages_range can fail by split_free_page's watermark check. > > It's very critical in CMA because it ends up failing alloc_contig_range. > > > > This is a big change and should have been in a patch on its > own. split_free_page checks watermarks because if the watermarks are > not obeyed a zone can become fully allocated. This can cause a system to > livelock under certain circumstances if a page cannot be allocated and a > free page is required before other pages can be freed. > > > I think we don't need the check in case of CMA because CMA allocates > > free pages by alloc_pages, not isolate_freepages_block in migrate_pages > > so watermark is already checked in alloc_pages. > > It uses alloc_pages when migrating pages out of the CMA area but note > that it uses isolate_freepages_block when allocating the CMA buffer when > alloc_contig_range calls isolate_freepages_range > > isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) > { > for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { > isolated = isolate_freepages_block(pfn, block_end_pfn, > &freelist, true); > } > map_pages(&freelist); > } > > so the actual CMA allocation itself is not using alloc_pages. By removing > the watermark check you allow the CMA to breach watermarks and puts the > system at risk of livelock. Fair enough. I will look into that. Thanks, Mel. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx136.postini.com [74.125.245.136]) by kanga.kvack.org (Postfix) with SMTP id 303796B005D for ; Fri, 17 Aug 2012 10:48:44 -0400 (EDT) Received: from epcpsbgm2.samsung.com (mailout3.samsung.com [203.254.224.33]) by mailout3.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0M8W002RZMH68V70@mailout3.samsung.com> for linux-mm@kvack.org; Fri, 17 Aug 2012 23:48:42 +0900 (KST) Received: from AMDC159 ([106.116.147.30]) by mmp1.samsung.com (Oracle Communications Messaging Server 7u4-24.01 (7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTPA id <0M8W00BXDMGVVKC0@mmp1.samsung.com> for linux-mm@kvack.org; Fri, 17 Aug 2012 23:48:42 +0900 (KST) From: Marek Szyprowski References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> <20120816135817.GS4177@suse.de> <20120817010547.GA3061@bbox> In-reply-to: <20120817010547.GA3061@bbox> Subject: RE: [RFC 1/2] cma: remove __reclaim_pages Date: Fri, 17 Aug 2012 16:48:30 +0200 Message-id: <024801cd7c87$6125f880$2371e980$%szyprowski@samsung.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit Content-language: pl Sender: owner-linux-mm@kvack.org List-ID: To: 'Minchan Kim' , 'Mel Gorman' Cc: 'Rik van Riel' , 'Kamezawa Hiroyuki' , 'Andrew Morton' , linux-kernel@vger.kernel.org, linux-mm@kvack.org Hi Minchan, On Friday, August 17, 2012 3:06 AM Minchan Kim wrote: > On Thu, Aug 16, 2012 at 02:58:18PM +0100, Mel Gorman wrote: > > On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote: > > > Now cma reclaims too many pages by __reclaim_pages which says > > > following as > > > > > > * Reclaim enough pages to make sure that contiguous allocation > > > * will not starve the system. > > > > > > Starve? What does it starve the system? The function which allocate > > > free page for migration target would wake up kswapd and do direct reclaim > > > if needed during migration so system doesn't starve. > > > > > > > I thought this patch was overkill at the time it was introduced but > > didn't have a concrete reason to reject it when I commented on it > > https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed > > up with "contiguous allocations should have higher priority than others" > > which I took to mean that he was also ok with excessive reclaim. > > I think OOM kill to background applications is more appropriate than > big latency of foreground(ex, Camera app) application in your mobile phone. > In other words, excessive reclaim is *really* bad which elapsed 8sec > in my test as worst case. :( > > > > > > Let remove __reclaim_pages and related function and fields. > > > > > > > That should be one patch and I don't object to it being removed as such > > but it's Marek's call. > > Marek. Any thought? Well, I've introduced this function as a result of solving Mel's request: http://www.spinics.net/lists/linux-mm/msg28485.html Before that I solved it almost the same as in your current patch. I'm aware of the fact that __reclaim_pages() approach might be a little overkill, but I didn't find anything better yet. > > > I modified split_free_page slightly because I removed __reclaim_pages, > > > isolate_freepages_range can fail by split_free_page's watermark check. > > > It's very critical in CMA because it ends up failing alloc_contig_range. > > > > > > > This is a big change and should have been in a patch on its > > own. split_free_page checks watermarks because if the watermarks are > > not obeyed a zone can become fully allocated. This can cause a system to > > livelock under certain circumstances if a page cannot be allocated and a > > free page is required before other pages can be freed. > > > > > I think we don't need the check in case of CMA because CMA allocates > > > free pages by alloc_pages, not isolate_freepages_block in migrate_pages > > > so watermark is already checked in alloc_pages. > > > > It uses alloc_pages when migrating pages out of the CMA area but note > > that it uses isolate_freepages_block when allocating the CMA buffer when > > alloc_contig_range calls isolate_freepages_range > > > > isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) > > { > > for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { > > isolated = isolate_freepages_block(pfn, block_end_pfn, > > &freelist, true); > > } > > map_pages(&freelist); > > } > > > > so the actual CMA allocation itself is not using alloc_pages. By removing > > the watermark check you allow the CMA to breach watermarks and puts the > > system at risk of livelock. > > Fair enough. I will look into that. We will also looks into this issue. Best regards -- Marek Szyprowski Samsung Poland R&D Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755370Ab2HNIzM (ORCPT ); Tue, 14 Aug 2012 04:55:12 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:53530 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754668Ab2HNIzJ (ORCPT ); Tue, 14 Aug 2012 04:55:09 -0400 X-AuditID: 9c930179-b7cc4ae00000134d-7b-502a126b3c69 From: Minchan Kim To: Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim Subject: [RFC 0/2] Reduce alloc_contig_range latency Date: Tue, 14 Aug 2012 17:57:05 +0900 Message-Id: <1344934627-8473-1-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi All, I played with CMA's core function alloc_contig_range and found it's very very slow so I suspect we can use it in real practice. I tested it with a bit tweak for working CMA in x86 on qemu. Test environment is following as. 1. x86_64 machince, 2G RAM, 4 core, movable zone 40M with try alloc_contig_range(movable_zone->middle_pfn, movable_zone->middle_pfn + 10M) per 5sec until background stress test program is terminated. 2. There is background stress program which can make lots of clean cache page. It mimics movie player. alloc_contig_range's latency unit: usec before: min 204000 max 8156000 mean 3109310.34482759 success count 58 after: min 8000 max 112000 mean 45788.2352941177 success count 85 So this patch reduces 8 sec as worst case, 3 sec as mean case. I'm off from now on until the day of tomorrow so please understand if I can't reply instantly. Minchan Kim (2): cma: remove __reclaim_pages cma: support MIGRATE_DISCARD include/linux/migrate_mode.h | 11 +++++-- include/linux/mm.h | 2 +- include/linux/mmzone.h | 9 ------ mm/compaction.c | 2 +- mm/migrate.c | 50 +++++++++++++++++++++++------ mm/page_alloc.c | 73 +++++------------------------------------- 6 files changed, 58 insertions(+), 89 deletions(-) -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755403Ab2HNIzS (ORCPT ); Tue, 14 Aug 2012 04:55:18 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:53530 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754668Ab2HNIzO (ORCPT ); Tue, 14 Aug 2012 04:55:14 -0400 X-AuditID: 9c930179-b7cc4ae00000134d-80-502a126ba1cc From: Minchan Kim To: Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim Subject: [RFC 2/2] cma: support MIGRATE_DISCARD Date: Tue, 14 Aug 2012 17:57:07 +0900 Message-Id: <1344934627-8473-3-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1344934627-8473-1-git-send-email-minchan@kernel.org> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch introudes MIGRATE_DISCARD mode in migration. It drop clean cache pages instead of migration so that migration latency could be reduced. Of course, it could evict code pages but latency of big contiguous memory is more important than some background application's slow down in mobile embedded enviroment. Signed-off-by: Minchan Kim --- include/linux/migrate_mode.h | 11 +++++++--- mm/migrate.c | 50 +++++++++++++++++++++++++++++++++--------- mm/page_alloc.c | 2 +- 3 files changed, 49 insertions(+), 14 deletions(-) diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h index ebf3d89..04ca19c 100644 --- a/include/linux/migrate_mode.h +++ b/include/linux/migrate_mode.h @@ -6,11 +6,16 @@ * on most operations but not ->writepage as the potential stall time * is too significant * MIGRATE_SYNC will block when migrating pages + * MIGRTATE_DISCARD will discard clean cache page instead of migration + * + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used + * together as OR flag. */ enum migrate_mode { - MIGRATE_ASYNC, - MIGRATE_SYNC_LIGHT, - MIGRATE_SYNC, + MIGRATE_ASYNC = 1 << 0, + MIGRATE_SYNC_LIGHT = 1 << 1, + MIGRATE_SYNC = 1 << 2, + MIGRATE_DISCARD = 1 << 3, }; #endif /* MIGRATE_MODE_H_INCLUDED */ diff --git a/mm/migrate.c b/mm/migrate.c index 77ed2d7..8119a59 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -225,7 +225,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, struct buffer_head *bh = head; /* Simple case, sync compaction */ - if (mode != MIGRATE_ASYNC) { + if (!(mode & MIGRATE_ASYNC)) { do { get_bh(bh); lock_buffer(bh); @@ -313,7 +313,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, * the mapping back due to an elevated page count, we would have to * block waiting on other references to be dropped. */ - if (mode == MIGRATE_ASYNC && head && + if (mode & MIGRATE_ASYNC && head && !buffer_migrate_lock_buffers(head, mode)) { page_unfreeze_refs(page, expected_count); spin_unlock_irq(&mapping->tree_lock); @@ -521,7 +521,7 @@ int buffer_migrate_page(struct address_space *mapping, * with an IRQ-safe spinlock held. In the sync case, the buffers * need to be locked now */ - if (mode != MIGRATE_ASYNC) + if (!(mode & MIGRATE_ASYNC)) BUG_ON(!buffer_migrate_lock_buffers(head, mode)); ClearPagePrivate(page); @@ -603,7 +603,7 @@ static int fallback_migrate_page(struct address_space *mapping, { if (PageDirty(page)) { /* Only writeback pages in full synchronous migration */ - if (mode != MIGRATE_SYNC) + if (!(mode & MIGRATE_SYNC)) return -EBUSY; return writeout(mapping, page); } @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage, int remap_swapcache = 1; struct mem_cgroup *mem; struct anon_vma *anon_vma = NULL; + enum ttu_flags ttu_flags; + bool discard_mode = false; + bool file = false; if (!trylock_page(page)) { - if (!force || mode == MIGRATE_ASYNC) + if (!force || mode & MIGRATE_ASYNC) goto out; /* @@ -733,7 +736,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * the retry loop is too short and in the sync-light case, * the overhead of stalling is too much */ - if (mode != MIGRATE_SYNC) { + if (!(mode & MIGRATE_SYNC)) { rc = -EBUSY; goto uncharge; } @@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, goto skip_unmap; } + file = page_is_file_cache(page); + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; + + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) + ttu_flags |= TTU_MIGRATION; + else + discard_mode = true; + /* Establish migration ptes or remove ptes */ - try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); + try_to_unmap(page, ttu_flags); skip_unmap: - if (!page_mapped(page)) - rc = move_to_new_page(newpage, page, remap_swapcache, mode); + if (!page_mapped(page)) { + if (!discard_mode) + rc = move_to_new_page(newpage, page, remap_swapcache, mode); + else { + struct address_space *mapping; + mapping = page_mapping(page); + + if (page_has_private(page)) { + if (!try_to_release_page(page, GFP_KERNEL)) { + rc = -EAGAIN; + goto uncharge; + } + } + + if (remove_mapping(mapping, page)) + rc = 0; + else + rc = -EAGAIN; + goto uncharge; + } + } if (rc && remap_swapcache) remove_migration_ptes(page, page); @@ -907,7 +937,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, rc = -EAGAIN; if (!trylock_page(hpage)) { - if (!force || mode != MIGRATE_SYNC) + if (!force || !(mode & MIGRATE_SYNC)) goto out; lock_page(hpage); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d8540eb..58ea96d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5662,7 +5662,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) ret = migrate_pages(&cc.migratepages, __alloc_contig_migrate_alloc, - 0, false, MIGRATE_SYNC); + 0, false, MIGRATE_SYNC|MIGRATE_DISCARD); } putback_lru_pages(&cc.migratepages); -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755428Ab2HNIzl (ORCPT ); Tue, 14 Aug 2012 04:55:41 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:53530 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755248Ab2HNIzL (ORCPT ); Tue, 14 Aug 2012 04:55:11 -0400 X-AuditID: 9c930179-b7cc4ae00000134d-7f-502a126bf3fd From: Minchan Kim To: Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim Subject: [RFC 1/2] cma: remove __reclaim_pages Date: Tue, 14 Aug 2012 17:57:06 +0900 Message-Id: <1344934627-8473-2-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1344934627-8473-1-git-send-email-minchan@kernel.org> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now cma reclaims too many pages by __reclaim_pages which says following as * Reclaim enough pages to make sure that contiguous allocation * will not starve the system. Starve? What does it starve the system? The function which allocate free page for migration target would wake up kswapd and do direct reclaim if needed during migration so system doesn't starve. Let remove __reclaim_pages and related function and fields. I modified split_free_page slightly because I removed __reclaim_pages, isolate_freepages_range can fail by split_free_page's watermark check. It's very critical in CMA because it ends up failing alloc_contig_range. I think we don't need the check in case of CMA because CMA allocates free pages by alloc_pages, not isolate_freepages_block in migrate_pages so watermark is already checked in alloc_pages. If the condition isn't meet, kswapd/direct-reclaim could reclaim pages. There is no reason to check it in split_free_page. Signed-off-by: Minchan Kim --- include/linux/mm.h | 2 +- include/linux/mmzone.h | 9 ------ mm/compaction.c | 2 +- mm/page_alloc.c | 71 +++++------------------------------------------- 4 files changed, 9 insertions(+), 75 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0514fe9..fd042ae 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -441,7 +441,7 @@ void put_page(struct page *page); void put_pages_list(struct list_head *pages); void split_page(struct page *page, unsigned int order); -int split_free_page(struct page *page); +int split_free_page(struct page *page, bool check_wmark); /* * Compound pages have a destructor function. Provide a diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 2daa54f..ca034a1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -63,10 +63,8 @@ enum { #ifdef CONFIG_CMA # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) -# define cma_wmark_pages(zone) zone->min_cma_pages #else # define is_migrate_cma(migratetype) false -# define cma_wmark_pages(zone) 0 #endif #define for_each_migratetype_order(order, type) \ @@ -376,13 +374,6 @@ struct zone { /* see spanned/present_pages for more description */ seqlock_t span_seqlock; #endif -#ifdef CONFIG_CMA - /* - * CMA needs to increase watermark levels during the allocation - * process to make sure that the system is not starved. - */ - unsigned long min_cma_pages; -#endif struct free_area free_area[MAX_ORDER]; #ifndef CONFIG_SPARSEMEM diff --git a/mm/compaction.c b/mm/compaction.c index e78cb96..8afa6dc 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -85,7 +85,7 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn, } /* Found a free page, break it into order-0 pages */ - isolated = split_free_page(page); + isolated = split_free_page(page, !strict); if (!isolated && strict) return 0; total_isolated += isolated; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cefac39..d8540eb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1388,7 +1388,7 @@ void split_page(struct page *page, unsigned int order) * Note: this is probably too low level an operation for use in drivers. * Please consult with lkml before using this in your driver. */ -int split_free_page(struct page *page) +int split_free_page(struct page *page, bool check_wmark) { unsigned int order; unsigned long watermark; @@ -1399,10 +1399,12 @@ int split_free_page(struct page *page) zone = page_zone(page); order = page_order(page); - /* Obey watermarks as if the page was being allocated */ - watermark = low_wmark_pages(zone) + (1 << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) - return 0; + if (check_wmark) { + /* Obey watermarks as if the page was being allocated */ + watermark = low_wmark_pages(zone) + (1 << order); + if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) + return 0; + } /* Remove page from free list */ list_del(&page->lru); @@ -5116,10 +5118,6 @@ static void __setup_per_zone_wmarks(void) zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2); zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1); - zone->watermark[WMARK_MIN] += cma_wmark_pages(zone); - zone->watermark[WMARK_LOW] += cma_wmark_pages(zone); - zone->watermark[WMARK_HIGH] += cma_wmark_pages(zone); - setup_zone_migrate_reserve(zone); spin_unlock_irqrestore(&zone->lock, flags); } @@ -5671,54 +5669,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) return ret > 0 ? 0 : ret; } -/* - * Update zone's cma pages counter used for watermark level calculation. - */ -static inline void __update_cma_watermarks(struct zone *zone, int count) -{ - unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); - zone->min_cma_pages += count; - spin_unlock_irqrestore(&zone->lock, flags); - setup_per_zone_wmarks(); -} - -/* - * Trigger memory pressure bump to reclaim some pages in order to be able to - * allocate 'count' pages in single page units. Does similar work as - *__alloc_pages_slowpath() function. - */ -static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count) -{ - enum zone_type high_zoneidx = gfp_zone(gfp_mask); - struct zonelist *zonelist = node_zonelist(0, gfp_mask); - int did_some_progress = 0; - int order = 1; - - /* - * Increase level of watermarks to force kswapd do his job - * to stabilise at new watermark level. - */ - __update_cma_watermarks(zone, count); - - /* Obey watermarks as if the page was being allocated */ - while (!zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0)) { - wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone)); - - did_some_progress = __perform_reclaim(gfp_mask, order, zonelist, - NULL); - if (!did_some_progress) { - /* Exhausted what can be done so it's blamo time */ - out_of_memory(zonelist, gfp_mask, order, NULL, false); - } - } - - /* Restore original watermark levels. */ - __update_cma_watermarks(zone, -count); - - return count; -} - /** * alloc_contig_range() -- tries to allocate given range of pages * @start: start PFN to allocate @@ -5742,7 +5692,6 @@ static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count) int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype) { - struct zone *zone = page_zone(pfn_to_page(start)); unsigned long outer_start, outer_end; int ret = 0, order; @@ -5817,12 +5766,6 @@ int alloc_contig_range(unsigned long start, unsigned long end, goto done; } - /* - * Reclaim enough pages to make sure that contiguous allocation - * will not starve the system. - */ - __reclaim_pages(zone, GFP_HIGHUSER_MOVABLE, end-start); - /* Grab isolated pages from freelists. */ outer_end = isolate_freepages_range(outer_start, end); if (!outer_end) { -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756138Ab2HNOU4 (ORCPT ); Tue, 14 Aug 2012 10:20:56 -0400 Received: from mail-ee0-f46.google.com ([74.125.83.46]:53239 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754768Ab2HNOUE (ORCPT ); Tue, 14 Aug 2012 10:20:04 -0400 From: Michal Nazarewicz To: Minchan Kim , Marek Szyprowski Cc: Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD In-Reply-To: <1344934627-8473-3-git-send-email-minchan@kernel.org> Organization: http://mina86.com/ References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> User-Agent: Notmuch/0.13.2+121~gb26df7c (http://notmuchmail.org) Emacs/24.1.50.2 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 Date: Tue, 14 Aug 2012 16:19:55 +0200 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Minchan Kim writes: > This patch introudes MIGRATE_DISCARD mode in migration. > It drop clean cache pages instead of migration so that > migration latency could be reduced. Of course, it could > evict code pages but latency of big contiguous memory > is more important than some background application's slow down > in mobile embedded enviroment. > > Signed-off-by: Minchan Kim This looks good to me. > --- > include/linux/migrate_mode.h | 11 +++++++--- > mm/migrate.c | 50 +++++++++++++++++++++++++++++++++---= ------ > mm/page_alloc.c | 2 +- > 3 files changed, 49 insertions(+), 14 deletions(-) > > diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h > index ebf3d89..04ca19c 100644 > --- a/include/linux/migrate_mode.h > +++ b/include/linux/migrate_mode.h > @@ -6,11 +6,16 @@ > * on most operations but not ->writepage as the potential stall time > * is too significant > * MIGRATE_SYNC will block when migrating pages > + * MIGRTATE_DISCARD will discard clean cache page instead of migration > + * > + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used > + * together as OR flag. > */ > enum migrate_mode { > - MIGRATE_ASYNC, > - MIGRATE_SYNC_LIGHT, > - MIGRATE_SYNC, > + MIGRATE_ASYNC =3D 1 << 0, > + MIGRATE_SYNC_LIGHT =3D 1 << 1, > + MIGRATE_SYNC =3D 1 << 2, > + MIGRATE_DISCARD =3D 1 << 3, > }; Since CMA is the only user of MIGRATE_DISCARD it may be worth it to guard it inside an #ifdef, eg: #ifdef CONFIG_CMA MIGRATE_DISCARD =3D 1 << 3, #define is_migrate_discard(mode) (((mode) & MIGRATE_DISCARD) =3D=3D MIGRATE= _DISCARD) #endif =20=20 > #endif /* MIGRATE_MODE_H_INCLUDED */ > diff --git a/mm/migrate.c b/mm/migrate.c > index 77ed2d7..8119a59 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struc= t page *newpage, > int remap_swapcache =3D 1; > struct mem_cgroup *mem; > struct anon_vma *anon_vma =3D NULL; > + enum ttu_flags ttu_flags; > + bool discard_mode =3D false; > + bool file =3D false; >=20=20 > if (!trylock_page(page)) { > - if (!force || mode =3D=3D MIGRATE_ASYNC) > + if (!force || mode & MIGRATE_ASYNC) + if (!force || (mode & MIGRATE_ASYNC)) > goto out; >=20=20 > /* --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------------------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAEBAgAGBQJQKl6LAAoJECBgQBJQdR/0goYP/0/hPtydiXAf13PcPR+dWLrx UhGN2//ZkV8KnA11/q/1fLJ7mIuUoR84gGi9PPkQUOdos5iOxoVj9yMB+UWlok68 qwrsOeBwJZLcYzpeLP3UAyuElxUBWCs4A/zvRXjA9Sb8Bnk0p7W/ONqfIjeYQsyN phpxS8dEZkwLff4QOaQpHNYopTxDyBfjzbL2F0vlnXfP0NuLVEYk6BDE1d1oCXWt 7t61/34i3e5fNYGCybgxcmMWG32vzg6i1a1MTt4kXFVg+H8PtC8iM2u+/o0Ma2oW 76le3gW1P3zGFreBABKYViyu3MUEdS5yNzjspvCM7d20nTLlxKN7ZnqZF2Pmldwb VU/eCmWTNbA3aCH+2zjbuV1M08CS4TmQcQa+T2Txp6PctRM6zfoM1v/kxdTgHKIW PdtXiYN1yryjHGb6RDt34W9VGhEmiJ4wOtVb1taDHODM/VzrWe8YRIqvyqDjqTD8 PoyeoP4lqA71KZelHJh5KY7YXw0owROkesoADZcxWJ01D4jRo5m6G+ZkyfiAwVgj IE3g4T5wcG4BxVPGazIni3HqoPX0HbSdbb7SnI4MHoSK1LypBGmuKsaoAqSb3o28 8qXxNNfPO2Cj6TNxyJDAkBnUXGL5b2nQYF1FR2Ha4t79ljqsMAX+54JqkOErWMtE 1MeBYF5Gv4LtAgQONM6c =v1Nm -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753486Ab2HOSwB (ORCPT ); Wed, 15 Aug 2012 14:52:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:29281 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752326Ab2HOSwA (ORCPT ); Wed, 15 Aug 2012 14:52:00 -0400 Message-ID: <502BEF53.80200@redhat.com> Date: Wed, 15 Aug 2012 14:49:55 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0 MIME-Version: 1.0 To: Minchan Kim CC: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC 1/2] cma: remove __reclaim_pages References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> In-Reply-To: <1344934627-8473-2-git-send-email-minchan@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/14/2012 04:57 AM, Minchan Kim wrote: > Now cma reclaims too many pages by __reclaim_pages which says > following as > > * Reclaim enough pages to make sure that contiguous allocation > * will not starve the system. > > Starve? What does it starve the system? The function which allocate > free page for migration target would wake up kswapd and do direct reclaim > if needed during migration so system doesn't starve. > > Let remove __reclaim_pages and related function and fields. Fair enough. > Signed-off-by: Minchan Kim Acked-by: Rik van Riel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752912Ab2HOTAN (ORCPT ); Wed, 15 Aug 2012 15:00:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32754 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751867Ab2HOTAL (ORCPT ); Wed, 15 Aug 2012 15:00:11 -0400 Message-ID: <502BF139.3040403@redhat.com> Date: Wed, 15 Aug 2012 14:58:01 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0 MIME-Version: 1.0 To: Minchan Kim CC: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> In-Reply-To: <1344934627-8473-3-git-send-email-minchan@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/14/2012 04:57 AM, Minchan Kim wrote: > This patch introudes MIGRATE_DISCARD mode in migration. > It drop clean cache pages instead of migration so that > migration latency could be reduced. Of course, it could > evict code pages but latency of big contiguous memory > is more important than some background application's slow down > in mobile embedded enviroment. Would it be an idea to only drop clean UNMAPPED page cache pages? > Signed-off-by: Minchan Kim > @@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > goto skip_unmap; > } > > + file = page_is_file_cache(page); > + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > + > + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) > + ttu_flags |= TTU_MIGRATION; > + else > + discard_mode = true; > + > /* Establish migration ptes or remove ptes */ > - try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); > + try_to_unmap(page, ttu_flags); This bit looks wrong, because you end up ignoring mlock and then discarding the page. Only dropping clean page cache pages that are not mapped would avoid that problem, without introducing much complexity in the code. That would turn the test above into: if (!page_mapped(page)) discard_mode = true; > skip_unmap: > - if (!page_mapped(page)) > - rc = move_to_new_page(newpage, page, remap_swapcache, mode); > + if (!page_mapped(page)) { > + if (!discard_mode) > + rc = move_to_new_page(newpage, page, remap_swapcache, mode); > + else { > + struct address_space *mapping; > + mapping = page_mapping(page); > + > + if (page_has_private(page)) { > + if (!try_to_release_page(page, GFP_KERNEL)) { > + rc = -EAGAIN; > + goto uncharge; > + } > + } > + > + if (remove_mapping(mapping, page)) > + rc = 0; > + else > + rc = -EAGAIN; > + goto uncharge; > + } > + } This big piece of code could probably be split out into its own function. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754196Ab2HOXSW (ORCPT ); Wed, 15 Aug 2012 19:18:22 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:52198 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753408Ab2HOXSR (ORCPT ); Wed, 15 Aug 2012 19:18:17 -0400 X-AuditID: 9c930197-b7bb2ae0000011d9-8d-502c2e376878 Date: Thu, 16 Aug 2012 08:20:23 +0900 From: Minchan Kim To: Michal Nazarewicz Cc: Marek Szyprowski , Mel Gorman , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD Message-ID: <20120815232023.GA15225@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, On Tue, Aug 14, 2012 at 04:19:55PM +0200, Michal Nazarewicz wrote: > Minchan Kim writes: > > This patch introudes MIGRATE_DISCARD mode in migration. > > It drop clean cache pages instead of migration so that > > migration latency could be reduced. Of course, it could > > evict code pages but latency of big contiguous memory > > is more important than some background application's slow down > > in mobile embedded enviroment. > > > > Signed-off-by: Minchan Kim > > This looks good to me. > > > --- > > include/linux/migrate_mode.h | 11 +++++++--- > > mm/migrate.c | 50 +++++++++++++++++++++++++++++++++--------- > > mm/page_alloc.c | 2 +- > > 3 files changed, 49 insertions(+), 14 deletions(-) > > > > diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h > > index ebf3d89..04ca19c 100644 > > --- a/include/linux/migrate_mode.h > > +++ b/include/linux/migrate_mode.h > > @@ -6,11 +6,16 @@ > > * on most operations but not ->writepage as the potential stall time > > * is too significant > > * MIGRATE_SYNC will block when migrating pages > > + * MIGRTATE_DISCARD will discard clean cache page instead of migration > > + * > > + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used > > + * together as OR flag. > > */ > > enum migrate_mode { > > - MIGRATE_ASYNC, > > - MIGRATE_SYNC_LIGHT, > > - MIGRATE_SYNC, > > + MIGRATE_ASYNC = 1 << 0, > > + MIGRATE_SYNC_LIGHT = 1 << 1, > > + MIGRATE_SYNC = 1 << 2, > > + MIGRATE_DISCARD = 1 << 3, > > }; > > Since CMA is the only user of MIGRATE_DISCARD it may be worth it to > guard it inside an #ifdef, eg: > > #ifdef CONFIG_CMA > MIGRATE_DISCARD = 1 << 3, > #define is_migrate_discard(mode) (((mode) & MIGRATE_DISCARD) == MIGRATE_DISCARD) The mode bit can be used with other bits like MIGRATE_SYNC|MIGRATE_DISCARD. So it is correct that (mode & MIGRATE_DISCARD). Anyway, I don't want to fold it into only CMA because I think we can have a pontential users in mm. For example, memory-hotplug case. No enough free memory in the system but lots of page cache page as a migration source, then we can remove page cache page instead of migration and it might be better than failing memory-hotremove. In summary, I want to open it for potential usecases in future if anyone doesn't oppose strongly. > #endif > > > > #endif /* MIGRATE_MODE_H_INCLUDED */ > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 77ed2d7..8119a59 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > > int remap_swapcache = 1; > > struct mem_cgroup *mem; > > struct anon_vma *anon_vma = NULL; > > + enum ttu_flags ttu_flags; > > + bool discard_mode = false; > > + bool file = false; > > > > if (!trylock_page(page)) { > > - if (!force || mode == MIGRATE_ASYNC) > > + if (!force || mode & MIGRATE_ASYNC) It's not wrong technically but for readability, NP. > > + if (!force || (mode & MIGRATE_ASYNC)) > > > goto out; > > > > /* > > > -- > Best regards, _ _ > .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o > ..o | Computer Science, Michał “mina86” Nazarewicz (o o) > ooo +------------------ooO--(_)--Ooo-- -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754573Ab2HOXbS (ORCPT ); Wed, 15 Aug 2012 19:31:18 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:42123 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751918Ab2HOXbR (ORCPT ); Wed, 15 Aug 2012 19:31:17 -0400 X-AuditID: 9c93016f-b7c98ae0000013b4-bf-502c3143735e Date: Thu, 16 Aug 2012 08:33:23 +0900 From: Minchan Kim To: Rik van Riel Cc: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD Message-ID: <20120815233323.GB15225@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> <502BF139.3040403@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <502BF139.3040403@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Rik, On Wed, Aug 15, 2012 at 02:58:01PM -0400, Rik van Riel wrote: > On 08/14/2012 04:57 AM, Minchan Kim wrote: > >This patch introudes MIGRATE_DISCARD mode in migration. > >It drop clean cache pages instead of migration so that > >migration latency could be reduced. Of course, it could > >evict code pages but latency of big contiguous memory > >is more important than some background application's slow down > >in mobile embedded enviroment. > > Would it be an idea to only drop clean UNMAPPED > page cache pages? Firstly I thougt about that but I chose more agressive thing. Namely, even drop mapped page cache. Because it can reduce latency more(ex, memcpy + remapping cost during migration) and it could not trivial if migration range is big. > > >Signed-off-by: Minchan Kim > > >@@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > > goto skip_unmap; > > } > > > >+ file = page_is_file_cache(page); > >+ ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > >+ > >+ if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) > >+ ttu_flags |= TTU_MIGRATION; > >+ else > >+ discard_mode = true; > >+ > > /* Establish migration ptes or remove ptes */ > >- try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); > >+ try_to_unmap(page, ttu_flags); > > This bit looks wrong, because you end up ignoring > mlock and then discarding the page. Argh, Thanks! I will fix it in next spin. > > Only dropping clean page cache pages that are not > mapped would avoid that problem, without introducing > much complexity in the code. Hmm, I don't think it makes code much complex. How about this? diff --git a/mm/rmap.c b/mm/rmap.c index 0f3b7cd..0909d79 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1223,7 +1223,8 @@ out: * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file. */ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, - unsigned long address, enum ttu_flags flags) + unsigned long address, enum ttu_flags flags, + unsigned long *vm_flags) { struct mm_struct *mm = vma->vm_mm; pte_t *pte; @@ -1235,6 +1236,7 @@ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, if (!pte) goto out; + vm_flags |= vma->vm_flags; /* * If the page is mlock()d, we cannot swap it out. * If it's recently referenced (perhaps page_referenced @@ -1652,7 +1654,7 @@ out: * SWAP_FAIL - the page is unswappable * SWAP_MLOCK - page is mlocked. */ -int try_to_unmap(struct page *page, enum ttu_flags flags) +int try_to_unmap(struct page *page, enum ttu_flags flags, unsigned long *vm_flags) { int ret; + file = page_is_file_cache(page); + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; + + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page) || + vm_flags & VM_LOCKED) + ttu_flags |= TTU_MIGRATION; + else + discard_mode = true; + > > That would turn the test above into: > > if (!page_mapped(page)) > discard_mode = true; > > > skip_unmap: > >- if (!page_mapped(page)) > >- rc = move_to_new_page(newpage, page, remap_swapcache, mode); > >+ if (!page_mapped(page)) { > >+ if (!discard_mode) > >+ rc = move_to_new_page(newpage, page, remap_swapcache, mode); > >+ else { > >+ struct address_space *mapping; > >+ mapping = page_mapping(page); > >+ > >+ if (page_has_private(page)) { > >+ if (!try_to_release_page(page, GFP_KERNEL)) { > >+ rc = -EAGAIN; > >+ goto uncharge; > >+ } > >+ } > >+ > >+ if (remove_mapping(mapping, page)) > >+ rc = 0; > >+ else > >+ rc = -EAGAIN; > >+ goto uncharge; > >+ } > >+ } > > This big piece of code could probably be split out > into its own function. > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752867Ab2HPANO (ORCPT ); Wed, 15 Aug 2012 20:13:14 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:62169 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751811Ab2HPANN (ORCPT ); Wed, 15 Aug 2012 20:13:13 -0400 X-AuditID: 9c930197-b7bb2ae0000011d9-79-502c3b14f666 Date: Thu, 16 Aug 2012 09:15:17 +0900 From: Minchan Kim To: Rik van Riel Cc: Marek Szyprowski , Mel Gorman , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC 2/2] cma: support MIGRATE_DISCARD Message-ID: <20120816001517.GC15225@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-3-git-send-email-minchan@kernel.org> <502BF139.3040403@redhat.com> <20120815233323.GB15225@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120815233323.GB15225@bbox> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 16, 2012 at 08:33:23AM +0900, Minchan Kim wrote: > Hi Rik, > > On Wed, Aug 15, 2012 at 02:58:01PM -0400, Rik van Riel wrote: > > On 08/14/2012 04:57 AM, Minchan Kim wrote: > > >This patch introudes MIGRATE_DISCARD mode in migration. > > >It drop clean cache pages instead of migration so that > > >migration latency could be reduced. Of course, it could > > >evict code pages but latency of big contiguous memory > > >is more important than some background application's slow down > > >in mobile embedded enviroment. > > > > Would it be an idea to only drop clean UNMAPPED > > page cache pages? > > Firstly I thougt about that but I chose more agressive thing. > Namely, even drop mapped page cache. > Because it can reduce latency more(ex, memcpy + remapping cost > during migration) and it could not trivial if migration range is big. > > > > > >Signed-off-by: Minchan Kim > > > > >@@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > > > goto skip_unmap; > > > } > > > > > >+ file = page_is_file_cache(page); > > >+ ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > > >+ > > >+ if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page)) > > >+ ttu_flags |= TTU_MIGRATION; > > >+ else > > >+ discard_mode = true; > > >+ > > > /* Establish migration ptes or remove ptes */ > > >- try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); > > >+ try_to_unmap(page, ttu_flags); > > > > This bit looks wrong, because you end up ignoring > > mlock and then discarding the page. > > Argh, Thanks! > I will fix it in next spin. > > > > > Only dropping clean page cache pages that are not > > mapped would avoid that problem, without introducing > > much complexity in the code. > > Hmm, I don't think it makes code much complex. > How about this? > > diff --git a/mm/rmap.c b/mm/rmap.c > index 0f3b7cd..0909d79 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1223,7 +1223,8 @@ out: > * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file. > */ > int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > - unsigned long address, enum ttu_flags flags) > + unsigned long address, enum ttu_flags flags, > + unsigned long *vm_flags) > { > struct mm_struct *mm = vma->vm_mm; > pte_t *pte; > @@ -1235,6 +1236,7 @@ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > if (!pte) > goto out; > > + vm_flags |= vma->vm_flags; > /* > * If the page is mlock()d, we cannot swap it out. > * If it's recently referenced (perhaps page_referenced > @@ -1652,7 +1654,7 @@ out: > * SWAP_FAIL - the page is unswappable > * SWAP_MLOCK - page is mlocked. > */ > -int try_to_unmap(struct page *page, enum ttu_flags flags) > +int try_to_unmap(struct page *page, enum ttu_flags flags, unsigned long *vm_flags) > { > int ret; > > > > + file = page_is_file_cache(page); > + ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS; > + > + if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page) || > + vm_flags & VM_LOCKED) We do try_to_unmap after this piece so we can't get the information in advance. :( I don't have better idea which doesn't have a drawback so I will accept your idea. Thanks, Rik. > + ttu_flags |= TTU_MIGRATION; > + else > + discard_mode = true; > + > > -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932611Ab2HPN6Z (ORCPT ); Thu, 16 Aug 2012 09:58:25 -0400 Received: from cantor2.suse.de ([195.135.220.15]:53300 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755461Ab2HPN6Y (ORCPT ); Thu, 16 Aug 2012 09:58:24 -0400 Date: Thu, 16 Aug 2012 14:58:18 +0100 From: Mel Gorman To: Minchan Kim Cc: Marek Szyprowski , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC 1/2] cma: remove __reclaim_pages Message-ID: <20120816135817.GS4177@suse.de> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1344934627-8473-2-git-send-email-minchan@kernel.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote: > Now cma reclaims too many pages by __reclaim_pages which says > following as > > * Reclaim enough pages to make sure that contiguous allocation > * will not starve the system. > > Starve? What does it starve the system? The function which allocate > free page for migration target would wake up kswapd and do direct reclaim > if needed during migration so system doesn't starve. > I thought this patch was overkill at the time it was introduced but didn't have a concrete reason to reject it when I commented on it https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed up with "contiguous allocations should have higher priority than others" which I took to mean that he was also ok with excessive reclaim. > Let remove __reclaim_pages and related function and fields. > That should be one patch and I don't object to it being removed as such but it's Marek's call. > I modified split_free_page slightly because I removed __reclaim_pages, > isolate_freepages_range can fail by split_free_page's watermark check. > It's very critical in CMA because it ends up failing alloc_contig_range. > This is a big change and should have been in a patch on its own. split_free_page checks watermarks because if the watermarks are not obeyed a zone can become fully allocated. This can cause a system to livelock under certain circumstances if a page cannot be allocated and a free page is required before other pages can be freed. > I think we don't need the check in case of CMA because CMA allocates > free pages by alloc_pages, not isolate_freepages_block in migrate_pages > so watermark is already checked in alloc_pages. It uses alloc_pages when migrating pages out of the CMA area but note that it uses isolate_freepages_block when allocating the CMA buffer when alloc_contig_range calls isolate_freepages_range isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) { for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { isolated = isolate_freepages_block(pfn, block_end_pfn, &freelist, true); } map_pages(&freelist); } so the actual CMA allocation itself is not using alloc_pages. By removing the watermark check you allow the CMA to breach watermarks and puts the system at risk of livelock. I'm not keen on the split_free_page() change at all. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756768Ab2HQBFy (ORCPT ); Thu, 16 Aug 2012 21:05:54 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:47068 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753720Ab2HQBFt (ORCPT ); Thu, 16 Aug 2012 21:05:49 -0400 X-AuditID: 9c930179-b7cc4ae00000134d-f3-502d98eba439 Date: Fri, 17 Aug 2012 10:05:47 +0900 From: Minchan Kim To: Mel Gorman Cc: Marek Szyprowski , Rik van Riel , Kamezawa Hiroyuki , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC 1/2] cma: remove __reclaim_pages Message-ID: <20120817010547.GA3061@bbox> References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> <20120816135817.GS4177@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120816135817.GS4177@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mel, On Thu, Aug 16, 2012 at 02:58:18PM +0100, Mel Gorman wrote: > On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote: > > Now cma reclaims too many pages by __reclaim_pages which says > > following as > > > > * Reclaim enough pages to make sure that contiguous allocation > > * will not starve the system. > > > > Starve? What does it starve the system? The function which allocate > > free page for migration target would wake up kswapd and do direct reclaim > > if needed during migration so system doesn't starve. > > > > I thought this patch was overkill at the time it was introduced but > didn't have a concrete reason to reject it when I commented on it > https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed > up with "contiguous allocations should have higher priority than others" > which I took to mean that he was also ok with excessive reclaim. I think OOM kill to background applications is more appropriate than big latency of foreground(ex, Camera app) application in your mobile phone. In other words, excessive reclaim is *really* bad which elapsed 8sec in my test as worst case. :( > > > Let remove __reclaim_pages and related function and fields. > > > > That should be one patch and I don't object to it being removed as such > but it's Marek's call. Marek. Any thought? > > > I modified split_free_page slightly because I removed __reclaim_pages, > > isolate_freepages_range can fail by split_free_page's watermark check. > > It's very critical in CMA because it ends up failing alloc_contig_range. > > > > This is a big change and should have been in a patch on its > own. split_free_page checks watermarks because if the watermarks are > not obeyed a zone can become fully allocated. This can cause a system to > livelock under certain circumstances if a page cannot be allocated and a > free page is required before other pages can be freed. > > > I think we don't need the check in case of CMA because CMA allocates > > free pages by alloc_pages, not isolate_freepages_block in migrate_pages > > so watermark is already checked in alloc_pages. > > It uses alloc_pages when migrating pages out of the CMA area but note > that it uses isolate_freepages_block when allocating the CMA buffer when > alloc_contig_range calls isolate_freepages_range > > isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) > { > for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { > isolated = isolate_freepages_block(pfn, block_end_pfn, > &freelist, true); > } > map_pages(&freelist); > } > > so the actual CMA allocation itself is not using alloc_pages. By removing > the watermark check you allow the CMA to breach watermarks and puts the > system at risk of livelock. Fair enough. I will look into that. Thanks, Mel. -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758156Ab2HQOsy (ORCPT ); Fri, 17 Aug 2012 10:48:54 -0400 Received: from mailout3.samsung.com ([203.254.224.33]:56946 "EHLO mailout3.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758127Ab2HQOso (ORCPT ); Fri, 17 Aug 2012 10:48:44 -0400 X-AuditID: cbfee61b-b7faf6d00000476a-f2-502e59ca5767 From: Marek Szyprowski To: "'Minchan Kim'" , "'Mel Gorman'" Cc: "'Rik van Riel'" , "'Kamezawa Hiroyuki'" , "'Andrew Morton'" , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1344934627-8473-1-git-send-email-minchan@kernel.org> <1344934627-8473-2-git-send-email-minchan@kernel.org> <20120816135817.GS4177@suse.de> <20120817010547.GA3061@bbox> In-reply-to: <20120817010547.GA3061@bbox> Subject: RE: [RFC 1/2] cma: remove __reclaim_pages Date: Fri, 17 Aug 2012 16:48:30 +0200 Organization: SPRC Message-id: <024801cd7c87$6125f880$2371e980$%szyprowski@samsung.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-index: Ac18FHD2r2d3/WCcS12aVYKLNysqywAYoToQ Content-language: pl X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrBLMWRmVeSWpSXmKPExsVy+t9jAd1TkXoBBgvWq1lc3jWHzYHR4/Mm uQDGKC6blNSczLLUIn27BK6Mq0+OsRbcla7Y07KFsYFxvWgXIyeHhICJxPoZm5ggbDGJC/fW s3UxcnEICSxilJi3sQ/KmcUk8eDaHLAqNgFDia63XUAJDg4RATeJ8y+sQcLMAvsZJbav0IWo 38EosfBxC1g9p4CmxJGnZ1lAbGEBI4m3O3axgtgsAqoSZz9PAJvDLyAkMXGWAkiYV8BFYvfX U0wQtqDEj8n3WCDma0ms33mcCcKWl9i85i0zSKuEgLrEo7+6IGERkOlHH7NClIhI3G14zjqB UXgWkkmzkEyahWTSLCQtCxhZVjGKphYkFxQnpeca6RUn5haX5qXrJefnbmIEh/cz6R2Mqxos DjEKcDAq8fAeKNMNEGJNLCuuzD3EKMHBrCTCu+IjUIg3JbGyKrUoP76oNCe1+BCjNAeLkjiv 13+glEB6YklqdmpqQWoRTJaJg1OqgTHg/jt7HpEd+uKeBfOM2m/Iz7KUjGQ1+xK35M271PaA df/qXwbPfOgbOP/476O7i+r7r5/nWmZ5me1qTFzG+e70G5rul/bUbP9jM1Xo/KbIbZ5bL7NM YF+579qK3bk7T3BddfLZKTbzXurRkIiNLToceW17gv7ZaE76aqD74sY/b0mF02FMRxOUWIoz Eg21mIuKEwEiMXRqawIAAA== X-TM-AS-MML: No Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Minchan, On Friday, August 17, 2012 3:06 AM Minchan Kim wrote: > On Thu, Aug 16, 2012 at 02:58:18PM +0100, Mel Gorman wrote: > > On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote: > > > Now cma reclaims too many pages by __reclaim_pages which says > > > following as > > > > > > * Reclaim enough pages to make sure that contiguous allocation > > > * will not starve the system. > > > > > > Starve? What does it starve the system? The function which allocate > > > free page for migration target would wake up kswapd and do direct reclaim > > > if needed during migration so system doesn't starve. > > > > > > > I thought this patch was overkill at the time it was introduced but > > didn't have a concrete reason to reject it when I commented on it > > https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed > > up with "contiguous allocations should have higher priority than others" > > which I took to mean that he was also ok with excessive reclaim. > > I think OOM kill to background applications is more appropriate than > big latency of foreground(ex, Camera app) application in your mobile phone. > In other words, excessive reclaim is *really* bad which elapsed 8sec > in my test as worst case. :( > > > > > > Let remove __reclaim_pages and related function and fields. > > > > > > > That should be one patch and I don't object to it being removed as such > > but it's Marek's call. > > Marek. Any thought? Well, I've introduced this function as a result of solving Mel's request: http://www.spinics.net/lists/linux-mm/msg28485.html Before that I solved it almost the same as in your current patch. I'm aware of the fact that __reclaim_pages() approach might be a little overkill, but I didn't find anything better yet. > > > I modified split_free_page slightly because I removed __reclaim_pages, > > > isolate_freepages_range can fail by split_free_page's watermark check. > > > It's very critical in CMA because it ends up failing alloc_contig_range. > > > > > > > This is a big change and should have been in a patch on its > > own. split_free_page checks watermarks because if the watermarks are > > not obeyed a zone can become fully allocated. This can cause a system to > > livelock under certain circumstances if a page cannot be allocated and a > > free page is required before other pages can be freed. > > > > > I think we don't need the check in case of CMA because CMA allocates > > > free pages by alloc_pages, not isolate_freepages_block in migrate_pages > > > so watermark is already checked in alloc_pages. > > > > It uses alloc_pages when migrating pages out of the CMA area but note > > that it uses isolate_freepages_block when allocating the CMA buffer when > > alloc_contig_range calls isolate_freepages_range > > > > isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) > > { > > for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { > > isolated = isolate_freepages_block(pfn, block_end_pfn, > > &freelist, true); > > } > > map_pages(&freelist); > > } > > > > so the actual CMA allocation itself is not using alloc_pages. By removing > > the watermark check you allow the CMA to breach watermarks and puts the > > system at risk of livelock. > > Fair enough. I will look into that. We will also looks into this issue. Best regards -- Marek Szyprowski Samsung Poland R&D Center