From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by kanga.kvack.org (Postfix) with ESMTP id 74AD36B00A7 for ; Wed, 7 May 2014 20:30:40 -0400 (EDT) Received: by mail-pa0-f41.google.com with SMTP id lj1so1869500pab.28 for ; Wed, 07 May 2014 17:30:40 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id ko6si2681144pbc.313.2014.05.07.17.30.38 for ; Wed, 07 May 2014 17:30:39 -0700 (PDT) From: Joonsoo Kim Subject: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Date: Thu, 8 May 2014 09:32:21 +0900 Message-Id: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Hello, This series tries to improve CMA. CMA is introduced to provide physically contiguous pages at runtime without reserving memory area. But, current implementation works like as reserving memory approach, because allocation on cma reserved region only occurs as fallback of migrate_movable allocation. We can allocate from it when there is no movable page. In that situation, kswapd would be invoked easily since unmovable and reclaimable allocation consider (free pages - free CMA pages) as free memory on the system and free memory may be lower than high watermark in that case. If kswapd start to reclaim memory, then fallback allocation doesn't occur much. In my experiment, I found that if system memory has 1024 MB memory and has 512 MB reserved memory for CMA, kswapd is mostly invoked around the 512MB free memory boundary. And invoked kswapd tries to make free memory until (free pages - free CMA pages) is higher than high watermark, so free memory on meminfo is moving around 512MB boundary consistently. To fix this problem, we should allocate the pages on cma reserved memory more aggressively and intelligenetly. Patch 2 implements the solution. Patch 1 is the simple optimization which remove useless re-trial and patch 3 is for removing useless alloc flag, so these are not important. See patch 2 for more detailed description. This patchset is based on v3.15-rc4. Thanks. Joonsoo Kim (3): CMA: remove redundant retrying code in __alloc_contig_migrate_range CMA: aggressively allocate the pages on cma reserved memory when not used CMA: always treat free cma pages as non-free on watermark checking include/linux/mmzone.h | 6 +++ mm/compaction.c | 4 -- mm/internal.h | 3 +- mm/page_alloc.c | 117 +++++++++++++++++++++++++++++++++++++++--------- 4 files changed, 102 insertions(+), 28 deletions(-) -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by kanga.kvack.org (Postfix) with ESMTP id 1110D6B00A9 for ; Wed, 7 May 2014 20:30:40 -0400 (EDT) Received: by mail-pa0-f42.google.com with SMTP id rd3so1883562pab.1 for ; Wed, 07 May 2014 17:30:40 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id vw5si14601728pab.210.2014.05.07.17.30.38 for ; Wed, 07 May 2014 17:30:40 -0700 (PDT) From: Joonsoo Kim Subject: [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Date: Thu, 8 May 2014 09:32:22 +0900 Message-Id: <1399509144-8898-2-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org We already have retry logic in migrate_pages(). It does retry 10 times. So if we keep this retrying code in __alloc_contig_migrate_range(), we would try to migrate some unmigratable page in 50 times. There is just one small difference in -ENOMEM case. migrate_pages() don't do retry in this case, however, current __alloc_contig_migrate_range() does. But, I think that this isn't problem, because in this case, we may fail again with same reason. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dba293..674ade7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, /* This function is based on compact_zone() from compaction.c. */ unsigned long nr_reclaimed; unsigned long pfn = start; - unsigned int tries = 0; int ret = 0; migrate_prep(); @@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = -EINTR; break; } - tries = 0; - } else if (++tries == 5) { - ret = ret < 0 ? ret : -EBUSY; - break; } nr_reclaimed = reclaim_clean_pages_from_list(cc->zone, @@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = migrate_pages(&cc->migratepages, alloc_migrate_target, 0, MIGRATE_SYNC, MR_CMA); + if (ret) { + ret = ret < 0 ? ret : -EBUSY; + break; + } } if (ret < 0) { putback_movable_pages(&cc->migratepages); -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f173.google.com (mail-pd0-f173.google.com [209.85.192.173]) by kanga.kvack.org (Postfix) with ESMTP id 1DBE46B00A9 for ; Wed, 7 May 2014 20:30:42 -0400 (EDT) Received: by mail-pd0-f173.google.com with SMTP id y10so1711952pdj.32 for ; Wed, 07 May 2014 17:30:41 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id hb8si2682638pbc.24.2014.05.07.17.30.40 for ; Wed, 07 May 2014 17:30:41 -0700 (PDT) From: Joonsoo Kim Subject: [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking Date: Thu, 8 May 2014 09:32:24 +0900 Message-Id: <1399509144-8898-4-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag for alloc flag and treats free cma pages as free pages if this flag is passed to watermark checking. Intention of that patch is that movable page allocation can be be handled from cma reserved region without starting kswapd. Now, previous patch changes the behaviour of allocator that movable allocation uses the page on cma reserved region aggressively, so this watermark hack isn't needed anymore. Therefore remove it. Signed-off-by: Joonsoo Kim diff --git a/mm/compaction.c b/mm/compaction.c index 627dc2e..36e2fcd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, count_compact_event(COMPACTSTALL); -#ifdef CONFIG_CMA - if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif /* Compact each zone in the list */ for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx, nodemask) { diff --git a/mm/internal.h b/mm/internal.h index 07b6736..a121762 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_HARDER 0x10 /* try to alloc harder */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ -#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */ -#define ALLOC_FAIR 0x100 /* fair zone allocation */ +#define ALLOC_FAIR 0x80 /* fair zone allocation */ #endif /* __MM_INTERNAL_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6f2b27b..6af2fa1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1757,20 +1757,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, long min = mark; long lowmem_reserve = z->lowmem_reserve[classzone_idx]; int o; - long free_cma = 0; free_pages -= (1 << order) - 1; if (alloc_flags & ALLOC_HIGH) min -= min / 2; if (alloc_flags & ALLOC_HARDER) min -= min / 4; -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - free_cma = zone_page_state(z, NR_FREE_CMA_PAGES); -#endif + /* + * We don't want to regard the pages on CMA region as free + * on watermark checking, since they cannot be used for + * unmovable/reclaimable allocation and they can suddenly + * vanish through CMA allocation + */ + if (IS_ENABLED(CONFIG_CMA) && z->has_cma) + free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES); - if (free_pages - free_cma <= min + lowmem_reserve) + if (free_pages <= min + lowmem_reserve) return false; for (o = 0; o < order; o++) { /* At the next order, this order's pages become unavailable */ @@ -2538,10 +2540,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask) unlikely(test_thread_flag(TIF_MEMDIE)))) alloc_flags |= ALLOC_NO_WATERMARKS; } -#ifdef CONFIG_CMA - if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif return alloc_flags; } @@ -2811,10 +2809,6 @@ retry_cpuset: if (!preferred_zone) goto out; -#ifdef CONFIG_CMA - if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif retry: /* First allocation attempt */ page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order, -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id BAEDA6B00AA for ; Wed, 7 May 2014 20:30:47 -0400 (EDT) Received: by mail-pa0-f46.google.com with SMTP id kx10so1898537pab.33 for ; Wed, 07 May 2014 17:30:47 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id ug9si14603785pab.212.2014.05.07.17.30.44 for ; Wed, 07 May 2014 17:30:46 -0700 (PDT) From: Joonsoo Kim Subject: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Date: Thu, 8 May 2014 09:32:23 +0900 Message-Id: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org CMA is introduced to provide physically contiguous pages at runtime. For this purpose, it reserves memory at boot time. Although it reserve memory, this reserved memory can be used for movable memory allocation request. This usecase is beneficial to the system that needs this CMA reserved memory infrequently and it is one of main purpose of introducing CMA. But, there is a problem in current implementation. The problem is that it works like as just reserved memory approach. The pages on cma reserved memory are hardly used for movable memory allocation. This is caused by combination of allocation and reclaim policy. The pages on cma reserved memory are allocated if there is no movable memory, that is, as fallback allocation. So the time this fallback allocation is started is under heavy memory pressure. Although it is under memory pressure, movable allocation easily succeed, since there would be many pages on cma reserved memory. But this is not the case for unmovable and reclaimable allocation, because they can't use the pages on cma reserved memory. These allocations regard system's free memory as (free pages - free cma pages) on watermark checking, that is, free unmovable pages + free reclaimable pages + free movable pages. Because we already exhausted movable pages, only free pages we have are unmovable and reclaimable types and this would be really small amount. So watermark checking would be failed. It will wake up kswapd to make enough free memory for unmovable and reclaimable allocation and kswapd will do. So before we fully utilize pages on cma reserved memory, kswapd start to reclaim memory and try to make free memory over the high watermark. This watermark checking by kswapd doesn't take care free cma pages so many movable pages would be reclaimed. After then, we have a lot of movable pages again, so fallback allocation doesn't happen again. To conclude, amount of free memory on meminfo which includes free CMA pages is moving around 512 MB if I reserve 512 MB memory for CMA. I found this problem on following experiment. 4 CPUs, 1024 MB, VIRTUAL MACHINE make -j24 CMA reserve: 0 MB 512 MB Elapsed-time: 234.8 361.8 Average-MemFree: 283880 KB 530851 KB To solve this problem, I can think following 2 possible solutions. 1. allocate the pages on cma reserved memory first, and if they are exhausted, allocate movable pages. 2. interleaved allocation: try to allocate specific amounts of memory from cma reserved memory and then allocate from free movable memory. I tested #1 approach and found the problem. Although free memory on meminfo can move around low watermark, there is large fluctuation on free memory, because too many pages are reclaimed when kswapd is invoked. Reason for this behaviour is that successive allocated CMA pages are on the LRU list in that order and kswapd reclaim them in same order. These memory doesn't help watermark checking from kwapd, so too many pages are reclaimed, I guess. So, I implement #2 approach. One thing I should note is that we should not change allocation target (movable list or cma) on each allocation attempt, since this prevent allocated pages to be in physically succession, so some I/O devices can be hurt their performance. To solve this, I keep allocation target in at least pageblock_nr_pages attempts and make this number reflect ratio, free pages without free cma pages to free cma pages. With this approach, system works very smoothly and fully utilize the pages on cma reserved memory. Following is the experimental result of this patch. 4 CPUs, 1024 MB, VIRTUAL MACHINE make -j24 CMA reserve: 0 MB 512 MB Elapsed-time: 234.8 361.8 Average-MemFree: 283880 KB 530851 KB pswpin: 7 110064 pswpout: 452 767502 CMA reserve: 0 MB 512 MB Elapsed-time: 234.2 235.6 Average-MemFree: 281651 KB 290227 KB pswpin: 8 8 pswpout: 430 510 There is no difference if we don't have cma reserved memory (0 MB case). But, with cma reserved memory (512 MB case), we fully utilize these reserved memory through this patch and the system behaves like as it doesn't reserve any memory. With this patch, we aggressively allocate the pages on cma reserved memory so latency of CMA can arise. Below is the experimental result about latency. 4 CPUs, 1024 MB, VIRTUAL MACHINE CMA reserve: 512 MB Backgound Workload: make -jN Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval N: 1 4 8 16 Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 So generally we can see latency increase. Ratio of this increase is rather big - up to 70%. But, under the heavy workload, it shows latency decrease - up to 55%. This may be worst-case scenario, but reducing it would be important for some system, so, I can say that this patch have advantages and disadvantages in terms of latency. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fac5509..3ff24d4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -389,6 +389,12 @@ struct zone { int compact_order_failed; #endif +#ifdef CONFIG_CMA + int has_cma; + int nr_try_cma; + int nr_try_movable; +#endif + ZONE_PADDING(_pad1_) /* Fields commonly accessed by the page reclaim scanner */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 674ade7..6f2b27b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) } #ifdef CONFIG_CMA +void __init init_alloc_ratio_counter(struct zone *zone) +{ + if (zone->has_cma) + return; + + zone->has_cma = 1; + zone->nr_try_movable = 0; + zone->nr_try_cma = 0; +} + /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ void __init init_cma_reserved_pageblock(struct page *page) { @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) set_pageblock_migratetype(page, MIGRATE_CMA); __free_pages(page, pageblock_order); adjust_managed_page_count(page, pageblock_nr_pages); + init_alloc_ratio_counter(page_zone(page)); } #endif @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) return NULL; } +#ifdef CONFIG_CMA +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, + int migratetype) +{ + long free, free_cma, free_wmark; + struct page *page; + + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) + return NULL; + + if (zone->nr_try_movable) + goto alloc_movable; + +alloc_cma: + if (zone->nr_try_cma) { + /* Okay. Now, we can try to allocate the page from cma region */ + zone->nr_try_cma--; + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); + + /* CMA pages can vanish through CMA allocation */ + if (unlikely(!page && order == 0)) + zone->nr_try_cma = 0; + + return page; + } + + /* Reset ratio counter */ + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); + + /* No cma free pages, so recharge only movable allocation */ + if (free_cma <= 0) { + zone->nr_try_movable = pageblock_nr_pages; + goto alloc_movable; + } + + free = zone_page_state(zone, NR_FREE_PAGES); + free_wmark = free - free_cma - high_wmark_pages(zone); + + /* + * free_wmark is below than 0, and it means that normal pages + * are under the pressure, so we recharge only cma allocation. + */ + if (free_wmark <= 0) { + zone->nr_try_cma = pageblock_nr_pages; + goto alloc_cma; + } + + if (free_wmark > free_cma) { + zone->nr_try_movable = + (free_wmark * pageblock_nr_pages) / free_cma; + zone->nr_try_cma = pageblock_nr_pages; + } else { + zone->nr_try_movable = pageblock_nr_pages; + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; + } + + /* Reset complete, start on movable first */ +alloc_movable: + zone->nr_try_movable--; + return NULL; +} +#endif + /* * Do the hard work of removing an element from the buddy allocator. * Call me with the zone->lock already held. @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) static struct page *__rmqueue(struct zone *zone, unsigned int order, int migratetype) { - struct page *page; + struct page *page = NULL; + + if (IS_ENABLED(CONFIG_CMA)) + page = __rmqueue_cma(zone, order, migratetype); retry_reserve: - page = __rmqueue_smallest(zone, order, migratetype); + if (!page) + page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { page = __rmqueue_fallback(zone, order, migratetype); @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, zone_seqlock_init(zone); zone->zone_pgdat = pgdat; zone_pcp_init(zone); + if (IS_ENABLED(CONFIG_CMA)) + zone->has_cma = 0; /* For bootup, initialized properly in watermark setup */ mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by kanga.kvack.org (Postfix) with ESMTP id 5519C6B0035 for ; Fri, 9 May 2014 08:39:11 -0400 (EDT) Received: by mail-pa0-f54.google.com with SMTP id bj1so3118897pad.27 for ; Fri, 09 May 2014 05:39:11 -0700 (PDT) Received: from mailout4.w1.samsung.com (mailout4.w1.samsung.com. [210.118.77.14]) by mx.google.com with ESMTPS id gi2si1724142pac.159.2014.05.09.05.39.09 for (version=TLSv1 cipher=RC4-MD5 bits=128/128); Fri, 09 May 2014 05:39:10 -0700 (PDT) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0N5B0077M4H3V970@mailout4.w1.samsung.com> for linux-mm@kvack.org; Fri, 09 May 2014 13:39:03 +0100 (BST) Message-id: <536CCC78.6050806@samsung.com> Date: Fri, 09 May 2014 14:39:20 +0200 From: Marek Szyprowski MIME-version: 1.0 Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> In-reply-to: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , 'Tomasz Stanislawski' Hello, On 2014-05-08 02:32, Joonsoo Kim wrote: > This series tries to improve CMA. > > CMA is introduced to provide physically contiguous pages at runtime > without reserving memory area. But, current implementation works like as > reserving memory approach, because allocation on cma reserved region only > occurs as fallback of migrate_movable allocation. We can allocate from it > when there is no movable page. In that situation, kswapd would be invoked > easily since unmovable and reclaimable allocation consider > (free pages - free CMA pages) as free memory on the system and free memory > may be lower than high watermark in that case. If kswapd start to reclaim > memory, then fallback allocation doesn't occur much. > > In my experiment, I found that if system memory has 1024 MB memory and > has 512 MB reserved memory for CMA, kswapd is mostly invoked around > the 512MB free memory boundary. And invoked kswapd tries to make free > memory until (free pages - free CMA pages) is higher than high watermark, > so free memory on meminfo is moving around 512MB boundary consistently. > > To fix this problem, we should allocate the pages on cma reserved memory > more aggressively and intelligenetly. Patch 2 implements the solution. > Patch 1 is the simple optimization which remove useless re-trial and patch 3 > is for removing useless alloc flag, so these are not important. > See patch 2 for more detailed description. > > This patchset is based on v3.15-rc4. Thanks for posting those patches. It basically reminds me the following discussion: http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 Your approach is basically the same. I hope that your patches can be improved in such a way that they will be accepted by mm maintainers. I only wonder if the third patch is really necessary. Without it kswapd wakeup might be still avoided in some cases. > Thanks. > Joonsoo Kim (3): > CMA: remove redundant retrying code in __alloc_contig_migrate_range > CMA: aggressively allocate the pages on cma reserved memory when not > used > CMA: always treat free cma pages as non-free on watermark checking > > include/linux/mmzone.h | 6 +++ > mm/compaction.c | 4 -- > mm/internal.h | 3 +- > mm/page_alloc.c | 117 +++++++++++++++++++++++++++++++++++++++--------- > 4 files changed, 102 insertions(+), 28 deletions(-) > Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by kanga.kvack.org (Postfix) with ESMTP id 489926B0035 for ; Fri, 9 May 2014 11:44:16 -0400 (EDT) Received: by mail-pa0-f53.google.com with SMTP id kp14so4497434pab.26 for ; Fri, 09 May 2014 08:44:16 -0700 (PDT) Received: from mail-pa0-x22c.google.com (mail-pa0-x22c.google.com [2607:f8b0:400e:c03::22c]) by mx.google.com with ESMTPS id gw6si2219969pac.208.2014.05.09.08.44.14 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 09 May 2014 08:44:15 -0700 (PDT) Received: by mail-pa0-f44.google.com with SMTP id ld10so4513913pab.31 for ; Fri, 09 May 2014 08:44:14 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range In-Reply-To: <1399509144-8898-2-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-2-git-send-email-iamjoonsoo.kim@lge.com> Date: Fri, 09 May 2014 08:44:06 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , linux-mm@kvack.org, linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, May 07 2014, Joonsoo Kim wrote: > We already have retry logic in migrate_pages(). It does retry 10 times. > So if we keep this retrying code in __alloc_contig_migrate_range(), we > would try to migrate some unmigratable page in 50 times. There is just one > small difference in -ENOMEM case. migrate_pages() don't do retry > in this case, however, current __alloc_contig_migrate_range() does. But, > I think that this isn't problem, because in this case, we may fail again > with same reason. > > Signed-off-by: Joonsoo Kim I think there was a reason for the retries in __alloc_contig_migrate_range but perhaps those are no longer valid. Acked-by: Michal Nazarewicz > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5dba293..674ade7 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct comp= act_control *cc, > /* This function is based on compact_zone() from compaction.c. */ > unsigned long nr_reclaimed; > unsigned long pfn =3D start; > - unsigned int tries =3D 0; > int ret =3D 0; >=20=20 > migrate_prep(); > @@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct com= pact_control *cc, > ret =3D -EINTR; > break; > } > - tries =3D 0; > - } else if (++tries =3D=3D 5) { > - ret =3D ret < 0 ? ret : -EBUSY; > - break; > } >=20=20 > nr_reclaimed =3D reclaim_clean_pages_from_list(cc->zone, > @@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct com= pact_control *cc, >=20=20 > ret =3D migrate_pages(&cc->migratepages, alloc_migrate_target, > 0, MIGRATE_SYNC, MR_CMA); > + if (ret) { > + ret =3D ret < 0 ? ret : -EBUSY; > + break; > + } > } > if (ret < 0) { > putback_movable_pages(&cc->migratepages); --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTbPfGAAoJECBgQBJQdR/0nJQP/0htX/xZxE1ePn5SBbCWts+2 Zx9ESFvp1aZEQKrPQ/u692Imj99GtbUntwK4NPH9ponpK9ErSFoONK8h1FP2hgLG KF8IhmKSy3D2J37r/kLLmmSHJqw52uqIu1UefbUv3fDgHvULd9kKz0eRPNn4dJTv 9+Vv7AbW69v39Owwp2R84y7t5SrPGN/SlqABzii296zmGkXQrWkDwRFk17FJ/KqA RkmMSzkR+hMmAfefd2WcFeUASJDqTDMTxBKiUmEs9/WKSbkTRVa+Z+MRvpnKBTDs Ra6Ya13fbFDKAVXivZiU+fIJkxnCQmPUfbjoZQn6T9FwkC89aVZKnLyPldKcPKg5 BjtoX7/HWrK3ERrV+n3CjqwITZZ4kMWbY8O81PgmM0HFZKdunEdqZCj07O0og7G3 xW/zGGlpXRBeDQa6xAm08ZInl3PTt5yq89Sl6vrNmOsubjrNiP4HNfR5dSHk08Ly 69Cs3SpCrNp64IzISO8QjabCw7oGzZoMrl6bnWaHSmNllOZwkAPTGQjq4kS8kcH5 KAqtF0tgQZcqLRh8dnQI7/WS6r5ClcHnuKQpN+4XXXo6B00Dc0B7ypBMRlPYgKJ8 FoxIMP6HyFTxxEtrndpfC4q6jcleoBRSWXkOYFArFu6az9egW3wlIYCYUpGprDZh Gt6W2uDFcn0rHqEJoXGl =+M2V -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vc0-f169.google.com (mail-vc0-f169.google.com [209.85.220.169]) by kanga.kvack.org (Postfix) with ESMTP id A1C9A6B0036 for ; Fri, 9 May 2014 11:45:40 -0400 (EDT) Received: by mail-vc0-f169.google.com with SMTP id ij19so5594214vcb.28 for ; Fri, 09 May 2014 08:45:40 -0700 (PDT) Received: from mail-pd0-x22a.google.com (mail-pd0-x22a.google.com [2607:f8b0:400e:c02::22a]) by mx.google.com with ESMTPS id py5si2434987pbc.443.2014.05.09.08.45.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 09 May 2014 08:45:40 -0700 (PDT) Received: by mail-pd0-f170.google.com with SMTP id v10so3871453pde.29 for ; Fri, 09 May 2014 08:45:39 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> Date: Fri, 09 May 2014 08:45:32 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , linux-mm@kvack.org, linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, May 07 2014, Joonsoo Kim wrote: > Signed-off-by: Joonsoo Kim The code looks good to me, but I don't feel competent on whether the approach is beneficial or not. Still: Acked-by: Michal Nazarewicz --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIbBAEBAgAGBQJTbPgcAAoJECBgQBJQdR/0V5EP9iwYwqxiwv9ZPYqFtCjjAsHH tJomWa/nlEKJ+eVoTwFT2FjpORmux2MHNDCWL+ncpV4Gh3SODbstkiRhJNksiOsz CKrc/amtefoiCkZOLf478Mn845t4a9TitUN3fAPqG4/iPulf1alelymFqaSiTU+I wV5JaQK5KWUnUADR/5UzMCEG1pgyu9SbIHYM2pKljbtFDNrrcE+h10UFepUgiNda onZvB002cdV4KR3ZA1Dw7UcMarL/gSL1GbWiqHuQz0Za2yoPZNtWJtuBoYBfNjfq Nlq0aIrKmx0viXfC4XkdRIJ0lJkEaWz560exmeEXWrO3egd3TtbYjPdZ5nheDUBZ 21ZkTTSYggR33oIasTGiAGFrJNDdX2TebAvulC1vIYZ+7wP53iwHNBQqU6UkpPw+ 0PrLQa1a7THDpoalRkfBCC+HBHBwJvsSGHYlgSvUA/b0EdzuI9CN29Ht+lC/kDqg vCJiO0yykygOaj/JATdP/kNnmF7KhRAJhUc2HQgrGCQ6wpyQ5Tlk8vtL9OUdaH7G W8VnqdRTU39S3j/1YXpJCjOxNr7m5mC6hl9pSkBaWzQ0x/bBi21jWiHdOPNWQnxK Qb+DpilW5ZoSmULo5dwyXIbjVxdoUKJuF9JotBoSP6tDppvXv2LD0a7PoN4oYN6l FEtcIJ2A1XPixQOfVFk= =4Tff -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f174.google.com (mail-pd0-f174.google.com [209.85.192.174]) by kanga.kvack.org (Postfix) with ESMTP id E15F16B0035 for ; Fri, 9 May 2014 11:46:57 -0400 (EDT) Received: by mail-pd0-f174.google.com with SMTP id w10so3792685pde.19 for ; Fri, 09 May 2014 08:46:57 -0700 (PDT) Received: from mail-pd0-x22e.google.com (mail-pd0-x22e.google.com [2607:f8b0:400e:c02::22e]) by mx.google.com with ESMTPS id se7si2470283pbb.10.2014.05.09.08.46.56 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 09 May 2014 08:46:57 -0700 (PDT) Received: by mail-pd0-f174.google.com with SMTP id w10so3792673pde.19 for ; Fri, 09 May 2014 08:46:56 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking In-Reply-To: <1399509144-8898-4-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-4-git-send-email-iamjoonsoo.kim@lge.com> Date: Fri, 09 May 2014 08:46:50 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , linux-mm@kvack.org, linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, May 07 2014, Joonsoo Kim wrote: > commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag > for alloc flag and treats free cma pages as free pages if this flag is > passed to watermark checking. Intention of that patch is that movable page > allocation can be be handled from cma reserved region without starting > kswapd. Now, previous patch changes the behaviour of allocator that > movable allocation uses the page on cma reserved region aggressively, > so this watermark hack isn't needed anymore. Therefore remove it. > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTbPhqAAoJECBgQBJQdR/0W00P/2lA4due77ZgrKd3+b1G+hZW 54DbdgQdTTJZQJZGeCtCKx/v9O4nY6suKmeGQXpleMog4BGEUa/+UMVM8ZGZSwYv DZjqTM4l/lwuK4fU0jEdSKwBmpYL9PnvtLhduY6iEuqW4zxqqZFo3Hkp5fdi++eh XSUl2TTD/p97HqIJrRCjNsBwk67iQ06uH1Xn3BPdPFem4sXiyyuUbWwv2+kwcfJk OICFmLXgMw4SDybGcADT7KTHp94BpDmqIOK4fu+hOGoGYzEQ0ECPZDnVgILRAbc/ mzecpMZWKYdsr/QXboAO7BU9V23x1DedJsJs87/Vq6MjB0PRUIAhUA4q52aI4Q9p i03xO9ulah32J38Xium37xXmTj1unKd2V92q+nyJWd8tMTyAwiTwFZycU7WoeT+7 oSUzVXfqW/Lq9idLFyALyRjs7iq0ofaeW1xaQs+qeVNK/Pq6X0NtEsB8n2AEjZuh Upy2h873IHhpT/YM4ZmxkL0VihqZOd6ofojgGXAj3Z+M9z8iQMEYeV9SwxM0URy0 d3IFE1fR0zWWZGJeWikKuv+iQk1lqIpD7fyEcqHJER2F8SBirtKFtIxbNec1tXEU vbPOogilTy8lzdRq9dlft/iF93ogOcGAzGSrgtJshYlMdsH7yWWN0d/KCYjjuF7T Q0ACZTmO0v/QIsNfEU4y =BUaR -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by kanga.kvack.org (Postfix) with ESMTP id 875756B0035 for ; Mon, 12 May 2014 13:04:14 -0400 (EDT) Received: by mail-pa0-f49.google.com with SMTP id lj1so8941977pab.36 for ; Mon, 12 May 2014 10:04:14 -0700 (PDT) Received: from smtp.codeaurora.org (smtp.codeaurora.org. [198.145.11.231]) by mx.google.com with ESMTPS id wt1si6654310pbc.32.2014.05.12.10.04.12 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 May 2014 10:04:13 -0700 (PDT) Message-ID: <5370FF1D.10707@codeaurora.org> Date: Mon, 12 May 2014 10:04:29 -0700 From: Laura Abbott MIME-Version: 1.0 Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Hi, On 5/7/2014 5:32 PM, Joonsoo Kim wrote: > CMA is introduced to provide physically contiguous pages at runtime. > For this purpose, it reserves memory at boot time. Although it reserve > memory, this reserved memory can be used for movable memory allocation > request. This usecase is beneficial to the system that needs this CMA > reserved memory infrequently and it is one of main purpose of > introducing CMA. > > But, there is a problem in current implementation. The problem is that > it works like as just reserved memory approach. The pages on cma reserved > memory are hardly used for movable memory allocation. This is caused by > combination of allocation and reclaim policy. > > The pages on cma reserved memory are allocated if there is no movable > memory, that is, as fallback allocation. So the time this fallback > allocation is started is under heavy memory pressure. Although it is under > memory pressure, movable allocation easily succeed, since there would be > many pages on cma reserved memory. But this is not the case for unmovable > and reclaimable allocation, because they can't use the pages on cma > reserved memory. These allocations regard system's free memory as > (free pages - free cma pages) on watermark checking, that is, free > unmovable pages + free reclaimable pages + free movable pages. Because > we already exhausted movable pages, only free pages we have are unmovable > and reclaimable types and this would be really small amount. So watermark > checking would be failed. It will wake up kswapd to make enough free > memory for unmovable and reclaimable allocation and kswapd will do. > So before we fully utilize pages on cma reserved memory, kswapd start to > reclaim memory and try to make free memory over the high watermark. This > watermark checking by kswapd doesn't take care free cma pages so many > movable pages would be reclaimed. After then, we have a lot of movable > pages again, so fallback allocation doesn't happen again. To conclude, > amount of free memory on meminfo which includes free CMA pages is moving > around 512 MB if I reserve 512 MB memory for CMA. > > I found this problem on following experiment. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > > To solve this problem, I can think following 2 possible solutions. > 1. allocate the pages on cma reserved memory first, and if they are > exhausted, allocate movable pages. > 2. interleaved allocation: try to allocate specific amounts of memory > from cma reserved memory and then allocate from free movable memory. > > I tested #1 approach and found the problem. Although free memory on > meminfo can move around low watermark, there is large fluctuation on free > memory, because too many pages are reclaimed when kswapd is invoked. > Reason for this behaviour is that successive allocated CMA pages are > on the LRU list in that order and kswapd reclaim them in same order. > These memory doesn't help watermark checking from kwapd, so too many > pages are reclaimed, I guess. > We have an out of tree implementation of #1 and so far it's worked for us although we weren't looking at the same metrics. I don't completely understand the issue you pointed out with #1. It sounds like the issue is that CMA pages are already in use by other processes and on LRU lists and because the pages are on LRU lists these aren't counted towards the watermark by kswapd. Is my understanding correct? > So, I implement #2 approach. > One thing I should note is that we should not change allocation target > (movable list or cma) on each allocation attempt, since this prevent > allocated pages to be in physically succession, so some I/O devices can > be hurt their performance. To solve this, I keep allocation target > in at least pageblock_nr_pages attempts and make this number reflect > ratio, free pages without free cma pages to free cma pages. With this > approach, system works very smoothly and fully utilize the pages on > cma reserved memory. > > Following is the experimental result of this patch. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > pswpin: 7 110064 > pswpout: 452 767502 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.2 235.6 > Average-MemFree: 281651 KB 290227 KB > pswpin: 8 8 > pswpout: 430 510 > > There is no difference if we don't have cma reserved memory (0 MB case). > But, with cma reserved memory (512 MB case), we fully utilize these > reserved memory through this patch and the system behaves like as > it doesn't reserve any memory. What metric are you using to determine all CMA memory was fully used? We've been checking /proc/pagetypeinfo > > With this patch, we aggressively allocate the pages on cma reserved memory > so latency of CMA can arise. Below is the experimental result about > latency. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > CMA reserve: 512 MB > Backgound Workload: make -jN > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > N: 1 4 8 16 > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > So generally we can see latency increase. Ratio of this increase > is rather big - up to 70%. But, under the heavy workload, it shows > latency decrease - up to 55%. This may be worst-case scenario, but > reducing it would be important for some system, so, I can say that > this patch have advantages and disadvantages in terms of latency. > Do you have any statistics related to failed migration from this? Latency and utilization are issues but so is migration success. In the past we've found that an increase in CMA utilization was related to increase in CMA migration failures because pages were unmigratable. The current workaround for this is limiting CMA pages to be used for user processes only and not the file cache. Both of these have their own problems. > Signed-off-by: Joonsoo Kim > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fac5509..3ff24d4 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -389,6 +389,12 @@ struct zone { > int compact_order_failed; > #endif > > +#ifdef CONFIG_CMA > + int has_cma; > + int nr_try_cma; > + int nr_try_movable; > +#endif > + > ZONE_PADDING(_pad1_) > > /* Fields commonly accessed by the page reclaim scanner */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 674ade7..6f2b27b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > } > > #ifdef CONFIG_CMA > +void __init init_alloc_ratio_counter(struct zone *zone) > +{ > + if (zone->has_cma) > + return; > + > + zone->has_cma = 1; > + zone->nr_try_movable = 0; > + zone->nr_try_cma = 0; > +} > + > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > void __init init_cma_reserved_pageblock(struct page *page) > { > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > set_pageblock_migratetype(page, MIGRATE_CMA); > __free_pages(page, pageblock_order); > adjust_managed_page_count(page, pageblock_nr_pages); > + init_alloc_ratio_counter(page_zone(page)); > } > #endif > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > return NULL; > } > > +#ifdef CONFIG_CMA > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > + int migratetype) > +{ > + long free, free_cma, free_wmark; > + struct page *page; > + > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > + return NULL; > + > + if (zone->nr_try_movable) > + goto alloc_movable; > + > +alloc_cma: > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > + > + /* Reset ratio counter */ > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + /* No cma free pages, so recharge only movable allocation */ > + if (free_cma <= 0) { > + zone->nr_try_movable = pageblock_nr_pages; > + goto alloc_movable; > + } > + > + free = zone_page_state(zone, NR_FREE_PAGES); > + free_wmark = free - free_cma - high_wmark_pages(zone); > + > + /* > + * free_wmark is below than 0, and it means that normal pages > + * are under the pressure, so we recharge only cma allocation. > + */ > + if (free_wmark <= 0) { > + zone->nr_try_cma = pageblock_nr_pages; > + goto alloc_cma; > + } > + > + if (free_wmark > free_cma) { > + zone->nr_try_movable = > + (free_wmark * pageblock_nr_pages) / free_cma; > + zone->nr_try_cma = pageblock_nr_pages; > + } else { > + zone->nr_try_movable = pageblock_nr_pages; > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > + } > + > + /* Reset complete, start on movable first */ > +alloc_movable: > + zone->nr_try_movable--; > + return NULL; > +} > +#endif > + > /* > * Do the hard work of removing an element from the buddy allocator. > * Call me with the zone->lock already held. > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > static struct page *__rmqueue(struct zone *zone, unsigned int order, > int migratetype) > { > - struct page *page; > + struct page *page = NULL; > + > + if (IS_ENABLED(CONFIG_CMA)) > + page = __rmqueue_cma(zone, order, migratetype); > > retry_reserve: > - page = __rmqueue_smallest(zone, order, migratetype); > + if (!page) > + page = __rmqueue_smallest(zone, order, migratetype); > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > page = __rmqueue_fallback(zone, order, migratetype); > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > zone_seqlock_init(zone); > zone->zone_pgdat = pgdat; > zone_pcp_init(zone); > + if (IS_ENABLED(CONFIG_CMA)) > + zone->has_cma = 0; > > /* For bootup, initialized properly in watermark setup */ > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > I'm going to see about running this through tests internally for comparison. Hopefully I'll get useful results in a day or so. Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by kanga.kvack.org (Postfix) with ESMTP id 227B56B0037 for ; Mon, 12 May 2014 21:12:23 -0400 (EDT) Received: by mail-pa0-f48.google.com with SMTP id rd3so9531378pab.7 for ; Mon, 12 May 2014 18:12:22 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id ud10si7133766pbc.116.2014.05.12.18.12.20 for ; Mon, 12 May 2014 18:12:22 -0700 (PDT) Date: Tue, 13 May 2014 10:14:27 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140513011426.GB23803@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5370FF1D.10707@codeaurora.org> Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Mon, May 12, 2014 at 10:04:29AM -0700, Laura Abbott wrote: > Hi, > > On 5/7/2014 5:32 PM, Joonsoo Kim wrote: > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > > > I tested #1 approach and found the problem. Although free memory on > > meminfo can move around low watermark, there is large fluctuation on free > > memory, because too many pages are reclaimed when kswapd is invoked. > > Reason for this behaviour is that successive allocated CMA pages are > > on the LRU list in that order and kswapd reclaim them in same order. > > These memory doesn't help watermark checking from kwapd, so too many > > pages are reclaimed, I guess. > > > > We have an out of tree implementation of #1 and so far it's worked for us > although we weren't looking at the same metrics. I don't completely > understand the issue you pointed out with #1. It sounds like the issue is > that CMA pages are already in use by other processes and on LRU lists and > because the pages are on LRU lists these aren't counted towards the > watermark by kswapd. Is my understanding correct? Hello, Yes, your understanding is correct. kswapd want to reclaim normal (not CMA) pages, but LRU lists could have a lot of CMA pages continuously by #1 approach, so watermark aren't restored easily. > > > So, I implement #2 approach. > > One thing I should note is that we should not change allocation target > > (movable list or cma) on each allocation attempt, since this prevent > > allocated pages to be in physically succession, so some I/O devices can > > be hurt their performance. To solve this, I keep allocation target > > in at least pageblock_nr_pages attempts and make this number reflect > > ratio, free pages without free cma pages to free cma pages. With this > > approach, system works very smoothly and fully utilize the pages on > > cma reserved memory. > > > > Following is the experimental result of this patch. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > pswpin: 7 110064 > > pswpout: 452 767502 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.2 235.6 > > Average-MemFree: 281651 KB 290227 KB > > pswpin: 8 8 > > pswpout: 430 510 > > > > There is no difference if we don't have cma reserved memory (0 MB case). > > But, with cma reserved memory (512 MB case), we fully utilize these > > reserved memory through this patch and the system behaves like as > > it doesn't reserve any memory. > > What metric are you using to determine all CMA memory was fully used? > We've been checking /proc/pagetypeinfo In this result, we can check whether CMA memory was used more or not by MemFree stat. I used /proc/zoneinfo to get an insight. > > > > With this patch, we aggressively allocate the pages on cma reserved memory > > so latency of CMA can arise. Below is the experimental result about > > latency. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > CMA reserve: 512 MB > > Backgound Workload: make -jN > > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > > > N: 1 4 8 16 > > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > > > So generally we can see latency increase. Ratio of this increase > > is rather big - up to 70%. But, under the heavy workload, it shows > > latency decrease - up to 55%. This may be worst-case scenario, but > > reducing it would be important for some system, so, I can say that > > this patch have advantages and disadvantages in terms of latency. > > > > Do you have any statistics related to failed migration from this? Latency > and utilization are issues but so is migration success. In the past we've > found that an increase in CMA utilization was related to increase in CMA > migration failures because pages were unmigratable. The current > workaround for this is limiting CMA pages to be used for user processes > only and not the file cache. Both of these have their own problems. I have the retrying number when doing 8 MB CMA allocation 20 times. These number are average of 5 runs. N: 1 4 8 16 Retrying(Before): 0 0 0.6 12.2 Retrying(After): 1.4 1.8 3 3.6 If you know any permanent failure case with file cache pages, please let me know. What I already know CMA migration failure about file cache pages is the problems related to buffer_head lru, which you mentioned before. > > Signed-off-by: Joonsoo Kim > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index fac5509..3ff24d4 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -389,6 +389,12 @@ struct zone { > > int compact_order_failed; > > #endif > > > > +#ifdef CONFIG_CMA > > + int has_cma; > > + int nr_try_cma; > > + int nr_try_movable; > > +#endif > > + > > ZONE_PADDING(_pad1_) > > > > /* Fields commonly accessed by the page reclaim scanner */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 674ade7..6f2b27b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > > } > > > > #ifdef CONFIG_CMA > > +void __init init_alloc_ratio_counter(struct zone *zone) > > +{ > > + if (zone->has_cma) > > + return; > > + > > + zone->has_cma = 1; > > + zone->nr_try_movable = 0; > > + zone->nr_try_cma = 0; > > +} > > + > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > > void __init init_cma_reserved_pageblock(struct page *page) > > { > > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > > set_pageblock_migratetype(page, MIGRATE_CMA); > > __free_pages(page, pageblock_order); > > adjust_managed_page_count(page, pageblock_nr_pages); > > + init_alloc_ratio_counter(page_zone(page)); > > } > > #endif > > > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > return NULL; > > } > > > > +#ifdef CONFIG_CMA > > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > > + int migratetype) > > +{ > > + long free, free_cma, free_wmark; > > + struct page *page; > > + > > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > > + return NULL; > > + > > + if (zone->nr_try_movable) > > + goto alloc_movable; > > + > > +alloc_cma: > > + if (zone->nr_try_cma) { > > + /* Okay. Now, we can try to allocate the page from cma region */ > > + zone->nr_try_cma--; > > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > > + > > + /* CMA pages can vanish through CMA allocation */ > > + if (unlikely(!page && order == 0)) > > + zone->nr_try_cma = 0; > > + > > + return page; > > + } > > + > > + /* Reset ratio counter */ > > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > > + > > + /* No cma free pages, so recharge only movable allocation */ > > + if (free_cma <= 0) { > > + zone->nr_try_movable = pageblock_nr_pages; > > + goto alloc_movable; > > + } > > + > > + free = zone_page_state(zone, NR_FREE_PAGES); > > + free_wmark = free - free_cma - high_wmark_pages(zone); > > + > > + /* > > + * free_wmark is below than 0, and it means that normal pages > > + * are under the pressure, so we recharge only cma allocation. > > + */ > > + if (free_wmark <= 0) { > > + zone->nr_try_cma = pageblock_nr_pages; > > + goto alloc_cma; > > + } > > + > > + if (free_wmark > free_cma) { > > + zone->nr_try_movable = > > + (free_wmark * pageblock_nr_pages) / free_cma; > > + zone->nr_try_cma = pageblock_nr_pages; > > + } else { > > + zone->nr_try_movable = pageblock_nr_pages; > > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > > + } > > + > > + /* Reset complete, start on movable first */ > > +alloc_movable: > > + zone->nr_try_movable--; > > + return NULL; > > +} > > +#endif > > + > > /* > > * Do the hard work of removing an element from the buddy allocator. > > * Call me with the zone->lock already held. > > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > static struct page *__rmqueue(struct zone *zone, unsigned int order, > > int migratetype) > > { > > - struct page *page; > > + struct page *page = NULL; > > + > > + if (IS_ENABLED(CONFIG_CMA)) > > + page = __rmqueue_cma(zone, order, migratetype); > > > > retry_reserve: > > - page = __rmqueue_smallest(zone, order, migratetype); > > + if (!page) > > + page = __rmqueue_smallest(zone, order, migratetype); > > > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > > page = __rmqueue_fallback(zone, order, migratetype); > > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > > zone_seqlock_init(zone); > > zone->zone_pgdat = pgdat; > > zone_pcp_init(zone); > > + if (IS_ENABLED(CONFIG_CMA)) > > + zone->has_cma = 0; > > > > /* For bootup, initialized properly in watermark setup */ > > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > > > > I'm going to see about running this through tests internally for comparison. > Hopefully I'll get useful results in a day or so. Okay. I really hope to see your result. :) Thanks for your interest. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f51.google.com (mail-pb0-f51.google.com [209.85.160.51]) by kanga.kvack.org (Postfix) with ESMTP id 3D6FD6B0072 for ; Mon, 12 May 2014 22:23:59 -0400 (EDT) Received: by mail-pb0-f51.google.com with SMTP id ma3so377538pbc.24 for ; Mon, 12 May 2014 19:23:58 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id hb8si7210216pbc.239.2014.05.12.19.23.57 for ; Mon, 12 May 2014 19:23:58 -0700 (PDT) Date: Tue, 13 May 2014 11:26:03 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140513022603.GF23803@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <536CCC78.6050806@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Marek Szyprowski Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , 'Tomasz Stanislawski' On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote: > Hello, > > On 2014-05-08 02:32, Joonsoo Kim wrote: > >This series tries to improve CMA. > > > >CMA is introduced to provide physically contiguous pages at runtime > >without reserving memory area. But, current implementation works like as > >reserving memory approach, because allocation on cma reserved region only > >occurs as fallback of migrate_movable allocation. We can allocate from it > >when there is no movable page. In that situation, kswapd would be invoked > >easily since unmovable and reclaimable allocation consider > >(free pages - free CMA pages) as free memory on the system and free memory > >may be lower than high watermark in that case. If kswapd start to reclaim > >memory, then fallback allocation doesn't occur much. > > > >In my experiment, I found that if system memory has 1024 MB memory and > >has 512 MB reserved memory for CMA, kswapd is mostly invoked around > >the 512MB free memory boundary. And invoked kswapd tries to make free > >memory until (free pages - free CMA pages) is higher than high watermark, > >so free memory on meminfo is moving around 512MB boundary consistently. > > > >To fix this problem, we should allocate the pages on cma reserved memory > >more aggressively and intelligenetly. Patch 2 implements the solution. > >Patch 1 is the simple optimization which remove useless re-trial and patch 3 > >is for removing useless alloc flag, so these are not important. > >See patch 2 for more detailed description. > > > >This patchset is based on v3.15-rc4. > > Thanks for posting those patches. It basically reminds me the > following discussion: > http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 > > Your approach is basically the same. I hope that your patches can be > improved > in such a way that they will be accepted by mm maintainers. I only > wonder if the > third patch is really necessary. Without it kswapd wakeup might be > still avoided > in some cases. Hello, Oh... I didn't know that patch and discussion, because I have no interest on CMA at that time. Your approach looks similar to #1 approach of mine and could have same problem of #1 approach which I mentioned in patch 2/3. Please refer that patch description. :) And, there is different purpose between this and yours. This patch is intended to better use of CMA pages and so get maximum performance. Just to not trigger oom, it can be possible to put this logic on reclaim path. But that is sub-optimal to get higher performance, because it needs migration in some cases. If second patch works as intended, there are just a few of cma free pages when we are toward on the watermark. So benefit of third patch would be marginal and we can remove ALLOC_CMA. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) by kanga.kvack.org (Postfix) with ESMTP id 53BF66B0075 for ; Mon, 12 May 2014 22:58:39 -0400 (EDT) Received: by mail-pa0-f52.google.com with SMTP id fa1so5480160pad.39 for ; Mon, 12 May 2014 19:58:39 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id rw8si7252743pbc.18.2014.05.12.19.58.35 for ; Mon, 12 May 2014 19:58:37 -0700 (PDT) Date: Tue, 13 May 2014 12:00:57 +0900 From: Minchan Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140513030057.GC32092@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Hey Joonsoo, On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > CMA is introduced to provide physically contiguous pages at runtime. > For this purpose, it reserves memory at boot time. Although it reserve > memory, this reserved memory can be used for movable memory allocation > request. This usecase is beneficial to the system that needs this CMA > reserved memory infrequently and it is one of main purpose of > introducing CMA. > > But, there is a problem in current implementation. The problem is that > it works like as just reserved memory approach. The pages on cma reserved > memory are hardly used for movable memory allocation. This is caused by > combination of allocation and reclaim policy. > > The pages on cma reserved memory are allocated if there is no movable > memory, that is, as fallback allocation. So the time this fallback > allocation is started is under heavy memory pressure. Although it is under > memory pressure, movable allocation easily succeed, since there would be > many pages on cma reserved memory. But this is not the case for unmovable > and reclaimable allocation, because they can't use the pages on cma > reserved memory. These allocations regard system's free memory as > (free pages - free cma pages) on watermark checking, that is, free > unmovable pages + free reclaimable pages + free movable pages. Because > we already exhausted movable pages, only free pages we have are unmovable > and reclaimable types and this would be really small amount. So watermark > checking would be failed. It will wake up kswapd to make enough free > memory for unmovable and reclaimable allocation and kswapd will do. > So before we fully utilize pages on cma reserved memory, kswapd start to > reclaim memory and try to make free memory over the high watermark. This > watermark checking by kswapd doesn't take care free cma pages so many > movable pages would be reclaimed. After then, we have a lot of movable > pages again, so fallback allocation doesn't happen again. To conclude, > amount of free memory on meminfo which includes free CMA pages is moving > around 512 MB if I reserve 512 MB memory for CMA. > > I found this problem on following experiment. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > > To solve this problem, I can think following 2 possible solutions. > 1. allocate the pages on cma reserved memory first, and if they are > exhausted, allocate movable pages. > 2. interleaved allocation: try to allocate specific amounts of memory > from cma reserved memory and then allocate from free movable memory. I love this idea but when I see the code, I don't like that. In allocation path, just try to allocate pages by round-robin so it's role of allocator. If one of migratetype is full, just pass mission to reclaimer with hint(ie, Hey reclaimer, it's non-movable allocation fail so there is pointless if you reclaim MIGRATE_CMA pages) so that reclaimer can filter it out during page scanning. We already have an tool to achieve it(ie, isolate_mode_t). And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? If possible, it would be better becauser it's generic function to check free pages and cause trigger reclaim/compaction logic. > > I tested #1 approach and found the problem. Although free memory on > meminfo can move around low watermark, there is large fluctuation on free > memory, because too many pages are reclaimed when kswapd is invoked. > Reason for this behaviour is that successive allocated CMA pages are > on the LRU list in that order and kswapd reclaim them in same order. > These memory doesn't help watermark checking from kwapd, so too many > pages are reclaimed, I guess. > > So, I implement #2 approach. > One thing I should note is that we should not change allocation target > (movable list or cma) on each allocation attempt, since this prevent > allocated pages to be in physically succession, so some I/O devices can > be hurt their performance. To solve this, I keep allocation target > in at least pageblock_nr_pages attempts and make this number reflect > ratio, free pages without free cma pages to free cma pages. With this > approach, system works very smoothly and fully utilize the pages on > cma reserved memory. > > Following is the experimental result of this patch. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > pswpin: 7 110064 > pswpout: 452 767502 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.2 235.6 > Average-MemFree: 281651 KB 290227 KB > pswpin: 8 8 > pswpout: 430 510 > > There is no difference if we don't have cma reserved memory (0 MB case). > But, with cma reserved memory (512 MB case), we fully utilize these > reserved memory through this patch and the system behaves like as > it doesn't reserve any memory. > > With this patch, we aggressively allocate the pages on cma reserved memory > so latency of CMA can arise. Below is the experimental result about > latency. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > CMA reserve: 512 MB > Backgound Workload: make -jN > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > N: 1 4 8 16 > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > So generally we can see latency increase. Ratio of this increase > is rather big - up to 70%. But, under the heavy workload, it shows > latency decrease - up to 55%. This may be worst-case scenario, but > reducing it would be important for some system, so, I can say that > this patch have advantages and disadvantages in terms of latency. > > Signed-off-by: Joonsoo Kim > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fac5509..3ff24d4 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -389,6 +389,12 @@ struct zone { > int compact_order_failed; > #endif > > +#ifdef CONFIG_CMA > + int has_cma; > + int nr_try_cma; > + int nr_try_movable; > +#endif > + > ZONE_PADDING(_pad1_) > > /* Fields commonly accessed by the page reclaim scanner */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 674ade7..6f2b27b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > } > > #ifdef CONFIG_CMA > +void __init init_alloc_ratio_counter(struct zone *zone) > +{ > + if (zone->has_cma) > + return; > + > + zone->has_cma = 1; > + zone->nr_try_movable = 0; > + zone->nr_try_cma = 0; > +} > + > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > void __init init_cma_reserved_pageblock(struct page *page) > { > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > set_pageblock_migratetype(page, MIGRATE_CMA); > __free_pages(page, pageblock_order); > adjust_managed_page_count(page, pageblock_nr_pages); > + init_alloc_ratio_counter(page_zone(page)); > } > #endif > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > return NULL; > } > > +#ifdef CONFIG_CMA > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > + int migratetype) > +{ > + long free, free_cma, free_wmark; > + struct page *page; > + > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > + return NULL; > + > + if (zone->nr_try_movable) > + goto alloc_movable; > + > +alloc_cma: > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > + > + /* Reset ratio counter */ > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + /* No cma free pages, so recharge only movable allocation */ > + if (free_cma <= 0) { > + zone->nr_try_movable = pageblock_nr_pages; > + goto alloc_movable; > + } > + > + free = zone_page_state(zone, NR_FREE_PAGES); > + free_wmark = free - free_cma - high_wmark_pages(zone); > + > + /* > + * free_wmark is below than 0, and it means that normal pages > + * are under the pressure, so we recharge only cma allocation. > + */ > + if (free_wmark <= 0) { > + zone->nr_try_cma = pageblock_nr_pages; > + goto alloc_cma; > + } > + > + if (free_wmark > free_cma) { > + zone->nr_try_movable = > + (free_wmark * pageblock_nr_pages) / free_cma; > + zone->nr_try_cma = pageblock_nr_pages; > + } else { > + zone->nr_try_movable = pageblock_nr_pages; > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > + } > + > + /* Reset complete, start on movable first */ > +alloc_movable: > + zone->nr_try_movable--; > + return NULL; > +} > +#endif > + > /* > * Do the hard work of removing an element from the buddy allocator. > * Call me with the zone->lock already held. > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > static struct page *__rmqueue(struct zone *zone, unsigned int order, > int migratetype) > { > - struct page *page; > + struct page *page = NULL; > + > + if (IS_ENABLED(CONFIG_CMA)) > + page = __rmqueue_cma(zone, order, migratetype); > > retry_reserve: > - page = __rmqueue_smallest(zone, order, migratetype); > + if (!page) > + page = __rmqueue_smallest(zone, order, migratetype); > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > page = __rmqueue_fallback(zone, order, migratetype); > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > zone_seqlock_init(zone); > zone->zone_pgdat = pgdat; > zone_pcp_init(zone); > + if (IS_ENABLED(CONFIG_CMA)) > + zone->has_cma = 0; > > /* For bootup, initialized properly in watermark setup */ > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > -- > 1.7.9.5 > > _______________________________________________ > OTC mailing list > OTC@blackduck.lge.com > http://blackduck.lge.com/mailman/listinfo/otc -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by kanga.kvack.org (Postfix) with ESMTP id 0B4436B0075 for ; Mon, 12 May 2014 23:03:03 -0400 (EDT) Received: by mail-pa0-f54.google.com with SMTP id bj1so8306606pad.41 for ; Mon, 12 May 2014 20:03:03 -0700 (PDT) Received: from lgeamrelo04.lge.com (lgeamrelo04.lge.com. [156.147.1.127]) by mx.google.com with ESMTP id lw14si11805412pab.148.2014.05.12.20.03.01 for ; Mon, 12 May 2014 20:03:02 -0700 (PDT) Date: Tue, 13 May 2014 12:05:23 +0900 From: Minchan Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140513030523.GD32092@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5370FF1D.10707@codeaurora.org> Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott Cc: Joonsoo Kim , Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Mon, May 12, 2014 at 10:04:29AM -0700, Laura Abbott wrote: > Hi, > > On 5/7/2014 5:32 PM, Joonsoo Kim wrote: > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > > > I tested #1 approach and found the problem. Although free memory on > > meminfo can move around low watermark, there is large fluctuation on free > > memory, because too many pages are reclaimed when kswapd is invoked. > > Reason for this behaviour is that successive allocated CMA pages are > > on the LRU list in that order and kswapd reclaim them in same order. > > These memory doesn't help watermark checking from kwapd, so too many > > pages are reclaimed, I guess. > > > > We have an out of tree implementation of #1 and so far it's worked for us > although we weren't looking at the same metrics. I don't completely > understand the issue you pointed out with #1. It sounds like the issue is > that CMA pages are already in use by other processes and on LRU lists and > because the pages are on LRU lists these aren't counted towards the > watermark by kswapd. Is my understanding correct? Kswapd could reclaim MIGRATE_CMA pages unconditionally although allocator patch was failed by non-movable allocation. It's pointless and should fix. > > > So, I implement #2 approach. > > One thing I should note is that we should not change allocation target > > (movable list or cma) on each allocation attempt, since this prevent > > allocated pages to be in physically succession, so some I/O devices can > > be hurt their performance. To solve this, I keep allocation target > > in at least pageblock_nr_pages attempts and make this number reflect > > ratio, free pages without free cma pages to free cma pages. With this > > approach, system works very smoothly and fully utilize the pages on > > cma reserved memory. > > > > Following is the experimental result of this patch. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > pswpin: 7 110064 > > pswpout: 452 767502 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.2 235.6 > > Average-MemFree: 281651 KB 290227 KB > > pswpin: 8 8 > > pswpout: 430 510 > > > > There is no difference if we don't have cma reserved memory (0 MB case). > > But, with cma reserved memory (512 MB case), we fully utilize these > > reserved memory through this patch and the system behaves like as > > it doesn't reserve any memory. > > What metric are you using to determine all CMA memory was fully used? > We've been checking /proc/pagetypeinfo > > > > > With this patch, we aggressively allocate the pages on cma reserved memory > > so latency of CMA can arise. Below is the experimental result about > > latency. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > CMA reserve: 512 MB > > Backgound Workload: make -jN > > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > > > N: 1 4 8 16 > > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > > > So generally we can see latency increase. Ratio of this increase > > is rather big - up to 70%. But, under the heavy workload, it shows > > latency decrease - up to 55%. This may be worst-case scenario, but > > reducing it would be important for some system, so, I can say that > > this patch have advantages and disadvantages in terms of latency. > > > > Do you have any statistics related to failed migration from this? Latency > and utilization are issues but so is migration success. In the past we've > found that an increase in CMA utilization was related to increase in CMA > migration failures because pages were unmigratable. The current > workaround for this is limiting CMA pages to be used for user processes > only and not the file cache. Both of these have their own problems. If Joonsoo's patch makes fail ratio higher, it would be okay to me because we have more report from them and have a chance to fix it. It's better than hiding the problem of CMA with some hack. > > > Signed-off-by: Joonsoo Kim > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index fac5509..3ff24d4 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -389,6 +389,12 @@ struct zone { > > int compact_order_failed; > > #endif > > > > +#ifdef CONFIG_CMA > > + int has_cma; > > + int nr_try_cma; > > + int nr_try_movable; > > +#endif > > + > > ZONE_PADDING(_pad1_) > > > > /* Fields commonly accessed by the page reclaim scanner */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 674ade7..6f2b27b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > > } > > > > #ifdef CONFIG_CMA > > +void __init init_alloc_ratio_counter(struct zone *zone) > > +{ > > + if (zone->has_cma) > > + return; > > + > > + zone->has_cma = 1; > > + zone->nr_try_movable = 0; > > + zone->nr_try_cma = 0; > > +} > > + > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > > void __init init_cma_reserved_pageblock(struct page *page) > > { > > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > > set_pageblock_migratetype(page, MIGRATE_CMA); > > __free_pages(page, pageblock_order); > > adjust_managed_page_count(page, pageblock_nr_pages); > > + init_alloc_ratio_counter(page_zone(page)); > > } > > #endif > > > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > return NULL; > > } > > > > +#ifdef CONFIG_CMA > > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > > + int migratetype) > > +{ > > + long free, free_cma, free_wmark; > > + struct page *page; > > + > > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > > + return NULL; > > + > > + if (zone->nr_try_movable) > > + goto alloc_movable; > > + > > +alloc_cma: > > + if (zone->nr_try_cma) { > > + /* Okay. Now, we can try to allocate the page from cma region */ > > + zone->nr_try_cma--; > > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > > + > > + /* CMA pages can vanish through CMA allocation */ > > + if (unlikely(!page && order == 0)) > > + zone->nr_try_cma = 0; > > + > > + return page; > > + } > > + > > + /* Reset ratio counter */ > > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > > + > > + /* No cma free pages, so recharge only movable allocation */ > > + if (free_cma <= 0) { > > + zone->nr_try_movable = pageblock_nr_pages; > > + goto alloc_movable; > > + } > > + > > + free = zone_page_state(zone, NR_FREE_PAGES); > > + free_wmark = free - free_cma - high_wmark_pages(zone); > > + > > + /* > > + * free_wmark is below than 0, and it means that normal pages > > + * are under the pressure, so we recharge only cma allocation. > > + */ > > + if (free_wmark <= 0) { > > + zone->nr_try_cma = pageblock_nr_pages; > > + goto alloc_cma; > > + } > > + > > + if (free_wmark > free_cma) { > > + zone->nr_try_movable = > > + (free_wmark * pageblock_nr_pages) / free_cma; > > + zone->nr_try_cma = pageblock_nr_pages; > > + } else { > > + zone->nr_try_movable = pageblock_nr_pages; > > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > > + } > > + > > + /* Reset complete, start on movable first */ > > +alloc_movable: > > + zone->nr_try_movable--; > > + return NULL; > > +} > > +#endif > > + > > /* > > * Do the hard work of removing an element from the buddy allocator. > > * Call me with the zone->lock already held. > > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > static struct page *__rmqueue(struct zone *zone, unsigned int order, > > int migratetype) > > { > > - struct page *page; > > + struct page *page = NULL; > > + > > + if (IS_ENABLED(CONFIG_CMA)) > > + page = __rmqueue_cma(zone, order, migratetype); > > > > retry_reserve: > > - page = __rmqueue_smallest(zone, order, migratetype); > > + if (!page) > > + page = __rmqueue_smallest(zone, order, migratetype); > > > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > > page = __rmqueue_fallback(zone, order, migratetype); > > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > > zone_seqlock_init(zone); > > zone->zone_pgdat = pgdat; > > zone_pcp_init(zone); > > + if (IS_ENABLED(CONFIG_CMA)) > > + zone->has_cma = 0; > > > > /* For bootup, initialized properly in watermark setup */ > > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > > > > I'm going to see about running this through tests internally for comparison. > Hopefully I'll get useful results in a day or so. > > Thanks, > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by kanga.kvack.org (Postfix) with ESMTP id C5F8A6B0036 for ; Wed, 14 May 2014 04:42:36 -0400 (EDT) Received: by mail-pa0-f48.google.com with SMTP id rd3so1410437pab.21 for ; Wed, 14 May 2014 01:42:36 -0700 (PDT) Received: from e23smtp01.au.ibm.com (e23smtp01.au.ibm.com. [202.81.31.143]) by mx.google.com with ESMTPS id iv2si613990pbd.254.2014.05.14.01.42.33 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 14 May 2014 01:42:35 -0700 (PDT) Received: from /spool/local by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 May 2014 18:42:31 +1000 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id 439A43578052 for ; Wed, 14 May 2014 18:42:26 +1000 (EST) Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s4E8gA3665994774 for ; Wed, 14 May 2014 18:42:11 +1000 Received: from d23av03.au.ibm.com (localhost [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s4E8gOe9020125 for ; Wed, 14 May 2014 18:42:24 +1000 From: "Aneesh Kumar K.V" Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> Date: Wed, 14 May 2014 14:12:19 +0530 Message-ID: <8761l8ah04.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Joonsoo Kim writes: > CMA is introduced to provide physically contiguous pages at runtime. > For this purpose, it reserves memory at boot time. Although it reserve > memory, this reserved memory can be used for movable memory allocation > request. This usecase is beneficial to the system that needs this CMA > reserved memory infrequently and it is one of main purpose of > introducing CMA. > > But, there is a problem in current implementation. The problem is that > it works like as just reserved memory approach. The pages on cma reserved > memory are hardly used for movable memory allocation. This is caused by > combination of allocation and reclaim policy. > > The pages on cma reserved memory are allocated if there is no movable > memory, that is, as fallback allocation. So the time this fallback > allocation is started is under heavy memory pressure. Although it is under > memory pressure, movable allocation easily succeed, since there would be > many pages on cma reserved memory. But this is not the case for unmovable > and reclaimable allocation, because they can't use the pages on cma > reserved memory. These allocations regard system's free memory as > (free pages - free cma pages) on watermark checking, that is, free > unmovable pages + free reclaimable pages + free movable pages. Because > we already exhausted movable pages, only free pages we have are unmovable > and reclaimable types and this would be really small amount. So watermark > checking would be failed. It will wake up kswapd to make enough free > memory for unmovable and reclaimable allocation and kswapd will do. > So before we fully utilize pages on cma reserved memory, kswapd start to > reclaim memory and try to make free memory over the high watermark. This > watermark checking by kswapd doesn't take care free cma pages so many > movable pages would be reclaimed. After then, we have a lot of movable > pages again, so fallback allocation doesn't happen again. To conclude, > amount of free memory on meminfo which includes free CMA pages is moving > around 512 MB if I reserve 512 MB memory for CMA. Another issue i am facing with the current code is the atomic allocation failing even with large number of CMA pages around. In my case we never reclaimed because large part of the memory is consumed by the page cache and for that, free memory check doesn't include at free_cma. I will test with this patchset and update here once i have the results. > > I found this problem on following experiment. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > > To solve this problem, I can think following 2 possible solutions. > 1. allocate the pages on cma reserved memory first, and if they are > exhausted, allocate movable pages. > 2. interleaved allocation: try to allocate specific amounts of memory > from cma reserved memory and then allocate from free movable memory. > > I tested #1 approach and found the problem. Although free memory on > meminfo can move around low watermark, there is large fluctuation on free > memory, because too many pages are reclaimed when kswapd is invoked. > Reason for this behaviour is that successive allocated CMA pages are > on the LRU list in that order and kswapd reclaim them in same order. > These memory doesn't help watermark checking from kwapd, so too many > pages are reclaimed, I guess. > > So, I implement #2 approach. > One thing I should note is that we should not change allocation target > (movable list or cma) on each allocation attempt, since this prevent > allocated pages to be in physically succession, so some I/O devices can > be hurt their performance. To solve this, I keep allocation target > in at least pageblock_nr_pages attempts and make this number reflect > ratio, free pages without free cma pages to free cma pages. With this > approach, system works very smoothly and fully utilize the pages on > cma reserved memory. > > Following is the experimental result of this patch. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > pswpin: 7 110064 > pswpout: 452 767502 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.2 235.6 > Average-MemFree: 281651 KB 290227 KB > pswpin: 8 8 > pswpout: 430 510 > > There is no difference if we don't have cma reserved memory (0 MB case). > But, with cma reserved memory (512 MB case), we fully utilize these > reserved memory through this patch and the system behaves like as > it doesn't reserve any memory. > > With this patch, we aggressively allocate the pages on cma reserved memory > so latency of CMA can arise. Below is the experimental result about > latency. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > CMA reserve: 512 MB > Backgound Workload: make -jN > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > N: 1 4 8 16 > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > So generally we can see latency increase. Ratio of this increase > is rather big - up to 70%. But, under the heavy workload, it shows > latency decrease - up to 55%. This may be worst-case scenario, but > reducing it would be important for some system, so, I can say that > this patch have advantages and disadvantages in terms of latency. > > Signed-off-by: Joonsoo Kim > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fac5509..3ff24d4 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -389,6 +389,12 @@ struct zone { > int compact_order_failed; > #endif > > +#ifdef CONFIG_CMA > + int has_cma; > + int nr_try_cma; > + int nr_try_movable; > +#endif Can you write documentation around this ? > + > ZONE_PADDING(_pad1_) > > /* Fields commonly accessed by the page reclaim scanner */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 674ade7..6f2b27b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > } > > #ifdef CONFIG_CMA > +void __init init_alloc_ratio_counter(struct zone *zone) > +{ > + if (zone->has_cma) > + return; > + > + zone->has_cma = 1; > + zone->nr_try_movable = 0; > + zone->nr_try_cma = 0; > +} > + > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > void __init init_cma_reserved_pageblock(struct page *page) > { > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > set_pageblock_migratetype(page, MIGRATE_CMA); > __free_pages(page, pageblock_order); > adjust_managed_page_count(page, pageblock_nr_pages); > + init_alloc_ratio_counter(page_zone(page)); > } > #endif > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > return NULL; > } > > +#ifdef CONFIG_CMA > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > + int migratetype) > +{ > + long free, free_cma, free_wmark; > + struct page *page; > + > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > + return NULL; > + > + if (zone->nr_try_movable) > + goto alloc_movable; > + > +alloc_cma: > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > + > + /* Reset ratio counter */ > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + /* No cma free pages, so recharge only movable allocation */ > + if (free_cma <= 0) { > + zone->nr_try_movable = pageblock_nr_pages; > + goto alloc_movable; > + } > + > + free = zone_page_state(zone, NR_FREE_PAGES); > + free_wmark = free - free_cma - high_wmark_pages(zone); > + > + /* > + * free_wmark is below than 0, and it means that normal pages > + * are under the pressure, so we recharge only cma allocation. > + */ > + if (free_wmark <= 0) { > + zone->nr_try_cma = pageblock_nr_pages; > + goto alloc_cma; > + } > + > + if (free_wmark > free_cma) { > + zone->nr_try_movable = > + (free_wmark * pageblock_nr_pages) / free_cma; > + zone->nr_try_cma = pageblock_nr_pages; > + } else { > + zone->nr_try_movable = pageblock_nr_pages; > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > + } Can you add the commit message documentation here. > + > + /* Reset complete, start on movable first */ > +alloc_movable: > + zone->nr_try_movable--; > + return NULL; > +} > +#endif > + > /* > * Do the hard work of removing an element from the buddy allocator. > * Call me with the zone->lock already held. > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > static struct page *__rmqueue(struct zone *zone, unsigned int order, > int migratetype) > { > - struct page *page; > + struct page *page = NULL; > + > + if (IS_ENABLED(CONFIG_CMA)) > + page = __rmqueue_cma(zone, order, migratetype); It would be better to move the migrate check here, So that it becomes /* For migrate movable allocation try cma area first */ if (IS_ENABLED(CONFIG_CMA) && (migratetype == MIGRATE_MOVABLE)) > > retry_reserve: > - page = __rmqueue_smallest(zone, order, migratetype); > + if (!page) > + page = __rmqueue_smallest(zone, order, migratetype); > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > page = __rmqueue_fallback(zone, order, migratetype); > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > zone_seqlock_init(zone); > zone->zone_pgdat = pgdat; > zone_pcp_init(zone); > + if (IS_ENABLED(CONFIG_CMA)) > + zone->has_cma = 0; > > /* For bootup, initialized properly in watermark setup */ > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > -- > 1.7.9.5 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id F3DE86B0036 for ; Wed, 14 May 2014 05:44:39 -0400 (EDT) Received: by mail-pa0-f50.google.com with SMTP id fb1so1482584pad.9 for ; Wed, 14 May 2014 02:44:39 -0700 (PDT) Received: from e28smtp05.in.ibm.com (e28smtp05.in.ibm.com. [122.248.162.5]) by mx.google.com with ESMTPS id ud10si696337pbc.159.2014.05.14.02.44.37 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 14 May 2014 02:44:39 -0700 (PDT) Received: from /spool/local by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 May 2014 15:14:36 +0530 Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id 210FE394005E for ; Wed, 14 May 2014 15:14:33 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s4E9il6i58851530 for ; Wed, 14 May 2014 15:14:47 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s4E9iUMA017916 for ; Wed, 14 May 2014 15:14:31 +0530 From: "Aneesh Kumar K.V" Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory In-Reply-To: <20140513022603.GF23803@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> Date: Wed, 14 May 2014 15:14:30 +0530 Message-ID: <8738gcae4h.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Marek Szyprowski Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , 'Tomasz Stanislawski' Joonsoo Kim writes: > On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote: >> Hello, >> >> On 2014-05-08 02:32, Joonsoo Kim wrote: >> >This series tries to improve CMA. >> > >> >CMA is introduced to provide physically contiguous pages at runtime >> >without reserving memory area. But, current implementation works like as >> >reserving memory approach, because allocation on cma reserved region only >> >occurs as fallback of migrate_movable allocation. We can allocate from it >> >when there is no movable page. In that situation, kswapd would be invoked >> >easily since unmovable and reclaimable allocation consider >> >(free pages - free CMA pages) as free memory on the system and free memory >> >may be lower than high watermark in that case. If kswapd start to reclaim >> >memory, then fallback allocation doesn't occur much. >> > >> >In my experiment, I found that if system memory has 1024 MB memory and >> >has 512 MB reserved memory for CMA, kswapd is mostly invoked around >> >the 512MB free memory boundary. And invoked kswapd tries to make free >> >memory until (free pages - free CMA pages) is higher than high watermark, >> >so free memory on meminfo is moving around 512MB boundary consistently. >> > >> >To fix this problem, we should allocate the pages on cma reserved memory >> >more aggressively and intelligenetly. Patch 2 implements the solution. >> >Patch 1 is the simple optimization which remove useless re-trial and patch 3 >> >is for removing useless alloc flag, so these are not important. >> >See patch 2 for more detailed description. >> > >> >This patchset is based on v3.15-rc4. >> >> Thanks for posting those patches. It basically reminds me the >> following discussion: >> http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 >> >> Your approach is basically the same. I hope that your patches can be >> improved >> in such a way that they will be accepted by mm maintainers. I only >> wonder if the >> third patch is really necessary. Without it kswapd wakeup might be >> still avoided >> in some cases. > > Hello, > > Oh... I didn't know that patch and discussion, because I have no interest > on CMA at that time. Your approach looks similar to #1 > approach of mine and could have same problem of #1 approach which I mentioned > in patch 2/3. Please refer that patch description. :) IIUC that patch also interleave right ? +#ifdef CONFIG_CMA + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); + unsigned long nr_cma_free = zone_page_state(zone, NR_FREE_CMA_PAGES); + + if (migratetype == MIGRATE_MOVABLE && nr_cma_free && + nr_free - nr_cma_free < 2 * low_wmark_pages(zone)) + migratetype = MIGRATE_CMA; +#endif /* CONFIG_CMA */ That doesn't always prefer CMA region. It would be nice to understand why grouping in pageblock_nr_pages is beneficial. Also in your patch you decrement nr_try_cma for every 'order' allocation. Why ? + if (zone->nr_try_cma) { + /* Okay. Now, we can try to allocate the page from cma region */ + zone->nr_try_cma--; + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); + + /* CMA pages can vanish through CMA allocation */ + if (unlikely(!page && order == 0)) + zone->nr_try_cma = 0; + + return page; + } If we fail above MIGRATE_CMA alloc should we return failure ? Why not try MOVABLE allocation on failure (ie fallthrough the code path) ? > And, there is different purpose between this and yours. This patch is > intended to better use of CMA pages and so get maximum performance. > Just to not trigger oom, it can be possible to put this logic on reclaim path. > But that is sub-optimal to get higher performance, because it needs > migration in some cases. > > If second patch works as intended, there are just a few of cma free pages > when we are toward on the watermark. So benefit of third patch would > be marginal and we can remove ALLOC_CMA. > > Thanks. > -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by kanga.kvack.org (Postfix) with ESMTP id 15D4C6B0036 for ; Wed, 14 May 2014 21:50:50 -0400 (EDT) Received: by mail-pa0-f42.google.com with SMTP id rd3so374124pab.15 for ; Wed, 14 May 2014 18:50:49 -0700 (PDT) Received: from lgemrelse6q.lge.com (LGEMRELSE6Q.lge.com. [156.147.1.121]) by mx.google.com with ESMTP id vb6si3701545pac.58.2014.05.14.18.50.47 for ; Wed, 14 May 2014 18:50:49 -0700 (PDT) Date: Thu, 15 May 2014 10:53:01 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515015301.GA10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140513030057.GC32092@bbox> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > Hey Joonsoo, > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > I love this idea but when I see the code, I don't like that. > In allocation path, just try to allocate pages by round-robin so it's role > of allocator. If one of migratetype is full, just pass mission to reclaimer > with hint(ie, Hey reclaimer, it's non-movable allocation fail > so there is pointless if you reclaim MIGRATE_CMA pages) so that > reclaimer can filter it out during page scanning. > We already have an tool to achieve it(ie, isolate_mode_t). Hello, I agree with leaving fast allocation path as simple as possible. I will remove runtime computation for determining ratio in __rmqueue_cma() and, instead, will use pre-computed value calculated on the other path. I am not sure that whether your second suggestion(Hey relaimer part) is good or not. In my quick thought, that could be helpful in the situation that many free cma pages remained. But, it would be not helpful when there are neither free movable and cma pages. In generally, most workloads mainly uses movable pages for page cache or anonymous mapping. Although reclaim is triggered by non-movable allocation failure, reclaimed pages are used mostly by movable allocation. We can handle these allocation request even if we reclaim the pages just in lru order. If we rotate the lru list for finding movable pages, it could cause more useful pages to be evicted. This is just my quick thought, so please let me correct if I am wrong. > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > If possible, it would be better becauser it's generic function to check > free pages and cause trigger reclaim/compaction logic. I guess, your *it* means ratio computation. Right? I don't like putting it on zone_watermark_ok(). Although it need to refer to free cma pages value which are also referred in zone_watermark_ok(), this computation is for determining ratio, not for triggering reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so putting this logic into zone_watermark_ok() looks not better to me. I will think better place to do it. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f45.google.com (mail-pb0-f45.google.com [209.85.160.45]) by kanga.kvack.org (Postfix) with ESMTP id 6A4C16B0036 for ; Wed, 14 May 2014 21:56:31 -0400 (EDT) Received: by mail-pb0-f45.google.com with SMTP id um1so385681pbc.4 for ; Wed, 14 May 2014 18:56:31 -0700 (PDT) Received: from lgemrelse6q.lge.com (LGEMRELSE6Q.lge.com. [156.147.1.121]) by mx.google.com with ESMTP id hf2si3672089pac.235.2014.05.14.18.56.28 for ; Wed, 14 May 2014 18:56:30 -0700 (PDT) Date: Thu, 15 May 2014 10:58:42 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515015842.GB10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <8761l8ah04.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8761l8ah04.fsf@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > Another issue i am facing with the current code is the atomic allocation > failing even with large number of CMA pages around. In my case we never > reclaimed because large part of the memory is consumed by the page cache and > for that, free memory check doesn't include at free_cma. I will test > with this patchset and update here once i have the results. > Hello, Could you elaborate more on your issue? I can't completely understand your problem. So your atomic allocation is movable? And although there are many free cma pages, that request is fail? > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > > > I tested #1 approach and found the problem. Although free memory on > > meminfo can move around low watermark, there is large fluctuation on free > > memory, because too many pages are reclaimed when kswapd is invoked. > > Reason for this behaviour is that successive allocated CMA pages are > > on the LRU list in that order and kswapd reclaim them in same order. > > These memory doesn't help watermark checking from kwapd, so too many > > pages are reclaimed, I guess. > > > > So, I implement #2 approach. > > One thing I should note is that we should not change allocation target > > (movable list or cma) on each allocation attempt, since this prevent > > allocated pages to be in physically succession, so some I/O devices can > > be hurt their performance. To solve this, I keep allocation target > > in at least pageblock_nr_pages attempts and make this number reflect > > ratio, free pages without free cma pages to free cma pages. With this > > approach, system works very smoothly and fully utilize the pages on > > cma reserved memory. > > > > Following is the experimental result of this patch. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > pswpin: 7 110064 > > pswpout: 452 767502 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.2 235.6 > > Average-MemFree: 281651 KB 290227 KB > > pswpin: 8 8 > > pswpout: 430 510 > > > > There is no difference if we don't have cma reserved memory (0 MB case). > > But, with cma reserved memory (512 MB case), we fully utilize these > > reserved memory through this patch and the system behaves like as > > it doesn't reserve any memory. > > > > With this patch, we aggressively allocate the pages on cma reserved memory > > so latency of CMA can arise. Below is the experimental result about > > latency. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > CMA reserve: 512 MB > > Backgound Workload: make -jN > > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > > > N: 1 4 8 16 > > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > > > So generally we can see latency increase. Ratio of this increase > > is rather big - up to 70%. But, under the heavy workload, it shows > > latency decrease - up to 55%. This may be worst-case scenario, but > > reducing it would be important for some system, so, I can say that > > this patch have advantages and disadvantages in terms of latency. > > > > Signed-off-by: Joonsoo Kim > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index fac5509..3ff24d4 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -389,6 +389,12 @@ struct zone { > > int compact_order_failed; > > #endif > > > > +#ifdef CONFIG_CMA > > + int has_cma; > > + int nr_try_cma; > > + int nr_try_movable; > > +#endif > > > Can you write documentation around this ? > Okay. > > + > > ZONE_PADDING(_pad1_) > > > > /* Fields commonly accessed by the page reclaim scanner */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 674ade7..6f2b27b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > > } > > > > #ifdef CONFIG_CMA > > +void __init init_alloc_ratio_counter(struct zone *zone) > > +{ > > + if (zone->has_cma) > > + return; > > + > > + zone->has_cma = 1; > > + zone->nr_try_movable = 0; > > + zone->nr_try_cma = 0; > > +} > > + > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > > void __init init_cma_reserved_pageblock(struct page *page) > > { > > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > > set_pageblock_migratetype(page, MIGRATE_CMA); > > __free_pages(page, pageblock_order); > > adjust_managed_page_count(page, pageblock_nr_pages); > > + init_alloc_ratio_counter(page_zone(page)); > > } > > #endif > > > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > return NULL; > > } > > > > +#ifdef CONFIG_CMA > > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > > + int migratetype) > > +{ > > + long free, free_cma, free_wmark; > > + struct page *page; > > + > > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > > + return NULL; > > + > > + if (zone->nr_try_movable) > > + goto alloc_movable; > > + > > +alloc_cma: > > + if (zone->nr_try_cma) { > > + /* Okay. Now, we can try to allocate the page from cma region */ > > + zone->nr_try_cma--; > > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > > + > > + /* CMA pages can vanish through CMA allocation */ > > + if (unlikely(!page && order == 0)) > > + zone->nr_try_cma = 0; > > + > > + return page; > > + } > > + > > + /* Reset ratio counter */ > > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > > + > > + /* No cma free pages, so recharge only movable allocation */ > > + if (free_cma <= 0) { > > + zone->nr_try_movable = pageblock_nr_pages; > > + goto alloc_movable; > > + } > > + > > + free = zone_page_state(zone, NR_FREE_PAGES); > > + free_wmark = free - free_cma - high_wmark_pages(zone); > > + > > + /* > > + * free_wmark is below than 0, and it means that normal pages > > + * are under the pressure, so we recharge only cma allocation. > > + */ > > + if (free_wmark <= 0) { > > + zone->nr_try_cma = pageblock_nr_pages; > > + goto alloc_cma; > > + } > > + > > + if (free_wmark > free_cma) { > > + zone->nr_try_movable = > > + (free_wmark * pageblock_nr_pages) / free_cma; > > + zone->nr_try_cma = pageblock_nr_pages; > > + } else { > > + zone->nr_try_movable = pageblock_nr_pages; > > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > > + } > > Can you add the commit message documentation here. > Okay. > > + > > + /* Reset complete, start on movable first */ > > +alloc_movable: > > + zone->nr_try_movable--; > > + return NULL; > > +} > > +#endif > > + > > /* > > * Do the hard work of removing an element from the buddy allocator. > > * Call me with the zone->lock already held. > > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > static struct page *__rmqueue(struct zone *zone, unsigned int order, > > int migratetype) > > { > > - struct page *page; > > + struct page *page = NULL; > > + > > + if (IS_ENABLED(CONFIG_CMA)) > > + page = __rmqueue_cma(zone, order, migratetype); > > It would be better to move the migrate check here, So that it becomes > > /* For migrate movable allocation try cma area first */ > if (IS_ENABLED(CONFIG_CMA) && (migratetype == MIGRATE_MOVABLE)) > > Okay. But it makes no difference between current code and your suggestion, because __rmqueue_cma would be inlined by compiler optimization. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id 5BE616B0036 for ; Wed, 14 May 2014 22:08:43 -0400 (EDT) Received: by mail-pa0-f45.google.com with SMTP id ey11so392978pad.18 for ; Wed, 14 May 2014 19:08:43 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id pt4si3698829pac.241.2014.05.14.19.08.41 for ; Wed, 14 May 2014 19:08:42 -0700 (PDT) Date: Thu, 15 May 2014 11:10:55 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140515021055.GC10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> <8738gcae4h.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8738gcae4h.fsf@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: Marek Szyprowski , Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , 'Tomasz Stanislawski' On Wed, May 14, 2014 at 03:14:30PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote: > >> Hello, > >> > >> On 2014-05-08 02:32, Joonsoo Kim wrote: > >> >This series tries to improve CMA. > >> > > >> >CMA is introduced to provide physically contiguous pages at runtime > >> >without reserving memory area. But, current implementation works like as > >> >reserving memory approach, because allocation on cma reserved region only > >> >occurs as fallback of migrate_movable allocation. We can allocate from it > >> >when there is no movable page. In that situation, kswapd would be invoked > >> >easily since unmovable and reclaimable allocation consider > >> >(free pages - free CMA pages) as free memory on the system and free memory > >> >may be lower than high watermark in that case. If kswapd start to reclaim > >> >memory, then fallback allocation doesn't occur much. > >> > > >> >In my experiment, I found that if system memory has 1024 MB memory and > >> >has 512 MB reserved memory for CMA, kswapd is mostly invoked around > >> >the 512MB free memory boundary. And invoked kswapd tries to make free > >> >memory until (free pages - free CMA pages) is higher than high watermark, > >> >so free memory on meminfo is moving around 512MB boundary consistently. > >> > > >> >To fix this problem, we should allocate the pages on cma reserved memory > >> >more aggressively and intelligenetly. Patch 2 implements the solution. > >> >Patch 1 is the simple optimization which remove useless re-trial and patch 3 > >> >is for removing useless alloc flag, so these are not important. > >> >See patch 2 for more detailed description. > >> > > >> >This patchset is based on v3.15-rc4. > >> > >> Thanks for posting those patches. It basically reminds me the > >> following discussion: > >> http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 > >> > >> Your approach is basically the same. I hope that your patches can be > >> improved > >> in such a way that they will be accepted by mm maintainers. I only > >> wonder if the > >> third patch is really necessary. Without it kswapd wakeup might be > >> still avoided > >> in some cases. > > > > Hello, > > > > Oh... I didn't know that patch and discussion, because I have no interest > > on CMA at that time. Your approach looks similar to #1 > > approach of mine and could have same problem of #1 approach which I mentioned > > in patch 2/3. Please refer that patch description. :) > > IIUC that patch also interleave right ? > > +#ifdef CONFIG_CMA > + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); > + unsigned long nr_cma_free = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + if (migratetype == MIGRATE_MOVABLE && nr_cma_free && > + nr_free - nr_cma_free < 2 * low_wmark_pages(zone)) > + migratetype = MIGRATE_CMA; > +#endif /* CONFIG_CMA */ Hello, This is not interleave in my point of view. This logic will allocate free movable pages until hitting 2 * low_wmark, and then allocate free cma pages. Interleave that I mean is something like round-robin policy with no constraint like above. > > That doesn't always prefer CMA region. It would be nice to > understand why grouping in pageblock_nr_pages is beneficial. Also in > your patch you decrement nr_try_cma for every 'order' allocation. Why ? pageblock_nr_pages is just magic value with no rationale. :) But we need grouping, because without it, we can't get physically contiguous pages. When we allocate the pages for page cache, readahead logic will try to allocate 32 pages. If we don't use grouping, disk I/O for these pages can't be handled by one I/O request on some devices. I'm not familiar to I/O device, please let me correct. And, yes, I will consider 'order' allocation when inc/dec nr_try_cma. > > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > > > If we fail above MIGRATE_CMA alloc should we return failure ? Why > not try MOVABLE allocation on failure (ie fallthrough the code path) ? This patch use fallthrough logic. If we fail on __rmqueue_cma(), it will go __rmqueue() as usual. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f51.google.com (mail-pb0-f51.google.com [209.85.160.51]) by kanga.kvack.org (Postfix) with ESMTP id D93D56B0036 for ; Wed, 14 May 2014 22:41:26 -0400 (EDT) Received: by mail-pb0-f51.google.com with SMTP id ma3so429999pbc.38 for ; Wed, 14 May 2014 19:41:26 -0700 (PDT) Received: from lgeamrelo04.lge.com (lgeamrelo04.lge.com. [156.147.1.127]) by mx.google.com with ESMTP id qb5si723953pbb.157.2014.05.14.19.41.24 for ; Wed, 14 May 2014 19:41:25 -0700 (PDT) Date: Thu, 15 May 2014 11:43:53 +0900 From: Minchan Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515024353.GA27599@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140515015301.GA10116@js1304-P5Q-DELUXE> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > Hey Joonsoo, > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > CMA is introduced to provide physically contiguous pages at runtime. > > > For this purpose, it reserves memory at boot time. Although it reserve > > > memory, this reserved memory can be used for movable memory allocation > > > request. This usecase is beneficial to the system that needs this CMA > > > reserved memory infrequently and it is one of main purpose of > > > introducing CMA. > > > > > > But, there is a problem in current implementation. The problem is that > > > it works like as just reserved memory approach. The pages on cma reserved > > > memory are hardly used for movable memory allocation. This is caused by > > > combination of allocation and reclaim policy. > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > memory, that is, as fallback allocation. So the time this fallback > > > allocation is started is under heavy memory pressure. Although it is under > > > memory pressure, movable allocation easily succeed, since there would be > > > many pages on cma reserved memory. But this is not the case for unmovable > > > and reclaimable allocation, because they can't use the pages on cma > > > reserved memory. These allocations regard system's free memory as > > > (free pages - free cma pages) on watermark checking, that is, free > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > we already exhausted movable pages, only free pages we have are unmovable > > > and reclaimable types and this would be really small amount. So watermark > > > checking would be failed. It will wake up kswapd to make enough free > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > reclaim memory and try to make free memory over the high watermark. This > > > watermark checking by kswapd doesn't take care free cma pages so many > > > movable pages would be reclaimed. After then, we have a lot of movable > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > amount of free memory on meminfo which includes free CMA pages is moving > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > I found this problem on following experiment. > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > > Elapsed-time: 234.8 361.8 > > > Average-MemFree: 283880 KB 530851 KB > > > > > > To solve this problem, I can think following 2 possible solutions. > > > 1. allocate the pages on cma reserved memory first, and if they are > > > exhausted, allocate movable pages. > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > from cma reserved memory and then allocate from free movable memory. > > > > I love this idea but when I see the code, I don't like that. > > In allocation path, just try to allocate pages by round-robin so it's role > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > reclaimer can filter it out during page scanning. > > We already have an tool to achieve it(ie, isolate_mode_t). > > Hello, > > I agree with leaving fast allocation path as simple as possible. > I will remove runtime computation for determining ratio in > __rmqueue_cma() and, instead, will use pre-computed value calculated > on the other path. Sounds good. > > I am not sure that whether your second suggestion(Hey relaimer part) > is good or not. In my quick thought, that could be helpful in the > situation that many free cma pages remained. But, it would be not helpful > when there are neither free movable and cma pages. In generally, most > workloads mainly uses movable pages for page cache or anonymous mapping. > Although reclaim is triggered by non-movable allocation failure, reclaimed > pages are used mostly by movable allocation. We can handle these allocation > request even if we reclaim the pages just in lru order. If we rotate > the lru list for finding movable pages, it could cause more useful > pages to be evicted. > > This is just my quick thought, so please let me correct if I am wrong. Why should reclaimer reclaim unnecessary pages? So, your answer is that it would be better because upcoming newly allocated pages would be allocated easily without interrupt. But it could reclaim too much pages until watermark for unmovable allocation is okay. Even, sometime, you might see OOM. Moreover, how could you handle current trobule? For example, there is atomic allocation and the only thing to save the world is kswapd because it's one of kswapd role but kswapd is spending many time to reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > > If possible, it would be better becauser it's generic function to check > > free pages and cause trigger reclaim/compaction logic. > > I guess, your *it* means ratio computation. Right? I meant just get_page_from_freelist like fair zone allocation for consistency but as we discussed offline, i'm not against with you if it's not right place. > I don't like putting it on zone_watermark_ok(). Although it need to > refer to free cma pages value which are also referred in zone_watermark_ok(), > this computation is for determining ratio, not for triggering > reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so > putting this logic into zone_watermark_ok() looks not better to me. > > I will think better place to do it. Yeb, Thanks! > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f43.google.com (mail-pb0-f43.google.com [209.85.160.43]) by kanga.kvack.org (Postfix) with ESMTP id 1DD8E6B0036 for ; Wed, 14 May 2014 22:45:18 -0400 (EDT) Received: by mail-pb0-f43.google.com with SMTP id up15so439523pbc.2 for ; Wed, 14 May 2014 19:45:17 -0700 (PDT) Received: from mailout4.samsung.com (mailout4.samsung.com. [203.254.224.34]) by mx.google.com with ESMTPS id qh4si338278pbb.223.2014.05.14.19.45.16 for (version=TLSv1 cipher=RC4-MD5 bits=128/128); Wed, 14 May 2014 19:45:17 -0700 (PDT) Received: from epcpsbgr1.samsung.com (u141.gpu120.samsung.co.kr [203.254.230.141]) by mailout4.samsung.com (Oracle Communications Messaging Server 7u4-24.01 (7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0N5L006AMGZEK050@mailout4.samsung.com> for linux-mm@kvack.org; Thu, 15 May 2014 11:45:14 +0900 (KST) Message-id: <53742A4B.4090901@samsung.com> Date: Thu, 15 May 2014 11:45:31 +0900 From: Heesub Shin MIME-version: 1.0 Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> In-reply-to: <20140515015301.GA10116@js1304-P5Q-DELUXE> Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Mel Gorman , Johannes Weiner , Marek Szyprowski Hello, On 05/15/2014 10:53 AM, Joonsoo Kim wrote: > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: >> Hey Joonsoo, >> >> On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: >>> CMA is introduced to provide physically contiguous pages at runtime. >>> For this purpose, it reserves memory at boot time. Although it reserve >>> memory, this reserved memory can be used for movable memory allocation >>> request. This usecase is beneficial to the system that needs this CMA >>> reserved memory infrequently and it is one of main purpose of >>> introducing CMA. >>> >>> But, there is a problem in current implementation. The problem is that >>> it works like as just reserved memory approach. The pages on cma reserved >>> memory are hardly used for movable memory allocation. This is caused by >>> combination of allocation and reclaim policy. >>> >>> The pages on cma reserved memory are allocated if there is no movable >>> memory, that is, as fallback allocation. So the time this fallback >>> allocation is started is under heavy memory pressure. Although it is under >>> memory pressure, movable allocation easily succeed, since there would be >>> many pages on cma reserved memory. But this is not the case for unmovable >>> and reclaimable allocation, because they can't use the pages on cma >>> reserved memory. These allocations regard system's free memory as >>> (free pages - free cma pages) on watermark checking, that is, free >>> unmovable pages + free reclaimable pages + free movable pages. Because >>> we already exhausted movable pages, only free pages we have are unmovable >>> and reclaimable types and this would be really small amount. So watermark >>> checking would be failed. It will wake up kswapd to make enough free >>> memory for unmovable and reclaimable allocation and kswapd will do. >>> So before we fully utilize pages on cma reserved memory, kswapd start to >>> reclaim memory and try to make free memory over the high watermark. This >>> watermark checking by kswapd doesn't take care free cma pages so many >>> movable pages would be reclaimed. After then, we have a lot of movable >>> pages again, so fallback allocation doesn't happen again. To conclude, >>> amount of free memory on meminfo which includes free CMA pages is moving >>> around 512 MB if I reserve 512 MB memory for CMA. >>> >>> I found this problem on following experiment. >>> >>> 4 CPUs, 1024 MB, VIRTUAL MACHINE >>> make -j24 >>> >>> CMA reserve: 0 MB 512 MB >>> Elapsed-time: 234.8 361.8 >>> Average-MemFree: 283880 KB 530851 KB >>> >>> To solve this problem, I can think following 2 possible solutions. >>> 1. allocate the pages on cma reserved memory first, and if they are >>> exhausted, allocate movable pages. >>> 2. interleaved allocation: try to allocate specific amounts of memory >>> from cma reserved memory and then allocate from free movable memory. >> >> I love this idea but when I see the code, I don't like that. >> In allocation path, just try to allocate pages by round-robin so it's role >> of allocator. If one of migratetype is full, just pass mission to reclaimer >> with hint(ie, Hey reclaimer, it's non-movable allocation fail >> so there is pointless if you reclaim MIGRATE_CMA pages) so that >> reclaimer can filter it out during page scanning. >> We already have an tool to achieve it(ie, isolate_mode_t). > > Hello, > > I agree with leaving fast allocation path as simple as possible. > I will remove runtime computation for determining ratio in > __rmqueue_cma() and, instead, will use pre-computed value calculated > on the other path. > > I am not sure that whether your second suggestion(Hey relaimer part) > is good or not. In my quick thought, that could be helpful in the > situation that many free cma pages remained. But, it would be not helpful > when there are neither free movable and cma pages. In generally, most > workloads mainly uses movable pages for page cache or anonymous mapping. > Although reclaim is triggered by non-movable allocation failure, reclaimed > pages are used mostly by movable allocation. We can handle these allocation > request even if we reclaim the pages just in lru order. If we rotate > the lru list for finding movable pages, it could cause more useful > pages to be evicted. > > This is just my quick thought, so please let me correct if I am wrong. We have an out of tree implementation that is completely the same with the approach Minchan said and it works, but it has definitely some side-effects as you pointed, distorting the LRU and evicting hot pages. I do not attach code fragments in this thread for some reasons, but it must be easy for yourself. I am wondering if it could help also in your case. Thanks, Heesub > >> >> And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? >> If possible, it would be better becauser it's generic function to check >> free pages and cause trigger reclaim/compaction logic. > > I guess, your *it* means ratio computation. Right? > I don't like putting it on zone_watermark_ok(). Although it need to > refer to free cma pages value which are also referred in zone_watermark_ok(), > this computation is for determining ratio, not for triggering > reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so > putting this logic into zone_watermark_ok() looks not better to me. > > I will think better place to do it. > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f45.google.com (mail-pb0-f45.google.com [209.85.160.45]) by kanga.kvack.org (Postfix) with ESMTP id 57C0A6B0036 for ; Thu, 15 May 2014 01:04:00 -0400 (EDT) Received: by mail-pb0-f45.google.com with SMTP id um1so583871pbc.18 for ; Wed, 14 May 2014 22:04:00 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id hb8si2060855pbc.239.2014.05.14.22.03.58 for ; Wed, 14 May 2014 22:03:59 -0700 (PDT) Date: Thu, 15 May 2014 14:06:27 +0900 From: Minchan Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515050627.GB27599@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <53742A4B.4090901@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53742A4B.4090901@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Heesub Shin Cc: Joonsoo Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Mel Gorman , Johannes Weiner , Marek Szyprowski Hello Heesub, On Thu, May 15, 2014 at 11:45:31AM +0900, Heesub Shin wrote: > Hello, > > On 05/15/2014 10:53 AM, Joonsoo Kim wrote: > >On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > >>Hey Joonsoo, > >> > >>On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > >>>CMA is introduced to provide physically contiguous pages at runtime. > >>>For this purpose, it reserves memory at boot time. Although it reserve > >>>memory, this reserved memory can be used for movable memory allocation > >>>request. This usecase is beneficial to the system that needs this CMA > >>>reserved memory infrequently and it is one of main purpose of > >>>introducing CMA. > >>> > >>>But, there is a problem in current implementation. The problem is that > >>>it works like as just reserved memory approach. The pages on cma reserved > >>>memory are hardly used for movable memory allocation. This is caused by > >>>combination of allocation and reclaim policy. > >>> > >>>The pages on cma reserved memory are allocated if there is no movable > >>>memory, that is, as fallback allocation. So the time this fallback > >>>allocation is started is under heavy memory pressure. Although it is under > >>>memory pressure, movable allocation easily succeed, since there would be > >>>many pages on cma reserved memory. But this is not the case for unmovable > >>>and reclaimable allocation, because they can't use the pages on cma > >>>reserved memory. These allocations regard system's free memory as > >>>(free pages - free cma pages) on watermark checking, that is, free > >>>unmovable pages + free reclaimable pages + free movable pages. Because > >>>we already exhausted movable pages, only free pages we have are unmovable > >>>and reclaimable types and this would be really small amount. So watermark > >>>checking would be failed. It will wake up kswapd to make enough free > >>>memory for unmovable and reclaimable allocation and kswapd will do. > >>>So before we fully utilize pages on cma reserved memory, kswapd start to > >>>reclaim memory and try to make free memory over the high watermark. This > >>>watermark checking by kswapd doesn't take care free cma pages so many > >>>movable pages would be reclaimed. After then, we have a lot of movable > >>>pages again, so fallback allocation doesn't happen again. To conclude, > >>>amount of free memory on meminfo which includes free CMA pages is moving > >>>around 512 MB if I reserve 512 MB memory for CMA. > >>> > >>>I found this problem on following experiment. > >>> > >>>4 CPUs, 1024 MB, VIRTUAL MACHINE > >>>make -j24 > >>> > >>>CMA reserve: 0 MB 512 MB > >>>Elapsed-time: 234.8 361.8 > >>>Average-MemFree: 283880 KB 530851 KB > >>> > >>>To solve this problem, I can think following 2 possible solutions. > >>>1. allocate the pages on cma reserved memory first, and if they are > >>> exhausted, allocate movable pages. > >>>2. interleaved allocation: try to allocate specific amounts of memory > >>> from cma reserved memory and then allocate from free movable memory. > >> > >>I love this idea but when I see the code, I don't like that. > >>In allocation path, just try to allocate pages by round-robin so it's role > >>of allocator. If one of migratetype is full, just pass mission to reclaimer > >>with hint(ie, Hey reclaimer, it's non-movable allocation fail > >>so there is pointless if you reclaim MIGRATE_CMA pages) so that > >>reclaimer can filter it out during page scanning. > >>We already have an tool to achieve it(ie, isolate_mode_t). > > > >Hello, > > > >I agree with leaving fast allocation path as simple as possible. > >I will remove runtime computation for determining ratio in > >__rmqueue_cma() and, instead, will use pre-computed value calculated > >on the other path. > > > >I am not sure that whether your second suggestion(Hey relaimer part) > >is good or not. In my quick thought, that could be helpful in the > >situation that many free cma pages remained. But, it would be not helpful > >when there are neither free movable and cma pages. In generally, most > >workloads mainly uses movable pages for page cache or anonymous mapping. > >Although reclaim is triggered by non-movable allocation failure, reclaimed > >pages are used mostly by movable allocation. We can handle these allocation > >request even if we reclaim the pages just in lru order. If we rotate > >the lru list for finding movable pages, it could cause more useful > >pages to be evicted. > > > >This is just my quick thought, so please let me correct if I am wrong. > > We have an out of tree implementation that is completely the same > with the approach Minchan said and it works, but it has definitely > some side-effects as you pointed, distorting the LRU and evicting > hot pages. I do not attach code fragments in this thread for some Actually, I discussed with Joonsoo to solve such corner case in future if someone report it but you did it now. Thanks! LRU churning is a general problem, not CMA specific although CMA would make worse more agressively so I'd like to handle it another topic(ie, patchset) The reason we did rotate them back to LRU head was just to avoid scanning repeat overhead of one reclaim cycle so one of idea I can think of is that we can put a reclaim cursor into LRU tail right before reclaim cycle and start scanning from the cursor and update the cursor position on every scanning cycle. Of course, we should rotate filtered out pages back to LRU's tail, not head but with cursor, we can skip pointless pages which was already scanned by this reclaim cycle. The cursor should be removed when the reclaim cycle would be done so if next reclaim happens, cursor will start from the beginning so it could make unecessary scanning again until reaching the proper victim page so CPU usage would be higher but it's better than evicting working set. Another idea? > reasons, but it must be easy for yourself. I am wondering if it > could help also in your case. > > Thanks, > Heesub > > > > >> > >>And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > >>If possible, it would be better becauser it's generic function to check > >>free pages and cause trigger reclaim/compaction logic. > > > >I guess, your *it* means ratio computation. Right? > >I don't like putting it on zone_watermark_ok(). Although it need to > >refer to free cma pages value which are also referred in zone_watermark_ok(), > >this computation is for determining ratio, not for triggering > >reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so > >putting this logic into zone_watermark_ok() looks not better to me. > > > >I will think better place to do it. > > > >Thanks. > > > >-- > >To unsubscribe, send a message with 'unsubscribe linux-mm' in > >the body to majordomo@kvack.org. For more info on Linux MM, > >see: http://www.linux-mm.org/ . > >Don't email: email@kvack.org > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f45.google.com (mail-ee0-f45.google.com [74.125.83.45]) by kanga.kvack.org (Postfix) with ESMTP id 780FE6B0036 for ; Thu, 15 May 2014 05:47:28 -0400 (EDT) Received: by mail-ee0-f45.google.com with SMTP id d49so449874eek.18 for ; Thu, 15 May 2014 02:47:27 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id v47si2316651een.237.2014.05.15.02.47.26 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 15 May 2014 02:47:26 -0700 (PDT) Date: Thu, 15 May 2014 10:47:18 +0100 From: Mel Gorman Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140515094718.GE23991@suse.de> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> <8738gcae4h.fsf@linux.vnet.ibm.com> <20140515021055.GC10116@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140515021055.GC10116@js1304-P5Q-DELUXE> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: "Aneesh Kumar K.V" , Marek Szyprowski , Andrew Morton , Rik van Riel , Johannes Weiner , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , 'Tomasz Stanislawski' On Thu, May 15, 2014 at 11:10:55AM +0900, Joonsoo Kim wrote: > > That doesn't always prefer CMA region. It would be nice to > > understand why grouping in pageblock_nr_pages is beneficial. Also in > > your patch you decrement nr_try_cma for every 'order' allocation. Why ? > > pageblock_nr_pages is just magic value with no rationale. :) I'm not following this discussions closely but there is rational to that value -- it's the size of a huge page for that architecture. At the time the fragmentation avoidance was implemented this was the largest allocation size of interest. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f41.google.com (mail-pb0-f41.google.com [209.85.160.41]) by kanga.kvack.org (Postfix) with ESMTP id 453E16B003D for ; Fri, 16 May 2014 04:02:39 -0400 (EDT) Received: by mail-pb0-f41.google.com with SMTP id uo5so2295328pbc.0 for ; Fri, 16 May 2014 01:02:38 -0700 (PDT) Received: from lgeamrelo04.lge.com (lgeamrelo04.lge.com. [156.147.1.127]) by mx.google.com with ESMTP id st6si8071728pab.46.2014.05.16.01.02.35 for ; Fri, 16 May 2014 01:02:38 -0700 (PDT) Message-ID: <5375C619.8010501@lge.com> Date: Fri, 16 May 2014 17:02:33 +0900 From: Gioh Kim MIME-Version: 1.0 Subject: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> In-Reply-To: <20140515015301.GA10116@js1304-P5Q-DELUXE> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Hi, I've been trying to apply CMA into my platform. USB host driver generated kernel panic like below when USB mouse is connected, because I turned on CMA and set the CMA_SIZE_MBYTES value into zero by mistake. I think the panic is cuased by atomic_pool in arch/arm/mm/dma-mapping.c. Zero CMA_SIZE_MBYTES value skips CMA initialization and then atomic_pool is not initialized also because __alloc_from_contiguous is failed in atomic_pool_init(). If CMA_SIZE_MBYTES_MAX is allowed to be zero, there should be defense code to check CMA is initlaized correctly. And atomic_pool initialization should be done by __alloc_remap_buffer instead of __alloc_from_contiguous if __alloc_from_contiguous is failed. IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX configuration to be larger than zero. [ 1.474523] ------------[ cut here ]------------ [ 1.479150] WARNING: at arch/arm/mm/dma-mapping.c:496 __dma_alloc.isra.19+0x1b8/0x1e0() [ 1.487160] coherent pool not initialised! [ 1.491249] Modules linked in: [ 1.494317] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.10.19+ #55 [ 1.500521] [<80013e20>] (unwind_backtrace+0x0/0xf8) from [<80011c60>] (show_stack+0x10/0x14) [ 1.509064] [<80011c60>] (show_stack+0x10/0x14) from [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) [ 1.518038] [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) from [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) [ 1.527616] [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) from [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) [ 1.537282] [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) from [<80017d7c>] (arm_dma_alloc+0x90/0x98) [ 1.546608] [<80017d7c>] (arm_dma_alloc+0x90/0x98) from [<8034a860>] (ohci_init+0x1b0/0x278) [ 1.555062] [<8034a860>] (ohci_init+0x1b0/0x278) from [<80332b0c>] (usb_add_hcd+0x184/0x5b8) [ 1.563500] [<80332b0c>] (usb_add_hcd+0x184/0x5b8) from [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) [ 1.572729] [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) from [<802f196c>] (platform_drv_probe+0x14/0x18) [ 1.582401] [<802f196c>] (platform_drv_probe+0x14/0x18) from [<802f0714>] (driver_probe_device+0x6c/0x1f8) [ 1.592064] [<802f0714>] (driver_probe_device+0x6c/0x1f8) from [<802f092c>] (__driver_attach+0x8c/0x90) [ 1.601465] [<802f092c>] (__driver_attach+0x8c/0x90) from [<802eeec8>] (bus_for_each_dev+0x54/0x88) [ 1.610518] [<802eeec8>] (bus_for_each_dev+0x54/0x88) from [<802efef0>] (bus_add_driver+0xd8/0x230) [ 1.619572] [<802efef0>] (bus_add_driver+0xd8/0x230) from [<802f0de4>] (driver_register+0x78/0x14c) [ 1.628632] [<802f0de4>] (driver_register+0x78/0x14c) from [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) [ 1.637859] [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) from [<8000879c>] (do_one_initcall+0xec/0x14c) [ 1.647088] [<8000879c>] (do_one_initcall+0xec/0x14c) from [<806dab30>] (kernel_init_freeable+0x150/0x220) [ 1.656754] [<806dab30>] (kernel_init_freeable+0x150/0x220) from [<80509f54>] (kernel_init+0x8/0xf8) [ 1.665895] [<80509f54>] (kernel_init+0x8/0xf8) from [<8000e398>] (ret_from_fork+0x14/0x3c) [ 1.674264] ---[ end trace 6f1857db5ef45cb9 ]--- [ 1.678880] ohci-platform ohci-platform.0: can't setup [ 1.684027] ohci-platform ohci-platform.0: USB bus 1 deregistered [ 1.690362] ohci-platform: probe of ohci-platform.0 failed with error -12 [ 1.697188] ohci-platform ohci-platform.1: Generic Platform OHCI Controller [ 1.704365] ohci-platform ohci-platform.1: new USB bus registered, assigned bus number 1 [ 1.712457] ------------[ cut here ]------------ [ 1.717096] WARNING: at arch/arm/mm/dma-mapping.c:496 __dma_alloc.isra.19+0x1b8/0x1e0() [ 1.725105] coherent pool not initialised! [ 1.729194] Modules linked in: [ 1.732247] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 3.10.19+ #55 [ 1.739404] [<80013e20>] (unwind_backtrace+0x0/0xf8) from [<80011c60>] (show_stack+0x10/0x14) [ 1.747949] [<80011c60>] (show_stack+0x10/0x14) from [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) [ 1.756923] [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) from [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) [ 1.766502] [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) from [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) [ 1.776168] [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) from [<80017d7c>] (arm_dma_alloc+0x90/0x98) [ 1.785484] [<80017d7c>] (arm_dma_alloc+0x90/0x98) from [<8034a860>] (ohci_init+0x1b0/0x278) [ 1.793933] [<8034a860>] (ohci_init+0x1b0/0x278) from [<80332b0c>] (usb_add_hcd+0x184/0x5b8) [ 1.802370] [<80332b0c>] (usb_add_hcd+0x184/0x5b8) from [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) [ 1.811597] [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) from [<802f196c>] (platform_drv_probe+0x14/0x18) [ 1.821263] [<802f196c>] (platform_drv_probe+0x14/0x18) from [<802f0714>] (driver_probe_device+0x6c/0x1f8) [ 1.830926] [<802f0714>] (driver_probe_device+0x6c/0x1f8) from [<802f092c>] (__driver_attach+0x8c/0x90) [ 1.840326] [<802f092c>] (__driver_attach+0x8c/0x90) from [<802eeec8>] (bus_for_each_dev+0x54/0x88) [ 1.849379] [<802eeec8>] (bus_for_each_dev+0x54/0x88) from [<802efef0>] (bus_add_driver+0xd8/0x230) [ 1.858432] [<802efef0>] (bus_add_driver+0xd8/0x230) from [<802f0de4>] (driver_register+0x78/0x14c) [ 1.867488] [<802f0de4>] (driver_register+0x78/0x14c) from [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) [ 1.876714] [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) from [<8000879c>] (do_one_initcall+0xec/0x14c) [ 1.885940] [<8000879c>] (do_one_initcall+0xec/0x14c) from [<806dab30>] (kernel_init_freeable+0x150/0x220) [ 1.895601] [<806dab30>] (kernel_init_freeable+0x150/0x220) from [<80509f54>] (kernel_init+0x8/0xf8) [ 1.904741] [<80509f54>] (kernel_init+0x8/0xf8) from [<8000e398>] (ret_from_fork+0x14/0x3c) [ 1.913085] ---[ end trace 6f1857db5ef45cba ]--- I'm adding my patch to restrict CMA_SIZE_MBYTES. This patch is based on 3.15.0-rc5 -------------------------------- 8< -------------------------------------- From 9f8e6d3c1f4bdeeeb7af3df7898b773a612c62e8 Mon Sep 17 00:00:00 2001 From: Gioh Kim Date: Fri, 16 May 2014 16:15:43 +0900 Subject: [PATCH] drivers/base/Kconfig: restrict CMA size to non-zero value The size of CMA area must be larger than zero. If the size is zero, CMA canno be initialized. Signed-off-by: Gioh Kim --- drivers/base/Kconfig | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 4b7b452..19b3578 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -222,13 +222,18 @@ config DMA_CMA if DMA_CMA comment "Default contiguous memory area size:" +config CMA_SIZE_MBYTES_MAX + int + default 1024 + config CMA_SIZE_MBYTES int "Size in Mega Bytes" depends on !CMA_SIZE_SEL_PERCENTAGE + range 1 CMA_SIZE_MBYTES_MAX default 16 help Defines the size (in MiB) of the default memory area for Contiguous - Memory Allocator. + Memory Allocator. This value must be larger than zero. config CMA_SIZE_PERCENTAGE int "Percentage of total memory" -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com [209.85.213.171]) by kanga.kvack.org (Postfix) with ESMTP id 2F4D26B0036 for ; Fri, 16 May 2014 13:45:25 -0400 (EDT) Received: by mail-ig0-f171.google.com with SMTP id c1so1079808igq.16 for ; Fri, 16 May 2014 10:45:24 -0700 (PDT) Received: from mail-ig0-x233.google.com (mail-ig0-x233.google.com [2607:f8b0:4001:c05::233]) by mx.google.com with ESMTPS id k17si3318720icg.22.2014.05.16.10.45.24 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 16 May 2014 10:45:24 -0700 (PDT) Received: by mail-ig0-f179.google.com with SMTP id hn18so1104672igb.0 for ; Fri, 16 May 2014 10:45:24 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value In-Reply-To: <5375C619.8010501@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> Date: Fri, 16 May 2014 10:45:12 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Gioh Kim , Joonsoo Kim , Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?utf-8?B?7J206rG07Zi4?= , gurugio@gmail.com --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Fri, May 16 2014, Gioh Kim wrote: > If CMA_SIZE_MBYTES is allowed to be zero, there should be defense code > to check CMA is initlaized correctly. And atomic_pool initialization > should be done by __alloc_remap_buffer instead of > __alloc_from_contiguous if __alloc_from_contiguous is failed. Agreed, and this is the correct fix. > IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX > configuration to be larger than zero. No, because it makes it impossible to have CMA disabled by default and only enabled if command line argument is given. Furthermore, your patch does *not* guarantee CMA region to always be allocated. If CMA_SIZE_SEL_PERCENTAGE is selected for instance. Or if user explicitly passes 0 on command line. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTdk6oAAoJECBgQBJQdR/0/uAP/iy4hKtOCEcIenjryq8Y8a6e A8qqXcLu0Ms9x0Pj6ooWAZiEwgyXMZaTv7ykH3JRGW6JDD4oHLwkCO5ZHXrhT1mf pPWIhdVJNJsFL8YBEoIWzRzdMFyXsPhezn79dCR4mX/mIMGiZtKEbNc8uTSNJozS yF0ZPGeevPWBgb5bJVh0ijDm26zyXIXk/aRxHCX5C9XgIS7aZhbKMmG2J2X97NU/ eyuQCPhzzfXKzcDzpZUYm2HhZDaJ/CQKOGQDwTDPVsuktOPeKu5T94+j5cFK9rKW NG/uTXDWA2B9DsC/OIcmSf/IFFHojWr2i7zaMPK4kXN6Hd+MAr9WNm+aslo8df+J F8Y2y9Gbu2ZQjBbB2R3Ecz4AJUDZgquOwSG54N+6QZuY+aMKoL3sc7kI+q12mZKS m2DjnEp6uUPsYo2RUaOotqjHBjiKlfLN6tBpxsP0BFRYyf/KCs7FGG/NS9g5xcU+ fI0h4AXIiA8g+bP1lmcv7BRFefKRZsQLYRuNoFFvzAqz0wmQ5tHpZylE6sEbpHzm d2dDlVizFPF9QEnLLMFGfOYUZrLLan3jmlCy5+dMKxKF2AdOKFYIrpexb5Io+jRp kabi/5LDR8ISiULgiQ9NDZyGTCXTmsvkvGQTuYyJhS+bBrQtrlVVL8w5Qx6/WSU1 l9Se161WbizexGWJds28 =oIPD -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by kanga.kvack.org (Postfix) with ESMTP id 8F2746B0036 for ; Sun, 18 May 2014 13:36:47 -0400 (EDT) Received: by mail-pa0-f49.google.com with SMTP id lj1so4689408pab.22 for ; Sun, 18 May 2014 10:36:47 -0700 (PDT) Received: from e28smtp03.in.ibm.com (e28smtp03.in.ibm.com. [122.248.162.3]) by mx.google.com with ESMTPS id ym9si16461455pab.72.2014.05.18.10.36.45 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Sun, 18 May 2014 10:36:46 -0700 (PDT) Received: from /spool/local by e28smtp03.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 18 May 2014 23:06:43 +0530 Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id 751363940048 for ; Sun, 18 May 2014 23:06:10 +0530 (IST) Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s4IHaVov54067408 for ; Sun, 18 May 2014 23:06:31 +0530 Received: from d28av03.in.ibm.com (localhost [127.0.0.1]) by d28av03.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s4IHa9Mg015280 for ; Sun, 18 May 2014 23:06:09 +0530 From: "Aneesh Kumar K.V" Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used In-Reply-To: <20140515015842.GB10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <8761l8ah04.fsf@linux.vnet.ibm.com> <20140515015842.GB10116@js1304-P5Q-DELUXE> Date: Sun, 18 May 2014 23:06:08 +0530 Message-ID: <87lhtzng53.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Joonsoo Kim writes: > On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote: >> Joonsoo Kim writes: >> >> >> >> Another issue i am facing with the current code is the atomic allocation >> failing even with large number of CMA pages around. In my case we never >> reclaimed because large part of the memory is consumed by the page cache and >> for that, free memory check doesn't include at free_cma. I will test >> with this patchset and update here once i have the results. >> > > Hello, > > Could you elaborate more on your issue? > I can't completely understand your problem. > So your atomic allocation is movable? And although there are many free > cma pages, that request is fail? > non movable atomic allocations are failing because we don't have anything other than CMA pages left and kswapd is yet to catchup ? swapper/0: page allocation failure: order:0, mode:0x20 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.23-1500.pkvm2_1.5.ppc64 #1 Call Trace: [c000000ffffcb610] [c000000000017330] .show_stack+0x130/0x200 (unreliable) [c000000ffffcb6e0] [c00000000087a8c8] .dump_stack+0x28/0x3c [c000000ffffcb750] [c0000000001e06f0] .warn_alloc_failed+0x110/0x160 [c000000ffffcb800] [c0000000001e5984] .__alloc_pages_nodemask+0x9d4/0xbf0 [c000000ffffcb9e0] [c00000000023775c] .alloc_pages_current+0xcc/0x1b0 [c000000ffffcba80] [c0000000007098d4] .__netdev_alloc_frag+0x1a4/0x1d0 [c000000ffffcbb20] [c00000000070d750] .__netdev_alloc_skb+0xc0/0x130 [c000000ffffcbbb0] [d000000009639b40] .tg3_poll_work+0x900/0x1110 [tg3] [c000000ffffcbd10] [d00000000963a3a4] .tg3_poll_msix+0x54/0x200 [tg3] [c000000ffffcbdb0] [c00000000071fcec] .net_rx_action+0x1dc/0x310 [c000000ffffcbe90] [c0000000000c1b08] .__do_softirq+0x158/0x330 [c000000ffffcbf90] [c000000000025744] .call_do_softirq+0x14/0x24 [c000000ffffc7e00] [c000000000011684] .do_softirq+0xf4/0x130 [c000000ffffc7e90] [c0000000000c1f18] .irq_exit+0xc8/0x110 [c000000ffffc7f10] [c000000000011258] .__do_irq+0xc8/0x1f0 [c000000ffffc7f90] [c000000000025768] .call_do_irq+0x14/0x24 [c00000000137b750] [c00000000001142c] .do_IRQ+0xac/0x130 [c00000000137b800] [c000000000002a64] hardware_interrupt_common+0x164/0x180 .... Node 0 DMA: 408*64kB (C) 408*128kB (C) 408*256kB (C) 408*512kB (C) 408*1024kB (C) 406*2048kB (C) 199*4096kB (C) 97*8192kB (C) 6*16384kB (C) = 3348992kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16384kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB meminfo details: MemTotal: 65875584 kB MemFree: 8001856 kB Buffers: 49330368 kB Cached: 178752 kB SwapCached: 0 kB Active: 28550464 kB Inactive: 25476416 kB Active(anon): 3771008 kB Inactive(anon): 767360 kB Active(file): 24779456 kB Inactive(file): 24709056 kB Unevictable: 15104 kB Mlocked: 15104 kB SwapTotal: 8384448 kB SwapFree: 8384448 kB Dirty: 0 kB -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f50.google.com (mail-pb0-f50.google.com [209.85.160.50]) by kanga.kvack.org (Postfix) with ESMTP id E1A526B0036 for ; Sun, 18 May 2014 21:47:15 -0400 (EDT) Received: by mail-pb0-f50.google.com with SMTP id ma3so5105426pbc.37 for ; Sun, 18 May 2014 18:47:15 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id ol8si2347030pbb.307.2014.05.18.18.47.13 for ; Sun, 18 May 2014 18:47:15 -0700 (PDT) Message-ID: <537962A0.4090600@lge.com> Date: Mon, 19 May 2014 10:47:12 +0900 From: Gioh Kim MIME-Version: 1.0 Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Nazarewicz , Joonsoo Kim , Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Thank you for your advice. I didn't notice it. I'm adding followings according to your advice: - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE* I think this can prevent the wrong kernel option. - change size_cmdline into default value SZ_16M I am not sure this can prevent if cma=0 cmdline option is also with base and limit options. I don't know how to send the second patch. Please pardon me that I just copy the patch here. --------------------------------- 8< ------------------------------------- From c283eaac41b044a2abb11cfd32a60fff034633c3 Mon Sep 17 00:00:00 2001 From: Gioh Kim Date: Fri, 16 May 2014 16:15:43 +0900 Subject: [PATCH] drivers/base/Kconfig: restrict CMA size to non-zero value The size of CMA area must be larger than zero. If the size is zero, all physically-contiguous allocation can be failed. Signed-off-by: Gioh Kim --- drivers/base/Kconfig | 14 ++++++++++++-- drivers/base/dma-contiguous.c | 3 ++- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 4b7b452..a7292ac 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -222,17 +222,27 @@ config DMA_CMA if DMA_CMA comment "Default contiguous memory area size:" +config CMA_SIZE_MBYTES_DEFAULT + int + default 16 + +config CMA_SIZE_MBYTES_MAX + int + default 1024 + config CMA_SIZE_MBYTES int "Size in Mega Bytes" depends on !CMA_SIZE_SEL_PERCENTAGE - default 16 + range 1 CMA_SIZE_MBYTES_MAX + default CMA_SIZE_MBYTES_DEFAULT help Defines the size (in MiB) of the default memory area for Contiguous - Memory Allocator. + Memory Allocator. This value must be larger than zero. config CMA_SIZE_PERCENTAGE int "Percentage of total memory" depends on !CMA_SIZE_SEL_MBYTES + range 1 100 default 10 help Defines the size of the default memory area for Contiguous Memory diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index b056661..5b70442 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -125,7 +125,8 @@ void __init dma_contiguous_reserve(phys_addr_t limit) pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit); if (size_cmdline != -1) { - selected_size = size_cmdline; + selected_size = ((size_cmdline == 0) ? + CONFIG_CMA_SIZE_MBYTES_DEFAULT : size_cmdline); selected_base = base_cmdline; selected_limit = min_not_zero(limit_cmdline, limit); if (base_cmdline + size_cmdline == limit_cmdline) -- 1.7.9.5 2014-05-17 i??i ? 2:45, Michal Nazarewicz i?' e,?: > On Fri, May 16 2014, Gioh Kim wrote: >> If CMA_SIZE_MBYTES is allowed to be zero, there should be defense code >> to check CMA is initlaized correctly. And atomic_pool initialization >> should be done by __alloc_remap_buffer instead of >> __alloc_from_contiguous if __alloc_from_contiguous is failed. > > Agreed, and this is the correct fix. > >> IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX >> configuration to be larger than zero. > > No, because it makes it impossible to have CMA disabled by default and > only enabled if command line argument is given. > > Furthermore, your patch does *not* guarantee CMA region to always be > allocated. If CMA_SIZE_SEL_PERCENTAGE is selected for instance. Or if > user explicitly passes 0 on command line. > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by kanga.kvack.org (Postfix) with ESMTP id 0A1A56B0036 for ; Sun, 18 May 2014 22:08:54 -0400 (EDT) Received: by mail-pa0-f47.google.com with SMTP id lf10so5008581pab.6 for ; Sun, 18 May 2014 19:08:54 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id vu2si8709454pbc.106.2014.05.18.19.08.53 for ; Sun, 18 May 2014 19:08:54 -0700 (PDT) Date: Mon, 19 May 2014 11:11:21 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519021121.GA19615@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140515024353.GA27599@bbox> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > Hey Joonsoo, > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > memory, this reserved memory can be used for movable memory allocation > > > > request. This usecase is beneficial to the system that needs this CMA > > > > reserved memory infrequently and it is one of main purpose of > > > > introducing CMA. > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > memory are hardly used for movable memory allocation. This is caused by > > > > combination of allocation and reclaim policy. > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > memory, that is, as fallback allocation. So the time this fallback > > > > allocation is started is under heavy memory pressure. Although it is under > > > > memory pressure, movable allocation easily succeed, since there would be > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > and reclaimable allocation, because they can't use the pages on cma > > > > reserved memory. These allocations regard system's free memory as > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > and reclaimable types and this would be really small amount. So watermark > > > > checking would be failed. It will wake up kswapd to make enough free > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > reclaim memory and try to make free memory over the high watermark. This > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > I found this problem on following experiment. > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > make -j24 > > > > > > > > CMA reserve: 0 MB 512 MB > > > > Elapsed-time: 234.8 361.8 > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > exhausted, allocate movable pages. > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > I love this idea but when I see the code, I don't like that. > > > In allocation path, just try to allocate pages by round-robin so it's role > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > reclaimer can filter it out during page scanning. > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > Hello, > > > > I agree with leaving fast allocation path as simple as possible. > > I will remove runtime computation for determining ratio in > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > on the other path. > > Sounds good. > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > is good or not. In my quick thought, that could be helpful in the > > situation that many free cma pages remained. But, it would be not helpful > > when there are neither free movable and cma pages. In generally, most > > workloads mainly uses movable pages for page cache or anonymous mapping. > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > pages are used mostly by movable allocation. We can handle these allocation > > request even if we reclaim the pages just in lru order. If we rotate > > the lru list for finding movable pages, it could cause more useful > > pages to be evicted. > > > > This is just my quick thought, so please let me correct if I am wrong. > > Why should reclaimer reclaim unnecessary pages? > So, your answer is that it would be better because upcoming newly allocated > pages would be allocated easily without interrupt. But it could reclaim > too much pages until watermark for unmovable allocation is okay. > Even, sometime, you might see OOM. > > Moreover, how could you handle current trobule? > For example, there is atomic allocation and the only thing to save the world > is kswapd because it's one of kswapd role but kswapd is spending many time to > reclaim CMA pages, which is pointless so the allocation would be easily failed. Hello, I guess that it isn't the problem. In lru, movable pages and cma pages would be interleaved. So it doesn't takes too long time to get the page for non-movable allocation. IMHO, in generally, memory shortage is made by movable allocation, so to distinguish allocation type and to handle them differently has marginal effect. Anyway, I will think more deeply. > > > > > > > > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > > > If possible, it would be better becauser it's generic function to check > > > free pages and cause trigger reclaim/compaction logic. > > > > I guess, your *it* means ratio computation. Right? > > I meant just get_page_from_freelist like fair zone allocation for consistency > but as we discussed offline, i'm not against with you if it's not right place. Okay :) Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f49.google.com (mail-pb0-f49.google.com [209.85.160.49]) by kanga.kvack.org (Postfix) with ESMTP id EAD3A6B0036 for ; Sun, 18 May 2014 22:10:07 -0400 (EDT) Received: by mail-pb0-f49.google.com with SMTP id jt11so5116106pbb.36 for ; Sun, 18 May 2014 19:10:07 -0700 (PDT) Received: from lgeamrelo04.lge.com (lgeamrelo04.lge.com. [156.147.1.127]) by mx.google.com with ESMTP id dh1si8692743pbc.198.2014.05.18.19.10.06 for ; Sun, 18 May 2014 19:10:07 -0700 (PDT) Date: Mon, 19 May 2014 11:12:34 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140519021234.GB19615@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> <8738gcae4h.fsf@linux.vnet.ibm.com> <20140515021055.GC10116@js1304-P5Q-DELUXE> <20140515094718.GE23991@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140515094718.GE23991@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: "Aneesh Kumar K.V" , Marek Szyprowski , Andrew Morton , Rik van Riel , Johannes Weiner , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , 'Tomasz Stanislawski' On Thu, May 15, 2014 at 10:47:18AM +0100, Mel Gorman wrote: > On Thu, May 15, 2014 at 11:10:55AM +0900, Joonsoo Kim wrote: > > > That doesn't always prefer CMA region. It would be nice to > > > understand why grouping in pageblock_nr_pages is beneficial. Also in > > > your patch you decrement nr_try_cma for every 'order' allocation. Why ? > > > > pageblock_nr_pages is just magic value with no rationale. :) > > I'm not following this discussions closely but there is rational to that > value -- it's the size of a huge page for that architecture. At the time > the fragmentation avoidance was implemented this was the largest allocation > size of interest. Hello, Indeed. There is a such good rationale. Really thanks for informing it. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f49.google.com (mail-pb0-f49.google.com [209.85.160.49]) by kanga.kvack.org (Postfix) with ESMTP id E5A4E6B0036 for ; Sun, 18 May 2014 22:26:55 -0400 (EDT) Received: by mail-pb0-f49.google.com with SMTP id jt11so5133058pbb.36 for ; Sun, 18 May 2014 19:26:55 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id xq7si697530pab.27.2014.05.18.19.26.53 for ; Sun, 18 May 2014 19:26:55 -0700 (PDT) Date: Mon, 19 May 2014 11:29:23 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519022922.GC19615@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <8761l8ah04.fsf@linux.vnet.ibm.com> <20140515015842.GB10116@js1304-P5Q-DELUXE> <87lhtzng53.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lhtzng53.fsf@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Sun, May 18, 2014 at 11:06:08PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote: > >> Joonsoo Kim writes: > >> > >> > >> > >> Another issue i am facing with the current code is the atomic allocation > >> failing even with large number of CMA pages around. In my case we never > >> reclaimed because large part of the memory is consumed by the page cache and > >> for that, free memory check doesn't include at free_cma. I will test > >> with this patchset and update here once i have the results. > >> > > > > Hello, > > > > Could you elaborate more on your issue? > > I can't completely understand your problem. > > So your atomic allocation is movable? And although there are many free > > cma pages, that request is fail? > > > > non movable atomic allocations are failing because we don't have > anything other than CMA pages left and kswapd is yet to catchup ? > > > swapper/0: page allocation failure: order:0, mode:0x20 > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.23-1500.pkvm2_1.5.ppc64 #1 > Call Trace: > [c000000ffffcb610] [c000000000017330] .show_stack+0x130/0x200 (unreliable) > [c000000ffffcb6e0] [c00000000087a8c8] .dump_stack+0x28/0x3c > [c000000ffffcb750] [c0000000001e06f0] .warn_alloc_failed+0x110/0x160 > [c000000ffffcb800] [c0000000001e5984] .__alloc_pages_nodemask+0x9d4/0xbf0 > [c000000ffffcb9e0] [c00000000023775c] .alloc_pages_current+0xcc/0x1b0 > [c000000ffffcba80] [c0000000007098d4] .__netdev_alloc_frag+0x1a4/0x1d0 > [c000000ffffcbb20] [c00000000070d750] .__netdev_alloc_skb+0xc0/0x130 > [c000000ffffcbbb0] [d000000009639b40] .tg3_poll_work+0x900/0x1110 [tg3] > [c000000ffffcbd10] [d00000000963a3a4] .tg3_poll_msix+0x54/0x200 [tg3] > [c000000ffffcbdb0] [c00000000071fcec] .net_rx_action+0x1dc/0x310 > [c000000ffffcbe90] [c0000000000c1b08] .__do_softirq+0x158/0x330 > [c000000ffffcbf90] [c000000000025744] .call_do_softirq+0x14/0x24 > [c000000ffffc7e00] [c000000000011684] .do_softirq+0xf4/0x130 > [c000000ffffc7e90] [c0000000000c1f18] .irq_exit+0xc8/0x110 > [c000000ffffc7f10] [c000000000011258] .__do_irq+0xc8/0x1f0 > [c000000ffffc7f90] [c000000000025768] .call_do_irq+0x14/0x24 > [c00000000137b750] [c00000000001142c] .do_IRQ+0xac/0x130 > [c00000000137b800] [c000000000002a64] > hardware_interrupt_common+0x164/0x180 > > .... > > > Node 0 DMA: 408*64kB (C) 408*128kB (C) 408*256kB (C) 408*512kB (C) 408*1024kB (C) 406*2048kB (C) 199*4096kB (C) 97*8192kB (C) 6*16384kB (C) = > 3348992kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16384kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB > > meminfo details: > > MemTotal: 65875584 kB > MemFree: 8001856 kB > Buffers: 49330368 kB > Cached: 178752 kB > SwapCached: 0 kB > Active: 28550464 kB > Inactive: 25476416 kB > Active(anon): 3771008 kB > Inactive(anon): 767360 kB > Active(file): 24779456 kB > Inactive(file): 24709056 kB > Unevictable: 15104 kB > Mlocked: 15104 kB > SwapTotal: 8384448 kB > SwapFree: 8384448 kB > Dirty: 0 kB > > -aneesh > Hello, I think that third patch in this patchset would solve this problem. Your problem may occur in following scenario. 1. Unmovable, reclaimable page are nearly empty. 2. There are some movable pages, so watermark checking is ok. 3. A lot of movable allocations are requested. 4. Most of movable pages are allocated. 5. But, watermark checking is still ok, because we have a lot of free cma pages and this allocation is for movable type. No waking up kswapd. 6. non-movable atomic allocation request => fail So, the problem is in step #5. Althoght we have enough pages for movable type, we should prepare allocation request for the others. With my third patch, kswapd could be woken by movable allocation, so your problem would disappreared. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f44.google.com (mail-pa0-f44.google.com [209.85.220.44]) by kanga.kvack.org (Postfix) with ESMTP id B3DB66B0036 for ; Sun, 18 May 2014 22:50:26 -0400 (EDT) Received: by mail-pa0-f44.google.com with SMTP id ld10so5115987pab.3 for ; Sun, 18 May 2014 19:50:26 -0700 (PDT) Received: from lgeamrelo01.lge.com (lgeamrelo01.lge.com. [156.147.1.125]) by mx.google.com with ESMTP id ek4si8738165pbc.210.2014.05.18.19.50.23 for ; Sun, 18 May 2014 19:50:25 -0700 (PDT) Date: Mon, 19 May 2014 11:53:05 +0900 From: Minchan Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519025305.GA13248@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519021121.GA19615@js1304-P5Q-DELUXE> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > Hey Joonsoo, > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > reserved memory infrequently and it is one of main purpose of > > > > > introducing CMA. > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > combination of allocation and reclaim policy. > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > reserved memory. These allocations regard system's free memory as > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > make -j24 > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > Elapsed-time: 234.8 361.8 > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > exhausted, allocate movable pages. > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > reclaimer can filter it out during page scanning. > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > Hello, > > > > > > I agree with leaving fast allocation path as simple as possible. > > > I will remove runtime computation for determining ratio in > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > on the other path. > > > > Sounds good. > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > is good or not. In my quick thought, that could be helpful in the > > > situation that many free cma pages remained. But, it would be not helpful > > > when there are neither free movable and cma pages. In generally, most > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > pages are used mostly by movable allocation. We can handle these allocation > > > request even if we reclaim the pages just in lru order. If we rotate > > > the lru list for finding movable pages, it could cause more useful > > > pages to be evicted. > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > Why should reclaimer reclaim unnecessary pages? > > So, your answer is that it would be better because upcoming newly allocated > > pages would be allocated easily without interrupt. But it could reclaim > > too much pages until watermark for unmovable allocation is okay. > > Even, sometime, you might see OOM. > > > > Moreover, how could you handle current trobule? > > For example, there is atomic allocation and the only thing to save the world > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > Hello, > > I guess that it isn't the problem. In lru, movable pages and cma pages > would be interleaved. So it doesn't takes too long time to get the > page for non-movable allocation. Please, don't assume there are ideal LRU ordering. Newly allocated page by fairness allocation is located by head of LRU while old pages are approaching the tail so there is huge time gab. During the time, old pages could be dropped/promoting so one of side could be filled with one type rather than interleaving both types pages you expected. Additionally, if you uses syncable backed device like ramdisk/zram or something, pageout can be synchronized with page I/O. In this case, reclaim time wouldn't be trivial than async I/O. For exmaple, zram-swap case, it needs page copy + comperssion and the speed depends on your CPU speed. > > IMHO, in generally, memory shortage is made by movable allocation, so > to distinguish allocation type and to handle them differently has > marginal effect. Again, please don't think workloads you know only and open the various possiblity from the design although such consideration doesn't make code ugly. > > Anyway, I will think more deeply. Yes, Please. > > > > > > > > > > > > > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > > > > If possible, it would be better becauser it's generic function to check > > > > free pages and cause trigger reclaim/compaction logic. > > > > > > I guess, your *it* means ratio computation. Right? > > > > I meant just get_page_from_freelist like fair zone allocation for consistency > > but as we discussed offline, i'm not against with you if it's not right place. > > Okay :) > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) by kanga.kvack.org (Postfix) with ESMTP id 1AC026B0036 for ; Mon, 19 May 2014 00:47:34 -0400 (EDT) Received: by mail-pa0-f52.google.com with SMTP id fa1so5248456pad.11 for ; Sun, 18 May 2014 21:47:34 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id be6si17963103pac.1.2014.05.18.21.47.32 for ; Sun, 18 May 2014 21:47:33 -0700 (PDT) Date: Mon, 19 May 2014 13:50:01 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519045001.GA23916@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> <20140519025305.GA13248@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519025305.GA13248@bbox> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote: > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > > Hey Joonsoo, > > > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > > reserved memory infrequently and it is one of main purpose of > > > > > > introducing CMA. > > > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > > combination of allocation and reclaim policy. > > > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > > reserved memory. These allocations regard system's free memory as > > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > > make -j24 > > > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > > Elapsed-time: 234.8 361.8 > > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > > exhausted, allocate movable pages. > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > > reclaimer can filter it out during page scanning. > > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > > > Hello, > > > > > > > > I agree with leaving fast allocation path as simple as possible. > > > > I will remove runtime computation for determining ratio in > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > > on the other path. > > > > > > Sounds good. > > > > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > > is good or not. In my quick thought, that could be helpful in the > > > > situation that many free cma pages remained. But, it would be not helpful > > > > when there are neither free movable and cma pages. In generally, most > > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > > pages are used mostly by movable allocation. We can handle these allocation > > > > request even if we reclaim the pages just in lru order. If we rotate > > > > the lru list for finding movable pages, it could cause more useful > > > > pages to be evicted. > > > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > > > Why should reclaimer reclaim unnecessary pages? > > > So, your answer is that it would be better because upcoming newly allocated > > > pages would be allocated easily without interrupt. But it could reclaim > > > too much pages until watermark for unmovable allocation is okay. > > > Even, sometime, you might see OOM. > > > > > > Moreover, how could you handle current trobule? > > > For example, there is atomic allocation and the only thing to save the world > > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > Hello, > > > > I guess that it isn't the problem. In lru, movable pages and cma pages > > would be interleaved. So it doesn't takes too long time to get the > > page for non-movable allocation. > > Please, don't assume there are ideal LRU ordering. > Newly allocated page by fairness allocation is located by head of LRU > while old pages are approaching the tail so there is huge time gab. > During the time, old pages could be dropped/promoting so one of side > could be filled with one type rather than interleaving both types pages > you expected. I assumed general case, not ideal case. Your example can be possible, but would be corner case. > > Additionally, if you uses syncable backed device like ramdisk/zram > or something, pageout can be synchronized with page I/O. > In this case, reclaim time wouldn't be trivial than async I/O. > For exmaple, zram-swap case, it needs page copy + comperssion and > the speed depends on your CPU speed. This is a general problem what zram-swap have, although reclaiming cma pages worse the situation. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id 03AE06B0036 for ; Mon, 19 May 2014 01:52:59 -0400 (EDT) Received: by mail-pa0-f45.google.com with SMTP id ey11so5289740pad.32 for ; Sun, 18 May 2014 22:52:59 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id wp2si18116305pab.65.2014.05.18.22.52.57 for ; Sun, 18 May 2014 22:52:59 -0700 (PDT) Date: Mon, 19 May 2014 14:55:27 +0900 From: Joonsoo Kim Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value Message-ID: <20140519055527.GA24099@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <537962A0.4090600@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Gioh Kim Cc: Michal Nazarewicz , Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?utf-8?B?7J206rG07Zi4?= , gurugio@gmail.com On Mon, May 19, 2014 at 10:47:12AM +0900, Gioh Kim wrote: > Thank you for your advice. I didn't notice it. > > I'm adding followings according to your advice: > > - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE* > I think this can prevent the wrong kernel option. > > - change size_cmdline into default value SZ_16M > I am not sure this can prevent if cma=0 cmdline option is also with base and limit options. Hello, I think that this problem is originated from atomic_pool_init(). If configured coherent_pool size is larger than default cma size, it can be failed even if this patch is applied. How about below patch? It uses fallback allocation if CMA is failed. Thanks. -----------------8<--------------------- diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 6b00be1..2909ab9 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void) unsigned long *bitmap; struct page *page; struct page **pages; - void *ptr; + void *ptr = NULL; int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long); bitmap = kzalloc(bitmap_size, GFP_KERNEL); @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void) if (IS_ENABLED(CONFIG_DMA_CMA)) ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page, atomic_pool_init); - else + if (!ptr) ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page, atomic_pool_init); if (ptr) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f43.google.com (mail-pb0-f43.google.com [209.85.160.43]) by kanga.kvack.org (Postfix) with ESMTP id 7D5256B0036 for ; Mon, 19 May 2014 05:14:22 -0400 (EDT) Received: by mail-pb0-f43.google.com with SMTP id up15so5620715pbc.2 for ; Mon, 19 May 2014 02:14:22 -0700 (PDT) Received: from lgemrelse6q.lge.com (LGEMRELSE6Q.lge.com. [156.147.1.121]) by mx.google.com with ESMTP id hu10si9268017pbc.358.2014.05.19.02.14.20 for ; Mon, 19 May 2014 02:14:21 -0700 (PDT) Message-ID: <5379CB66.7090607@lge.com> Date: Mon, 19 May 2014 18:14:14 +0900 From: Gioh Kim MIME-Version: 1.0 Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> In-Reply-To: <20140519055527.GA24099@js1304-P5Q-DELUXE> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Michal Nazarewicz , Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com In __dma_alloc function, your patch can make __alloc_from_pool work. But __alloc_from_contiguous doesn't work. Therefore __dma_alloc sometimes works and sometimes not according to the gfp(__GFP_WAIT) flag. Do I understand correctly? I think __dma_alloc should work consistently. Both of __alloc_from_contiguous and __alloc_from_pool should work together, or both of them do not work. 2014-05-19 i??i?? 2:55, Joonsoo Kim i?' e,?: > On Mon, May 19, 2014 at 10:47:12AM +0900, Gioh Kim wrote: >> Thank you for your advice. I didn't notice it. >> >> I'm adding followings according to your advice: >> >> - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE* >> I think this can prevent the wrong kernel option. >> >> - change size_cmdline into default value SZ_16M >> I am not sure this can prevent if cma=0 cmdline option is also with base and limit options. > > Hello, > > I think that this problem is originated from atomic_pool_init(). > If configured coherent_pool size is larger than default cma size, > it can be failed even if this patch is applied. > > How about below patch? > It uses fallback allocation if CMA is failed. > > Thanks. > > -----------------8<--------------------- > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index 6b00be1..2909ab9 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void) > unsigned long *bitmap; > struct page *page; > struct page **pages; > - void *ptr; > + void *ptr = NULL; > int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long); > > bitmap = kzalloc(bitmap_size, GFP_KERNEL); > @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void) > if (IS_ENABLED(CONFIG_DMA_CMA)) > ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page, > atomic_pool_init); > - else > + if (!ptr) > ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page, > atomic_pool_init); > if (ptr) { > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by kanga.kvack.org (Postfix) with ESMTP id 04E676B0038 for ; Mon, 19 May 2014 15:59:33 -0400 (EDT) Received: by mail-pa0-f41.google.com with SMTP id lj1so6250102pab.28 for ; Mon, 19 May 2014 12:59:33 -0700 (PDT) Received: from mail-pa0-x236.google.com (mail-pa0-x236.google.com [2607:f8b0:400e:c03::236]) by mx.google.com with ESMTPS id yv2si20936517pac.23.2014.05.19.12.59.31 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 12:59:31 -0700 (PDT) Received: by mail-pa0-f54.google.com with SMTP id bj1so6188997pad.41 for ; Mon, 19 May 2014 12:59:31 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value In-Reply-To: <20140519055527.GA24099@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> Date: Mon, 19 May 2014 09:59:22 -1000 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Gioh Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?utf-8?B?7J206rG07Zi4?= , gurugio@gmail.com --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sun, May 18 2014, Joonsoo Kim wrote: > I think that this problem is originated from atomic_pool_init(). > If configured coherent_pool size is larger than default cma size, > it can be failed even if this patch is applied. > > How about below patch? > It uses fallback allocation if CMA is failed. Yes, I thought about it, but __dma_alloc uses similar code: else if (!IS_ENABLED(CONFIG_DMA_CMA)) addr =3D __alloc_remap_buffer(dev, size, gfp, prot, &page, caller); else addr =3D __alloc_from_contiguous(dev, size, prot, &page, caller); so it probably needs to be changed as well. > -----------------8<--------------------- > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index 6b00be1..2909ab9 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void) > unsigned long *bitmap; > struct page *page; > struct page **pages; > - void *ptr; > + void *ptr =3D NULL; > int bitmap_size =3D BITS_TO_LONGS(nr_pages) * sizeof(long); >=20=20 > bitmap =3D kzalloc(bitmap_size, GFP_KERNEL); > @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void) > if (IS_ENABLED(CONFIG_DMA_CMA)) > ptr =3D __alloc_from_contiguous(NULL, pool->size, prot, &= page, > atomic_pool_init); > - else > + if (!ptr) > ptr =3D __alloc_remap_buffer(NULL, pool->size, gfp, prot,= &page, > atomic_pool_init); > if (ptr) { > --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTemKaAAoJECBgQBJQdR/0o2UP/2G4b+KPDsHmMb6HsfXWyaPP cm0tokTR/UKmMO68Pb6Wgsokt34/aS6KCHrVHqni7lxtuR0Zb28gdQaLcya2Lb0S lhlaoEQcElAUfxTVLAUChlb4L6TZcf7dUOPN317rqepvOp7K98FNENqWhyK5hkSC 5H+SYCB+7rn3+4ApQ/xFL7XCoA7C85qsxnZEa35R/FMVI2zv70xcLIakiV/4XZ3W eXBsHEj7X1ZRnIBAARA2VBzMaMMAhAUYzSRwTSP+gBqJ53M4bae7FX7Kml81U4ra V3VtWt78hZ+fY3hljuIPFmICV6vRsbv7Opg2TQHbU5ekKf8Mr8Y+D8Xo/U6hugv+ SjdcC8+Edsa0m4bO6Blhz4GM5eHoX7cOxmyDxIPuGAPRZeiDdmTvduSzfvd6oDZ+ 9QvSLi41co0SdSrOuSpc3gtqmIOkFZ3vhgycmZAXmbdI96rq29VB/deqFGaUorgw X24ENPlMxH2Z/84KV3EAQM+pR2MHZIesxB/7hRbaHVRKCD/wZ1MrtCbg4RAPceSA 1Lyyzsr68yAlcteyA+HxLQAeGh2fwfJ8Bz6iIJR1pBFdeBUj+T6l+cgz8N1YM5HE dnijLmb6u70SvMAAvhRd0EITICXTHhb2xlhXfiT+jwJfw6+iiYNK3jTP7HNNA7BW UiWN72LDf6SF0tvtqSm/ =uOv/ -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by kanga.kvack.org (Postfix) with ESMTP id 23FBA6B0037 for ; Mon, 19 May 2014 19:16:15 -0400 (EDT) Received: by mail-pa0-f53.google.com with SMTP id kp14so6392198pab.26 for ; Mon, 19 May 2014 16:16:14 -0700 (PDT) Received: from lgeamrelo04.lge.com (lgeamrelo04.lge.com. [156.147.1.127]) by mx.google.com with ESMTP id ab2si21441510pad.96.2014.05.19.16.16.13 for ; Mon, 19 May 2014 16:16:14 -0700 (PDT) Date: Tue, 20 May 2014 08:18:59 +0900 From: Minchan Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519231859.GA21636@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> <20140519025305.GA13248@bbox> <20140519045001.GA23916@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519045001.GA23916@js1304-P5Q-DELUXE> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski On Mon, May 19, 2014 at 01:50:01PM +0900, Joonsoo Kim wrote: > On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote: > > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > > > Hey Joonsoo, > > > > > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > > > reserved memory infrequently and it is one of main purpose of > > > > > > > introducing CMA. > > > > > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > > > combination of allocation and reclaim policy. > > > > > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > > > reserved memory. These allocations regard system's free memory as > > > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > > > make -j24 > > > > > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > > > Elapsed-time: 234.8 361.8 > > > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > > > exhausted, allocate movable pages. > > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > > > reclaimer can filter it out during page scanning. > > > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > > > > > Hello, > > > > > > > > > > I agree with leaving fast allocation path as simple as possible. > > > > > I will remove runtime computation for determining ratio in > > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > > > on the other path. > > > > > > > > Sounds good. > > > > > > > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > > > is good or not. In my quick thought, that could be helpful in the > > > > > situation that many free cma pages remained. But, it would be not helpful > > > > > when there are neither free movable and cma pages. In generally, most > > > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > > > pages are used mostly by movable allocation. We can handle these allocation > > > > > request even if we reclaim the pages just in lru order. If we rotate > > > > > the lru list for finding movable pages, it could cause more useful > > > > > pages to be evicted. > > > > > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > > > > > Why should reclaimer reclaim unnecessary pages? > > > > So, your answer is that it would be better because upcoming newly allocated > > > > pages would be allocated easily without interrupt. But it could reclaim > > > > too much pages until watermark for unmovable allocation is okay. > > > > Even, sometime, you might see OOM. > > > > > > > > Moreover, how could you handle current trobule? > > > > For example, there is atomic allocation and the only thing to save the world > > > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > > > Hello, > > > > > > I guess that it isn't the problem. In lru, movable pages and cma pages > > > would be interleaved. So it doesn't takes too long time to get the > > > page for non-movable allocation. > > > > Please, don't assume there are ideal LRU ordering. > > Newly allocated page by fairness allocation is located by head of LRU > > while old pages are approaching the tail so there is huge time gab. > > During the time, old pages could be dropped/promoting so one of side > > could be filled with one type rather than interleaving both types pages > > you expected. > > I assumed general case, not ideal case. > Your example can be possible, but would be corner case. I talked with Joonsoo yesterday and should post our conclusion for other reviewers/maintainers. It's not a corner case and it could happen depending on zone and CMA configuration. For example, there is 330M high zone and CMA consumes 300M in the space while normal movable area consumes just 30M. In the case, unmovable allocation could make too many unnecessary reclaiming of the zone so the conclusion we reached is to need target reclaiming(ex, isolate_mode_t). But not sure it should be part of this patchset because this patchset is surely enhance(ie, before, it was hard to allocate page from CMA area but this patchset makes it works) but this patchset could make mentioned problem as side-effect so I think we could solve the issue(ie, too many reclaiming in unbalanced zone) in another patchset. Joonsoo, please mention this problem in the description when you respin so other MM guys can notice that and give ideas, which would be helpful a lot. > > > > > Additionally, if you uses syncable backed device like ramdisk/zram > > or something, pageout can be synchronized with page I/O. > > In this case, reclaim time wouldn't be trivial than async I/O. > > For exmaple, zram-swap case, it needs page copy + comperssion and > > the speed depends on your CPU speed. > > This is a general problem what zram-swap have, > although reclaiming cma pages worse the situation. > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by kanga.kvack.org (Postfix) with ESMTP id A5DEF6B0037 for ; Mon, 19 May 2014 19:19:30 -0400 (EDT) Received: by mail-pa0-f54.google.com with SMTP id bj1so6440395pad.41 for ; Mon, 19 May 2014 16:19:30 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id gp6si21428605pac.215.2014.05.19.16.19.28 for ; Mon, 19 May 2014 16:19:29 -0700 (PDT) Date: Tue, 20 May 2014 08:22:15 +0900 From: Minchan Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519232215.GB21636@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <53742A4B.4090901@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53742A4B.4090901@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Heesub Shin Cc: Joonsoo Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Mel Gorman , Johannes Weiner , Marek Szyprowski On Thu, May 15, 2014 at 11:45:31AM +0900, Heesub Shin wrote: > Hello, > > On 05/15/2014 10:53 AM, Joonsoo Kim wrote: > >On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > >>Hey Joonsoo, > >> > >>On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > >>>CMA is introduced to provide physically contiguous pages at runtime. > >>>For this purpose, it reserves memory at boot time. Although it reserve > >>>memory, this reserved memory can be used for movable memory allocation > >>>request. This usecase is beneficial to the system that needs this CMA > >>>reserved memory infrequently and it is one of main purpose of > >>>introducing CMA. > >>> > >>>But, there is a problem in current implementation. The problem is that > >>>it works like as just reserved memory approach. The pages on cma reserved > >>>memory are hardly used for movable memory allocation. This is caused by > >>>combination of allocation and reclaim policy. > >>> > >>>The pages on cma reserved memory are allocated if there is no movable > >>>memory, that is, as fallback allocation. So the time this fallback > >>>allocation is started is under heavy memory pressure. Although it is under > >>>memory pressure, movable allocation easily succeed, since there would be > >>>many pages on cma reserved memory. But this is not the case for unmovable > >>>and reclaimable allocation, because they can't use the pages on cma > >>>reserved memory. These allocations regard system's free memory as > >>>(free pages - free cma pages) on watermark checking, that is, free > >>>unmovable pages + free reclaimable pages + free movable pages. Because > >>>we already exhausted movable pages, only free pages we have are unmovable > >>>and reclaimable types and this would be really small amount. So watermark > >>>checking would be failed. It will wake up kswapd to make enough free > >>>memory for unmovable and reclaimable allocation and kswapd will do. > >>>So before we fully utilize pages on cma reserved memory, kswapd start to > >>>reclaim memory and try to make free memory over the high watermark. This > >>>watermark checking by kswapd doesn't take care free cma pages so many > >>>movable pages would be reclaimed. After then, we have a lot of movable > >>>pages again, so fallback allocation doesn't happen again. To conclude, > >>>amount of free memory on meminfo which includes free CMA pages is moving > >>>around 512 MB if I reserve 512 MB memory for CMA. > >>> > >>>I found this problem on following experiment. > >>> > >>>4 CPUs, 1024 MB, VIRTUAL MACHINE > >>>make -j24 > >>> > >>>CMA reserve: 0 MB 512 MB > >>>Elapsed-time: 234.8 361.8 > >>>Average-MemFree: 283880 KB 530851 KB > >>> > >>>To solve this problem, I can think following 2 possible solutions. > >>>1. allocate the pages on cma reserved memory first, and if they are > >>> exhausted, allocate movable pages. > >>>2. interleaved allocation: try to allocate specific amounts of memory > >>> from cma reserved memory and then allocate from free movable memory. > >> > >>I love this idea but when I see the code, I don't like that. > >>In allocation path, just try to allocate pages by round-robin so it's role > >>of allocator. If one of migratetype is full, just pass mission to reclaimer > >>with hint(ie, Hey reclaimer, it's non-movable allocation fail > >>so there is pointless if you reclaim MIGRATE_CMA pages) so that > >>reclaimer can filter it out during page scanning. > >>We already have an tool to achieve it(ie, isolate_mode_t). > > > >Hello, > > > >I agree with leaving fast allocation path as simple as possible. > >I will remove runtime computation for determining ratio in > >__rmqueue_cma() and, instead, will use pre-computed value calculated > >on the other path. > > > >I am not sure that whether your second suggestion(Hey relaimer part) > >is good or not. In my quick thought, that could be helpful in the > >situation that many free cma pages remained. But, it would be not helpful > >when there are neither free movable and cma pages. In generally, most > >workloads mainly uses movable pages for page cache or anonymous mapping. > >Although reclaim is triggered by non-movable allocation failure, reclaimed > >pages are used mostly by movable allocation. We can handle these allocation > >request even if we reclaim the pages just in lru order. If we rotate > >the lru list for finding movable pages, it could cause more useful > >pages to be evicted. > > > >This is just my quick thought, so please let me correct if I am wrong. > > We have an out of tree implementation that is completely the same > with the approach Minchan said and it works, but it has definitely > some side-effects as you pointed, distorting the LRU and evicting > hot pages. I do not attach code fragments in this thread for some > reasons, but it must be easy for yourself. I am wondering if it > could help also in your case. > > Thanks, > Heesub Heesub, To be sure, did you try round-robin allocate like Joonsoo's approach and happend such LRU churning problem? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by kanga.kvack.org (Postfix) with ESMTP id 6F9AB6B0036 for ; Mon, 19 May 2014 20:50:18 -0400 (EDT) Received: by mail-pa0-f41.google.com with SMTP id lj1so6517186pab.14 for ; Mon, 19 May 2014 17:50:18 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id ai10si21762652pad.94.2014.05.19.17.50.16 for ; Mon, 19 May 2014 17:50:17 -0700 (PDT) Message-ID: <537AA6C7.1040506@lge.com> Date: Tue, 20 May 2014 09:50:15 +0900 From: Gioh Kim MIME-Version: 1.0 Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Nazarewicz , Joonsoo Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com 2014-05-20 i??i ? 4:59, Michal Nazarewicz i?' e,?: > On Sun, May 18 2014, Joonsoo Kim wrote: >> I think that this problem is originated from atomic_pool_init(). >> If configured coherent_pool size is larger than default cma size, >> it can be failed even if this patch is applied. The coherent_pool size (atomic_pool.size) should be restricted smaller than cma size. This is another issue, however I think the default atomic pool size is too small. Only one port of USB host needs at most 256Kbytes coherent memory (according to the USB host spec). If a platform has several ports, it needs more than 1MB. Therefore the default atomic pool size should be at least 1MB. >> >> How about below patch? >> It uses fallback allocation if CMA is failed. > > Yes, I thought about it, but __dma_alloc uses similar code: > > else if (!IS_ENABLED(CONFIG_DMA_CMA)) > addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller); > else > addr = __alloc_from_contiguous(dev, size, prot, &page, caller); > > so it probably needs to be changed as well. If CMA option is not selected, __alloc_from_contiguous would not be called. We don't need to the fallback allocation. And if CMA option is selected and initialized correctly, the cma allocation can fail in case of no-CMA-memory situation. I thinks in that case we don't need to the fallback allocation also, because it is normal case. Therefore I think the restriction of CMA size option and make CMA work can cover every cases. I think below patch is also good choice. If both of you, Michal and Joonsoo, do not agree with me, please inform me. I will make a patch including option restriction and fallback allocation. > >> -----------------8<--------------------- >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c >> index 6b00be1..2909ab9 100644 >> --- a/arch/arm/mm/dma-mapping.c >> +++ b/arch/arm/mm/dma-mapping.c >> @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void) >> unsigned long *bitmap; >> struct page *page; >> struct page **pages; >> - void *ptr; >> + void *ptr = NULL; >> int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long); >> >> bitmap = kzalloc(bitmap_size, GFP_KERNEL); >> @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void) >> if (IS_ENABLED(CONFIG_DMA_CMA)) >> ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page, >> atomic_pool_init); >> - else >> + if (!ptr) >> ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page, >> atomic_pool_init); >> if (ptr) { >> > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f41.google.com (mail-pb0-f41.google.com [209.85.160.41]) by kanga.kvack.org (Postfix) with ESMTP id ACF586B0036 for ; Mon, 19 May 2014 21:28:37 -0400 (EDT) Received: by mail-pb0-f41.google.com with SMTP id uo5so6597665pbc.0 for ; Mon, 19 May 2014 18:28:37 -0700 (PDT) Received: from mail-pd0-x22e.google.com (mail-pd0-x22e.google.com [2607:f8b0:400e:c02::22e]) by mx.google.com with ESMTPS id ny3si9719364pab.230.2014.05.19.18.28.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 18:28:36 -0700 (PDT) Received: by mail-pd0-f174.google.com with SMTP id r10so56758pdi.5 for ; Mon, 19 May 2014 18:28:36 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value In-Reply-To: <537AA6C7.1040506@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> Date: Mon, 19 May 2014 15:28:24 -1000 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Gioh Kim , Joonsoo Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?utf-8?B?7J206rG07Zi4?= , gurugio@gmail.com --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Mon, May 19 2014, Gioh Kim wrote: > If CMA option is not selected, __alloc_from_contiguous would not be > called. We don't need to the fallback allocation. > > And if CMA option is selected and initialized correctly, > the cma allocation can fail in case of no-CMA-memory situation. > I thinks in that case we don't need to the fallback allocation also, > because it is normal case. > > Therefore I think the restriction of CMA size option and make CMA work > can cover every cases. Wait, you just wrote that if CMA is not initialised correctly, it's fine for atomic pool initialisation to fail, but if CMA size is initialised correctly but too small, this is somehow worse situation? I'm a bit confused to be honest. IMO, cma=3D0 command line argument should be supported, as should having the default CMA size zero. If CMA size is set to zero, kernel should behave as if CMA was not enabled at compile time. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTeq+5AAoJECBgQBJQdR/0Ns4P/2+MSDCVhcRh8a2OpEG35FsZ MY48W6w7LnXneI+SS2/Bx3hHbK4PDuF6DViY/thZ0VYEZ3rg0iaD4v8545LRWE5Z 0GUnjPy9/iPX1jJMnhHJChfYD0D3/l6j+io9TcaBDnsTm+i4zY4Y7R2DyPYZIYDA RRp1JxkCdcVJ3zF6EqM/9hWPZbrrB6WYB46Ig9lG3IBGUsVdNR3TmAhdwx49IAp3 BPWGJIEKji0HHC0mnvgEzf822bwZc2w1DqpzarJhUYEuxvOyqw3E29mCjNwS9ME4 8aIqWlPka1rqTPylLrspz+P0rFfovag4SHVVLUSqOLvvSgUAqDh/20L9j7+qinmB PyhQLlH5s38n7cfVPn/DKSB1u8Stpjgen/aydHqDHIiHg/Ng6h9Eb3IZoNMkMAIA jmpAm3zShgkZJNhkCxwHkWn+mUqo3E3o8cmxE6/b2L0VdO06KIzXZ6jsR4Biy/1s HI/FocpbzbjHbN+PqpJwgmWOn6ih5+CXPXYaVT20hban5v4jPffor5LbhSStWAeE K7lYCtuLr6APwB/8/TOwzoKNdLicynZd2s0xLw407RTBtr/MF6sGH2p2rcRnKctB dOsFd7P1jEzPbRM9AtDlNpYBaImiRNNs2nubEEFC+11ciZrs18PEZ7K/ICJQGO/p 0FSiK86mX2eIZFAgZRj/ =Th+9 -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id A060C6B0036 for ; Mon, 19 May 2014 22:26:59 -0400 (EDT) Received: by mail-pa0-f50.google.com with SMTP id fb1so6629576pad.9 for ; Mon, 19 May 2014 19:26:59 -0700 (PDT) Received: from lgemrelse6q.lge.com (LGEMRELSE6Q.lge.com. [156.147.1.121]) by mx.google.com with ESMTP id os9si22260119pac.155.2014.05.19.19.26.57 for ; Mon, 19 May 2014 19:26:58 -0700 (PDT) Message-ID: <537ABD6F.9090608@lge.com> Date: Tue, 20 May 2014 11:26:55 +0900 From: Gioh Kim MIME-Version: 1.0 Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Nazarewicz , Joonsoo Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com 2014-05-20 i??i ? 10:28, Michal Nazarewicz i?' e,?: > On Mon, May 19 2014, Gioh Kim wrote: >> If CMA option is not selected, __alloc_from_contiguous would not be >> called. We don't need to the fallback allocation. >> >> And if CMA option is selected and initialized correctly, >> the cma allocation can fail in case of no-CMA-memory situation. >> I thinks in that case we don't need to the fallback allocation also, >> because it is normal case. >> >> Therefore I think the restriction of CMA size option and make CMA work >> can cover every cases. > > Wait, you just wrote that if CMA is not initialised correctly, it's fine > for atomic pool initialisation to fail, but if CMA size is initialised > correctly but too small, this is somehow worse situation? I'm a bit > confused to be honest. I'm sorry to confuse you. Please forgive my poor English. My point is atomic_pool should be able to work with/without CMA. > > IMO, cma=0 command line argument should be supported, as should having > the default CMA size zero. If CMA size is set to zero, kernel should > behave as if CMA was not enabled at compile time. It's also good if atomic_pool can work well with zero CMA size. I can give up my patch. But Joonsoo's patch should be applied. Joonsoo, can you please send the full patch to maintainers? > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f178.google.com (mail-pd0-f178.google.com [209.85.192.178]) by kanga.kvack.org (Postfix) with ESMTP id 893D96B0036 for ; Tue, 20 May 2014 02:31:11 -0400 (EDT) Received: by mail-pd0-f178.google.com with SMTP id v10so18697pde.23 for ; Mon, 19 May 2014 23:31:11 -0700 (PDT) Received: from lgeamrelo04.lge.com (lgeamrelo04.lge.com. [156.147.1.127]) by mx.google.com with ESMTP id iw8si370193pbc.137.2014.05.19.23.31.09 for ; Mon, 19 May 2014 23:31:10 -0700 (PDT) Date: Tue, 20 May 2014 15:33:42 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140520063342.GA8315@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> <20140519025305.GA13248@bbox> <20140519045001.GA23916@js1304-P5Q-DELUXE> <20140519231859.GA21636@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519231859.GA21636@bbox> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski On Tue, May 20, 2014 at 08:18:59AM +0900, Minchan Kim wrote: > On Mon, May 19, 2014 at 01:50:01PM +0900, Joonsoo Kim wrote: > > On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote: > > > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > > > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > > > > Hey Joonsoo, > > > > > > > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > > > > reserved memory infrequently and it is one of main purpose of > > > > > > > > introducing CMA. > > > > > > > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > > > > combination of allocation and reclaim policy. > > > > > > > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > > > > reserved memory. These allocations regard system's free memory as > > > > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > > > > make -j24 > > > > > > > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > > > > Elapsed-time: 234.8 361.8 > > > > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > > > > exhausted, allocate movable pages. > > > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > > > > reclaimer can filter it out during page scanning. > > > > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > > > > > > > Hello, > > > > > > > > > > > > I agree with leaving fast allocation path as simple as possible. > > > > > > I will remove runtime computation for determining ratio in > > > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > > > > on the other path. > > > > > > > > > > Sounds good. > > > > > > > > > > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > > > > is good or not. In my quick thought, that could be helpful in the > > > > > > situation that many free cma pages remained. But, it would be not helpful > > > > > > when there are neither free movable and cma pages. In generally, most > > > > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > > > > pages are used mostly by movable allocation. We can handle these allocation > > > > > > request even if we reclaim the pages just in lru order. If we rotate > > > > > > the lru list for finding movable pages, it could cause more useful > > > > > > pages to be evicted. > > > > > > > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > > > > > > > Why should reclaimer reclaim unnecessary pages? > > > > > So, your answer is that it would be better because upcoming newly allocated > > > > > pages would be allocated easily without interrupt. But it could reclaim > > > > > too much pages until watermark for unmovable allocation is okay. > > > > > Even, sometime, you might see OOM. > > > > > > > > > > Moreover, how could you handle current trobule? > > > > > For example, there is atomic allocation and the only thing to save the world > > > > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > > > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > > > > > Hello, > > > > > > > > I guess that it isn't the problem. In lru, movable pages and cma pages > > > > would be interleaved. So it doesn't takes too long time to get the > > > > page for non-movable allocation. > > > > > > Please, don't assume there are ideal LRU ordering. > > > Newly allocated page by fairness allocation is located by head of LRU > > > while old pages are approaching the tail so there is huge time gab. > > > During the time, old pages could be dropped/promoting so one of side > > > could be filled with one type rather than interleaving both types pages > > > you expected. > > > > I assumed general case, not ideal case. > > Your example can be possible, but would be corner case. > > I talked with Joonsoo yesterday and should post our conclusion > for other reviewers/maintainers. > > It's not a corner case and it could happen depending on zone and CMA > configuration. For example, there is 330M high zone and CMA consumes > 300M in the space while normal movable area consumes just 30M. > In the case, unmovable allocation could make too many unnecessary > reclaiming of the zone so the conclusion we reached is to need target > reclaiming(ex, isolate_mode_t). > > But not sure it should be part of this patchset because this patchset > is surely enhance(ie, before, it was hard to allocate page from CMA area > but this patchset makes it works) but this patchset could make mentioned > problem as side-effect so I think we could solve the issue(ie, too many > reclaiming in unbalanced zone) in another patchset. > > Joonsoo, please mention this problem in the description when you respin > so other MM guys can notice that and give ideas, which would be helpful > a lot. Okay. Will do :) Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com [209.85.192.169]) by kanga.kvack.org (Postfix) with ESMTP id 6E02E6B0038 for ; Tue, 20 May 2014 07:38:17 -0400 (EDT) Received: by mail-pd0-f169.google.com with SMTP id w10so245402pde.14 for ; Tue, 20 May 2014 04:38:17 -0700 (PDT) Received: from mailout2.w1.samsung.com (mailout2.w1.samsung.com. [210.118.77.12]) by mx.google.com with ESMTPS id io2si1395713pbc.125.2014.05.20.04.38.16 for (version=TLSv1 cipher=RC4-MD5 bits=128/128); Tue, 20 May 2014 04:38:16 -0700 (PDT) MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout2.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0N5V006KHEZEJTA0@mailout2.w1.samsung.com> for linux-mm@kvack.org; Tue, 20 May 2014 12:38:02 +0100 (BST) Content-transfer-encoding: 8BIT Message-id: <537B3EA5.2040302@samsung.com> Date: Tue, 20 May 2014 13:38:13 +0200 From: Marek Szyprowski Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> In-reply-to: <537AA6C7.1040506@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Gioh Kim , Michal Nazarewicz , Joonsoo Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Hello, On 2014-05-20 02:50, Gioh Kim wrote: > > > 2014-05-20 i??i ? 4:59, Michal Nazarewicz i?' e,?: >> On Sun, May 18 2014, Joonsoo Kim wrote: >>> I think that this problem is originated from atomic_pool_init(). >>> If configured coherent_pool size is larger than default cma size, >>> it can be failed even if this patch is applied. > > The coherent_pool size (atomic_pool.size) should be restricted smaller > than cma size. > > This is another issue, however I think the default atomic pool size is > too small. > Only one port of USB host needs at most 256Kbytes coherent memory > (according to the USB host spec). This pool is used only for allocation done in atomic context (allocations done with GFP_ATOMIC flag), otherwise the standard allocation path is used. Are you sure that each usb host port really needs so much memory allocated in atomic context? > If a platform has several ports, it needs more than 1MB. > Therefore the default atomic pool size should be at least 1MB. > >>> >>> How about below patch? >>> It uses fallback allocation if CMA is failed. >> >> Yes, I thought about it, but __dma_alloc uses similar code: >> >> else if (!IS_ENABLED(CONFIG_DMA_CMA)) >> addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, >> caller); >> else >> addr = __alloc_from_contiguous(dev, size, prot, &page, caller); >> >> so it probably needs to be changed as well. > > If CMA option is not selected, __alloc_from_contiguous would not be > called. > We don't need to the fallback allocation. > > And if CMA option is selected and initialized correctly, > the cma allocation can fail in case of no-CMA-memory situation. > I thinks in that case we don't need to the fallback allocation also, > because it is normal case. > > Therefore I think the restriction of CMA size option and make CMA work > can cover every cases. > > I think below patch is also good choice. > If both of you, Michal and Joonsoo, do not agree with me, please > inform me. > I will make a patch including option restriction and fallback allocation. I'm not sure if we need a fallback for failed CMA allocation. The only issue that have been mentioned here and needs to be resolved is support for disabling cma by kernel command line. Right now it will fails completely. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f42.google.com (mail-la0-f42.google.com [209.85.215.42]) by kanga.kvack.org (Postfix) with ESMTP id 111876B0036 for ; Tue, 20 May 2014 08:24:00 -0400 (EDT) Received: by mail-la0-f42.google.com with SMTP id el20so329138lab.15 for ; Tue, 20 May 2014 05:24:00 -0700 (PDT) Received: from mail-lb0-x236.google.com (mail-lb0-x236.google.com [2a00:1450:4010:c04::236]) by mx.google.com with ESMTPS id oc9si2830537lbb.57.2014.05.20.05.23.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 05:23:59 -0700 (PDT) Received: by mail-lb0-f182.google.com with SMTP id z11so322691lbi.27 for ; Tue, 20 May 2014 05:23:58 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <537B3EA5.2040302@samsung.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> <537B3EA5.2040302@samsung.com> Date: Tue, 20 May 2014 21:23:58 +0900 Message-ID: Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value From: Gi-Oh Kim Content-Type: multipart/alternative; boundary=001a113367ac44bf8604f9d3f4a5 Sender: owner-linux-mm@kvack.org List-ID: To: Marek Szyprowski Cc: Gioh Kim , Michal Nazarewicz , Joonsoo Kim , Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , =?UTF-8?B?7J206rG07Zi4?= --001a113367ac44bf8604f9d3f4a5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 2014-05-20 20:38 GMT+09:00 Marek Szyprowski : > Hello, > > > On 2014-05-20 02:50, Gioh Kim wrote: > >> >> >> 2014-05-20 =EC=98=A4=EC=A0=84 4:59, Michal Nazarewicz =EC=93=B4 =EA=B8= =80: >> >>> On Sun, May 18 2014, Joonsoo Kim wrote: >>> >>>> I think that this problem is originated from atomic_pool_init(). >>>> If configured coherent_pool size is larger than default cma size, >>>> it can be failed even if this patch is applied. >>>> >>> >> The coherent_pool size (atomic_pool.size) should be restricted smaller >> than cma size. >> >> This is another issue, however I think the default atomic pool size is >> too small. >> Only one port of USB host needs at most 256Kbytes coherent memory >> (according to the USB host spec). >> > > This pool is used only for allocation done in atomic context (allocations > done with GFP_ATOMIC flag), otherwise the standard allocation path is use= d. > Are you sure that each usb host port really needs so much memory allocate= d > in atomic context? http://lxr.free-electrons.com/source/drivers/usb/host/ehci-mem.c#L210 dma_alloc_coherent is called with gfp as zero, no GFP_ATOMIC flag. If CMA is turned on and size is zero, ehci driver occurs panic. > > > If a platform has several ports, it needs more than 1MB. >> Therefore the default atomic pool size should be at least 1MB. >> >> >>>> How about below patch? >>>> It uses fallback allocation if CMA is failed. >>>> >>> >>> Yes, I thought about it, but __dma_alloc uses similar code: >>> >>> else if (!IS_ENABLED(CONFIG_DMA_CMA)) >>> addr =3D __alloc_remap_buffer(dev, size, gfp, prot, &page, call= er); >>> else >>> addr =3D __alloc_from_contiguous(dev, size, prot, &page, caller= ); >>> >>> so it probably needs to be changed as well. >>> >> >> If CMA option is not selected, __alloc_from_contiguous would not be >> called. >> We don't need to the fallback allocation. >> >> And if CMA option is selected and initialized correctly, >> the cma allocation can fail in case of no-CMA-memory situation. >> I thinks in that case we don't need to the fallback allocation also, >> because it is normal case. >> >> Therefore I think the restriction of CMA size option and make CMA work >> can cover every cases. >> >> I think below patch is also good choice. >> If both of you, Michal and Joonsoo, do not agree with me, please inform >> me. >> I will make a patch including option restriction and fallback allocation= . >> > > I'm not sure if we need a fallback for failed CMA allocation. The only > issue that > have been mentioned here and needs to be resolved is support for disablin= g > cma by > kernel command line. Right now it will fails completely. > > Best regards > -- > Marek Szyprowski, PhD > Samsung R&D Institute Poland > > --=20 ---- Love and Serve make me happy blog - http://gurugio.blogspot.com/ homepage - CalciumOS http://code.google.com/p/caoskernel/ --001a113367ac44bf8604f9d3f4a5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable



2014-05-20 20:38 GMT+09:00 Marek Szyprowski <<= a href=3D"mailto:m.szyprowski@samsung.com" target=3D"_blank">m.szyprowski@s= amsung.com>:
Hello,


On 2014-05-20 02:50, Gioh Kim wrote:


2014-05-20 =EC=98=A4=EC=A0=84 4:59, Michal Nazarewicz =EC=93=B4 =EA=B8=80:<= br>
On Sun, May 18 2014, Joonsoo Kim wrote:
I think that this problem is originated from atomic_pool_init().
If configured coherent_pool size is larger than default cma size,
it can be failed even if this patch is applied.

The coherent_pool size (atomic_pool.size) should be restricted smaller than= cma size.

This is another issue, however I think the default atomic pool size is too = small.
Only one port of USB host needs at most 256Kbytes coherent memory (accordin= g to the USB host spec).

This pool is used only for allocation done in atomic context (allocations done with GFP_ATOMIC flag), otherwise the standard allocation path is used.=
Are you sure that each usb host port really needs so much memory allocated<= br> in atomic context?

dma_alloc_cohe= rent=C2=A0is called with gfp as zero, no GFP_ATOMIC flag.

If CMA is turned on and size is zero, ehci driver occur= s panic.
=C2=A0


If a platform has several ports, it needs more than 1MB.
Therefore the default atomic pool size should be at least 1MB.


How about below patch?
It uses fallback allocation if CMA is failed.

Yes, I thought about it, but __dma_alloc uses similar code:

=C2=A0 =C2=A0 else if (!IS_ENABLED(CONFIG_DMA_CMA))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 addr =3D __alloc_remap_buffer(dev, size, gfp, p= rot, &page, caller);
=C2=A0 =C2=A0 else
=C2=A0 =C2=A0 =C2=A0 =C2=A0 addr =3D __alloc_from_contiguous(dev, size, pro= t, &page, caller);

so it probably needs to be changed as well.

If CMA option is not selected, __alloc_from_contiguous would not be called.=
We don't need to the fallback allocation.

And if CMA option is selected and initialized correctly,
the cma allocation can fail in case of no-CMA-memory situation.
I thinks in that case we don't need to the fallback allocation also, because it is normal case.

Therefore I think the restriction of CMA size option and make CMA work can = cover every cases.

I think below patch is also good choice.
If both of you, Michal and Joonsoo, do not agree with me, please inform me.=
I will make a patch including option restriction and fallback allocation.

I'm not sure if we need a fallback for failed CMA allocation. The only = issue that
have been mentioned here and needs to be resolved is support for disabling = cma by
kernel command line. Right now it will fails completely.
= =C2=A0

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland




-- ----
Love and Serve make me happy
blog - http://gurugio.blogspot.com/
homepage - CalciumOS http://code.google.com/p/caos= kernel/
--001a113367ac44bf8604f9d3f4a5-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) by kanga.kvack.org (Postfix) with ESMTP id 87FF86B0035 for ; Tue, 20 May 2014 14:15:30 -0400 (EDT) Received: by mail-pd0-f177.google.com with SMTP id g10so547755pdj.22 for ; Tue, 20 May 2014 11:15:30 -0700 (PDT) Received: from mail-pa0-x230.google.com (mail-pa0-x230.google.com [2607:f8b0:400e:c03::230]) by mx.google.com with ESMTPS id xu2si2850120pbb.129.2014.05.20.11.15.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 11:15:29 -0700 (PDT) Received: by mail-pa0-f48.google.com with SMTP id rd3so553130pab.7 for ; Tue, 20 May 2014 11:15:29 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value In-Reply-To: <537ABD6F.9090608@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> <537ABD6F.9090608@lge.com> Date: Tue, 20 May 2014 08:15:18 -1000 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Gioh Kim , Joonsoo Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?utf-8?B?7J206rG07Zi4?= , gurugio@gmail.com --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Mon, May 19 2014, Gioh Kim wrote: > My point is atomic_pool should be able to work with/without CMA. Agreed. >> IMO, cma=3D0 command line argument should be supported, as should having >> the default CMA size zero. If CMA size is set to zero, kernel should >> behave as if CMA was not enabled at compile time. > It's also good if atomic_pool can work well with zero CMA size. Exactly. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTe5u2AAoJECBgQBJQdR/06jEP/iYmW8mK7Wuu6CYGXh/lzz5t UFw2wQHHnAzZqAxYVKbgUK+XNFAvZ52jWztebNP7XGFqd8Ryn2btWY1/yJEVm63V OsIlZ/oiam4J0y6vSUS0gA3MzazQ1dvAIs96MEkKBqrcWE8WQlVxIkbr/EAT1vIH Zn2Jckhupp4LxRtwtYXXX3P1MikYzIzmsAC2uv4IU4N4btw8e+zSRB06G8PhFZmw YtkHrI9KVYlK5GkoThILXMYjOmO7dErrGXJc6CSSew/TUgsoHnBme96hG+Ahp5Zq nakbo2m1On7pCZzwv+OcHKdim6QZWUzDqk1OBruubuCY60u8778FLfVwMIMBCE9w gFda98xgDUCc40e3UmsZR9kbOiZa8IxRvOdPH3WghS2Vz1BI4IGphH9aBBd6kXbY osUizgtYA646bpuYsq2++YnxgYiIHvE7sjsdu+GyD7JkbSXBYglejXshuO8QKx4D eE1Y5Ak643kb//UqZgRCe4VcihYzuHnYjT7rcdfFRzT30+2cMoKhPJULJhIP956I BOBaM/WQiizDc6vSbpDWjoeC8MoZia2ARqF1vhYFcib1rNaQGpK/MI+tOhx4b0Rc TqljUnVDUg105PcnSaezDSGN3p4yli8F3BYgECuBr/oHrRObjOL/zFPCKV9XXC5z HcKJdseWFVuWOU0La7xz =WbUu -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f181.google.com (mail-pd0-f181.google.com [209.85.192.181]) by kanga.kvack.org (Postfix) with ESMTP id E8B4F6B0035 for ; Tue, 20 May 2014 20:15:13 -0400 (EDT) Received: by mail-pd0-f181.google.com with SMTP id z10so790491pdj.26 for ; Tue, 20 May 2014 17:15:13 -0700 (PDT) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id fy1si3884725pbb.65.2014.05.20.17.15.11 for ; Tue, 20 May 2014 17:15:13 -0700 (PDT) Message-ID: <537BF00E.3030409@lge.com> Date: Wed, 21 May 2014 09:15:10 +0900 From: Gioh Kim MIME-Version: 1.0 Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> <537B3EA5.2040302@samsung.com> In-Reply-To: <537B3EA5.2040302@samsung.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Marek Szyprowski , Michal Nazarewicz , Joonsoo Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com 2014-05-20 i??i?? 8:38, Marek Szyprowski i?' e,?: > Hello, > > On 2014-05-20 02:50, Gioh Kim wrote: >> >> >> 2014-05-20 i??i ? 4:59, Michal Nazarewicz i?' e,?: >>> On Sun, May 18 2014, Joonsoo Kim wrote: >>>> I think that this problem is originated from atomic_pool_init(). >>>> If configured coherent_pool size is larger than default cma size, >>>> it can be failed even if this patch is applied. >> >> The coherent_pool size (atomic_pool.size) should be restricted smaller than cma size. >> >> This is another issue, however I think the default atomic pool size is too small. >> Only one port of USB host needs at most 256Kbytes coherent memory (according to the USB host spec). > > This pool is used only for allocation done in atomic context (allocations > done with GFP_ATOMIC flag), otherwise the standard allocation path is used. > Are you sure that each usb host port really needs so much memory allocated > in atomic context? I don't know why but drivers/usb/host/ohci-hcd.c:ohci_init() calls dma_alloc_coherent with zero gfp. Therefore it occurs panic if CMA is turned on and CONFIG_CMA_SIZE_MBYTES is zero. A pointer pool->vaddr is NULL in __alloc_from_pool. Below is my kernel message. [ 24.339858] -----------[ cut here ]----------- [ 24.344535] WARNING: at arch/arm/mm/dma-mapping.c:492 __dma_alloc.isra.19+0x25c/0x2a4() [ 24.352554] coherent pool not initialised! [ 24.356644] Modules linked in: [ 24.359701] CPU: 1 PID: 711 Comm: sh Not tainted 3.10.19+ #42 [ 24.365488] [<800140e0>] (unwind_backtrace+0x0/0xf8) from [<80011f20>] (show_stack+0x10/0x14) [ 24.374045] [<80011f20>] (show_stack+0x10/0x14) from [<8001f21c>] (warn_slowpath_common+0x4c/0x6c) [ 24.383022] [<8001f21c>] (warn_slowpath_common+0x4c/0x6c) from [<8001f2d0>] (warn_slowpath_fmt+0x30/0x40) [ 24.392602] [<8001f2d0>] (warn_slowpath_fmt+0x30/0x40) from [<80017f5c>] (__dma_alloc.isra.19+0x25c/0x2a4) [ 24.402270] [<80017f5c>] (__dma_alloc.isra.19+0x25c/0x2a4) from [<800180d0>] (arm_dma_alloc+0x90/0x98) [ 24.411580] [<800180d0>] (arm_dma_alloc+0x90/0x98) from [<8034ab54>] (ohci_init+0x1b0/0x278) [ 24.420035] [<8034ab54>] (ohci_init+0x1b0/0x278) from [<80332e00>] (usb_add_hcd+0x184/0x5b8) [ 24.428484] [<80332e00>] (usb_add_hcd+0x184/0x5b8) from [<8034b8d4>] (ohci_platform_probe+0xd0/0x174) [ 24.437713] [<8034b8d4>] (ohci_platform_probe+0xd0/0x174) from [<802f1cac>] (platform_drv_probe+0x14/0x18) [ 24.447385] [<802f1cac>] (platform_drv_probe+0x14/0x18) from [<802f0a54>] (driver_probe_device+0x6c/0x1f8) [ 24.457049] [<802f0a54>] (driver_probe_device+0x6c/0x1f8) from [<802ef16c>] (bus_for_each_drv+0x44/0x8c) [ 24.466537] [<802ef16c>] (bus_for_each_drv+0x44/0x8c) from [<802f09bc>] (device_attach+0x74/0x80) [ 24.475416] [<802f09bc>] (device_attach+0x74/0x80) from [<802f0050>] (bus_probe_device+0x84/0xa8) [ 24.484295] [<802f0050>] (bus_probe_device+0x84/0xa8) from [<802ee89c>] (device_add+0x4c0/0x58c) [ 24.493088] [<802ee89c>] (device_add+0x4c0/0x58c) from [<802f21b8>] (platform_device_add+0xac/0x214) [ 24.502227] [<802f21b8>] (platform_device_add+0xac/0x214) from [<8001bf3c>] (lg115x_init_usb+0xbc/0xe4) [ 24.511618] [<8001bf3c>] (lg115x_init_usb+0xbc/0xe4) from [<80008734>] (do_user_initcalls+0x98/0x128) [ 24.520843] [<80008734>] (do_user_initcalls+0x98/0x128) from [<80008870>] (proc_write_usercalls+0xac/0xd0) [ 24.530512] [<80008870>] (proc_write_usercalls+0xac/0xd0) from [<80138f48>] (proc_reg_write+0x58/0x80) [ 24.539830] [<80138f48>] (proc_reg_write+0x58/0x80) from [<800f0084>] (vfs_write+0xb0/0x1bc) [ 24.548275] [<800f0084>] (vfs_write+0xb0/0x1bc) from [<800f04d0>] (SyS_write+0x3c/0x70) [ 24.556287] [<800f04d0>] (SyS_write+0x3c/0x70) from [<8000e5c0>] (ret_fast_syscall+0x0/0x30) [ 24.564726] --[ end trace c092568e2a263d21 ]-- [ 24.569345] ohci-platform ohci-platform.0: can't setup [ 24.574498] ohci-platform ohci-platform.0: USB bus 1 deregistered [ 24.582241] ohci-platform: probe of ohci-platform.0 failed with error -12 [ 24.590496] ohci-platform ohci-platform.1: Generic Platform OHCI Controller [ 24.598984] ohci-platform ohci-platform.1: new USB bus registered, assigned bus number 1 > >> If a platform has several ports, it needs more than 1MB. >> Therefore the default atomic pool size should be at least 1MB. >> >>>> >>>> How about below patch? >>>> It uses fallback allocation if CMA is failed. >>> >>> Yes, I thought about it, but __dma_alloc uses similar code: >>> >>> else if (!IS_ENABLED(CONFIG_DMA_CMA)) >>> addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller); >>> else >>> addr = __alloc_from_contiguous(dev, size, prot, &page, caller); >>> >>> so it probably needs to be changed as well. >> >> If CMA option is not selected, __alloc_from_contiguous would not be called. >> We don't need to the fallback allocation. >> >> And if CMA option is selected and initialized correctly, >> the cma allocation can fail in case of no-CMA-memory situation. >> I thinks in that case we don't need to the fallback allocation also, >> because it is normal case. >> >> Therefore I think the restriction of CMA size option and make CMA work can cover every cases. >> >> I think below patch is also good choice. >> If both of you, Michal and Joonsoo, do not agree with me, please inform me. >> I will make a patch including option restriction and fallback allocation. > > I'm not sure if we need a fallback for failed CMA allocation. The only issue that > have been mentioned here and needs to be resolved is support for disabling cma by > kernel command line. Right now it will fails completely. cma=0 in the kernel command line and CONFIG_CMA_SIZE_MBYTES 0 are set selected_size as zero in dma_contiguous_reserve. And dma_contiguous_reserve_area cannot be called and atomic_pool is not initialized. After that dma_alloc_coherent try to allocate via atomic_pool (__alloc_from_pool) or CMA (__alloc_from_contiguous). Allocation via atomic_pool fails becauseof atomic_pool->vaddr is NULL. And CMA allocation shouldn't be called because cma=0 or setting CONFIG_CMA_SIZE_MBYTES 0 is the same with disabling CMA. If cma=0 or CONFIG_CMA_SIZE_MBYTES is 0, __alloc_remap_buffer should be called instead of __alloc_from_contiguous even-if CMA is turned on. I'm poor at English so I describe the problem in seudo code: if (CMA is turned on) and ((cma=0 in command line) or (CONFIG_CMA_SIZE_MBYTES=0)) try to allocate from CMA but CMA is not initialized > > Best regards -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f52.google.com (mail-pb0-f52.google.com [209.85.160.52]) by kanga.kvack.org (Postfix) with ESMTP id CA53A6B0035 for ; Fri, 23 May 2014 20:58:00 -0400 (EDT) Received: by mail-pb0-f52.google.com with SMTP id rr13so4900144pbb.11 for ; Fri, 23 May 2014 17:58:00 -0700 (PDT) Received: from smtp.codeaurora.org (smtp.codeaurora.org. [198.145.11.231]) by mx.google.com with ESMTPS id zv2si5984775pbb.131.2014.05.23.17.57.59 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 May 2014 17:57:59 -0700 (PDT) Message-ID: <537FEE96.8000704@codeaurora.org> Date: Fri, 23 May 2014 17:57:58 -0700 From: Laura Abbott MIME-Version: 1.0 Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> In-Reply-To: <5370FF1D.10707@codeaurora.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, lmark@codeaurora.org On 5/12/2014 10:04 AM, Laura Abbott wrote: > > I'm going to see about running this through tests internally for comparison. > Hopefully I'll get useful results in a day or so. > > Thanks, > Laura > We ran some tests internally and found that for our purposes these patches made the benchmarks worse vs. the existing implementation of using CMA first for some pages. These are mostly androidisms but androidisms that we care about for having a device be useful. The foreground memory headroom on the device was on average about 40 MB smaller when using these patches vs our existing implementation of something like solution #1. By foreground memory headroom we simply mean the amount of memory that the foreground application can allocate before it is killed by the Android Low Memory killer. We also found that when running a sequence of app launches these patches had more high priority app kills by the LMK and more alloc stalls. The test did a total of 500 hundred app launches (using 9 separate applications) The CMA memory in our system is rarely used by its client and is therefore available to the system most of the time. Test device - 4 CPUs - Android 4.4.2 - 512MB of RAM - 68 MB of CMA Results: Existing solution: Foreground headroom: 200MB Number of higher priority LMK kills (oom_score_adj < 529): 332 Number of alloc stalls: 607 Test patches: Foreground headroom: 160MB Number of higher priority LMK kills (oom_score_adj < 529): 459 Number of alloc stalls: 29538 We believe that the issues seen with these patches are the result of the LMK being more aggressive. The LMK will be more aggressive because it will ignore free CMA pages for unmovable allocations, and since most calls to the LMK are made by kswapd (which uses GFP_KERNEL) the LMK will mostly ignore free CMA pages. Because the LMK thresholds are higher than the zone watermarks, there will often be a lot of free CMA pages in the system when the LMK is called, which the LMK will usually ignore. Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by kanga.kvack.org (Postfix) with ESMTP id 5B1A16B0035 for ; Sun, 25 May 2014 22:41:26 -0400 (EDT) Received: by mail-pa0-f54.google.com with SMTP id bj1so6894146pad.13 for ; Sun, 25 May 2014 19:41:26 -0700 (PDT) Received: from lgeamrelo02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id dx7si11185299pab.190.2014.05.25.19.41.23 for ; Sun, 25 May 2014 19:41:25 -0700 (PDT) Date: Mon, 26 May 2014 11:44:17 +0900 From: Joonsoo Kim Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140526024417.GA26935@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> <537FEE96.8000704@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <537FEE96.8000704@codeaurora.org> Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, lmark@codeaurora.org On Fri, May 23, 2014 at 05:57:58PM -0700, Laura Abbott wrote: > On 5/12/2014 10:04 AM, Laura Abbott wrote: > > > > I'm going to see about running this through tests internally for comparison. > > Hopefully I'll get useful results in a day or so. > > > > Thanks, > > Laura > > > > We ran some tests internally and found that for our purposes these patches made > the benchmarks worse vs. the existing implementation of using CMA first for some > pages. These are mostly androidisms but androidisms that we care about for > having a device be useful. > > The foreground memory headroom on the device was on average about 40 MB smaller > when using these patches vs our existing implementation of something like > solution #1. By foreground memory headroom we simply mean the amount of memory > that the foreground application can allocate before it is killed by the Android > Low Memory killer. > > We also found that when running a sequence of app launches these patches had > more high priority app kills by the LMK and more alloc stalls. The test did a > total of 500 hundred app launches (using 9 separate applications) The CMA > memory in our system is rarely used by its client and is therefore available > to the system most of the time. > > Test device > - 4 CPUs > - Android 4.4.2 > - 512MB of RAM > - 68 MB of CMA > > > Results: > > Existing solution: > Foreground headroom: 200MB > Number of higher priority LMK kills (oom_score_adj < 529): 332 > Number of alloc stalls: 607 > > > Test patches: > Foreground headroom: 160MB > Number of higher priority LMK kills (oom_score_adj < 529): > 459 Number of alloc stalls: 29538 > > We believe that the issues seen with these patches are the result of the LMK > being more aggressive. The LMK will be more aggressive because it will ignore > free CMA pages for unmovable allocations, and since most calls to the LMK are > made by kswapd (which uses GFP_KERNEL) the LMK will mostly ignore free CMA > pages. Because the LMK thresholds are higher than the zone watermarks, there > will often be a lot of free CMA pages in the system when the LMK is called, > which the LMK will usually ignore. Hello, Really thanks for testing!!! If possible, please let me know nr_free_cma of these patches/your in-house implementation before testing. I can guess following scenario about your test. On boot-up, CMA memory are mostly used by native processes, because your implementation use CMA first for some pages. kswapd is woken up late since non-CMA free memory is larger than my implementation. And, on reclaiming, the LMK reclaiming memory by killing app process would reclaim movable memory with high probability since cma memory are mostly used by native processes and app processes have just movable memory. This is just my guess. But, if it is true, this is not fair test for this patchset. If possible, could you make nr_free_cma same on both implementation before testing? Moreover, in mainline implementation, the LMK doesn't consider if memory type is CMA or not. Maybe your overall system would be highly optimized for your implementation, so I'm not sure if your testing is appropriate or not for this patchset. Anyway, I would like to optimize this for android. :) Please let me know more about your system. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751962AbaEHAak (ORCPT ); Wed, 7 May 2014 20:30:40 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:48384 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751415AbaEHAaj (ORCPT ); Wed, 7 May 2014 20:30:39 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com From: Joonsoo Kim To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Date: Thu, 8 May 2014 09:32:21 +0900 Message-Id: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, This series tries to improve CMA. CMA is introduced to provide physically contiguous pages at runtime without reserving memory area. But, current implementation works like as reserving memory approach, because allocation on cma reserved region only occurs as fallback of migrate_movable allocation. We can allocate from it when there is no movable page. In that situation, kswapd would be invoked easily since unmovable and reclaimable allocation consider (free pages - free CMA pages) as free memory on the system and free memory may be lower than high watermark in that case. If kswapd start to reclaim memory, then fallback allocation doesn't occur much. In my experiment, I found that if system memory has 1024 MB memory and has 512 MB reserved memory for CMA, kswapd is mostly invoked around the 512MB free memory boundary. And invoked kswapd tries to make free memory until (free pages - free CMA pages) is higher than high watermark, so free memory on meminfo is moving around 512MB boundary consistently. To fix this problem, we should allocate the pages on cma reserved memory more aggressively and intelligenetly. Patch 2 implements the solution. Patch 1 is the simple optimization which remove useless re-trial and patch 3 is for removing useless alloc flag, so these are not important. See patch 2 for more detailed description. This patchset is based on v3.15-rc4. Thanks. Joonsoo Kim (3): CMA: remove redundant retrying code in __alloc_contig_migrate_range CMA: aggressively allocate the pages on cma reserved memory when not used CMA: always treat free cma pages as non-free on watermark checking include/linux/mmzone.h | 6 +++ mm/compaction.c | 4 -- mm/internal.h | 3 +- mm/page_alloc.c | 117 +++++++++++++++++++++++++++++++++++++++--------- 4 files changed, 102 insertions(+), 28 deletions(-) -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752292AbaEHAan (ORCPT ); Wed, 7 May 2014 20:30:43 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:48417 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751469AbaEHAak (ORCPT ); Wed, 7 May 2014 20:30:40 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com From: Joonsoo Kim To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Date: Thu, 8 May 2014 09:32:23 +0900 Message-Id: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CMA is introduced to provide physically contiguous pages at runtime. For this purpose, it reserves memory at boot time. Although it reserve memory, this reserved memory can be used for movable memory allocation request. This usecase is beneficial to the system that needs this CMA reserved memory infrequently and it is one of main purpose of introducing CMA. But, there is a problem in current implementation. The problem is that it works like as just reserved memory approach. The pages on cma reserved memory are hardly used for movable memory allocation. This is caused by combination of allocation and reclaim policy. The pages on cma reserved memory are allocated if there is no movable memory, that is, as fallback allocation. So the time this fallback allocation is started is under heavy memory pressure. Although it is under memory pressure, movable allocation easily succeed, since there would be many pages on cma reserved memory. But this is not the case for unmovable and reclaimable allocation, because they can't use the pages on cma reserved memory. These allocations regard system's free memory as (free pages - free cma pages) on watermark checking, that is, free unmovable pages + free reclaimable pages + free movable pages. Because we already exhausted movable pages, only free pages we have are unmovable and reclaimable types and this would be really small amount. So watermark checking would be failed. It will wake up kswapd to make enough free memory for unmovable and reclaimable allocation and kswapd will do. So before we fully utilize pages on cma reserved memory, kswapd start to reclaim memory and try to make free memory over the high watermark. This watermark checking by kswapd doesn't take care free cma pages so many movable pages would be reclaimed. After then, we have a lot of movable pages again, so fallback allocation doesn't happen again. To conclude, amount of free memory on meminfo which includes free CMA pages is moving around 512 MB if I reserve 512 MB memory for CMA. I found this problem on following experiment. 4 CPUs, 1024 MB, VIRTUAL MACHINE make -j24 CMA reserve: 0 MB 512 MB Elapsed-time: 234.8 361.8 Average-MemFree: 283880 KB 530851 KB To solve this problem, I can think following 2 possible solutions. 1. allocate the pages on cma reserved memory first, and if they are exhausted, allocate movable pages. 2. interleaved allocation: try to allocate specific amounts of memory from cma reserved memory and then allocate from free movable memory. I tested #1 approach and found the problem. Although free memory on meminfo can move around low watermark, there is large fluctuation on free memory, because too many pages are reclaimed when kswapd is invoked. Reason for this behaviour is that successive allocated CMA pages are on the LRU list in that order and kswapd reclaim them in same order. These memory doesn't help watermark checking from kwapd, so too many pages are reclaimed, I guess. So, I implement #2 approach. One thing I should note is that we should not change allocation target (movable list or cma) on each allocation attempt, since this prevent allocated pages to be in physically succession, so some I/O devices can be hurt their performance. To solve this, I keep allocation target in at least pageblock_nr_pages attempts and make this number reflect ratio, free pages without free cma pages to free cma pages. With this approach, system works very smoothly and fully utilize the pages on cma reserved memory. Following is the experimental result of this patch. 4 CPUs, 1024 MB, VIRTUAL MACHINE make -j24 CMA reserve: 0 MB 512 MB Elapsed-time: 234.8 361.8 Average-MemFree: 283880 KB 530851 KB pswpin: 7 110064 pswpout: 452 767502 CMA reserve: 0 MB 512 MB Elapsed-time: 234.2 235.6 Average-MemFree: 281651 KB 290227 KB pswpin: 8 8 pswpout: 430 510 There is no difference if we don't have cma reserved memory (0 MB case). But, with cma reserved memory (512 MB case), we fully utilize these reserved memory through this patch and the system behaves like as it doesn't reserve any memory. With this patch, we aggressively allocate the pages on cma reserved memory so latency of CMA can arise. Below is the experimental result about latency. 4 CPUs, 1024 MB, VIRTUAL MACHINE CMA reserve: 512 MB Backgound Workload: make -jN Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval N: 1 4 8 16 Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 So generally we can see latency increase. Ratio of this increase is rather big - up to 70%. But, under the heavy workload, it shows latency decrease - up to 55%. This may be worst-case scenario, but reducing it would be important for some system, so, I can say that this patch have advantages and disadvantages in terms of latency. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fac5509..3ff24d4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -389,6 +389,12 @@ struct zone { int compact_order_failed; #endif +#ifdef CONFIG_CMA + int has_cma; + int nr_try_cma; + int nr_try_movable; +#endif + ZONE_PADDING(_pad1_) /* Fields commonly accessed by the page reclaim scanner */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 674ade7..6f2b27b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) } #ifdef CONFIG_CMA +void __init init_alloc_ratio_counter(struct zone *zone) +{ + if (zone->has_cma) + return; + + zone->has_cma = 1; + zone->nr_try_movable = 0; + zone->nr_try_cma = 0; +} + /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ void __init init_cma_reserved_pageblock(struct page *page) { @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) set_pageblock_migratetype(page, MIGRATE_CMA); __free_pages(page, pageblock_order); adjust_managed_page_count(page, pageblock_nr_pages); + init_alloc_ratio_counter(page_zone(page)); } #endif @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) return NULL; } +#ifdef CONFIG_CMA +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, + int migratetype) +{ + long free, free_cma, free_wmark; + struct page *page; + + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) + return NULL; + + if (zone->nr_try_movable) + goto alloc_movable; + +alloc_cma: + if (zone->nr_try_cma) { + /* Okay. Now, we can try to allocate the page from cma region */ + zone->nr_try_cma--; + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); + + /* CMA pages can vanish through CMA allocation */ + if (unlikely(!page && order == 0)) + zone->nr_try_cma = 0; + + return page; + } + + /* Reset ratio counter */ + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); + + /* No cma free pages, so recharge only movable allocation */ + if (free_cma <= 0) { + zone->nr_try_movable = pageblock_nr_pages; + goto alloc_movable; + } + + free = zone_page_state(zone, NR_FREE_PAGES); + free_wmark = free - free_cma - high_wmark_pages(zone); + + /* + * free_wmark is below than 0, and it means that normal pages + * are under the pressure, so we recharge only cma allocation. + */ + if (free_wmark <= 0) { + zone->nr_try_cma = pageblock_nr_pages; + goto alloc_cma; + } + + if (free_wmark > free_cma) { + zone->nr_try_movable = + (free_wmark * pageblock_nr_pages) / free_cma; + zone->nr_try_cma = pageblock_nr_pages; + } else { + zone->nr_try_movable = pageblock_nr_pages; + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; + } + + /* Reset complete, start on movable first */ +alloc_movable: + zone->nr_try_movable--; + return NULL; +} +#endif + /* * Do the hard work of removing an element from the buddy allocator. * Call me with the zone->lock already held. @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) static struct page *__rmqueue(struct zone *zone, unsigned int order, int migratetype) { - struct page *page; + struct page *page = NULL; + + if (IS_ENABLED(CONFIG_CMA)) + page = __rmqueue_cma(zone, order, migratetype); retry_reserve: - page = __rmqueue_smallest(zone, order, migratetype); + if (!page) + page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { page = __rmqueue_fallback(zone, order, migratetype); @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, zone_seqlock_init(zone); zone->zone_pgdat = pgdat; zone_pcp_init(zone); + if (IS_ENABLED(CONFIG_CMA)) + zone->has_cma = 0; /* For bootup, initialized properly in watermark setup */ mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752555AbaEHAas (ORCPT ); Wed, 7 May 2014 20:30:48 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:48409 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752032AbaEHAan (ORCPT ); Wed, 7 May 2014 20:30:43 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com From: Joonsoo Kim To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Date: Thu, 8 May 2014 09:32:22 +0900 Message-Id: <1399509144-8898-2-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We already have retry logic in migrate_pages(). It does retry 10 times. So if we keep this retrying code in __alloc_contig_migrate_range(), we would try to migrate some unmigratable page in 50 times. There is just one small difference in -ENOMEM case. migrate_pages() don't do retry in this case, however, current __alloc_contig_migrate_range() does. But, I think that this isn't problem, because in this case, we may fail again with same reason. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dba293..674ade7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, /* This function is based on compact_zone() from compaction.c. */ unsigned long nr_reclaimed; unsigned long pfn = start; - unsigned int tries = 0; int ret = 0; migrate_prep(); @@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = -EINTR; break; } - tries = 0; - } else if (++tries == 5) { - ret = ret < 0 ? ret : -EBUSY; - break; } nr_reclaimed = reclaim_clean_pages_from_list(cc->zone, @@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = migrate_pages(&cc->migratepages, alloc_migrate_target, 0, MIGRATE_SYNC, MR_CMA); + if (ret) { + ret = ret < 0 ? ret : -EBUSY; + break; + } } if (ret < 0) { putback_movable_pages(&cc->migratepages); -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752688AbaEHAbM (ORCPT ); Wed, 7 May 2014 20:31:12 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:48431 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751600AbaEHAak (ORCPT ); Wed, 7 May 2014 20:30:40 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com From: Joonsoo Kim To: Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking Date: Thu, 8 May 2014 09:32:24 +0900 Message-Id: <1399509144-8898-4-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag for alloc flag and treats free cma pages as free pages if this flag is passed to watermark checking. Intention of that patch is that movable page allocation can be be handled from cma reserved region without starting kswapd. Now, previous patch changes the behaviour of allocator that movable allocation uses the page on cma reserved region aggressively, so this watermark hack isn't needed anymore. Therefore remove it. Signed-off-by: Joonsoo Kim diff --git a/mm/compaction.c b/mm/compaction.c index 627dc2e..36e2fcd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, count_compact_event(COMPACTSTALL); -#ifdef CONFIG_CMA - if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif /* Compact each zone in the list */ for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx, nodemask) { diff --git a/mm/internal.h b/mm/internal.h index 07b6736..a121762 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_HARDER 0x10 /* try to alloc harder */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ -#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */ -#define ALLOC_FAIR 0x100 /* fair zone allocation */ +#define ALLOC_FAIR 0x80 /* fair zone allocation */ #endif /* __MM_INTERNAL_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6f2b27b..6af2fa1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1757,20 +1757,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, long min = mark; long lowmem_reserve = z->lowmem_reserve[classzone_idx]; int o; - long free_cma = 0; free_pages -= (1 << order) - 1; if (alloc_flags & ALLOC_HIGH) min -= min / 2; if (alloc_flags & ALLOC_HARDER) min -= min / 4; -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - free_cma = zone_page_state(z, NR_FREE_CMA_PAGES); -#endif + /* + * We don't want to regard the pages on CMA region as free + * on watermark checking, since they cannot be used for + * unmovable/reclaimable allocation and they can suddenly + * vanish through CMA allocation + */ + if (IS_ENABLED(CONFIG_CMA) && z->has_cma) + free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES); - if (free_pages - free_cma <= min + lowmem_reserve) + if (free_pages <= min + lowmem_reserve) return false; for (o = 0; o < order; o++) { /* At the next order, this order's pages become unavailable */ @@ -2538,10 +2540,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask) unlikely(test_thread_flag(TIF_MEMDIE)))) alloc_flags |= ALLOC_NO_WATERMARKS; } -#ifdef CONFIG_CMA - if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif return alloc_flags; } @@ -2811,10 +2809,6 @@ retry_cpuset: if (!preferred_zone) goto out; -#ifdef CONFIG_CMA - if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif retry: /* First allocation attempt */ page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order, -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752733AbaEIMjM (ORCPT ); Fri, 9 May 2014 08:39:12 -0400 Received: from mailout4.w1.samsung.com ([210.118.77.14]:61660 "EHLO mailout4.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750712AbaEIMjK (ORCPT ); Fri, 9 May 2014 08:39:10 -0400 X-AuditID: cbfec7f5-b7fae6d000004d6d-d9-536ccc6b5e2b Message-id: <536CCC78.6050806@samsung.com> Date: Fri, 09 May 2014 14:39:20 +0200 From: Marek Szyprowski User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-version: 1.0 To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , "'Tomasz Stanislawski'" Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> In-reply-to: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrFLMWRmVeSWpSXmKPExsVy+t/xa7rZZ3KCDebOFrWYs34Nm8XGGetZ LVZv8rU4OHsJk8XK7mY2i7NNb9gttnfOYLe4vGsOm8W9Nf9ZLSa/e8ZoseB4C6vFsq/v2S3+ XlnPYjGv/SWrA5/H4TfvmT0u9/UyeWxa1cnmsenTJHaPrrdXmDxOzPjN4rHuzysmj/f7rrJ5 9G1Zxeix+XS1x+dNcgHcUVw2Kak5mWWpRfp2CVwZX36fZSp4Ilxx+6FNA+MN/i5GDg4JAROJ lw3uXYycQKaYxIV769lAbCGBpYwSvY2lXYxcQPYnRombbb3sIAleAS2J/e8ngRWxCKhKXFq1 iQnEZhMwlOh62wUWFxWIkdj9eSEjRL2gxI/J91hAbBGBUIm5HavAbGaBs8wS1z4mgtjCQPEd q3ZCLXaVWN2xC6yXU8BNYnbrJ6h6M4lHLeuYIWx5ic1r3jJPYBSYhWTFLCRls5CULWBkXsUo mlqaXFCclJ5rpFecmFtcmpeul5yfu4kREl9fdzAuPWZ1iFGAg1GJh3eBTEawEGtiWXFl7iFG CQ5mJRFezcM5wUK8KYmVValF+fFFpTmpxYcYmTg4pRoYdXuFeN7YxO8pPhId+f1B/VrlCYwh /LEfTqQkZe0y+mlyeEVt5U3W72E8vsaFCd8LG45uUNbu1PefPbv14T+N1pUhFrMWlBma34u7 FjhZuDX/2c+GSV6OVzZFNfidfckVmT+Dr6A+Ou+O07ZLCw5v2Cx6wufCme8PLz2SOdp95D7H Os+60goTJZbijERDLeai4kQABO/zJ40CAAA= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On 2014-05-08 02:32, Joonsoo Kim wrote: > This series tries to improve CMA. > > CMA is introduced to provide physically contiguous pages at runtime > without reserving memory area. But, current implementation works like as > reserving memory approach, because allocation on cma reserved region only > occurs as fallback of migrate_movable allocation. We can allocate from it > when there is no movable page. In that situation, kswapd would be invoked > easily since unmovable and reclaimable allocation consider > (free pages - free CMA pages) as free memory on the system and free memory > may be lower than high watermark in that case. If kswapd start to reclaim > memory, then fallback allocation doesn't occur much. > > In my experiment, I found that if system memory has 1024 MB memory and > has 512 MB reserved memory for CMA, kswapd is mostly invoked around > the 512MB free memory boundary. And invoked kswapd tries to make free > memory until (free pages - free CMA pages) is higher than high watermark, > so free memory on meminfo is moving around 512MB boundary consistently. > > To fix this problem, we should allocate the pages on cma reserved memory > more aggressively and intelligenetly. Patch 2 implements the solution. > Patch 1 is the simple optimization which remove useless re-trial and patch 3 > is for removing useless alloc flag, so these are not important. > See patch 2 for more detailed description. > > This patchset is based on v3.15-rc4. Thanks for posting those patches. It basically reminds me the following discussion: http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 Your approach is basically the same. I hope that your patches can be improved in such a way that they will be accepted by mm maintainers. I only wonder if the third patch is really necessary. Without it kswapd wakeup might be still avoided in some cases. > Thanks. > Joonsoo Kim (3): > CMA: remove redundant retrying code in __alloc_contig_migrate_range > CMA: aggressively allocate the pages on cma reserved memory when not > used > CMA: always treat free cma pages as non-free on watermark checking > > include/linux/mmzone.h | 6 +++ > mm/compaction.c | 4 -- > mm/internal.h | 3 +- > mm/page_alloc.c | 117 +++++++++++++++++++++++++++++++++++++++--------- > 4 files changed, 102 insertions(+), 28 deletions(-) > Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756393AbaEIPoR (ORCPT ); Fri, 9 May 2014 11:44:17 -0400 Received: from mail-pa0-f47.google.com ([209.85.220.47]:53397 "EHLO mail-pa0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751748AbaEIPoP (ORCPT ); Fri, 9 May 2014 11:44:15 -0400 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range In-Reply-To: <1399509144-8898-2-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-2-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140509:iamjoonsoo.kim@lge.com::TESFHnDQVc6ItF7n:0000000000000000000000000000000000000000dZy X-Hashcash: 1:20:140509:hannes@cmpxchg.org::gvLdQCyuEuYycmaz:00000000000000000000000000000000000000000000gQa X-Hashcash: 1:20:140509:iamjoonsoo.kim@lge.com::xLrSo7AUpGIExFcb:0000000000000000000000000000000000000000S7H X-Hashcash: 1:20:140509:m.szyprowski@samsung.com::BOOx9wCDJ98d7kky:000000000000000000000000000000000000015gF X-Hashcash: 1:20:140509:riel@redhat.com::/jx7R2FDMgRaVIlv:002Hgm X-Hashcash: 1:20:140509:linux-kernel@vger.kernel.org::EKY3qKinWEKGWGd4:0000000000000000000000000000000001/Vu X-Hashcash: 1:20:140509:akpm@linux-foundation.org::ObeCDMIbIv3XQox6:00000000000000000000000000000000000025mA X-Hashcash: 1:20:140509:mgorman@suse.de::H7HpknEYyVRBs4s2:002X9/ X-Hashcash: 1:20:140509:minchan@kernel.org::QpasDFc4iNPaqoer:00000000000000000000000000000000000000000002JPd X-Hashcash: 1:20:140509:lauraa@codeaurora.org::eElI9U0wf/zVdS9w:00000000000000000000000000000000000000003S6g X-Hashcash: 1:20:140509:heesub.shin@samsung.com::UOJ6jEmDigYNx2O4:0000000000000000000000000000000000000035Hd X-Hashcash: 1:20:140509:linux-mm@kvack.org::sPXPBazAcS3zu/Mk:00000000000000000000000000000000000000000002wyc Date: Fri, 09 May 2014 08:44:06 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, May 07 2014, Joonsoo Kim wrote: > We already have retry logic in migrate_pages(). It does retry 10 times. > So if we keep this retrying code in __alloc_contig_migrate_range(), we > would try to migrate some unmigratable page in 50 times. There is just one > small difference in -ENOMEM case. migrate_pages() don't do retry > in this case, however, current __alloc_contig_migrate_range() does. But, > I think that this isn't problem, because in this case, we may fail again > with same reason. > > Signed-off-by: Joonsoo Kim I think there was a reason for the retries in __alloc_contig_migrate_range but perhaps those are no longer valid. Acked-by: Michal Nazarewicz > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5dba293..674ade7 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct comp= act_control *cc, > /* This function is based on compact_zone() from compaction.c. */ > unsigned long nr_reclaimed; > unsigned long pfn =3D start; > - unsigned int tries =3D 0; > int ret =3D 0; >=20=20 > migrate_prep(); > @@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct com= pact_control *cc, > ret =3D -EINTR; > break; > } > - tries =3D 0; > - } else if (++tries =3D=3D 5) { > - ret =3D ret < 0 ? ret : -EBUSY; > - break; > } >=20=20 > nr_reclaimed =3D reclaim_clean_pages_from_list(cc->zone, > @@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct com= pact_control *cc, >=20=20 > ret =3D migrate_pages(&cc->migratepages, alloc_migrate_target, > 0, MIGRATE_SYNC, MR_CMA); > + if (ret) { > + ret =3D ret < 0 ? ret : -EBUSY; > + break; > + } > } > if (ret < 0) { > putback_movable_pages(&cc->migratepages); --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTbPfGAAoJECBgQBJQdR/0nJQP/0htX/xZxE1ePn5SBbCWts+2 Zx9ESFvp1aZEQKrPQ/u692Imj99GtbUntwK4NPH9ponpK9ErSFoONK8h1FP2hgLG KF8IhmKSy3D2J37r/kLLmmSHJqw52uqIu1UefbUv3fDgHvULd9kKz0eRPNn4dJTv 9+Vv7AbW69v39Owwp2R84y7t5SrPGN/SlqABzii296zmGkXQrWkDwRFk17FJ/KqA RkmMSzkR+hMmAfefd2WcFeUASJDqTDMTxBKiUmEs9/WKSbkTRVa+Z+MRvpnKBTDs Ra6Ya13fbFDKAVXivZiU+fIJkxnCQmPUfbjoZQn6T9FwkC89aVZKnLyPldKcPKg5 BjtoX7/HWrK3ERrV+n3CjqwITZZ4kMWbY8O81PgmM0HFZKdunEdqZCj07O0og7G3 xW/zGGlpXRBeDQa6xAm08ZInl3PTt5yq89Sl6vrNmOsubjrNiP4HNfR5dSHk08Ly 69Cs3SpCrNp64IzISO8QjabCw7oGzZoMrl6bnWaHSmNllOZwkAPTGQjq4kS8kcH5 KAqtF0tgQZcqLRh8dnQI7/WS6r5ClcHnuKQpN+4XXXo6B00Dc0B7ypBMRlPYgKJ8 FoxIMP6HyFTxxEtrndpfC4q6jcleoBRSWXkOYFArFu6az9egW3wlIYCYUpGprDZh Gt6W2uDFcn0rHqEJoXGl =+M2V -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756852AbaEIPpm (ORCPT ); Fri, 9 May 2014 11:45:42 -0400 Received: from mail-pd0-f175.google.com ([209.85.192.175]:54497 "EHLO mail-pd0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754063AbaEIPpk (ORCPT ); Fri, 9 May 2014 11:45:40 -0400 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140509:m.szyprowski@samsung.com::jjklja5rOYzhLe7F:00000000000000000000000000000000000000akA X-Hashcash: 1:20:140509:hannes@cmpxchg.org::sAN8SCTs+YspnYzz:00000000000000000000000000000000000000000001JlD X-Hashcash: 1:20:140509:linux-kernel@vger.kernel.org::FH9rDb1i4aLnWAcD:0000000000000000000000000000000000rAy X-Hashcash: 1:20:140509:iamjoonsoo.kim@lge.com::QWg29gn20YSyv7lG:0000000000000000000000000000000000000002KnL X-Hashcash: 1:20:140509:minchan@kernel.org::fKury2Afn+M7+a9n:00000000000000000000000000000000000000000005mDz X-Hashcash: 1:20:140509:riel@redhat.com::YOdTrduOZTQbvAwK:004Kt6 X-Hashcash: 1:20:140509:akpm@linux-foundation.org::nOjqAUKWDiYQdkwu:0000000000000000000000000000000000005CLZ X-Hashcash: 1:20:140509:lauraa@codeaurora.org::scImUw64u2TsiWeM:00000000000000000000000000000000000000007tsf X-Hashcash: 1:20:140509:heesub.shin@samsung.com::sJ3jcBC5fDjbeXvg:00000000000000000000000000000000000000AyrZ X-Hashcash: 1:20:140509:linux-mm@kvack.org::MDI4DM/vrTPdQ53N:0000000000000000000000000000000000000000000DD99 X-Hashcash: 1:20:140509:mgorman@suse.de::hZbAJekaHk+bQ7CR:00C+MR X-Hashcash: 1:20:140509:iamjoonsoo.kim@lge.com::4mtXAhVviBoVERHh:000000000000000000000000000000000000000FPDj Date: Fri, 09 May 2014 08:45:32 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, May 07 2014, Joonsoo Kim wrote: > Signed-off-by: Joonsoo Kim The code looks good to me, but I don't feel competent on whether the approach is beneficial or not. Still: Acked-by: Michal Nazarewicz --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIbBAEBAgAGBQJTbPgcAAoJECBgQBJQdR/0V5EP9iwYwqxiwv9ZPYqFtCjjAsHH tJomWa/nlEKJ+eVoTwFT2FjpORmux2MHNDCWL+ncpV4Gh3SODbstkiRhJNksiOsz CKrc/amtefoiCkZOLf478Mn845t4a9TitUN3fAPqG4/iPulf1alelymFqaSiTU+I wV5JaQK5KWUnUADR/5UzMCEG1pgyu9SbIHYM2pKljbtFDNrrcE+h10UFepUgiNda onZvB002cdV4KR3ZA1Dw7UcMarL/gSL1GbWiqHuQz0Za2yoPZNtWJtuBoYBfNjfq Nlq0aIrKmx0viXfC4XkdRIJ0lJkEaWz560exmeEXWrO3egd3TtbYjPdZ5nheDUBZ 21ZkTTSYggR33oIasTGiAGFrJNDdX2TebAvulC1vIYZ+7wP53iwHNBQqU6UkpPw+ 0PrLQa1a7THDpoalRkfBCC+HBHBwJvsSGHYlgSvUA/b0EdzuI9CN29Ht+lC/kDqg vCJiO0yykygOaj/JATdP/kNnmF7KhRAJhUc2HQgrGCQ6wpyQ5Tlk8vtL9OUdaH7G W8VnqdRTU39S3j/1YXpJCjOxNr7m5mC6hl9pSkBaWzQ0x/bBi21jWiHdOPNWQnxK Qb+DpilW5ZoSmULo5dwyXIbjVxdoUKJuF9JotBoSP6tDppvXv2LD0a7PoN4oYN6l FEtcIJ2A1XPixQOfVFk= =4Tff -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756871AbaEIPq6 (ORCPT ); Fri, 9 May 2014 11:46:58 -0400 Received: from mail-pa0-f53.google.com ([209.85.220.53]:36439 "EHLO mail-pa0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754063AbaEIPq5 (ORCPT ); Fri, 9 May 2014 11:46:57 -0400 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking In-Reply-To: <1399509144-8898-4-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-4-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140509:linux-mm@kvack.org::47ReBGKVbf2qhLLX:000000000000000000000000000000000000000000008X+ X-Hashcash: 1:20:140509:linux-kernel@vger.kernel.org::K2IGrY22q+I5pYUa:00000000000000000000000000000000004JC X-Hashcash: 1:20:140509:mgorman@suse.de::u0VqgUPMQIOg0VSk:000ElT X-Hashcash: 1:20:140509:iamjoonsoo.kim@lge.com::jMODyFp0GXXMaewK:0000000000000000000000000000000000000000L/5 X-Hashcash: 1:20:140509:akpm@linux-foundation.org::Sldmi56S23zG6Bh8:0000000000000000000000000000000000001E6R X-Hashcash: 1:20:140509:heesub.shin@samsung.com::/UY1cjyEuguVO4vq:000000000000000000000000000000000000001kFu X-Hashcash: 1:20:140509:m.szyprowski@samsung.com::1Y4XhlBXPXMnW1C4:00000000000000000000000000000000000002tFP X-Hashcash: 1:20:140509:lauraa@codeaurora.org::qPgBE4aT23FxY7iB:00000000000000000000000000000000000000004Hzj X-Hashcash: 1:20:140509:riel@redhat.com::20UieBUv2/DNE/0P:003jR7 X-Hashcash: 1:20:140509:hannes@cmpxchg.org::55kNHvp0qz7Mbwfp:00000000000000000000000000000000000000000005pAj X-Hashcash: 1:20:140509:iamjoonsoo.kim@lge.com::EFibjMK+v1QRpwbO:0000000000000000000000000000000000000006rHj X-Hashcash: 1:20:140509:minchan@kernel.org::U/rdXRvAgGCoy05p:00000000000000000000000000000000000000000007rUa Date: Fri, 09 May 2014 08:46:50 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, May 07 2014, Joonsoo Kim wrote: > commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag > for alloc flag and treats free cma pages as free pages if this flag is > passed to watermark checking. Intention of that patch is that movable page > allocation can be be handled from cma reserved region without starting > kswapd. Now, previous patch changes the behaviour of allocator that > movable allocation uses the page on cma reserved region aggressively, > so this watermark hack isn't needed anymore. Therefore remove it. > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJTbPhqAAoJECBgQBJQdR/0W00P/2lA4due77ZgrKd3+b1G+hZW 54DbdgQdTTJZQJZGeCtCKx/v9O4nY6suKmeGQXpleMog4BGEUa/+UMVM8ZGZSwYv DZjqTM4l/lwuK4fU0jEdSKwBmpYL9PnvtLhduY6iEuqW4zxqqZFo3Hkp5fdi++eh XSUl2TTD/p97HqIJrRCjNsBwk67iQ06uH1Xn3BPdPFem4sXiyyuUbWwv2+kwcfJk OICFmLXgMw4SDybGcADT7KTHp94BpDmqIOK4fu+hOGoGYzEQ0ECPZDnVgILRAbc/ mzecpMZWKYdsr/QXboAO7BU9V23x1DedJsJs87/Vq6MjB0PRUIAhUA4q52aI4Q9p i03xO9ulah32J38Xium37xXmTj1unKd2V92q+nyJWd8tMTyAwiTwFZycU7WoeT+7 oSUzVXfqW/Lq9idLFyALyRjs7iq0ofaeW1xaQs+qeVNK/Pq6X0NtEsB8n2AEjZuh Upy2h873IHhpT/YM4ZmxkL0VihqZOd6ofojgGXAj3Z+M9z8iQMEYeV9SwxM0URy0 d3IFE1fR0zWWZGJeWikKuv+iQk1lqIpD7fyEcqHJER2F8SBirtKFtIxbNec1tXEU vbPOogilTy8lzdRq9dlft/iF93ogOcGAzGSrgtJshYlMdsH7yWWN0d/KCYjjuF7T Q0ACZTmO0v/QIsNfEU4y =BUaR -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932530AbaELREP (ORCPT ); Mon, 12 May 2014 13:04:15 -0400 Received: from smtp.codeaurora.org ([198.145.11.231]:40574 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753827AbaELREM (ORCPT ); Mon, 12 May 2014 13:04:12 -0400 Message-ID: <5370FF1D.10707@codeaurora.org> Date: Mon, 12 May 2014 10:04:29 -0700 From: Laura Abbott User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Joonsoo Kim , Andrew Morton CC: Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 5/7/2014 5:32 PM, Joonsoo Kim wrote: > CMA is introduced to provide physically contiguous pages at runtime. > For this purpose, it reserves memory at boot time. Although it reserve > memory, this reserved memory can be used for movable memory allocation > request. This usecase is beneficial to the system that needs this CMA > reserved memory infrequently and it is one of main purpose of > introducing CMA. > > But, there is a problem in current implementation. The problem is that > it works like as just reserved memory approach. The pages on cma reserved > memory are hardly used for movable memory allocation. This is caused by > combination of allocation and reclaim policy. > > The pages on cma reserved memory are allocated if there is no movable > memory, that is, as fallback allocation. So the time this fallback > allocation is started is under heavy memory pressure. Although it is under > memory pressure, movable allocation easily succeed, since there would be > many pages on cma reserved memory. But this is not the case for unmovable > and reclaimable allocation, because they can't use the pages on cma > reserved memory. These allocations regard system's free memory as > (free pages - free cma pages) on watermark checking, that is, free > unmovable pages + free reclaimable pages + free movable pages. Because > we already exhausted movable pages, only free pages we have are unmovable > and reclaimable types and this would be really small amount. So watermark > checking would be failed. It will wake up kswapd to make enough free > memory for unmovable and reclaimable allocation and kswapd will do. > So before we fully utilize pages on cma reserved memory, kswapd start to > reclaim memory and try to make free memory over the high watermark. This > watermark checking by kswapd doesn't take care free cma pages so many > movable pages would be reclaimed. After then, we have a lot of movable > pages again, so fallback allocation doesn't happen again. To conclude, > amount of free memory on meminfo which includes free CMA pages is moving > around 512 MB if I reserve 512 MB memory for CMA. > > I found this problem on following experiment. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > > To solve this problem, I can think following 2 possible solutions. > 1. allocate the pages on cma reserved memory first, and if they are > exhausted, allocate movable pages. > 2. interleaved allocation: try to allocate specific amounts of memory > from cma reserved memory and then allocate from free movable memory. > > I tested #1 approach and found the problem. Although free memory on > meminfo can move around low watermark, there is large fluctuation on free > memory, because too many pages are reclaimed when kswapd is invoked. > Reason for this behaviour is that successive allocated CMA pages are > on the LRU list in that order and kswapd reclaim them in same order. > These memory doesn't help watermark checking from kwapd, so too many > pages are reclaimed, I guess. > We have an out of tree implementation of #1 and so far it's worked for us although we weren't looking at the same metrics. I don't completely understand the issue you pointed out with #1. It sounds like the issue is that CMA pages are already in use by other processes and on LRU lists and because the pages are on LRU lists these aren't counted towards the watermark by kswapd. Is my understanding correct? > So, I implement #2 approach. > One thing I should note is that we should not change allocation target > (movable list or cma) on each allocation attempt, since this prevent > allocated pages to be in physically succession, so some I/O devices can > be hurt their performance. To solve this, I keep allocation target > in at least pageblock_nr_pages attempts and make this number reflect > ratio, free pages without free cma pages to free cma pages. With this > approach, system works very smoothly and fully utilize the pages on > cma reserved memory. > > Following is the experimental result of this patch. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > pswpin: 7 110064 > pswpout: 452 767502 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.2 235.6 > Average-MemFree: 281651 KB 290227 KB > pswpin: 8 8 > pswpout: 430 510 > > There is no difference if we don't have cma reserved memory (0 MB case). > But, with cma reserved memory (512 MB case), we fully utilize these > reserved memory through this patch and the system behaves like as > it doesn't reserve any memory. What metric are you using to determine all CMA memory was fully used? We've been checking /proc/pagetypeinfo > > With this patch, we aggressively allocate the pages on cma reserved memory > so latency of CMA can arise. Below is the experimental result about > latency. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > CMA reserve: 512 MB > Backgound Workload: make -jN > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > N: 1 4 8 16 > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > So generally we can see latency increase. Ratio of this increase > is rather big - up to 70%. But, under the heavy workload, it shows > latency decrease - up to 55%. This may be worst-case scenario, but > reducing it would be important for some system, so, I can say that > this patch have advantages and disadvantages in terms of latency. > Do you have any statistics related to failed migration from this? Latency and utilization are issues but so is migration success. In the past we've found that an increase in CMA utilization was related to increase in CMA migration failures because pages were unmigratable. The current workaround for this is limiting CMA pages to be used for user processes only and not the file cache. Both of these have their own problems. > Signed-off-by: Joonsoo Kim > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fac5509..3ff24d4 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -389,6 +389,12 @@ struct zone { > int compact_order_failed; > #endif > > +#ifdef CONFIG_CMA > + int has_cma; > + int nr_try_cma; > + int nr_try_movable; > +#endif > + > ZONE_PADDING(_pad1_) > > /* Fields commonly accessed by the page reclaim scanner */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 674ade7..6f2b27b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > } > > #ifdef CONFIG_CMA > +void __init init_alloc_ratio_counter(struct zone *zone) > +{ > + if (zone->has_cma) > + return; > + > + zone->has_cma = 1; > + zone->nr_try_movable = 0; > + zone->nr_try_cma = 0; > +} > + > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > void __init init_cma_reserved_pageblock(struct page *page) > { > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > set_pageblock_migratetype(page, MIGRATE_CMA); > __free_pages(page, pageblock_order); > adjust_managed_page_count(page, pageblock_nr_pages); > + init_alloc_ratio_counter(page_zone(page)); > } > #endif > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > return NULL; > } > > +#ifdef CONFIG_CMA > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > + int migratetype) > +{ > + long free, free_cma, free_wmark; > + struct page *page; > + > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > + return NULL; > + > + if (zone->nr_try_movable) > + goto alloc_movable; > + > +alloc_cma: > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > + > + /* Reset ratio counter */ > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + /* No cma free pages, so recharge only movable allocation */ > + if (free_cma <= 0) { > + zone->nr_try_movable = pageblock_nr_pages; > + goto alloc_movable; > + } > + > + free = zone_page_state(zone, NR_FREE_PAGES); > + free_wmark = free - free_cma - high_wmark_pages(zone); > + > + /* > + * free_wmark is below than 0, and it means that normal pages > + * are under the pressure, so we recharge only cma allocation. > + */ > + if (free_wmark <= 0) { > + zone->nr_try_cma = pageblock_nr_pages; > + goto alloc_cma; > + } > + > + if (free_wmark > free_cma) { > + zone->nr_try_movable = > + (free_wmark * pageblock_nr_pages) / free_cma; > + zone->nr_try_cma = pageblock_nr_pages; > + } else { > + zone->nr_try_movable = pageblock_nr_pages; > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > + } > + > + /* Reset complete, start on movable first */ > +alloc_movable: > + zone->nr_try_movable--; > + return NULL; > +} > +#endif > + > /* > * Do the hard work of removing an element from the buddy allocator. > * Call me with the zone->lock already held. > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > static struct page *__rmqueue(struct zone *zone, unsigned int order, > int migratetype) > { > - struct page *page; > + struct page *page = NULL; > + > + if (IS_ENABLED(CONFIG_CMA)) > + page = __rmqueue_cma(zone, order, migratetype); > > retry_reserve: > - page = __rmqueue_smallest(zone, order, migratetype); > + if (!page) > + page = __rmqueue_smallest(zone, order, migratetype); > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > page = __rmqueue_fallback(zone, order, migratetype); > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > zone_seqlock_init(zone); > zone->zone_pgdat = pgdat; > zone_pcp_init(zone); > + if (IS_ENABLED(CONFIG_CMA)) > + zone->has_cma = 0; > > /* For bootup, initialized properly in watermark setup */ > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > I'm going to see about running this through tests internally for comparison. Hopefully I'll get useful results in a day or so. Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757994AbaEMBM0 (ORCPT ); Mon, 12 May 2014 21:12:26 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:50397 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756212AbaEMBMW (ORCPT ); Mon, 12 May 2014 21:12:22 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Tue, 13 May 2014 10:14:27 +0900 From: Joonsoo Kim To: Laura Abbott Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140513011426.GB23803@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5370FF1D.10707@codeaurora.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 12, 2014 at 10:04:29AM -0700, Laura Abbott wrote: > Hi, > > On 5/7/2014 5:32 PM, Joonsoo Kim wrote: > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > > > I tested #1 approach and found the problem. Although free memory on > > meminfo can move around low watermark, there is large fluctuation on free > > memory, because too many pages are reclaimed when kswapd is invoked. > > Reason for this behaviour is that successive allocated CMA pages are > > on the LRU list in that order and kswapd reclaim them in same order. > > These memory doesn't help watermark checking from kwapd, so too many > > pages are reclaimed, I guess. > > > > We have an out of tree implementation of #1 and so far it's worked for us > although we weren't looking at the same metrics. I don't completely > understand the issue you pointed out with #1. It sounds like the issue is > that CMA pages are already in use by other processes and on LRU lists and > because the pages are on LRU lists these aren't counted towards the > watermark by kswapd. Is my understanding correct? Hello, Yes, your understanding is correct. kswapd want to reclaim normal (not CMA) pages, but LRU lists could have a lot of CMA pages continuously by #1 approach, so watermark aren't restored easily. > > > So, I implement #2 approach. > > One thing I should note is that we should not change allocation target > > (movable list or cma) on each allocation attempt, since this prevent > > allocated pages to be in physically succession, so some I/O devices can > > be hurt their performance. To solve this, I keep allocation target > > in at least pageblock_nr_pages attempts and make this number reflect > > ratio, free pages without free cma pages to free cma pages. With this > > approach, system works very smoothly and fully utilize the pages on > > cma reserved memory. > > > > Following is the experimental result of this patch. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > pswpin: 7 110064 > > pswpout: 452 767502 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.2 235.6 > > Average-MemFree: 281651 KB 290227 KB > > pswpin: 8 8 > > pswpout: 430 510 > > > > There is no difference if we don't have cma reserved memory (0 MB case). > > But, with cma reserved memory (512 MB case), we fully utilize these > > reserved memory through this patch and the system behaves like as > > it doesn't reserve any memory. > > What metric are you using to determine all CMA memory was fully used? > We've been checking /proc/pagetypeinfo In this result, we can check whether CMA memory was used more or not by MemFree stat. I used /proc/zoneinfo to get an insight. > > > > With this patch, we aggressively allocate the pages on cma reserved memory > > so latency of CMA can arise. Below is the experimental result about > > latency. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > CMA reserve: 512 MB > > Backgound Workload: make -jN > > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > > > N: 1 4 8 16 > > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > > > So generally we can see latency increase. Ratio of this increase > > is rather big - up to 70%. But, under the heavy workload, it shows > > latency decrease - up to 55%. This may be worst-case scenario, but > > reducing it would be important for some system, so, I can say that > > this patch have advantages and disadvantages in terms of latency. > > > > Do you have any statistics related to failed migration from this? Latency > and utilization are issues but so is migration success. In the past we've > found that an increase in CMA utilization was related to increase in CMA > migration failures because pages were unmigratable. The current > workaround for this is limiting CMA pages to be used for user processes > only and not the file cache. Both of these have their own problems. I have the retrying number when doing 8 MB CMA allocation 20 times. These number are average of 5 runs. N: 1 4 8 16 Retrying(Before): 0 0 0.6 12.2 Retrying(After): 1.4 1.8 3 3.6 If you know any permanent failure case with file cache pages, please let me know. What I already know CMA migration failure about file cache pages is the problems related to buffer_head lru, which you mentioned before. > > Signed-off-by: Joonsoo Kim > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index fac5509..3ff24d4 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -389,6 +389,12 @@ struct zone { > > int compact_order_failed; > > #endif > > > > +#ifdef CONFIG_CMA > > + int has_cma; > > + int nr_try_cma; > > + int nr_try_movable; > > +#endif > > + > > ZONE_PADDING(_pad1_) > > > > /* Fields commonly accessed by the page reclaim scanner */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 674ade7..6f2b27b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > > } > > > > #ifdef CONFIG_CMA > > +void __init init_alloc_ratio_counter(struct zone *zone) > > +{ > > + if (zone->has_cma) > > + return; > > + > > + zone->has_cma = 1; > > + zone->nr_try_movable = 0; > > + zone->nr_try_cma = 0; > > +} > > + > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > > void __init init_cma_reserved_pageblock(struct page *page) > > { > > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > > set_pageblock_migratetype(page, MIGRATE_CMA); > > __free_pages(page, pageblock_order); > > adjust_managed_page_count(page, pageblock_nr_pages); > > + init_alloc_ratio_counter(page_zone(page)); > > } > > #endif > > > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > return NULL; > > } > > > > +#ifdef CONFIG_CMA > > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > > + int migratetype) > > +{ > > + long free, free_cma, free_wmark; > > + struct page *page; > > + > > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > > + return NULL; > > + > > + if (zone->nr_try_movable) > > + goto alloc_movable; > > + > > +alloc_cma: > > + if (zone->nr_try_cma) { > > + /* Okay. Now, we can try to allocate the page from cma region */ > > + zone->nr_try_cma--; > > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > > + > > + /* CMA pages can vanish through CMA allocation */ > > + if (unlikely(!page && order == 0)) > > + zone->nr_try_cma = 0; > > + > > + return page; > > + } > > + > > + /* Reset ratio counter */ > > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > > + > > + /* No cma free pages, so recharge only movable allocation */ > > + if (free_cma <= 0) { > > + zone->nr_try_movable = pageblock_nr_pages; > > + goto alloc_movable; > > + } > > + > > + free = zone_page_state(zone, NR_FREE_PAGES); > > + free_wmark = free - free_cma - high_wmark_pages(zone); > > + > > + /* > > + * free_wmark is below than 0, and it means that normal pages > > + * are under the pressure, so we recharge only cma allocation. > > + */ > > + if (free_wmark <= 0) { > > + zone->nr_try_cma = pageblock_nr_pages; > > + goto alloc_cma; > > + } > > + > > + if (free_wmark > free_cma) { > > + zone->nr_try_movable = > > + (free_wmark * pageblock_nr_pages) / free_cma; > > + zone->nr_try_cma = pageblock_nr_pages; > > + } else { > > + zone->nr_try_movable = pageblock_nr_pages; > > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > > + } > > + > > + /* Reset complete, start on movable first */ > > +alloc_movable: > > + zone->nr_try_movable--; > > + return NULL; > > +} > > +#endif > > + > > /* > > * Do the hard work of removing an element from the buddy allocator. > > * Call me with the zone->lock already held. > > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > static struct page *__rmqueue(struct zone *zone, unsigned int order, > > int migratetype) > > { > > - struct page *page; > > + struct page *page = NULL; > > + > > + if (IS_ENABLED(CONFIG_CMA)) > > + page = __rmqueue_cma(zone, order, migratetype); > > > > retry_reserve: > > - page = __rmqueue_smallest(zone, order, migratetype); > > + if (!page) > > + page = __rmqueue_smallest(zone, order, migratetype); > > > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > > page = __rmqueue_fallback(zone, order, migratetype); > > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > > zone_seqlock_init(zone); > > zone->zone_pgdat = pgdat; > > zone_pcp_init(zone); > > + if (IS_ENABLED(CONFIG_CMA)) > > + zone->has_cma = 0; > > > > /* For bootup, initialized properly in watermark setup */ > > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > > > > I'm going to see about running this through tests internally for comparison. > Hopefully I'll get useful results in a day or so. Okay. I really hope to see your result. :) Thanks for your interest. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752869AbaEMCX6 (ORCPT ); Mon, 12 May 2014 22:23:58 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:52355 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752099AbaEMCX5 (ORCPT ); Mon, 12 May 2014 22:23:57 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Tue, 13 May 2014 11:26:03 +0900 From: Joonsoo Kim To: Marek Szyprowski Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , "'Tomasz Stanislawski'" Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140513022603.GF23803@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <536CCC78.6050806@samsung.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote: > Hello, > > On 2014-05-08 02:32, Joonsoo Kim wrote: > >This series tries to improve CMA. > > > >CMA is introduced to provide physically contiguous pages at runtime > >without reserving memory area. But, current implementation works like as > >reserving memory approach, because allocation on cma reserved region only > >occurs as fallback of migrate_movable allocation. We can allocate from it > >when there is no movable page. In that situation, kswapd would be invoked > >easily since unmovable and reclaimable allocation consider > >(free pages - free CMA pages) as free memory on the system and free memory > >may be lower than high watermark in that case. If kswapd start to reclaim > >memory, then fallback allocation doesn't occur much. > > > >In my experiment, I found that if system memory has 1024 MB memory and > >has 512 MB reserved memory for CMA, kswapd is mostly invoked around > >the 512MB free memory boundary. And invoked kswapd tries to make free > >memory until (free pages - free CMA pages) is higher than high watermark, > >so free memory on meminfo is moving around 512MB boundary consistently. > > > >To fix this problem, we should allocate the pages on cma reserved memory > >more aggressively and intelligenetly. Patch 2 implements the solution. > >Patch 1 is the simple optimization which remove useless re-trial and patch 3 > >is for removing useless alloc flag, so these are not important. > >See patch 2 for more detailed description. > > > >This patchset is based on v3.15-rc4. > > Thanks for posting those patches. It basically reminds me the > following discussion: > http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 > > Your approach is basically the same. I hope that your patches can be > improved > in such a way that they will be accepted by mm maintainers. I only > wonder if the > third patch is really necessary. Without it kswapd wakeup might be > still avoided > in some cases. Hello, Oh... I didn't know that patch and discussion, because I have no interest on CMA at that time. Your approach looks similar to #1 approach of mine and could have same problem of #1 approach which I mentioned in patch 2/3. Please refer that patch description. :) And, there is different purpose between this and yours. This patch is intended to better use of CMA pages and so get maximum performance. Just to not trigger oom, it can be possible to put this logic on reclaim path. But that is sub-optimal to get higher performance, because it needs migration in some cases. If second patch works as intended, there are just a few of cma free pages when we are toward on the watermark. So benefit of third patch would be marginal and we can remove ALLOC_CMA. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752597AbaEMC6i (ORCPT ); Mon, 12 May 2014 22:58:38 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:42828 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751829AbaEMC6h (ORCPT ); Mon, 12 May 2014 22:58:37 -0400 X-Original-SENDERIP: 10.177.220.169 X-Original-MAILFROM: minchan.kim@lge.com Date: Tue, 13 May 2014 12:00:57 +0900 From: Minchan Kim To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140513030057.GC32092@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey Joonsoo, On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > CMA is introduced to provide physically contiguous pages at runtime. > For this purpose, it reserves memory at boot time. Although it reserve > memory, this reserved memory can be used for movable memory allocation > request. This usecase is beneficial to the system that needs this CMA > reserved memory infrequently and it is one of main purpose of > introducing CMA. > > But, there is a problem in current implementation. The problem is that > it works like as just reserved memory approach. The pages on cma reserved > memory are hardly used for movable memory allocation. This is caused by > combination of allocation and reclaim policy. > > The pages on cma reserved memory are allocated if there is no movable > memory, that is, as fallback allocation. So the time this fallback > allocation is started is under heavy memory pressure. Although it is under > memory pressure, movable allocation easily succeed, since there would be > many pages on cma reserved memory. But this is not the case for unmovable > and reclaimable allocation, because they can't use the pages on cma > reserved memory. These allocations regard system's free memory as > (free pages - free cma pages) on watermark checking, that is, free > unmovable pages + free reclaimable pages + free movable pages. Because > we already exhausted movable pages, only free pages we have are unmovable > and reclaimable types and this would be really small amount. So watermark > checking would be failed. It will wake up kswapd to make enough free > memory for unmovable and reclaimable allocation and kswapd will do. > So before we fully utilize pages on cma reserved memory, kswapd start to > reclaim memory and try to make free memory over the high watermark. This > watermark checking by kswapd doesn't take care free cma pages so many > movable pages would be reclaimed. After then, we have a lot of movable > pages again, so fallback allocation doesn't happen again. To conclude, > amount of free memory on meminfo which includes free CMA pages is moving > around 512 MB if I reserve 512 MB memory for CMA. > > I found this problem on following experiment. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > > To solve this problem, I can think following 2 possible solutions. > 1. allocate the pages on cma reserved memory first, and if they are > exhausted, allocate movable pages. > 2. interleaved allocation: try to allocate specific amounts of memory > from cma reserved memory and then allocate from free movable memory. I love this idea but when I see the code, I don't like that. In allocation path, just try to allocate pages by round-robin so it's role of allocator. If one of migratetype is full, just pass mission to reclaimer with hint(ie, Hey reclaimer, it's non-movable allocation fail so there is pointless if you reclaim MIGRATE_CMA pages) so that reclaimer can filter it out during page scanning. We already have an tool to achieve it(ie, isolate_mode_t). And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? If possible, it would be better becauser it's generic function to check free pages and cause trigger reclaim/compaction logic. > > I tested #1 approach and found the problem. Although free memory on > meminfo can move around low watermark, there is large fluctuation on free > memory, because too many pages are reclaimed when kswapd is invoked. > Reason for this behaviour is that successive allocated CMA pages are > on the LRU list in that order and kswapd reclaim them in same order. > These memory doesn't help watermark checking from kwapd, so too many > pages are reclaimed, I guess. > > So, I implement #2 approach. > One thing I should note is that we should not change allocation target > (movable list or cma) on each allocation attempt, since this prevent > allocated pages to be in physically succession, so some I/O devices can > be hurt their performance. To solve this, I keep allocation target > in at least pageblock_nr_pages attempts and make this number reflect > ratio, free pages without free cma pages to free cma pages. With this > approach, system works very smoothly and fully utilize the pages on > cma reserved memory. > > Following is the experimental result of this patch. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > pswpin: 7 110064 > pswpout: 452 767502 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.2 235.6 > Average-MemFree: 281651 KB 290227 KB > pswpin: 8 8 > pswpout: 430 510 > > There is no difference if we don't have cma reserved memory (0 MB case). > But, with cma reserved memory (512 MB case), we fully utilize these > reserved memory through this patch and the system behaves like as > it doesn't reserve any memory. > > With this patch, we aggressively allocate the pages on cma reserved memory > so latency of CMA can arise. Below is the experimental result about > latency. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > CMA reserve: 512 MB > Backgound Workload: make -jN > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > N: 1 4 8 16 > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > So generally we can see latency increase. Ratio of this increase > is rather big - up to 70%. But, under the heavy workload, it shows > latency decrease - up to 55%. This may be worst-case scenario, but > reducing it would be important for some system, so, I can say that > this patch have advantages and disadvantages in terms of latency. > > Signed-off-by: Joonsoo Kim > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fac5509..3ff24d4 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -389,6 +389,12 @@ struct zone { > int compact_order_failed; > #endif > > +#ifdef CONFIG_CMA > + int has_cma; > + int nr_try_cma; > + int nr_try_movable; > +#endif > + > ZONE_PADDING(_pad1_) > > /* Fields commonly accessed by the page reclaim scanner */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 674ade7..6f2b27b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > } > > #ifdef CONFIG_CMA > +void __init init_alloc_ratio_counter(struct zone *zone) > +{ > + if (zone->has_cma) > + return; > + > + zone->has_cma = 1; > + zone->nr_try_movable = 0; > + zone->nr_try_cma = 0; > +} > + > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > void __init init_cma_reserved_pageblock(struct page *page) > { > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > set_pageblock_migratetype(page, MIGRATE_CMA); > __free_pages(page, pageblock_order); > adjust_managed_page_count(page, pageblock_nr_pages); > + init_alloc_ratio_counter(page_zone(page)); > } > #endif > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > return NULL; > } > > +#ifdef CONFIG_CMA > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > + int migratetype) > +{ > + long free, free_cma, free_wmark; > + struct page *page; > + > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > + return NULL; > + > + if (zone->nr_try_movable) > + goto alloc_movable; > + > +alloc_cma: > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > + > + /* Reset ratio counter */ > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + /* No cma free pages, so recharge only movable allocation */ > + if (free_cma <= 0) { > + zone->nr_try_movable = pageblock_nr_pages; > + goto alloc_movable; > + } > + > + free = zone_page_state(zone, NR_FREE_PAGES); > + free_wmark = free - free_cma - high_wmark_pages(zone); > + > + /* > + * free_wmark is below than 0, and it means that normal pages > + * are under the pressure, so we recharge only cma allocation. > + */ > + if (free_wmark <= 0) { > + zone->nr_try_cma = pageblock_nr_pages; > + goto alloc_cma; > + } > + > + if (free_wmark > free_cma) { > + zone->nr_try_movable = > + (free_wmark * pageblock_nr_pages) / free_cma; > + zone->nr_try_cma = pageblock_nr_pages; > + } else { > + zone->nr_try_movable = pageblock_nr_pages; > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > + } > + > + /* Reset complete, start on movable first */ > +alloc_movable: > + zone->nr_try_movable--; > + return NULL; > +} > +#endif > + > /* > * Do the hard work of removing an element from the buddy allocator. > * Call me with the zone->lock already held. > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > static struct page *__rmqueue(struct zone *zone, unsigned int order, > int migratetype) > { > - struct page *page; > + struct page *page = NULL; > + > + if (IS_ENABLED(CONFIG_CMA)) > + page = __rmqueue_cma(zone, order, migratetype); > > retry_reserve: > - page = __rmqueue_smallest(zone, order, migratetype); > + if (!page) > + page = __rmqueue_smallest(zone, order, migratetype); > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > page = __rmqueue_fallback(zone, order, migratetype); > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > zone_seqlock_init(zone); > zone->zone_pgdat = pgdat; > zone_pcp_init(zone); > + if (IS_ENABLED(CONFIG_CMA)) > + zone->has_cma = 0; > > /* For bootup, initialized properly in watermark setup */ > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > -- > 1.7.9.5 > > _______________________________________________ > OTC mailing list > OTC@blackduck.lge.com > http://blackduck.lge.com/mailman/listinfo/otc -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753260AbaEMDDF (ORCPT ); Mon, 12 May 2014 23:03:05 -0400 Received: from lgeamrelo04.lge.com ([156.147.1.127]:62661 "EHLO lgeamrelo04.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751085AbaEMDDD (ORCPT ); Mon, 12 May 2014 23:03:03 -0400 X-Original-SENDERIP: 10.177.220.169 X-Original-MAILFROM: minchan@kernel.org Date: Tue, 13 May 2014 12:05:23 +0900 From: Minchan Kim To: Laura Abbott Cc: Joonsoo Kim , Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140513030523.GD32092@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5370FF1D.10707@codeaurora.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 12, 2014 at 10:04:29AM -0700, Laura Abbott wrote: > Hi, > > On 5/7/2014 5:32 PM, Joonsoo Kim wrote: > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > > > I tested #1 approach and found the problem. Although free memory on > > meminfo can move around low watermark, there is large fluctuation on free > > memory, because too many pages are reclaimed when kswapd is invoked. > > Reason for this behaviour is that successive allocated CMA pages are > > on the LRU list in that order and kswapd reclaim them in same order. > > These memory doesn't help watermark checking from kwapd, so too many > > pages are reclaimed, I guess. > > > > We have an out of tree implementation of #1 and so far it's worked for us > although we weren't looking at the same metrics. I don't completely > understand the issue you pointed out with #1. It sounds like the issue is > that CMA pages are already in use by other processes and on LRU lists and > because the pages are on LRU lists these aren't counted towards the > watermark by kswapd. Is my understanding correct? Kswapd could reclaim MIGRATE_CMA pages unconditionally although allocator patch was failed by non-movable allocation. It's pointless and should fix. > > > So, I implement #2 approach. > > One thing I should note is that we should not change allocation target > > (movable list or cma) on each allocation attempt, since this prevent > > allocated pages to be in physically succession, so some I/O devices can > > be hurt their performance. To solve this, I keep allocation target > > in at least pageblock_nr_pages attempts and make this number reflect > > ratio, free pages without free cma pages to free cma pages. With this > > approach, system works very smoothly and fully utilize the pages on > > cma reserved memory. > > > > Following is the experimental result of this patch. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > pswpin: 7 110064 > > pswpout: 452 767502 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.2 235.6 > > Average-MemFree: 281651 KB 290227 KB > > pswpin: 8 8 > > pswpout: 430 510 > > > > There is no difference if we don't have cma reserved memory (0 MB case). > > But, with cma reserved memory (512 MB case), we fully utilize these > > reserved memory through this patch and the system behaves like as > > it doesn't reserve any memory. > > What metric are you using to determine all CMA memory was fully used? > We've been checking /proc/pagetypeinfo > > > > > With this patch, we aggressively allocate the pages on cma reserved memory > > so latency of CMA can arise. Below is the experimental result about > > latency. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > CMA reserve: 512 MB > > Backgound Workload: make -jN > > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > > > N: 1 4 8 16 > > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > > > So generally we can see latency increase. Ratio of this increase > > is rather big - up to 70%. But, under the heavy workload, it shows > > latency decrease - up to 55%. This may be worst-case scenario, but > > reducing it would be important for some system, so, I can say that > > this patch have advantages and disadvantages in terms of latency. > > > > Do you have any statistics related to failed migration from this? Latency > and utilization are issues but so is migration success. In the past we've > found that an increase in CMA utilization was related to increase in CMA > migration failures because pages were unmigratable. The current > workaround for this is limiting CMA pages to be used for user processes > only and not the file cache. Both of these have their own problems. If Joonsoo's patch makes fail ratio higher, it would be okay to me because we have more report from them and have a chance to fix it. It's better than hiding the problem of CMA with some hack. > > > Signed-off-by: Joonsoo Kim > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index fac5509..3ff24d4 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -389,6 +389,12 @@ struct zone { > > int compact_order_failed; > > #endif > > > > +#ifdef CONFIG_CMA > > + int has_cma; > > + int nr_try_cma; > > + int nr_try_movable; > > +#endif > > + > > ZONE_PADDING(_pad1_) > > > > /* Fields commonly accessed by the page reclaim scanner */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 674ade7..6f2b27b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > > } > > > > #ifdef CONFIG_CMA > > +void __init init_alloc_ratio_counter(struct zone *zone) > > +{ > > + if (zone->has_cma) > > + return; > > + > > + zone->has_cma = 1; > > + zone->nr_try_movable = 0; > > + zone->nr_try_cma = 0; > > +} > > + > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > > void __init init_cma_reserved_pageblock(struct page *page) > > { > > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > > set_pageblock_migratetype(page, MIGRATE_CMA); > > __free_pages(page, pageblock_order); > > adjust_managed_page_count(page, pageblock_nr_pages); > > + init_alloc_ratio_counter(page_zone(page)); > > } > > #endif > > > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > return NULL; > > } > > > > +#ifdef CONFIG_CMA > > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > > + int migratetype) > > +{ > > + long free, free_cma, free_wmark; > > + struct page *page; > > + > > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > > + return NULL; > > + > > + if (zone->nr_try_movable) > > + goto alloc_movable; > > + > > +alloc_cma: > > + if (zone->nr_try_cma) { > > + /* Okay. Now, we can try to allocate the page from cma region */ > > + zone->nr_try_cma--; > > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > > + > > + /* CMA pages can vanish through CMA allocation */ > > + if (unlikely(!page && order == 0)) > > + zone->nr_try_cma = 0; > > + > > + return page; > > + } > > + > > + /* Reset ratio counter */ > > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > > + > > + /* No cma free pages, so recharge only movable allocation */ > > + if (free_cma <= 0) { > > + zone->nr_try_movable = pageblock_nr_pages; > > + goto alloc_movable; > > + } > > + > > + free = zone_page_state(zone, NR_FREE_PAGES); > > + free_wmark = free - free_cma - high_wmark_pages(zone); > > + > > + /* > > + * free_wmark is below than 0, and it means that normal pages > > + * are under the pressure, so we recharge only cma allocation. > > + */ > > + if (free_wmark <= 0) { > > + zone->nr_try_cma = pageblock_nr_pages; > > + goto alloc_cma; > > + } > > + > > + if (free_wmark > free_cma) { > > + zone->nr_try_movable = > > + (free_wmark * pageblock_nr_pages) / free_cma; > > + zone->nr_try_cma = pageblock_nr_pages; > > + } else { > > + zone->nr_try_movable = pageblock_nr_pages; > > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > > + } > > + > > + /* Reset complete, start on movable first */ > > +alloc_movable: > > + zone->nr_try_movable--; > > + return NULL; > > +} > > +#endif > > + > > /* > > * Do the hard work of removing an element from the buddy allocator. > > * Call me with the zone->lock already held. > > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > static struct page *__rmqueue(struct zone *zone, unsigned int order, > > int migratetype) > > { > > - struct page *page; > > + struct page *page = NULL; > > + > > + if (IS_ENABLED(CONFIG_CMA)) > > + page = __rmqueue_cma(zone, order, migratetype); > > > > retry_reserve: > > - page = __rmqueue_smallest(zone, order, migratetype); > > + if (!page) > > + page = __rmqueue_smallest(zone, order, migratetype); > > > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > > page = __rmqueue_fallback(zone, order, migratetype); > > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > > zone_seqlock_init(zone); > > zone->zone_pgdat = pgdat; > > zone_pcp_init(zone); > > + if (IS_ENABLED(CONFIG_CMA)) > > + zone->has_cma = 0; > > > > /* For bootup, initialized properly in watermark setup */ > > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > > > > I'm going to see about running this through tests internally for comparison. > Hopefully I'll get useful results in a day or so. > > Thanks, > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753514AbaENImk (ORCPT ); Wed, 14 May 2014 04:42:40 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:54530 "EHLO e23smtp01.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751253AbaENIme (ORCPT ); Wed, 14 May 2014 04:42:34 -0400 From: "Aneesh Kumar K.V" To: Joonsoo Kim , Andrew Morton Cc: Rik van Riel , Johannes Weiner , Mel Gorman , Joonsoo Kim , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used In-Reply-To: <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.18~rc0+2~gbc64cdc (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Wed, 14 May 2014 14:12:19 +0530 Message-ID: <8761l8ah04.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14051408-1618-0000-0000-0000003854A4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Joonsoo Kim writes: > CMA is introduced to provide physically contiguous pages at runtime. > For this purpose, it reserves memory at boot time. Although it reserve > memory, this reserved memory can be used for movable memory allocation > request. This usecase is beneficial to the system that needs this CMA > reserved memory infrequently and it is one of main purpose of > introducing CMA. > > But, there is a problem in current implementation. The problem is that > it works like as just reserved memory approach. The pages on cma reserved > memory are hardly used for movable memory allocation. This is caused by > combination of allocation and reclaim policy. > > The pages on cma reserved memory are allocated if there is no movable > memory, that is, as fallback allocation. So the time this fallback > allocation is started is under heavy memory pressure. Although it is under > memory pressure, movable allocation easily succeed, since there would be > many pages on cma reserved memory. But this is not the case for unmovable > and reclaimable allocation, because they can't use the pages on cma > reserved memory. These allocations regard system's free memory as > (free pages - free cma pages) on watermark checking, that is, free > unmovable pages + free reclaimable pages + free movable pages. Because > we already exhausted movable pages, only free pages we have are unmovable > and reclaimable types and this would be really small amount. So watermark > checking would be failed. It will wake up kswapd to make enough free > memory for unmovable and reclaimable allocation and kswapd will do. > So before we fully utilize pages on cma reserved memory, kswapd start to > reclaim memory and try to make free memory over the high watermark. This > watermark checking by kswapd doesn't take care free cma pages so many > movable pages would be reclaimed. After then, we have a lot of movable > pages again, so fallback allocation doesn't happen again. To conclude, > amount of free memory on meminfo which includes free CMA pages is moving > around 512 MB if I reserve 512 MB memory for CMA. Another issue i am facing with the current code is the atomic allocation failing even with large number of CMA pages around. In my case we never reclaimed because large part of the memory is consumed by the page cache and for that, free memory check doesn't include at free_cma. I will test with this patchset and update here once i have the results. > > I found this problem on following experiment. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > > To solve this problem, I can think following 2 possible solutions. > 1. allocate the pages on cma reserved memory first, and if they are > exhausted, allocate movable pages. > 2. interleaved allocation: try to allocate specific amounts of memory > from cma reserved memory and then allocate from free movable memory. > > I tested #1 approach and found the problem. Although free memory on > meminfo can move around low watermark, there is large fluctuation on free > memory, because too many pages are reclaimed when kswapd is invoked. > Reason for this behaviour is that successive allocated CMA pages are > on the LRU list in that order and kswapd reclaim them in same order. > These memory doesn't help watermark checking from kwapd, so too many > pages are reclaimed, I guess. > > So, I implement #2 approach. > One thing I should note is that we should not change allocation target > (movable list or cma) on each allocation attempt, since this prevent > allocated pages to be in physically succession, so some I/O devices can > be hurt their performance. To solve this, I keep allocation target > in at least pageblock_nr_pages attempts and make this number reflect > ratio, free pages without free cma pages to free cma pages. With this > approach, system works very smoothly and fully utilize the pages on > cma reserved memory. > > Following is the experimental result of this patch. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > make -j24 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.8 361.8 > Average-MemFree: 283880 KB 530851 KB > pswpin: 7 110064 > pswpout: 452 767502 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 234.2 235.6 > Average-MemFree: 281651 KB 290227 KB > pswpin: 8 8 > pswpout: 430 510 > > There is no difference if we don't have cma reserved memory (0 MB case). > But, with cma reserved memory (512 MB case), we fully utilize these > reserved memory through this patch and the system behaves like as > it doesn't reserve any memory. > > With this patch, we aggressively allocate the pages on cma reserved memory > so latency of CMA can arise. Below is the experimental result about > latency. > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > CMA reserve: 512 MB > Backgound Workload: make -jN > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > N: 1 4 8 16 > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > So generally we can see latency increase. Ratio of this increase > is rather big - up to 70%. But, under the heavy workload, it shows > latency decrease - up to 55%. This may be worst-case scenario, but > reducing it would be important for some system, so, I can say that > this patch have advantages and disadvantages in terms of latency. > > Signed-off-by: Joonsoo Kim > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fac5509..3ff24d4 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -389,6 +389,12 @@ struct zone { > int compact_order_failed; > #endif > > +#ifdef CONFIG_CMA > + int has_cma; > + int nr_try_cma; > + int nr_try_movable; > +#endif Can you write documentation around this ? > + > ZONE_PADDING(_pad1_) > > /* Fields commonly accessed by the page reclaim scanner */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 674ade7..6f2b27b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > } > > #ifdef CONFIG_CMA > +void __init init_alloc_ratio_counter(struct zone *zone) > +{ > + if (zone->has_cma) > + return; > + > + zone->has_cma = 1; > + zone->nr_try_movable = 0; > + zone->nr_try_cma = 0; > +} > + > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > void __init init_cma_reserved_pageblock(struct page *page) > { > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > set_pageblock_migratetype(page, MIGRATE_CMA); > __free_pages(page, pageblock_order); > adjust_managed_page_count(page, pageblock_nr_pages); > + init_alloc_ratio_counter(page_zone(page)); > } > #endif > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > return NULL; > } > > +#ifdef CONFIG_CMA > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > + int migratetype) > +{ > + long free, free_cma, free_wmark; > + struct page *page; > + > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > + return NULL; > + > + if (zone->nr_try_movable) > + goto alloc_movable; > + > +alloc_cma: > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > + > + /* Reset ratio counter */ > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + /* No cma free pages, so recharge only movable allocation */ > + if (free_cma <= 0) { > + zone->nr_try_movable = pageblock_nr_pages; > + goto alloc_movable; > + } > + > + free = zone_page_state(zone, NR_FREE_PAGES); > + free_wmark = free - free_cma - high_wmark_pages(zone); > + > + /* > + * free_wmark is below than 0, and it means that normal pages > + * are under the pressure, so we recharge only cma allocation. > + */ > + if (free_wmark <= 0) { > + zone->nr_try_cma = pageblock_nr_pages; > + goto alloc_cma; > + } > + > + if (free_wmark > free_cma) { > + zone->nr_try_movable = > + (free_wmark * pageblock_nr_pages) / free_cma; > + zone->nr_try_cma = pageblock_nr_pages; > + } else { > + zone->nr_try_movable = pageblock_nr_pages; > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > + } Can you add the commit message documentation here. > + > + /* Reset complete, start on movable first */ > +alloc_movable: > + zone->nr_try_movable--; > + return NULL; > +} > +#endif > + > /* > * Do the hard work of removing an element from the buddy allocator. > * Call me with the zone->lock already held. > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > static struct page *__rmqueue(struct zone *zone, unsigned int order, > int migratetype) > { > - struct page *page; > + struct page *page = NULL; > + > + if (IS_ENABLED(CONFIG_CMA)) > + page = __rmqueue_cma(zone, order, migratetype); It would be better to move the migrate check here, So that it becomes /* For migrate movable allocation try cma area first */ if (IS_ENABLED(CONFIG_CMA) && (migratetype == MIGRATE_MOVABLE)) > > retry_reserve: > - page = __rmqueue_smallest(zone, order, migratetype); > + if (!page) > + page = __rmqueue_smallest(zone, order, migratetype); > > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { > page = __rmqueue_fallback(zone, order, migratetype); > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > zone_seqlock_init(zone); > zone->zone_pgdat = pgdat; > zone_pcp_init(zone); > + if (IS_ENABLED(CONFIG_CMA)) > + zone->has_cma = 0; > > /* For bootup, initialized properly in watermark setup */ > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > -- > 1.7.9.5 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754077AbaENJop (ORCPT ); Wed, 14 May 2014 05:44:45 -0400 Received: from e28smtp06.in.ibm.com ([122.248.162.6]:53536 "EHLO e28smtp06.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752330AbaENJoi (ORCPT ); Wed, 14 May 2014 05:44:38 -0400 From: "Aneesh Kumar K.V" To: Joonsoo Kim , Marek Szyprowski Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , "'Tomasz Stanislawski'" Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory In-Reply-To: <20140513022603.GF23803@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> User-Agent: Notmuch/0.18~rc0+2~gbc64cdc (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Wed, 14 May 2014 15:14:30 +0530 Message-ID: <8738gcae4h.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14051409-9574-0000-0000-00000D93926A Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Joonsoo Kim writes: > On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote: >> Hello, >> >> On 2014-05-08 02:32, Joonsoo Kim wrote: >> >This series tries to improve CMA. >> > >> >CMA is introduced to provide physically contiguous pages at runtime >> >without reserving memory area. But, current implementation works like as >> >reserving memory approach, because allocation on cma reserved region only >> >occurs as fallback of migrate_movable allocation. We can allocate from it >> >when there is no movable page. In that situation, kswapd would be invoked >> >easily since unmovable and reclaimable allocation consider >> >(free pages - free CMA pages) as free memory on the system and free memory >> >may be lower than high watermark in that case. If kswapd start to reclaim >> >memory, then fallback allocation doesn't occur much. >> > >> >In my experiment, I found that if system memory has 1024 MB memory and >> >has 512 MB reserved memory for CMA, kswapd is mostly invoked around >> >the 512MB free memory boundary. And invoked kswapd tries to make free >> >memory until (free pages - free CMA pages) is higher than high watermark, >> >so free memory on meminfo is moving around 512MB boundary consistently. >> > >> >To fix this problem, we should allocate the pages on cma reserved memory >> >more aggressively and intelligenetly. Patch 2 implements the solution. >> >Patch 1 is the simple optimization which remove useless re-trial and patch 3 >> >is for removing useless alloc flag, so these are not important. >> >See patch 2 for more detailed description. >> > >> >This patchset is based on v3.15-rc4. >> >> Thanks for posting those patches. It basically reminds me the >> following discussion: >> http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 >> >> Your approach is basically the same. I hope that your patches can be >> improved >> in such a way that they will be accepted by mm maintainers. I only >> wonder if the >> third patch is really necessary. Without it kswapd wakeup might be >> still avoided >> in some cases. > > Hello, > > Oh... I didn't know that patch and discussion, because I have no interest > on CMA at that time. Your approach looks similar to #1 > approach of mine and could have same problem of #1 approach which I mentioned > in patch 2/3. Please refer that patch description. :) IIUC that patch also interleave right ? +#ifdef CONFIG_CMA + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); + unsigned long nr_cma_free = zone_page_state(zone, NR_FREE_CMA_PAGES); + + if (migratetype == MIGRATE_MOVABLE && nr_cma_free && + nr_free - nr_cma_free < 2 * low_wmark_pages(zone)) + migratetype = MIGRATE_CMA; +#endif /* CONFIG_CMA */ That doesn't always prefer CMA region. It would be nice to understand why grouping in pageblock_nr_pages is beneficial. Also in your patch you decrement nr_try_cma for every 'order' allocation. Why ? + if (zone->nr_try_cma) { + /* Okay. Now, we can try to allocate the page from cma region */ + zone->nr_try_cma--; + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); + + /* CMA pages can vanish through CMA allocation */ + if (unlikely(!page && order == 0)) + zone->nr_try_cma = 0; + + return page; + } If we fail above MIGRATE_CMA alloc should we return failure ? Why not try MOVABLE allocation on failure (ie fallthrough the code path) ? > And, there is different purpose between this and yours. This patch is > intended to better use of CMA pages and so get maximum performance. > Just to not trigger oom, it can be possible to put this logic on reclaim path. > But that is sub-optimal to get higher performance, because it needs > migration in some cases. > > If second patch works as intended, there are just a few of cma free pages > when we are toward on the watermark. So benefit of third patch would > be marginal and we can remove ALLOC_CMA. > > Thanks. > -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751660AbaEOBuu (ORCPT ); Wed, 14 May 2014 21:50:50 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:50498 "EHLO lgemrelse6q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750962AbaEOBut (ORCPT ); Wed, 14 May 2014 21:50:49 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Thu, 15 May 2014 10:53:01 +0900 From: Joonsoo Kim To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515015301.GA10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140513030057.GC32092@bbox> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > Hey Joonsoo, > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > I love this idea but when I see the code, I don't like that. > In allocation path, just try to allocate pages by round-robin so it's role > of allocator. If one of migratetype is full, just pass mission to reclaimer > with hint(ie, Hey reclaimer, it's non-movable allocation fail > so there is pointless if you reclaim MIGRATE_CMA pages) so that > reclaimer can filter it out during page scanning. > We already have an tool to achieve it(ie, isolate_mode_t). Hello, I agree with leaving fast allocation path as simple as possible. I will remove runtime computation for determining ratio in __rmqueue_cma() and, instead, will use pre-computed value calculated on the other path. I am not sure that whether your second suggestion(Hey relaimer part) is good or not. In my quick thought, that could be helpful in the situation that many free cma pages remained. But, it would be not helpful when there are neither free movable and cma pages. In generally, most workloads mainly uses movable pages for page cache or anonymous mapping. Although reclaim is triggered by non-movable allocation failure, reclaimed pages are used mostly by movable allocation. We can handle these allocation request even if we reclaim the pages just in lru order. If we rotate the lru list for finding movable pages, it could cause more useful pages to be evicted. This is just my quick thought, so please let me correct if I am wrong. > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > If possible, it would be better becauser it's generic function to check > free pages and cause trigger reclaim/compaction logic. I guess, your *it* means ratio computation. Right? I don't like putting it on zone_watermark_ok(). Although it need to refer to free cma pages value which are also referred in zone_watermark_ok(), this computation is for determining ratio, not for triggering reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so putting this logic into zone_watermark_ok() looks not better to me. I will think better place to do it. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751730AbaEOB4b (ORCPT ); Wed, 14 May 2014 21:56:31 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:53136 "EHLO lgemrelse6q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751004AbaEOB4a (ORCPT ); Wed, 14 May 2014 21:56:30 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Thu, 15 May 2014 10:58:42 +0900 From: Joonsoo Kim To: "Aneesh Kumar K.V" Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515015842.GB10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <8761l8ah04.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8761l8ah04.fsf@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > CMA is introduced to provide physically contiguous pages at runtime. > > For this purpose, it reserves memory at boot time. Although it reserve > > memory, this reserved memory can be used for movable memory allocation > > request. This usecase is beneficial to the system that needs this CMA > > reserved memory infrequently and it is one of main purpose of > > introducing CMA. > > > > But, there is a problem in current implementation. The problem is that > > it works like as just reserved memory approach. The pages on cma reserved > > memory are hardly used for movable memory allocation. This is caused by > > combination of allocation and reclaim policy. > > > > The pages on cma reserved memory are allocated if there is no movable > > memory, that is, as fallback allocation. So the time this fallback > > allocation is started is under heavy memory pressure. Although it is under > > memory pressure, movable allocation easily succeed, since there would be > > many pages on cma reserved memory. But this is not the case for unmovable > > and reclaimable allocation, because they can't use the pages on cma > > reserved memory. These allocations regard system's free memory as > > (free pages - free cma pages) on watermark checking, that is, free > > unmovable pages + free reclaimable pages + free movable pages. Because > > we already exhausted movable pages, only free pages we have are unmovable > > and reclaimable types and this would be really small amount. So watermark > > checking would be failed. It will wake up kswapd to make enough free > > memory for unmovable and reclaimable allocation and kswapd will do. > > So before we fully utilize pages on cma reserved memory, kswapd start to > > reclaim memory and try to make free memory over the high watermark. This > > watermark checking by kswapd doesn't take care free cma pages so many > > movable pages would be reclaimed. After then, we have a lot of movable > > pages again, so fallback allocation doesn't happen again. To conclude, > > amount of free memory on meminfo which includes free CMA pages is moving > > around 512 MB if I reserve 512 MB memory for CMA. > > > Another issue i am facing with the current code is the atomic allocation > failing even with large number of CMA pages around. In my case we never > reclaimed because large part of the memory is consumed by the page cache and > for that, free memory check doesn't include at free_cma. I will test > with this patchset and update here once i have the results. > Hello, Could you elaborate more on your issue? I can't completely understand your problem. So your atomic allocation is movable? And although there are many free cma pages, that request is fail? > > > > I found this problem on following experiment. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > > > To solve this problem, I can think following 2 possible solutions. > > 1. allocate the pages on cma reserved memory first, and if they are > > exhausted, allocate movable pages. > > 2. interleaved allocation: try to allocate specific amounts of memory > > from cma reserved memory and then allocate from free movable memory. > > > > I tested #1 approach and found the problem. Although free memory on > > meminfo can move around low watermark, there is large fluctuation on free > > memory, because too many pages are reclaimed when kswapd is invoked. > > Reason for this behaviour is that successive allocated CMA pages are > > on the LRU list in that order and kswapd reclaim them in same order. > > These memory doesn't help watermark checking from kwapd, so too many > > pages are reclaimed, I guess. > > > > So, I implement #2 approach. > > One thing I should note is that we should not change allocation target > > (movable list or cma) on each allocation attempt, since this prevent > > allocated pages to be in physically succession, so some I/O devices can > > be hurt their performance. To solve this, I keep allocation target > > in at least pageblock_nr_pages attempts and make this number reflect > > ratio, free pages without free cma pages to free cma pages. With this > > approach, system works very smoothly and fully utilize the pages on > > cma reserved memory. > > > > Following is the experimental result of this patch. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.8 361.8 > > Average-MemFree: 283880 KB 530851 KB > > pswpin: 7 110064 > > pswpout: 452 767502 > > > > > > CMA reserve: 0 MB 512 MB > > Elapsed-time: 234.2 235.6 > > Average-MemFree: 281651 KB 290227 KB > > pswpin: 8 8 > > pswpout: 430 510 > > > > There is no difference if we don't have cma reserved memory (0 MB case). > > But, with cma reserved memory (512 MB case), we fully utilize these > > reserved memory through this patch and the system behaves like as > > it doesn't reserve any memory. > > > > With this patch, we aggressively allocate the pages on cma reserved memory > > so latency of CMA can arise. Below is the experimental result about > > latency. > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > CMA reserve: 512 MB > > Backgound Workload: make -jN > > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval > > > > N: 1 4 8 16 > > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 > > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 > > > > So generally we can see latency increase. Ratio of this increase > > is rather big - up to 70%. But, under the heavy workload, it shows > > latency decrease - up to 55%. This may be worst-case scenario, but > > reducing it would be important for some system, so, I can say that > > this patch have advantages and disadvantages in terms of latency. > > > > Signed-off-by: Joonsoo Kim > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index fac5509..3ff24d4 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -389,6 +389,12 @@ struct zone { > > int compact_order_failed; > > #endif > > > > +#ifdef CONFIG_CMA > > + int has_cma; > > + int nr_try_cma; > > + int nr_try_movable; > > +#endif > > > Can you write documentation around this ? > Okay. > > + > > ZONE_PADDING(_pad1_) > > > > /* Fields commonly accessed by the page reclaim scanner */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 674ade7..6f2b27b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order) > > } > > > > #ifdef CONFIG_CMA > > +void __init init_alloc_ratio_counter(struct zone *zone) > > +{ > > + if (zone->has_cma) > > + return; > > + > > + zone->has_cma = 1; > > + zone->nr_try_movable = 0; > > + zone->nr_try_cma = 0; > > +} > > + > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ > > void __init init_cma_reserved_pageblock(struct page *page) > > { > > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page) > > set_pageblock_migratetype(page, MIGRATE_CMA); > > __free_pages(page, pageblock_order); > > adjust_managed_page_count(page, pageblock_nr_pages); > > + init_alloc_ratio_counter(page_zone(page)); > > } > > #endif > > > > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > return NULL; > > } > > > > +#ifdef CONFIG_CMA > > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order, > > + int migratetype) > > +{ > > + long free, free_cma, free_wmark; > > + struct page *page; > > + > > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma) > > + return NULL; > > + > > + if (zone->nr_try_movable) > > + goto alloc_movable; > > + > > +alloc_cma: > > + if (zone->nr_try_cma) { > > + /* Okay. Now, we can try to allocate the page from cma region */ > > + zone->nr_try_cma--; > > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > > + > > + /* CMA pages can vanish through CMA allocation */ > > + if (unlikely(!page && order == 0)) > > + zone->nr_try_cma = 0; > > + > > + return page; > > + } > > + > > + /* Reset ratio counter */ > > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES); > > + > > + /* No cma free pages, so recharge only movable allocation */ > > + if (free_cma <= 0) { > > + zone->nr_try_movable = pageblock_nr_pages; > > + goto alloc_movable; > > + } > > + > > + free = zone_page_state(zone, NR_FREE_PAGES); > > + free_wmark = free - free_cma - high_wmark_pages(zone); > > + > > + /* > > + * free_wmark is below than 0, and it means that normal pages > > + * are under the pressure, so we recharge only cma allocation. > > + */ > > + if (free_wmark <= 0) { > > + zone->nr_try_cma = pageblock_nr_pages; > > + goto alloc_cma; > > + } > > + > > + if (free_wmark > free_cma) { > > + zone->nr_try_movable = > > + (free_wmark * pageblock_nr_pages) / free_cma; > > + zone->nr_try_cma = pageblock_nr_pages; > > + } else { > > + zone->nr_try_movable = pageblock_nr_pages; > > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark; > > + } > > Can you add the commit message documentation here. > Okay. > > + > > + /* Reset complete, start on movable first */ > > +alloc_movable: > > + zone->nr_try_movable--; > > + return NULL; > > +} > > +#endif > > + > > /* > > * Do the hard work of removing an element from the buddy allocator. > > * Call me with the zone->lock already held. > > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > static struct page *__rmqueue(struct zone *zone, unsigned int order, > > int migratetype) > > { > > - struct page *page; > > + struct page *page = NULL; > > + > > + if (IS_ENABLED(CONFIG_CMA)) > > + page = __rmqueue_cma(zone, order, migratetype); > > It would be better to move the migrate check here, So that it becomes > > /* For migrate movable allocation try cma area first */ > if (IS_ENABLED(CONFIG_CMA) && (migratetype == MIGRATE_MOVABLE)) > > Okay. But it makes no difference between current code and your suggestion, because __rmqueue_cma would be inlined by compiler optimization. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751681AbaEOCIn (ORCPT ); Wed, 14 May 2014 22:08:43 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:42148 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751003AbaEOCIm (ORCPT ); Wed, 14 May 2014 22:08:42 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Thu, 15 May 2014 11:10:55 +0900 From: Joonsoo Kim To: "Aneesh Kumar K.V" Cc: Marek Szyprowski , Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , "'Tomasz Stanislawski'" Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140515021055.GC10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> <8738gcae4h.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8738gcae4h.fsf@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 14, 2014 at 03:14:30PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote: > >> Hello, > >> > >> On 2014-05-08 02:32, Joonsoo Kim wrote: > >> >This series tries to improve CMA. > >> > > >> >CMA is introduced to provide physically contiguous pages at runtime > >> >without reserving memory area. But, current implementation works like as > >> >reserving memory approach, because allocation on cma reserved region only > >> >occurs as fallback of migrate_movable allocation. We can allocate from it > >> >when there is no movable page. In that situation, kswapd would be invoked > >> >easily since unmovable and reclaimable allocation consider > >> >(free pages - free CMA pages) as free memory on the system and free memory > >> >may be lower than high watermark in that case. If kswapd start to reclaim > >> >memory, then fallback allocation doesn't occur much. > >> > > >> >In my experiment, I found that if system memory has 1024 MB memory and > >> >has 512 MB reserved memory for CMA, kswapd is mostly invoked around > >> >the 512MB free memory boundary. And invoked kswapd tries to make free > >> >memory until (free pages - free CMA pages) is higher than high watermark, > >> >so free memory on meminfo is moving around 512MB boundary consistently. > >> > > >> >To fix this problem, we should allocate the pages on cma reserved memory > >> >more aggressively and intelligenetly. Patch 2 implements the solution. > >> >Patch 1 is the simple optimization which remove useless re-trial and patch 3 > >> >is for removing useless alloc flag, so these are not important. > >> >See patch 2 for more detailed description. > >> > > >> >This patchset is based on v3.15-rc4. > >> > >> Thanks for posting those patches. It basically reminds me the > >> following discussion: > >> http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524 > >> > >> Your approach is basically the same. I hope that your patches can be > >> improved > >> in such a way that they will be accepted by mm maintainers. I only > >> wonder if the > >> third patch is really necessary. Without it kswapd wakeup might be > >> still avoided > >> in some cases. > > > > Hello, > > > > Oh... I didn't know that patch and discussion, because I have no interest > > on CMA at that time. Your approach looks similar to #1 > > approach of mine and could have same problem of #1 approach which I mentioned > > in patch 2/3. Please refer that patch description. :) > > IIUC that patch also interleave right ? > > +#ifdef CONFIG_CMA > + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); > + unsigned long nr_cma_free = zone_page_state(zone, NR_FREE_CMA_PAGES); > + > + if (migratetype == MIGRATE_MOVABLE && nr_cma_free && > + nr_free - nr_cma_free < 2 * low_wmark_pages(zone)) > + migratetype = MIGRATE_CMA; > +#endif /* CONFIG_CMA */ Hello, This is not interleave in my point of view. This logic will allocate free movable pages until hitting 2 * low_wmark, and then allocate free cma pages. Interleave that I mean is something like round-robin policy with no constraint like above. > > That doesn't always prefer CMA region. It would be nice to > understand why grouping in pageblock_nr_pages is beneficial. Also in > your patch you decrement nr_try_cma for every 'order' allocation. Why ? pageblock_nr_pages is just magic value with no rationale. :) But we need grouping, because without it, we can't get physically contiguous pages. When we allocate the pages for page cache, readahead logic will try to allocate 32 pages. If we don't use grouping, disk I/O for these pages can't be handled by one I/O request on some devices. I'm not familiar to I/O device, please let me correct. And, yes, I will consider 'order' allocation when inc/dec nr_try_cma. > > + if (zone->nr_try_cma) { > + /* Okay. Now, we can try to allocate the page from cma region */ > + zone->nr_try_cma--; > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA); > + > + /* CMA pages can vanish through CMA allocation */ > + if (unlikely(!page && order == 0)) > + zone->nr_try_cma = 0; > + > + return page; > + } > > > If we fail above MIGRATE_CMA alloc should we return failure ? Why > not try MOVABLE allocation on failure (ie fallthrough the code path) ? This patch use fallthrough logic. If we fail on __rmqueue_cma(), it will go __rmqueue() as usual. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752770AbaEOCl0 (ORCPT ); Wed, 14 May 2014 22:41:26 -0400 Received: from lgeamrelo04.lge.com ([156.147.1.127]:61974 "EHLO lgeamrelo04.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751077AbaEOClZ (ORCPT ); Wed, 14 May 2014 22:41:25 -0400 X-Original-SENDERIP: 10.177.220.169 X-Original-MAILFROM: minchan@kernel.org Date: Thu, 15 May 2014 11:43:53 +0900 From: Minchan Kim To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515024353.GA27599@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140515015301.GA10116@js1304-P5Q-DELUXE> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > Hey Joonsoo, > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > CMA is introduced to provide physically contiguous pages at runtime. > > > For this purpose, it reserves memory at boot time. Although it reserve > > > memory, this reserved memory can be used for movable memory allocation > > > request. This usecase is beneficial to the system that needs this CMA > > > reserved memory infrequently and it is one of main purpose of > > > introducing CMA. > > > > > > But, there is a problem in current implementation. The problem is that > > > it works like as just reserved memory approach. The pages on cma reserved > > > memory are hardly used for movable memory allocation. This is caused by > > > combination of allocation and reclaim policy. > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > memory, that is, as fallback allocation. So the time this fallback > > > allocation is started is under heavy memory pressure. Although it is under > > > memory pressure, movable allocation easily succeed, since there would be > > > many pages on cma reserved memory. But this is not the case for unmovable > > > and reclaimable allocation, because they can't use the pages on cma > > > reserved memory. These allocations regard system's free memory as > > > (free pages - free cma pages) on watermark checking, that is, free > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > we already exhausted movable pages, only free pages we have are unmovable > > > and reclaimable types and this would be really small amount. So watermark > > > checking would be failed. It will wake up kswapd to make enough free > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > reclaim memory and try to make free memory over the high watermark. This > > > watermark checking by kswapd doesn't take care free cma pages so many > > > movable pages would be reclaimed. After then, we have a lot of movable > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > amount of free memory on meminfo which includes free CMA pages is moving > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > I found this problem on following experiment. > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > make -j24 > > > > > > CMA reserve: 0 MB 512 MB > > > Elapsed-time: 234.8 361.8 > > > Average-MemFree: 283880 KB 530851 KB > > > > > > To solve this problem, I can think following 2 possible solutions. > > > 1. allocate the pages on cma reserved memory first, and if they are > > > exhausted, allocate movable pages. > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > from cma reserved memory and then allocate from free movable memory. > > > > I love this idea but when I see the code, I don't like that. > > In allocation path, just try to allocate pages by round-robin so it's role > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > reclaimer can filter it out during page scanning. > > We already have an tool to achieve it(ie, isolate_mode_t). > > Hello, > > I agree with leaving fast allocation path as simple as possible. > I will remove runtime computation for determining ratio in > __rmqueue_cma() and, instead, will use pre-computed value calculated > on the other path. Sounds good. > > I am not sure that whether your second suggestion(Hey relaimer part) > is good or not. In my quick thought, that could be helpful in the > situation that many free cma pages remained. But, it would be not helpful > when there are neither free movable and cma pages. In generally, most > workloads mainly uses movable pages for page cache or anonymous mapping. > Although reclaim is triggered by non-movable allocation failure, reclaimed > pages are used mostly by movable allocation. We can handle these allocation > request even if we reclaim the pages just in lru order. If we rotate > the lru list for finding movable pages, it could cause more useful > pages to be evicted. > > This is just my quick thought, so please let me correct if I am wrong. Why should reclaimer reclaim unnecessary pages? So, your answer is that it would be better because upcoming newly allocated pages would be allocated easily without interrupt. But it could reclaim too much pages until watermark for unmovable allocation is okay. Even, sometime, you might see OOM. Moreover, how could you handle current trobule? For example, there is atomic allocation and the only thing to save the world is kswapd because it's one of kswapd role but kswapd is spending many time to reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > > If possible, it would be better becauser it's generic function to check > > free pages and cause trigger reclaim/compaction logic. > > I guess, your *it* means ratio computation. Right? I meant just get_page_from_freelist like fair zone allocation for consistency but as we discussed offline, i'm not against with you if it's not right place. > I don't like putting it on zone_watermark_ok(). Although it need to > refer to free cma pages value which are also referred in zone_watermark_ok(), > this computation is for determining ratio, not for triggering > reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so > putting this logic into zone_watermark_ok() looks not better to me. > > I will think better place to do it. Yeb, Thanks! > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752500AbaEOCpT (ORCPT ); Wed, 14 May 2014 22:45:19 -0400 Received: from mailout4.samsung.com ([203.254.224.34]:42066 "EHLO mailout4.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751279AbaEOCpR (ORCPT ); Wed, 14 May 2014 22:45:17 -0400 X-AuditID: cbfee68d-b7f4e6d000004845-d3-53742a3addd1 Message-id: <53742A4B.4090901@samsung.com> Date: Thu, 15 May 2014 11:45:31 +0900 From: Heesub Shin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-version: 1.0 To: Joonsoo Kim , Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> In-reply-to: <20140515015301.GA10116@js1304-P5Q-DELUXE> Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrMIsWRmVeSWpSXmKPExsVy+t8zLV0rrZJgg55jahZz1q9hs3h5SNNi 9SZfi4OzlzBZrOxuZrPY3jmD3eLyrjlsFvfW/Ge1WHvkLrvF5HfPGC0WHG9htbi//wGbxd8r 61kceD0Ov3nP7HG5r5fJY9OnSeweXW+vMHmcmPGbxWPdn1dMHu/3XWXz6NuyitFj8+lqj8+b 5AK4orhsUlJzMstSi/TtErgyrmxpYCmYqFvxc+V/xgbGrSpdjBwcEgImEnsPy3UxcgKZYhIX 7q1n62Lk4hASWMYoce77bSaIhInE5hU3WSAS0xklrrw5yQjhvGGUmNv/mhWkildAS+Lpv2Ns IDaLgKrE1qn9TCAb2AS0JQ5tCwYJiwpESNxrPAxVLijxY/I9FhBbRMBb4vazn2CbmQWOMkm8 Pr4LbLOwQIbEzON/oTafYZR4fPAJWAengLnEi+2tYEXMAtYSKydtY4Sw5SU2r3nLDNIgIbCQ Q2LDgVvMEBcJSHybfIgF4mdZiU0HmCFek5Q4uOIGywRGsVlIjpqFZOwsJGMXMDKvYhRNLUgu KE5KLzLUK07MLS7NS9dLzs/dxAiJ8N4djLcPWB9iTAZaOZFZSjQ5H5gg8kriDY3NjCxMTUyN jcwtzUgTVhLnTXqYFCQkkJ5YkpqdmlqQWhRfVJqTWnyIkYmDU6qBkdHQ0m75+/cMrp9Df7xY 6Fcid8Q06aPHBrcpisbuH7w+eb3viTMKKpn7/uvBK3tzOrhF7S4eFJgtq2P6ZF3T3WNc4mdn bL6pOEtm3fclVbxSjTvM7Zb2pfdlSS+L/jf7TknDoaimlcbM65aL5975NMG4UPnNrrDuBgMW zk7BxTmT7/JkcZv/VmIpzkg01GIuKk4EAC+WeM0GAwAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrIKsWRmVeSWpSXmKPExsVy+t9jQV0rrZJgg2Mn5S3mrF/DZvHykKbF 6k2+FgdnL2GyWNndzGaxvXMGu8XlXXPYLO6t+c9qsfbIXXaLye+eMVosON7CanF//wM2i79X 1rM48HocfvOe2eNyXy+Tx6ZPk9g9ut5eYfI4MeM3i8e6P6+YPN7vu8rm0bdlFaPH5tPVHp83 yQVwRTUw2mSkJqakFimk5iXnp2TmpdsqeQfHO8ebmhkY6hpaWpgrKeQl5qbaKrn4BOi6ZeYA 3a6kUJaYUwoUCkgsLlbSt8M0ITTETdcCpjFC1zckCK7HyAANJKxjzLiypYGlYKJuxc+V/xkb GLeqdDFyckgImEhsXnGTBcIWk7hwbz1bFyMXh5DAdEaJK29OMkI4bxgl5va/ZgWp4hXQknj6 7xgbiM0ioCqxdWo/UxcjBwebgLbEoW3BIGFRgQiJe42HocoFJX5Mvge2QETAW+L2s59gC5gF jjJJvD6+iwkkISyQITHz+F8WiGVnGCUeH3wC1sEpYC7xYnsrWBGzgLXEyknbGCFseYnNa94y T2AUmIVkySwkZbOQlC1gZF7FKJpakFxQnJSea6hXnJhbXJqXrpecn7uJEZxAnkntYFzZYHGI UYCDUYmHl2FycbAQa2JZcWXuIUYJDmYlEV4TxpJgId6UxMqq1KL8+KLSnNTiQ4zJwCCYyCwl mpwPTG55JfGGxiZmRpZGZsYm5sbGpAkrifMeaLUOFBJITyxJzU5NLUgtgtnCxMEp1cB4lFn1 zu99x18uUZpdcudTgVXmqt2JnFVXJqybUK8xZ4qXeW/+jF5r3fXl0luvRR+J+vjqZidHDZNk fdqrgzotIi+Xb9Da/mnfokfrMlbkbTNJjjs3XVblX335pfiw1FL3eZsm8BTwTQxv/F/TXXf7 mVn9spbFSxrTmtRuKelfSYw6zbiyiD9FiaU4I9FQi7moOBEAwVJoVmQDAAA= DLP-Filter: Pass X-MTR: 20000000000000000@CPGS X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On 05/15/2014 10:53 AM, Joonsoo Kim wrote: > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: >> Hey Joonsoo, >> >> On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: >>> CMA is introduced to provide physically contiguous pages at runtime. >>> For this purpose, it reserves memory at boot time. Although it reserve >>> memory, this reserved memory can be used for movable memory allocation >>> request. This usecase is beneficial to the system that needs this CMA >>> reserved memory infrequently and it is one of main purpose of >>> introducing CMA. >>> >>> But, there is a problem in current implementation. The problem is that >>> it works like as just reserved memory approach. The pages on cma reserved >>> memory are hardly used for movable memory allocation. This is caused by >>> combination of allocation and reclaim policy. >>> >>> The pages on cma reserved memory are allocated if there is no movable >>> memory, that is, as fallback allocation. So the time this fallback >>> allocation is started is under heavy memory pressure. Although it is under >>> memory pressure, movable allocation easily succeed, since there would be >>> many pages on cma reserved memory. But this is not the case for unmovable >>> and reclaimable allocation, because they can't use the pages on cma >>> reserved memory. These allocations regard system's free memory as >>> (free pages - free cma pages) on watermark checking, that is, free >>> unmovable pages + free reclaimable pages + free movable pages. Because >>> we already exhausted movable pages, only free pages we have are unmovable >>> and reclaimable types and this would be really small amount. So watermark >>> checking would be failed. It will wake up kswapd to make enough free >>> memory for unmovable and reclaimable allocation and kswapd will do. >>> So before we fully utilize pages on cma reserved memory, kswapd start to >>> reclaim memory and try to make free memory over the high watermark. This >>> watermark checking by kswapd doesn't take care free cma pages so many >>> movable pages would be reclaimed. After then, we have a lot of movable >>> pages again, so fallback allocation doesn't happen again. To conclude, >>> amount of free memory on meminfo which includes free CMA pages is moving >>> around 512 MB if I reserve 512 MB memory for CMA. >>> >>> I found this problem on following experiment. >>> >>> 4 CPUs, 1024 MB, VIRTUAL MACHINE >>> make -j24 >>> >>> CMA reserve: 0 MB 512 MB >>> Elapsed-time: 234.8 361.8 >>> Average-MemFree: 283880 KB 530851 KB >>> >>> To solve this problem, I can think following 2 possible solutions. >>> 1. allocate the pages on cma reserved memory first, and if they are >>> exhausted, allocate movable pages. >>> 2. interleaved allocation: try to allocate specific amounts of memory >>> from cma reserved memory and then allocate from free movable memory. >> >> I love this idea but when I see the code, I don't like that. >> In allocation path, just try to allocate pages by round-robin so it's role >> of allocator. If one of migratetype is full, just pass mission to reclaimer >> with hint(ie, Hey reclaimer, it's non-movable allocation fail >> so there is pointless if you reclaim MIGRATE_CMA pages) so that >> reclaimer can filter it out during page scanning. >> We already have an tool to achieve it(ie, isolate_mode_t). > > Hello, > > I agree with leaving fast allocation path as simple as possible. > I will remove runtime computation for determining ratio in > __rmqueue_cma() and, instead, will use pre-computed value calculated > on the other path. > > I am not sure that whether your second suggestion(Hey relaimer part) > is good or not. In my quick thought, that could be helpful in the > situation that many free cma pages remained. But, it would be not helpful > when there are neither free movable and cma pages. In generally, most > workloads mainly uses movable pages for page cache or anonymous mapping. > Although reclaim is triggered by non-movable allocation failure, reclaimed > pages are used mostly by movable allocation. We can handle these allocation > request even if we reclaim the pages just in lru order. If we rotate > the lru list for finding movable pages, it could cause more useful > pages to be evicted. > > This is just my quick thought, so please let me correct if I am wrong. We have an out of tree implementation that is completely the same with the approach Minchan said and it works, but it has definitely some side-effects as you pointed, distorting the LRU and evicting hot pages. I do not attach code fragments in this thread for some reasons, but it must be easy for yourself. I am wondering if it could help also in your case. Thanks, Heesub > >> >> And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? >> If possible, it would be better becauser it's generic function to check >> free pages and cause trigger reclaim/compaction logic. > > I guess, your *it* means ratio computation. Right? > I don't like putting it on zone_watermark_ok(). Although it need to > refer to free cma pages value which are also referred in zone_watermark_ok(), > this computation is for determining ratio, not for triggering > reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so > putting this logic into zone_watermark_ok() looks not better to me. > > I will think better place to do it. > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752173AbaEOFEA (ORCPT ); Thu, 15 May 2014 01:04:00 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:33197 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751162AbaEOFD7 (ORCPT ); Thu, 15 May 2014 01:03:59 -0400 X-Original-SENDERIP: 10.177.220.169 X-Original-MAILFROM: minchan@kernel.org Date: Thu, 15 May 2014 14:06:27 +0900 From: Minchan Kim To: Heesub Shin Cc: Joonsoo Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140515050627.GB27599@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <53742A4B.4090901@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53742A4B.4090901@samsung.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Heesub, On Thu, May 15, 2014 at 11:45:31AM +0900, Heesub Shin wrote: > Hello, > > On 05/15/2014 10:53 AM, Joonsoo Kim wrote: > >On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > >>Hey Joonsoo, > >> > >>On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > >>>CMA is introduced to provide physically contiguous pages at runtime. > >>>For this purpose, it reserves memory at boot time. Although it reserve > >>>memory, this reserved memory can be used for movable memory allocation > >>>request. This usecase is beneficial to the system that needs this CMA > >>>reserved memory infrequently and it is one of main purpose of > >>>introducing CMA. > >>> > >>>But, there is a problem in current implementation. The problem is that > >>>it works like as just reserved memory approach. The pages on cma reserved > >>>memory are hardly used for movable memory allocation. This is caused by > >>>combination of allocation and reclaim policy. > >>> > >>>The pages on cma reserved memory are allocated if there is no movable > >>>memory, that is, as fallback allocation. So the time this fallback > >>>allocation is started is under heavy memory pressure. Although it is under > >>>memory pressure, movable allocation easily succeed, since there would be > >>>many pages on cma reserved memory. But this is not the case for unmovable > >>>and reclaimable allocation, because they can't use the pages on cma > >>>reserved memory. These allocations regard system's free memory as > >>>(free pages - free cma pages) on watermark checking, that is, free > >>>unmovable pages + free reclaimable pages + free movable pages. Because > >>>we already exhausted movable pages, only free pages we have are unmovable > >>>and reclaimable types and this would be really small amount. So watermark > >>>checking would be failed. It will wake up kswapd to make enough free > >>>memory for unmovable and reclaimable allocation and kswapd will do. > >>>So before we fully utilize pages on cma reserved memory, kswapd start to > >>>reclaim memory and try to make free memory over the high watermark. This > >>>watermark checking by kswapd doesn't take care free cma pages so many > >>>movable pages would be reclaimed. After then, we have a lot of movable > >>>pages again, so fallback allocation doesn't happen again. To conclude, > >>>amount of free memory on meminfo which includes free CMA pages is moving > >>>around 512 MB if I reserve 512 MB memory for CMA. > >>> > >>>I found this problem on following experiment. > >>> > >>>4 CPUs, 1024 MB, VIRTUAL MACHINE > >>>make -j24 > >>> > >>>CMA reserve: 0 MB 512 MB > >>>Elapsed-time: 234.8 361.8 > >>>Average-MemFree: 283880 KB 530851 KB > >>> > >>>To solve this problem, I can think following 2 possible solutions. > >>>1. allocate the pages on cma reserved memory first, and if they are > >>> exhausted, allocate movable pages. > >>>2. interleaved allocation: try to allocate specific amounts of memory > >>> from cma reserved memory and then allocate from free movable memory. > >> > >>I love this idea but when I see the code, I don't like that. > >>In allocation path, just try to allocate pages by round-robin so it's role > >>of allocator. If one of migratetype is full, just pass mission to reclaimer > >>with hint(ie, Hey reclaimer, it's non-movable allocation fail > >>so there is pointless if you reclaim MIGRATE_CMA pages) so that > >>reclaimer can filter it out during page scanning. > >>We already have an tool to achieve it(ie, isolate_mode_t). > > > >Hello, > > > >I agree with leaving fast allocation path as simple as possible. > >I will remove runtime computation for determining ratio in > >__rmqueue_cma() and, instead, will use pre-computed value calculated > >on the other path. > > > >I am not sure that whether your second suggestion(Hey relaimer part) > >is good or not. In my quick thought, that could be helpful in the > >situation that many free cma pages remained. But, it would be not helpful > >when there are neither free movable and cma pages. In generally, most > >workloads mainly uses movable pages for page cache or anonymous mapping. > >Although reclaim is triggered by non-movable allocation failure, reclaimed > >pages are used mostly by movable allocation. We can handle these allocation > >request even if we reclaim the pages just in lru order. If we rotate > >the lru list for finding movable pages, it could cause more useful > >pages to be evicted. > > > >This is just my quick thought, so please let me correct if I am wrong. > > We have an out of tree implementation that is completely the same > with the approach Minchan said and it works, but it has definitely > some side-effects as you pointed, distorting the LRU and evicting > hot pages. I do not attach code fragments in this thread for some Actually, I discussed with Joonsoo to solve such corner case in future if someone report it but you did it now. Thanks! LRU churning is a general problem, not CMA specific although CMA would make worse more agressively so I'd like to handle it another topic(ie, patchset) The reason we did rotate them back to LRU head was just to avoid scanning repeat overhead of one reclaim cycle so one of idea I can think of is that we can put a reclaim cursor into LRU tail right before reclaim cycle and start scanning from the cursor and update the cursor position on every scanning cycle. Of course, we should rotate filtered out pages back to LRU's tail, not head but with cursor, we can skip pointless pages which was already scanned by this reclaim cycle. The cursor should be removed when the reclaim cycle would be done so if next reclaim happens, cursor will start from the beginning so it could make unecessary scanning again until reaching the proper victim page so CPU usage would be higher but it's better than evicting working set. Another idea? > reasons, but it must be easy for yourself. I am wondering if it > could help also in your case. > > Thanks, > Heesub > > > > >> > >>And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > >>If possible, it would be better becauser it's generic function to check > >>free pages and cause trigger reclaim/compaction logic. > > > >I guess, your *it* means ratio computation. Right? > >I don't like putting it on zone_watermark_ok(). Although it need to > >refer to free cma pages value which are also referred in zone_watermark_ok(), > >this computation is for determining ratio, not for triggering > >reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so > >putting this logic into zone_watermark_ok() looks not better to me. > > > >I will think better place to do it. > > > >Thanks. > > > >-- > >To unsubscribe, send a message with 'unsubscribe linux-mm' in > >the body to majordomo@kvack.org. For more info on Linux MM, > >see: http://www.linux-mm.org/ . > >Don't email: email@kvack.org > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753814AbaEOJr3 (ORCPT ); Thu, 15 May 2014 05:47:29 -0400 Received: from cantor2.suse.de ([195.135.220.15]:43598 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751905AbaEOJr1 (ORCPT ); Thu, 15 May 2014 05:47:27 -0400 Date: Thu, 15 May 2014 10:47:18 +0100 From: Mel Gorman To: Joonsoo Kim Cc: "Aneesh Kumar K.V" , Marek Szyprowski , Andrew Morton , Rik van Riel , Johannes Weiner , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , "'Tomasz Stanislawski'" Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140515094718.GE23991@suse.de> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> <8738gcae4h.fsf@linux.vnet.ibm.com> <20140515021055.GC10116@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140515021055.GC10116@js1304-P5Q-DELUXE> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 15, 2014 at 11:10:55AM +0900, Joonsoo Kim wrote: > > That doesn't always prefer CMA region. It would be nice to > > understand why grouping in pageblock_nr_pages is beneficial. Also in > > your patch you decrement nr_try_cma for every 'order' allocation. Why ? > > pageblock_nr_pages is just magic value with no rationale. :) I'm not following this discussions closely but there is rational to that value -- it's the size of a huge page for that architecture. At the time the fragmentation avoidance was implemented this was the largest allocation size of interest. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755920AbaEPICl (ORCPT ); Fri, 16 May 2014 04:02:41 -0400 Received: from lgeamrelo04.lge.com ([156.147.1.127]:59313 "EHLO lgeamrelo04.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754353AbaEPICi (ORCPT ); Fri, 16 May 2014 04:02:38 -0400 X-Original-SENDERIP: 10.178.33.69 X-Original-MAILFROM: gioh.kim@lge.com Message-ID: <5375C619.8010501@lge.com> Date: Fri, 16 May 2014 17:02:33 +0900 From: Gioh Kim User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Joonsoo Kim , Minchan Kim CC: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> In-Reply-To: <20140515015301.GA10116@js1304-P5Q-DELUXE> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I've been trying to apply CMA into my platform. USB host driver generated kernel panic like below when USB mouse is connected, because I turned on CMA and set the CMA_SIZE_MBYTES value into zero by mistake. I think the panic is cuased by atomic_pool in arch/arm/mm/dma-mapping.c. Zero CMA_SIZE_MBYTES value skips CMA initialization and then atomic_pool is not initialized also because __alloc_from_contiguous is failed in atomic_pool_init(). If CMA_SIZE_MBYTES_MAX is allowed to be zero, there should be defense code to check CMA is initlaized correctly. And atomic_pool initialization should be done by __alloc_remap_buffer instead of __alloc_from_contiguous if __alloc_from_contiguous is failed. IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX configuration to be larger than zero. [ 1.474523] ------------[ cut here ]------------ [ 1.479150] WARNING: at arch/arm/mm/dma-mapping.c:496 __dma_alloc.isra.19+0x1b8/0x1e0() [ 1.487160] coherent pool not initialised! [ 1.491249] Modules linked in: [ 1.494317] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.10.19+ #55 [ 1.500521] [<80013e20>] (unwind_backtrace+0x0/0xf8) from [<80011c60>] (show_stack+0x10/0x14) [ 1.509064] [<80011c60>] (show_stack+0x10/0x14) from [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) [ 1.518038] [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) from [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) [ 1.527616] [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) from [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) [ 1.537282] [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) from [<80017d7c>] (arm_dma_alloc+0x90/0x98) [ 1.546608] [<80017d7c>] (arm_dma_alloc+0x90/0x98) from [<8034a860>] (ohci_init+0x1b0/0x278) [ 1.555062] [<8034a860>] (ohci_init+0x1b0/0x278) from [<80332b0c>] (usb_add_hcd+0x184/0x5b8) [ 1.563500] [<80332b0c>] (usb_add_hcd+0x184/0x5b8) from [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) [ 1.572729] [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) from [<802f196c>] (platform_drv_probe+0x14/0x18) [ 1.582401] [<802f196c>] (platform_drv_probe+0x14/0x18) from [<802f0714>] (driver_probe_device+0x6c/0x1f8) [ 1.592064] [<802f0714>] (driver_probe_device+0x6c/0x1f8) from [<802f092c>] (__driver_attach+0x8c/0x90) [ 1.601465] [<802f092c>] (__driver_attach+0x8c/0x90) from [<802eeec8>] (bus_for_each_dev+0x54/0x88) [ 1.610518] [<802eeec8>] (bus_for_each_dev+0x54/0x88) from [<802efef0>] (bus_add_driver+0xd8/0x230) [ 1.619572] [<802efef0>] (bus_add_driver+0xd8/0x230) from [<802f0de4>] (driver_register+0x78/0x14c) [ 1.628632] [<802f0de4>] (driver_register+0x78/0x14c) from [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) [ 1.637859] [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) from [<8000879c>] (do_one_initcall+0xec/0x14c) [ 1.647088] [<8000879c>] (do_one_initcall+0xec/0x14c) from [<806dab30>] (kernel_init_freeable+0x150/0x220) [ 1.656754] [<806dab30>] (kernel_init_freeable+0x150/0x220) from [<80509f54>] (kernel_init+0x8/0xf8) [ 1.665895] [<80509f54>] (kernel_init+0x8/0xf8) from [<8000e398>] (ret_from_fork+0x14/0x3c) [ 1.674264] ---[ end trace 6f1857db5ef45cb9 ]--- [ 1.678880] ohci-platform ohci-platform.0: can't setup [ 1.684027] ohci-platform ohci-platform.0: USB bus 1 deregistered [ 1.690362] ohci-platform: probe of ohci-platform.0 failed with error -12 [ 1.697188] ohci-platform ohci-platform.1: Generic Platform OHCI Controller [ 1.704365] ohci-platform ohci-platform.1: new USB bus registered, assigned bus number 1 [ 1.712457] ------------[ cut here ]------------ [ 1.717096] WARNING: at arch/arm/mm/dma-mapping.c:496 __dma_alloc.isra.19+0x1b8/0x1e0() [ 1.725105] coherent pool not initialised! [ 1.729194] Modules linked in: [ 1.732247] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 3.10.19+ #55 [ 1.739404] [<80013e20>] (unwind_backtrace+0x0/0xf8) from [<80011c60>] (show_stack+0x10/0x14) [ 1.747949] [<80011c60>] (show_stack+0x10/0x14) from [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) [ 1.756923] [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) from [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) [ 1.766502] [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) from [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) [ 1.776168] [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) from [<80017d7c>] (arm_dma_alloc+0x90/0x98) [ 1.785484] [<80017d7c>] (arm_dma_alloc+0x90/0x98) from [<8034a860>] (ohci_init+0x1b0/0x278) [ 1.793933] [<8034a860>] (ohci_init+0x1b0/0x278) from [<80332b0c>] (usb_add_hcd+0x184/0x5b8) [ 1.802370] [<80332b0c>] (usb_add_hcd+0x184/0x5b8) from [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) [ 1.811597] [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) from [<802f196c>] (platform_drv_probe+0x14/0x18) [ 1.821263] [<802f196c>] (platform_drv_probe+0x14/0x18) from [<802f0714>] (driver_probe_device+0x6c/0x1f8) [ 1.830926] [<802f0714>] (driver_probe_device+0x6c/0x1f8) from [<802f092c>] (__driver_attach+0x8c/0x90) [ 1.840326] [<802f092c>] (__driver_attach+0x8c/0x90) from [<802eeec8>] (bus_for_each_dev+0x54/0x88) [ 1.849379] [<802eeec8>] (bus_for_each_dev+0x54/0x88) from [<802efef0>] (bus_add_driver+0xd8/0x230) [ 1.858432] [<802efef0>] (bus_add_driver+0xd8/0x230) from [<802f0de4>] (driver_register+0x78/0x14c) [ 1.867488] [<802f0de4>] (driver_register+0x78/0x14c) from [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) [ 1.876714] [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) from [<8000879c>] (do_one_initcall+0xec/0x14c) [ 1.885940] [<8000879c>] (do_one_initcall+0xec/0x14c) from [<806dab30>] (kernel_init_freeable+0x150/0x220) [ 1.895601] [<806dab30>] (kernel_init_freeable+0x150/0x220) from [<80509f54>] (kernel_init+0x8/0xf8) [ 1.904741] [<80509f54>] (kernel_init+0x8/0xf8) from [<8000e398>] (ret_from_fork+0x14/0x3c) [ 1.913085] ---[ end trace 6f1857db5ef45cba ]--- I'm adding my patch to restrict CMA_SIZE_MBYTES. This patch is based on 3.15.0-rc5 -------------------------------- 8< -------------------------------------- From 9f8e6d3c1f4bdeeeb7af3df7898b773a612c62e8 Mon Sep 17 00:00:00 2001 From: Gioh Kim Date: Fri, 16 May 2014 16:15:43 +0900 Subject: [PATCH] drivers/base/Kconfig: restrict CMA size to non-zero value The size of CMA area must be larger than zero. If the size is zero, CMA canno be initialized. Signed-off-by: Gioh Kim --- drivers/base/Kconfig | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 4b7b452..19b3578 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -222,13 +222,18 @@ config DMA_CMA if DMA_CMA comment "Default contiguous memory area size:" +config CMA_SIZE_MBYTES_MAX + int + default 1024 + config CMA_SIZE_MBYTES int "Size in Mega Bytes" depends on !CMA_SIZE_SEL_PERCENTAGE + range 1 CMA_SIZE_MBYTES_MAX default 16 help Defines the size (in MiB) of the default memory area for Contiguous - Memory Allocator. + Memory Allocator. This value must be larger than zero. config CMA_SIZE_PERCENTAGE int "Percentage of total memory" -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751894AbaERRgT (ORCPT ); Sun, 18 May 2014 13:36:19 -0400 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:49502 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751570AbaERRgR (ORCPT ); Sun, 18 May 2014 13:36:17 -0400 From: "Aneesh Kumar K.V" To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used In-Reply-To: <20140515015842.GB10116@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <8761l8ah04.fsf@linux.vnet.ibm.com> <20140515015842.GB10116@js1304-P5Q-DELUXE> User-Agent: Notmuch/0.18~rc0+2~gbc64cdc (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Sun, 18 May 2014 23:06:08 +0530 Message-ID: <87lhtzng53.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14051817-3864-0000-0000-00000E457A06 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Joonsoo Kim writes: > On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote: >> Joonsoo Kim writes: >> >> >> >> Another issue i am facing with the current code is the atomic allocation >> failing even with large number of CMA pages around. In my case we never >> reclaimed because large part of the memory is consumed by the page cache and >> for that, free memory check doesn't include at free_cma. I will test >> with this patchset and update here once i have the results. >> > > Hello, > > Could you elaborate more on your issue? > I can't completely understand your problem. > So your atomic allocation is movable? And although there are many free > cma pages, that request is fail? > non movable atomic allocations are failing because we don't have anything other than CMA pages left and kswapd is yet to catchup ? swapper/0: page allocation failure: order:0, mode:0x20 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.23-1500.pkvm2_1.5.ppc64 #1 Call Trace: [c000000ffffcb610] [c000000000017330] .show_stack+0x130/0x200 (unreliable) [c000000ffffcb6e0] [c00000000087a8c8] .dump_stack+0x28/0x3c [c000000ffffcb750] [c0000000001e06f0] .warn_alloc_failed+0x110/0x160 [c000000ffffcb800] [c0000000001e5984] .__alloc_pages_nodemask+0x9d4/0xbf0 [c000000ffffcb9e0] [c00000000023775c] .alloc_pages_current+0xcc/0x1b0 [c000000ffffcba80] [c0000000007098d4] .__netdev_alloc_frag+0x1a4/0x1d0 [c000000ffffcbb20] [c00000000070d750] .__netdev_alloc_skb+0xc0/0x130 [c000000ffffcbbb0] [d000000009639b40] .tg3_poll_work+0x900/0x1110 [tg3] [c000000ffffcbd10] [d00000000963a3a4] .tg3_poll_msix+0x54/0x200 [tg3] [c000000ffffcbdb0] [c00000000071fcec] .net_rx_action+0x1dc/0x310 [c000000ffffcbe90] [c0000000000c1b08] .__do_softirq+0x158/0x330 [c000000ffffcbf90] [c000000000025744] .call_do_softirq+0x14/0x24 [c000000ffffc7e00] [c000000000011684] .do_softirq+0xf4/0x130 [c000000ffffc7e90] [c0000000000c1f18] .irq_exit+0xc8/0x110 [c000000ffffc7f10] [c000000000011258] .__do_irq+0xc8/0x1f0 [c000000ffffc7f90] [c000000000025768] .call_do_irq+0x14/0x24 [c00000000137b750] [c00000000001142c] .do_IRQ+0xac/0x130 [c00000000137b800] [c000000000002a64] hardware_interrupt_common+0x164/0x180 .... Node 0 DMA: 408*64kB (C) 408*128kB (C) 408*256kB (C) 408*512kB (C) 408*1024kB (C) 406*2048kB (C) 199*4096kB (C) 97*8192kB (C) 6*16384kB (C) = 3348992kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16384kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB meminfo details: MemTotal: 65875584 kB MemFree: 8001856 kB Buffers: 49330368 kB Cached: 178752 kB SwapCached: 0 kB Active: 28550464 kB Inactive: 25476416 kB Active(anon): 3771008 kB Inactive(anon): 767360 kB Active(file): 24779456 kB Inactive(file): 24709056 kB Unevictable: 15104 kB Mlocked: 15104 kB SwapTotal: 8384448 kB SwapFree: 8384448 kB Dirty: 0 kB -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752656AbaESBrQ (ORCPT ); Sun, 18 May 2014 21:47:16 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:53994 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752447AbaESBrP (ORCPT ); Sun, 18 May 2014 21:47:15 -0400 X-Original-SENDERIP: 10.178.33.69 X-Original-MAILFROM: gioh.kim@lge.com Message-ID: <537962A0.4090600@lge.com> Date: Mon, 19 May 2014 10:47:12 +0900 From: Gioh Kim User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Michal Nazarewicz , Joonsoo Kim , Minchan Kim CC: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thank you for your advice. I didn't notice it. I'm adding followings according to your advice: - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE* I think this can prevent the wrong kernel option. - change size_cmdline into default value SZ_16M I am not sure this can prevent if cma=0 cmdline option is also with base and limit options. I don't know how to send the second patch. Please pardon me that I just copy the patch here. --------------------------------- 8< ------------------------------------- From c283eaac41b044a2abb11cfd32a60fff034633c3 Mon Sep 17 00:00:00 2001 From: Gioh Kim Date: Fri, 16 May 2014 16:15:43 +0900 Subject: [PATCH] drivers/base/Kconfig: restrict CMA size to non-zero value The size of CMA area must be larger than zero. If the size is zero, all physically-contiguous allocation can be failed. Signed-off-by: Gioh Kim --- drivers/base/Kconfig | 14 ++++++++++++-- drivers/base/dma-contiguous.c | 3 ++- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 4b7b452..a7292ac 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -222,17 +222,27 @@ config DMA_CMA if DMA_CMA comment "Default contiguous memory area size:" +config CMA_SIZE_MBYTES_DEFAULT + int + default 16 + +config CMA_SIZE_MBYTES_MAX + int + default 1024 + config CMA_SIZE_MBYTES int "Size in Mega Bytes" depends on !CMA_SIZE_SEL_PERCENTAGE - default 16 + range 1 CMA_SIZE_MBYTES_MAX + default CMA_SIZE_MBYTES_DEFAULT help Defines the size (in MiB) of the default memory area for Contiguous - Memory Allocator. + Memory Allocator. This value must be larger than zero. config CMA_SIZE_PERCENTAGE int "Percentage of total memory" depends on !CMA_SIZE_SEL_MBYTES + range 1 100 default 10 help Defines the size of the default memory area for Contiguous Memory diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index b056661..5b70442 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -125,7 +125,8 @@ void __init dma_contiguous_reserve(phys_addr_t limit) pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit); if (size_cmdline != -1) { - selected_size = size_cmdline; + selected_size = ((size_cmdline == 0) ? + CONFIG_CMA_SIZE_MBYTES_DEFAULT : size_cmdline); selected_base = base_cmdline; selected_limit = min_not_zero(limit_cmdline, limit); if (base_cmdline + size_cmdline == limit_cmdline) -- 1.7.9.5 2014-05-17 오전 2:45, Michal Nazarewicz 쓴 글: > On Fri, May 16 2014, Gioh Kim wrote: >> If CMA_SIZE_MBYTES is allowed to be zero, there should be defense code >> to check CMA is initlaized correctly. And atomic_pool initialization >> should be done by __alloc_remap_buffer instead of >> __alloc_from_contiguous if __alloc_from_contiguous is failed. > > Agreed, and this is the correct fix. > >> IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX >> configuration to be larger than zero. > > No, because it makes it impossible to have CMA disabled by default and > only enabled if command line argument is given. > > Furthermore, your patch does *not* guarantee CMA region to always be > allocated. If CMA_SIZE_SEL_PERCENTAGE is selected for instance. Or if > user explicitly passes 0 on command line. > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752406AbaESCIz (ORCPT ); Sun, 18 May 2014 22:08:55 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:38628 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751046AbaESCIy (ORCPT ); Sun, 18 May 2014 22:08:54 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Mon, 19 May 2014 11:11:21 +0900 From: Joonsoo Kim To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519021121.GA19615@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140515024353.GA27599@bbox> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > Hey Joonsoo, > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > memory, this reserved memory can be used for movable memory allocation > > > > request. This usecase is beneficial to the system that needs this CMA > > > > reserved memory infrequently and it is one of main purpose of > > > > introducing CMA. > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > memory are hardly used for movable memory allocation. This is caused by > > > > combination of allocation and reclaim policy. > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > memory, that is, as fallback allocation. So the time this fallback > > > > allocation is started is under heavy memory pressure. Although it is under > > > > memory pressure, movable allocation easily succeed, since there would be > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > and reclaimable allocation, because they can't use the pages on cma > > > > reserved memory. These allocations regard system's free memory as > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > and reclaimable types and this would be really small amount. So watermark > > > > checking would be failed. It will wake up kswapd to make enough free > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > reclaim memory and try to make free memory over the high watermark. This > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > I found this problem on following experiment. > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > make -j24 > > > > > > > > CMA reserve: 0 MB 512 MB > > > > Elapsed-time: 234.8 361.8 > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > exhausted, allocate movable pages. > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > I love this idea but when I see the code, I don't like that. > > > In allocation path, just try to allocate pages by round-robin so it's role > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > reclaimer can filter it out during page scanning. > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > Hello, > > > > I agree with leaving fast allocation path as simple as possible. > > I will remove runtime computation for determining ratio in > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > on the other path. > > Sounds good. > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > is good or not. In my quick thought, that could be helpful in the > > situation that many free cma pages remained. But, it would be not helpful > > when there are neither free movable and cma pages. In generally, most > > workloads mainly uses movable pages for page cache or anonymous mapping. > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > pages are used mostly by movable allocation. We can handle these allocation > > request even if we reclaim the pages just in lru order. If we rotate > > the lru list for finding movable pages, it could cause more useful > > pages to be evicted. > > > > This is just my quick thought, so please let me correct if I am wrong. > > Why should reclaimer reclaim unnecessary pages? > So, your answer is that it would be better because upcoming newly allocated > pages would be allocated easily without interrupt. But it could reclaim > too much pages until watermark for unmovable allocation is okay. > Even, sometime, you might see OOM. > > Moreover, how could you handle current trobule? > For example, there is atomic allocation and the only thing to save the world > is kswapd because it's one of kswapd role but kswapd is spending many time to > reclaim CMA pages, which is pointless so the allocation would be easily failed. Hello, I guess that it isn't the problem. In lru, movable pages and cma pages would be interleaved. So it doesn't takes too long time to get the page for non-movable allocation. IMHO, in generally, memory shortage is made by movable allocation, so to distinguish allocation type and to handle them differently has marginal effect. Anyway, I will think more deeply. > > > > > > > > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > > > If possible, it would be better becauser it's generic function to check > > > free pages and cause trigger reclaim/compaction logic. > > > > I guess, your *it* means ratio computation. Right? > > I meant just get_page_from_freelist like fair zone allocation for consistency > but as we discussed offline, i'm not against with you if it's not right place. Okay :) Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752524AbaESCKI (ORCPT ); Sun, 18 May 2014 22:10:08 -0400 Received: from lgeamrelo04.lge.com ([156.147.1.127]:54392 "EHLO lgeamrelo04.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752324AbaESCKH (ORCPT ); Sun, 18 May 2014 22:10:07 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Mon, 19 May 2014 11:12:34 +0900 From: Joonsoo Kim To: Mel Gorman Cc: "Aneesh Kumar K.V" , Marek Szyprowski , Andrew Morton , Rik van Riel , Johannes Weiner , Laura Abbott , Minchan Kim , Heesub Shin , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kyungmin Park , Bartlomiej Zolnierkiewicz , "'Tomasz Stanislawski'" Subject: Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Message-ID: <20140519021234.GB19615@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <536CCC78.6050806@samsung.com> <20140513022603.GF23803@js1304-P5Q-DELUXE> <8738gcae4h.fsf@linux.vnet.ibm.com> <20140515021055.GC10116@js1304-P5Q-DELUXE> <20140515094718.GE23991@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140515094718.GE23991@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 15, 2014 at 10:47:18AM +0100, Mel Gorman wrote: > On Thu, May 15, 2014 at 11:10:55AM +0900, Joonsoo Kim wrote: > > > That doesn't always prefer CMA region. It would be nice to > > > understand why grouping in pageblock_nr_pages is beneficial. Also in > > > your patch you decrement nr_try_cma for every 'order' allocation. Why ? > > > > pageblock_nr_pages is just magic value with no rationale. :) > > I'm not following this discussions closely but there is rational to that > value -- it's the size of a huge page for that architecture. At the time > the fragmentation avoidance was implemented this was the largest allocation > size of interest. Hello, Indeed. There is a such good rationale. Really thanks for informing it. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752536AbaESC05 (ORCPT ); Sun, 18 May 2014 22:26:57 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:47850 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752467AbaESC0z (ORCPT ); Sun, 18 May 2014 22:26:55 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Mon, 19 May 2014 11:29:23 +0900 From: Joonsoo Kim To: "Aneesh Kumar K.V" Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Laura Abbott , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519022922.GC19615@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <8761l8ah04.fsf@linux.vnet.ibm.com> <20140515015842.GB10116@js1304-P5Q-DELUXE> <87lhtzng53.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lhtzng53.fsf@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 18, 2014 at 11:06:08PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote: > >> Joonsoo Kim writes: > >> > >> > >> > >> Another issue i am facing with the current code is the atomic allocation > >> failing even with large number of CMA pages around. In my case we never > >> reclaimed because large part of the memory is consumed by the page cache and > >> for that, free memory check doesn't include at free_cma. I will test > >> with this patchset and update here once i have the results. > >> > > > > Hello, > > > > Could you elaborate more on your issue? > > I can't completely understand your problem. > > So your atomic allocation is movable? And although there are many free > > cma pages, that request is fail? > > > > non movable atomic allocations are failing because we don't have > anything other than CMA pages left and kswapd is yet to catchup ? > > > swapper/0: page allocation failure: order:0, mode:0x20 > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.23-1500.pkvm2_1.5.ppc64 #1 > Call Trace: > [c000000ffffcb610] [c000000000017330] .show_stack+0x130/0x200 (unreliable) > [c000000ffffcb6e0] [c00000000087a8c8] .dump_stack+0x28/0x3c > [c000000ffffcb750] [c0000000001e06f0] .warn_alloc_failed+0x110/0x160 > [c000000ffffcb800] [c0000000001e5984] .__alloc_pages_nodemask+0x9d4/0xbf0 > [c000000ffffcb9e0] [c00000000023775c] .alloc_pages_current+0xcc/0x1b0 > [c000000ffffcba80] [c0000000007098d4] .__netdev_alloc_frag+0x1a4/0x1d0 > [c000000ffffcbb20] [c00000000070d750] .__netdev_alloc_skb+0xc0/0x130 > [c000000ffffcbbb0] [d000000009639b40] .tg3_poll_work+0x900/0x1110 [tg3] > [c000000ffffcbd10] [d00000000963a3a4] .tg3_poll_msix+0x54/0x200 [tg3] > [c000000ffffcbdb0] [c00000000071fcec] .net_rx_action+0x1dc/0x310 > [c000000ffffcbe90] [c0000000000c1b08] .__do_softirq+0x158/0x330 > [c000000ffffcbf90] [c000000000025744] .call_do_softirq+0x14/0x24 > [c000000ffffc7e00] [c000000000011684] .do_softirq+0xf4/0x130 > [c000000ffffc7e90] [c0000000000c1f18] .irq_exit+0xc8/0x110 > [c000000ffffc7f10] [c000000000011258] .__do_irq+0xc8/0x1f0 > [c000000ffffc7f90] [c000000000025768] .call_do_irq+0x14/0x24 > [c00000000137b750] [c00000000001142c] .do_IRQ+0xac/0x130 > [c00000000137b800] [c000000000002a64] > hardware_interrupt_common+0x164/0x180 > > .... > > > Node 0 DMA: 408*64kB (C) 408*128kB (C) 408*256kB (C) 408*512kB (C) 408*1024kB (C) 406*2048kB (C) 199*4096kB (C) 97*8192kB (C) 6*16384kB (C) = > 3348992kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16384kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB > > meminfo details: > > MemTotal: 65875584 kB > MemFree: 8001856 kB > Buffers: 49330368 kB > Cached: 178752 kB > SwapCached: 0 kB > Active: 28550464 kB > Inactive: 25476416 kB > Active(anon): 3771008 kB > Inactive(anon): 767360 kB > Active(file): 24779456 kB > Inactive(file): 24709056 kB > Unevictable: 15104 kB > Mlocked: 15104 kB > SwapTotal: 8384448 kB > SwapFree: 8384448 kB > Dirty: 0 kB > > -aneesh > Hello, I think that third patch in this patchset would solve this problem. Your problem may occur in following scenario. 1. Unmovable, reclaimable page are nearly empty. 2. There are some movable pages, so watermark checking is ok. 3. A lot of movable allocations are requested. 4. Most of movable pages are allocated. 5. But, watermark checking is still ok, because we have a lot of free cma pages and this allocation is for movable type. No waking up kswapd. 6. non-movable atomic allocation request => fail So, the problem is in step #5. Althoght we have enough pages for movable type, we should prepare allocation request for the others. With my third patch, kswapd could be woken by movable allocation, so your problem would disappreared. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752650AbaESCu2 (ORCPT ); Sun, 18 May 2014 22:50:28 -0400 Received: from lgeamrelo01.lge.com ([156.147.1.125]:57287 "EHLO lgeamrelo01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752370AbaESCu1 (ORCPT ); Sun, 18 May 2014 22:50:27 -0400 X-Original-SENDERIP: 10.177.220.169 X-Original-MAILFROM: minchan@kernel.org Date: Mon, 19 May 2014 11:53:05 +0900 From: Minchan Kim To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519025305.GA13248@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519021121.GA19615@js1304-P5Q-DELUXE> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > Hey Joonsoo, > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > reserved memory infrequently and it is one of main purpose of > > > > > introducing CMA. > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > combination of allocation and reclaim policy. > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > reserved memory. These allocations regard system's free memory as > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > make -j24 > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > Elapsed-time: 234.8 361.8 > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > exhausted, allocate movable pages. > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > reclaimer can filter it out during page scanning. > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > Hello, > > > > > > I agree with leaving fast allocation path as simple as possible. > > > I will remove runtime computation for determining ratio in > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > on the other path. > > > > Sounds good. > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > is good or not. In my quick thought, that could be helpful in the > > > situation that many free cma pages remained. But, it would be not helpful > > > when there are neither free movable and cma pages. In generally, most > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > pages are used mostly by movable allocation. We can handle these allocation > > > request even if we reclaim the pages just in lru order. If we rotate > > > the lru list for finding movable pages, it could cause more useful > > > pages to be evicted. > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > Why should reclaimer reclaim unnecessary pages? > > So, your answer is that it would be better because upcoming newly allocated > > pages would be allocated easily without interrupt. But it could reclaim > > too much pages until watermark for unmovable allocation is okay. > > Even, sometime, you might see OOM. > > > > Moreover, how could you handle current trobule? > > For example, there is atomic allocation and the only thing to save the world > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > Hello, > > I guess that it isn't the problem. In lru, movable pages and cma pages > would be interleaved. So it doesn't takes too long time to get the > page for non-movable allocation. Please, don't assume there are ideal LRU ordering. Newly allocated page by fairness allocation is located by head of LRU while old pages are approaching the tail so there is huge time gab. During the time, old pages could be dropped/promoting so one of side could be filled with one type rather than interleaving both types pages you expected. Additionally, if you uses syncable backed device like ramdisk/zram or something, pageout can be synchronized with page I/O. In this case, reclaim time wouldn't be trivial than async I/O. For exmaple, zram-swap case, it needs page copy + comperssion and the speed depends on your CPU speed. > > IMHO, in generally, memory shortage is made by movable allocation, so > to distinguish allocation type and to handle them differently has > marginal effect. Again, please don't think workloads you know only and open the various possiblity from the design although such consideration doesn't make code ugly. > > Anyway, I will think more deeply. Yes, Please. > > > > > > > > > > > > > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA? > > > > If possible, it would be better becauser it's generic function to check > > > > free pages and cause trigger reclaim/compaction logic. > > > > > > I guess, your *it* means ratio computation. Right? > > > > I meant just get_page_from_freelist like fair zone allocation for consistency > > but as we discussed offline, i'm not against with you if it's not right place. > > Okay :) > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750808AbaESEre (ORCPT ); Mon, 19 May 2014 00:47:34 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:42072 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750727AbaESErd (ORCPT ); Mon, 19 May 2014 00:47:33 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Mon, 19 May 2014 13:50:01 +0900 From: Joonsoo Kim To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519045001.GA23916@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> <20140519025305.GA13248@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519025305.GA13248@bbox> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote: > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > > Hey Joonsoo, > > > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > > reserved memory infrequently and it is one of main purpose of > > > > > > introducing CMA. > > > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > > combination of allocation and reclaim policy. > > > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > > reserved memory. These allocations regard system's free memory as > > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > > make -j24 > > > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > > Elapsed-time: 234.8 361.8 > > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > > exhausted, allocate movable pages. > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > > reclaimer can filter it out during page scanning. > > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > > > Hello, > > > > > > > > I agree with leaving fast allocation path as simple as possible. > > > > I will remove runtime computation for determining ratio in > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > > on the other path. > > > > > > Sounds good. > > > > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > > is good or not. In my quick thought, that could be helpful in the > > > > situation that many free cma pages remained. But, it would be not helpful > > > > when there are neither free movable and cma pages. In generally, most > > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > > pages are used mostly by movable allocation. We can handle these allocation > > > > request even if we reclaim the pages just in lru order. If we rotate > > > > the lru list for finding movable pages, it could cause more useful > > > > pages to be evicted. > > > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > > > Why should reclaimer reclaim unnecessary pages? > > > So, your answer is that it would be better because upcoming newly allocated > > > pages would be allocated easily without interrupt. But it could reclaim > > > too much pages until watermark for unmovable allocation is okay. > > > Even, sometime, you might see OOM. > > > > > > Moreover, how could you handle current trobule? > > > For example, there is atomic allocation and the only thing to save the world > > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > Hello, > > > > I guess that it isn't the problem. In lru, movable pages and cma pages > > would be interleaved. So it doesn't takes too long time to get the > > page for non-movable allocation. > > Please, don't assume there are ideal LRU ordering. > Newly allocated page by fairness allocation is located by head of LRU > while old pages are approaching the tail so there is huge time gab. > During the time, old pages could be dropped/promoting so one of side > could be filled with one type rather than interleaving both types pages > you expected. I assumed general case, not ideal case. Your example can be possible, but would be corner case. > > Additionally, if you uses syncable backed device like ramdisk/zram > or something, pageout can be synchronized with page I/O. > In this case, reclaim time wouldn't be trivial than async I/O. > For exmaple, zram-swap case, it needs page copy + comperssion and > the speed depends on your CPU speed. This is a general problem what zram-swap have, although reclaiming cma pages worse the situation. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751897AbaESFxA (ORCPT ); Mon, 19 May 2014 01:53:00 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:43654 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750738AbaESFw7 (ORCPT ); Mon, 19 May 2014 01:52:59 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Mon, 19 May 2014 14:55:27 +0900 From: Joonsoo Kim To: Gioh Kim Cc: Michal Nazarewicz , Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?utf-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value Message-ID: <20140519055527.GA24099@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <537962A0.4090600@lge.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 19, 2014 at 10:47:12AM +0900, Gioh Kim wrote: > Thank you for your advice. I didn't notice it. > > I'm adding followings according to your advice: > > - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE* > I think this can prevent the wrong kernel option. > > - change size_cmdline into default value SZ_16M > I am not sure this can prevent if cma=0 cmdline option is also with base and limit options. Hello, I think that this problem is originated from atomic_pool_init(). If configured coherent_pool size is larger than default cma size, it can be failed even if this patch is applied. How about below patch? It uses fallback allocation if CMA is failed. Thanks. -----------------8<--------------------- diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 6b00be1..2909ab9 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void) unsigned long *bitmap; struct page *page; struct page **pages; - void *ptr; + void *ptr = NULL; int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long); bitmap = kzalloc(bitmap_size, GFP_KERNEL); @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void) if (IS_ENABLED(CONFIG_DMA_CMA)) ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page, atomic_pool_init); - else + if (!ptr) ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page, atomic_pool_init); if (ptr) { From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753947AbaESJOZ (ORCPT ); Mon, 19 May 2014 05:14:25 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:58277 "EHLO lgemrelse6q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753846AbaESJOV (ORCPT ); Mon, 19 May 2014 05:14:21 -0400 X-Original-SENDERIP: 10.178.33.69 X-Original-MAILFROM: gioh.kim@lge.com Message-ID: <5379CB66.7090607@lge.com> Date: Mon, 19 May 2014 18:14:14 +0900 From: Gioh Kim User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Joonsoo Kim CC: Michal Nazarewicz , Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> In-Reply-To: <20140519055527.GA24099@js1304-P5Q-DELUXE> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In __dma_alloc function, your patch can make __alloc_from_pool work. But __alloc_from_contiguous doesn't work. Therefore __dma_alloc sometimes works and sometimes not according to the gfp(__GFP_WAIT) flag. Do I understand correctly? I think __dma_alloc should work consistently. Both of __alloc_from_contiguous and __alloc_from_pool should work together, or both of them do not work. 2014-05-19 오후 2:55, Joonsoo Kim 쓴 글: > On Mon, May 19, 2014 at 10:47:12AM +0900, Gioh Kim wrote: >> Thank you for your advice. I didn't notice it. >> >> I'm adding followings according to your advice: >> >> - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE* >> I think this can prevent the wrong kernel option. >> >> - change size_cmdline into default value SZ_16M >> I am not sure this can prevent if cma=0 cmdline option is also with base and limit options. > > Hello, > > I think that this problem is originated from atomic_pool_init(). > If configured coherent_pool size is larger than default cma size, > it can be failed even if this patch is applied. > > How about below patch? > It uses fallback allocation if CMA is failed. > > Thanks. > > -----------------8<--------------------- > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index 6b00be1..2909ab9 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void) > unsigned long *bitmap; > struct page *page; > struct page **pages; > - void *ptr; > + void *ptr = NULL; > int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long); > > bitmap = kzalloc(bitmap_size, GFP_KERNEL); > @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void) > if (IS_ENABLED(CONFIG_DMA_CMA)) > ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page, > atomic_pool_init); > - else > + if (!ptr) > ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page, > atomic_pool_init); > if (ptr) { > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752163AbaESXQQ (ORCPT ); Mon, 19 May 2014 19:16:16 -0400 Received: from lgeamrelo04.lge.com ([156.147.1.127]:52264 "EHLO lgeamrelo04.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751117AbaESXQO (ORCPT ); Mon, 19 May 2014 19:16:14 -0400 X-Original-SENDERIP: 10.177.220.169 X-Original-MAILFROM: minchan@kernel.org Date: Tue, 20 May 2014 08:18:59 +0900 From: Minchan Kim To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519231859.GA21636@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> <20140519025305.GA13248@bbox> <20140519045001.GA23916@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519045001.GA23916@js1304-P5Q-DELUXE> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 19, 2014 at 01:50:01PM +0900, Joonsoo Kim wrote: > On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote: > > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > > > Hey Joonsoo, > > > > > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > > > reserved memory infrequently and it is one of main purpose of > > > > > > > introducing CMA. > > > > > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > > > combination of allocation and reclaim policy. > > > > > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > > > reserved memory. These allocations regard system's free memory as > > > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > > > make -j24 > > > > > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > > > Elapsed-time: 234.8 361.8 > > > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > > > exhausted, allocate movable pages. > > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > > > reclaimer can filter it out during page scanning. > > > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > > > > > Hello, > > > > > > > > > > I agree with leaving fast allocation path as simple as possible. > > > > > I will remove runtime computation for determining ratio in > > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > > > on the other path. > > > > > > > > Sounds good. > > > > > > > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > > > is good or not. In my quick thought, that could be helpful in the > > > > > situation that many free cma pages remained. But, it would be not helpful > > > > > when there are neither free movable and cma pages. In generally, most > > > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > > > pages are used mostly by movable allocation. We can handle these allocation > > > > > request even if we reclaim the pages just in lru order. If we rotate > > > > > the lru list for finding movable pages, it could cause more useful > > > > > pages to be evicted. > > > > > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > > > > > Why should reclaimer reclaim unnecessary pages? > > > > So, your answer is that it would be better because upcoming newly allocated > > > > pages would be allocated easily without interrupt. But it could reclaim > > > > too much pages until watermark for unmovable allocation is okay. > > > > Even, sometime, you might see OOM. > > > > > > > > Moreover, how could you handle current trobule? > > > > For example, there is atomic allocation and the only thing to save the world > > > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > > > Hello, > > > > > > I guess that it isn't the problem. In lru, movable pages and cma pages > > > would be interleaved. So it doesn't takes too long time to get the > > > page for non-movable allocation. > > > > Please, don't assume there are ideal LRU ordering. > > Newly allocated page by fairness allocation is located by head of LRU > > while old pages are approaching the tail so there is huge time gab. > > During the time, old pages could be dropped/promoting so one of side > > could be filled with one type rather than interleaving both types pages > > you expected. > > I assumed general case, not ideal case. > Your example can be possible, but would be corner case. I talked with Joonsoo yesterday and should post our conclusion for other reviewers/maintainers. It's not a corner case and it could happen depending on zone and CMA configuration. For example, there is 330M high zone and CMA consumes 300M in the space while normal movable area consumes just 30M. In the case, unmovable allocation could make too many unnecessary reclaiming of the zone so the conclusion we reached is to need target reclaiming(ex, isolate_mode_t). But not sure it should be part of this patchset because this patchset is surely enhance(ie, before, it was hard to allocate page from CMA area but this patchset makes it works) but this patchset could make mentioned problem as side-effect so I think we could solve the issue(ie, too many reclaiming in unbalanced zone) in another patchset. Joonsoo, please mention this problem in the description when you respin so other MM guys can notice that and give ideas, which would be helpful a lot. > > > > > Additionally, if you uses syncable backed device like ramdisk/zram > > or something, pageout can be synchronized with page I/O. > > In this case, reclaim time wouldn't be trivial than async I/O. > > For exmaple, zram-swap case, it needs page copy + comperssion and > > the speed depends on your CPU speed. > > This is a general problem what zram-swap have, > although reclaiming cma pages worse the situation. > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751837AbaESXTb (ORCPT ); Mon, 19 May 2014 19:19:31 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:59242 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750758AbaESXTa (ORCPT ); Mon, 19 May 2014 19:19:30 -0400 X-Original-SENDERIP: 10.177.220.169 X-Original-MAILFROM: minchan@kernel.org Date: Tue, 20 May 2014 08:22:15 +0900 From: Minchan Kim To: Heesub Shin Cc: Joonsoo Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140519232215.GB21636@bbox> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <53742A4B.4090901@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53742A4B.4090901@samsung.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 15, 2014 at 11:45:31AM +0900, Heesub Shin wrote: > Hello, > > On 05/15/2014 10:53 AM, Joonsoo Kim wrote: > >On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > >>Hey Joonsoo, > >> > >>On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > >>>CMA is introduced to provide physically contiguous pages at runtime. > >>>For this purpose, it reserves memory at boot time. Although it reserve > >>>memory, this reserved memory can be used for movable memory allocation > >>>request. This usecase is beneficial to the system that needs this CMA > >>>reserved memory infrequently and it is one of main purpose of > >>>introducing CMA. > >>> > >>>But, there is a problem in current implementation. The problem is that > >>>it works like as just reserved memory approach. The pages on cma reserved > >>>memory are hardly used for movable memory allocation. This is caused by > >>>combination of allocation and reclaim policy. > >>> > >>>The pages on cma reserved memory are allocated if there is no movable > >>>memory, that is, as fallback allocation. So the time this fallback > >>>allocation is started is under heavy memory pressure. Although it is under > >>>memory pressure, movable allocation easily succeed, since there would be > >>>many pages on cma reserved memory. But this is not the case for unmovable > >>>and reclaimable allocation, because they can't use the pages on cma > >>>reserved memory. These allocations regard system's free memory as > >>>(free pages - free cma pages) on watermark checking, that is, free > >>>unmovable pages + free reclaimable pages + free movable pages. Because > >>>we already exhausted movable pages, only free pages we have are unmovable > >>>and reclaimable types and this would be really small amount. So watermark > >>>checking would be failed. It will wake up kswapd to make enough free > >>>memory for unmovable and reclaimable allocation and kswapd will do. > >>>So before we fully utilize pages on cma reserved memory, kswapd start to > >>>reclaim memory and try to make free memory over the high watermark. This > >>>watermark checking by kswapd doesn't take care free cma pages so many > >>>movable pages would be reclaimed. After then, we have a lot of movable > >>>pages again, so fallback allocation doesn't happen again. To conclude, > >>>amount of free memory on meminfo which includes free CMA pages is moving > >>>around 512 MB if I reserve 512 MB memory for CMA. > >>> > >>>I found this problem on following experiment. > >>> > >>>4 CPUs, 1024 MB, VIRTUAL MACHINE > >>>make -j24 > >>> > >>>CMA reserve: 0 MB 512 MB > >>>Elapsed-time: 234.8 361.8 > >>>Average-MemFree: 283880 KB 530851 KB > >>> > >>>To solve this problem, I can think following 2 possible solutions. > >>>1. allocate the pages on cma reserved memory first, and if they are > >>> exhausted, allocate movable pages. > >>>2. interleaved allocation: try to allocate specific amounts of memory > >>> from cma reserved memory and then allocate from free movable memory. > >> > >>I love this idea but when I see the code, I don't like that. > >>In allocation path, just try to allocate pages by round-robin so it's role > >>of allocator. If one of migratetype is full, just pass mission to reclaimer > >>with hint(ie, Hey reclaimer, it's non-movable allocation fail > >>so there is pointless if you reclaim MIGRATE_CMA pages) so that > >>reclaimer can filter it out during page scanning. > >>We already have an tool to achieve it(ie, isolate_mode_t). > > > >Hello, > > > >I agree with leaving fast allocation path as simple as possible. > >I will remove runtime computation for determining ratio in > >__rmqueue_cma() and, instead, will use pre-computed value calculated > >on the other path. > > > >I am not sure that whether your second suggestion(Hey relaimer part) > >is good or not. In my quick thought, that could be helpful in the > >situation that many free cma pages remained. But, it would be not helpful > >when there are neither free movable and cma pages. In generally, most > >workloads mainly uses movable pages for page cache or anonymous mapping. > >Although reclaim is triggered by non-movable allocation failure, reclaimed > >pages are used mostly by movable allocation. We can handle these allocation > >request even if we reclaim the pages just in lru order. If we rotate > >the lru list for finding movable pages, it could cause more useful > >pages to be evicted. > > > >This is just my quick thought, so please let me correct if I am wrong. > > We have an out of tree implementation that is completely the same > with the approach Minchan said and it works, but it has definitely > some side-effects as you pointed, distorting the LRU and evicting > hot pages. I do not attach code fragments in this thread for some > reasons, but it must be easy for yourself. I am wondering if it > could help also in your case. > > Thanks, > Heesub Heesub, To be sure, did you try round-robin allocate like Joonsoo's approach and happend such LRU churning problem? -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752582AbaETAuU (ORCPT ); Mon, 19 May 2014 20:50:20 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:59718 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752397AbaETAuS (ORCPT ); Mon, 19 May 2014 20:50:18 -0400 X-Original-SENDERIP: 10.178.33.69 X-Original-MAILFROM: gioh.kim@lge.com Message-ID: <537AA6C7.1040506@lge.com> Date: Tue, 20 May 2014 09:50:15 +0900 From: Gioh Kim User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Michal Nazarewicz , Joonsoo Kim CC: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2014-05-20 오전 4:59, Michal Nazarewicz 쓴 글: > On Sun, May 18 2014, Joonsoo Kim wrote: >> I think that this problem is originated from atomic_pool_init(). >> If configured coherent_pool size is larger than default cma size, >> it can be failed even if this patch is applied. The coherent_pool size (atomic_pool.size) should be restricted smaller than cma size. This is another issue, however I think the default atomic pool size is too small. Only one port of USB host needs at most 256Kbytes coherent memory (according to the USB host spec). If a platform has several ports, it needs more than 1MB. Therefore the default atomic pool size should be at least 1MB. >> >> How about below patch? >> It uses fallback allocation if CMA is failed. > > Yes, I thought about it, but __dma_alloc uses similar code: > > else if (!IS_ENABLED(CONFIG_DMA_CMA)) > addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller); > else > addr = __alloc_from_contiguous(dev, size, prot, &page, caller); > > so it probably needs to be changed as well. If CMA option is not selected, __alloc_from_contiguous would not be called. We don't need to the fallback allocation. And if CMA option is selected and initialized correctly, the cma allocation can fail in case of no-CMA-memory situation. I thinks in that case we don't need to the fallback allocation also, because it is normal case. Therefore I think the restriction of CMA size option and make CMA work can cover every cases. I think below patch is also good choice. If both of you, Michal and Joonsoo, do not agree with me, please inform me. I will make a patch including option restriction and fallback allocation. > >> -----------------8<--------------------- >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c >> index 6b00be1..2909ab9 100644 >> --- a/arch/arm/mm/dma-mapping.c >> +++ b/arch/arm/mm/dma-mapping.c >> @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void) >> unsigned long *bitmap; >> struct page *page; >> struct page **pages; >> - void *ptr; >> + void *ptr = NULL; >> int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long); >> >> bitmap = kzalloc(bitmap_size, GFP_KERNEL); >> @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void) >> if (IS_ENABLED(CONFIG_DMA_CMA)) >> ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page, >> atomic_pool_init); >> - else >> + if (!ptr) >> ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page, >> atomic_pool_init); >> if (ptr) { >> > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753203AbaETC1A (ORCPT ); Mon, 19 May 2014 22:27:00 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:35545 "EHLO lgemrelse6q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751331AbaETC07 (ORCPT ); Mon, 19 May 2014 22:26:59 -0400 X-Original-SENDERIP: 10.178.33.69 X-Original-MAILFROM: gioh.kim@lge.com Message-ID: <537ABD6F.9090608@lge.com> Date: Tue, 20 May 2014 11:26:55 +0900 From: Gioh Kim User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Michal Nazarewicz , Joonsoo Kim CC: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2014-05-20 오전 10:28, Michal Nazarewicz 쓴 글: > On Mon, May 19 2014, Gioh Kim wrote: >> If CMA option is not selected, __alloc_from_contiguous would not be >> called. We don't need to the fallback allocation. >> >> And if CMA option is selected and initialized correctly, >> the cma allocation can fail in case of no-CMA-memory situation. >> I thinks in that case we don't need to the fallback allocation also, >> because it is normal case. >> >> Therefore I think the restriction of CMA size option and make CMA work >> can cover every cases. > > Wait, you just wrote that if CMA is not initialised correctly, it's fine > for atomic pool initialisation to fail, but if CMA size is initialised > correctly but too small, this is somehow worse situation? I'm a bit > confused to be honest. I'm sorry to confuse you. Please forgive my poor English. My point is atomic_pool should be able to work with/without CMA. > > IMO, cma=0 command line argument should be supported, as should having > the default CMA size zero. If CMA size is set to zero, kernel should > behave as if CMA was not enabled at compile time. It's also good if atomic_pool can work well with zero CMA size. I can give up my patch. But Joonsoo's patch should be applied. Joonsoo, can you please send the full patch to maintainers? > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751825AbaETGbM (ORCPT ); Tue, 20 May 2014 02:31:12 -0400 Received: from lgeamrelo04.lge.com ([156.147.1.127]:61762 "EHLO lgeamrelo04.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750703AbaETGbL (ORCPT ); Tue, 20 May 2014 02:31:11 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Tue, 20 May 2014 15:33:42 +0900 From: Joonsoo Kim To: Minchan Kim Cc: Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Nazarewicz , Heesub Shin , Mel Gorman , Johannes Weiner , Marek Szyprowski Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140520063342.GA8315@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <20140515024353.GA27599@bbox> <20140519021121.GA19615@js1304-P5Q-DELUXE> <20140519025305.GA13248@bbox> <20140519045001.GA23916@js1304-P5Q-DELUXE> <20140519231859.GA21636@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140519231859.GA21636@bbox> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 20, 2014 at 08:18:59AM +0900, Minchan Kim wrote: > On Mon, May 19, 2014 at 01:50:01PM +0900, Joonsoo Kim wrote: > > On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote: > > > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote: > > > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote: > > > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote: > > > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote: > > > > > > > Hey Joonsoo, > > > > > > > > > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote: > > > > > > > > CMA is introduced to provide physically contiguous pages at runtime. > > > > > > > > For this purpose, it reserves memory at boot time. Although it reserve > > > > > > > > memory, this reserved memory can be used for movable memory allocation > > > > > > > > request. This usecase is beneficial to the system that needs this CMA > > > > > > > > reserved memory infrequently and it is one of main purpose of > > > > > > > > introducing CMA. > > > > > > > > > > > > > > > > But, there is a problem in current implementation. The problem is that > > > > > > > > it works like as just reserved memory approach. The pages on cma reserved > > > > > > > > memory are hardly used for movable memory allocation. This is caused by > > > > > > > > combination of allocation and reclaim policy. > > > > > > > > > > > > > > > > The pages on cma reserved memory are allocated if there is no movable > > > > > > > > memory, that is, as fallback allocation. So the time this fallback > > > > > > > > allocation is started is under heavy memory pressure. Although it is under > > > > > > > > memory pressure, movable allocation easily succeed, since there would be > > > > > > > > many pages on cma reserved memory. But this is not the case for unmovable > > > > > > > > and reclaimable allocation, because they can't use the pages on cma > > > > > > > > reserved memory. These allocations regard system's free memory as > > > > > > > > (free pages - free cma pages) on watermark checking, that is, free > > > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because > > > > > > > > we already exhausted movable pages, only free pages we have are unmovable > > > > > > > > and reclaimable types and this would be really small amount. So watermark > > > > > > > > checking would be failed. It will wake up kswapd to make enough free > > > > > > > > memory for unmovable and reclaimable allocation and kswapd will do. > > > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to > > > > > > > > reclaim memory and try to make free memory over the high watermark. This > > > > > > > > watermark checking by kswapd doesn't take care free cma pages so many > > > > > > > > movable pages would be reclaimed. After then, we have a lot of movable > > > > > > > > pages again, so fallback allocation doesn't happen again. To conclude, > > > > > > > > amount of free memory on meminfo which includes free CMA pages is moving > > > > > > > > around 512 MB if I reserve 512 MB memory for CMA. > > > > > > > > > > > > > > > > I found this problem on following experiment. > > > > > > > > > > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE > > > > > > > > make -j24 > > > > > > > > > > > > > > > > CMA reserve: 0 MB 512 MB > > > > > > > > Elapsed-time: 234.8 361.8 > > > > > > > > Average-MemFree: 283880 KB 530851 KB > > > > > > > > > > > > > > > > To solve this problem, I can think following 2 possible solutions. > > > > > > > > 1. allocate the pages on cma reserved memory first, and if they are > > > > > > > > exhausted, allocate movable pages. > > > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory > > > > > > > > from cma reserved memory and then allocate from free movable memory. > > > > > > > > > > > > > > I love this idea but when I see the code, I don't like that. > > > > > > > In allocation path, just try to allocate pages by round-robin so it's role > > > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer > > > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail > > > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that > > > > > > > reclaimer can filter it out during page scanning. > > > > > > > We already have an tool to achieve it(ie, isolate_mode_t). > > > > > > > > > > > > Hello, > > > > > > > > > > > > I agree with leaving fast allocation path as simple as possible. > > > > > > I will remove runtime computation for determining ratio in > > > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated > > > > > > on the other path. > > > > > > > > > > Sounds good. > > > > > > > > > > > > > > > > > I am not sure that whether your second suggestion(Hey relaimer part) > > > > > > is good or not. In my quick thought, that could be helpful in the > > > > > > situation that many free cma pages remained. But, it would be not helpful > > > > > > when there are neither free movable and cma pages. In generally, most > > > > > > workloads mainly uses movable pages for page cache or anonymous mapping. > > > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed > > > > > > pages are used mostly by movable allocation. We can handle these allocation > > > > > > request even if we reclaim the pages just in lru order. If we rotate > > > > > > the lru list for finding movable pages, it could cause more useful > > > > > > pages to be evicted. > > > > > > > > > > > > This is just my quick thought, so please let me correct if I am wrong. > > > > > > > > > > Why should reclaimer reclaim unnecessary pages? > > > > > So, your answer is that it would be better because upcoming newly allocated > > > > > pages would be allocated easily without interrupt. But it could reclaim > > > > > too much pages until watermark for unmovable allocation is okay. > > > > > Even, sometime, you might see OOM. > > > > > > > > > > Moreover, how could you handle current trobule? > > > > > For example, there is atomic allocation and the only thing to save the world > > > > > is kswapd because it's one of kswapd role but kswapd is spending many time to > > > > > reclaim CMA pages, which is pointless so the allocation would be easily failed. > > > > > > > > Hello, > > > > > > > > I guess that it isn't the problem. In lru, movable pages and cma pages > > > > would be interleaved. So it doesn't takes too long time to get the > > > > page for non-movable allocation. > > > > > > Please, don't assume there are ideal LRU ordering. > > > Newly allocated page by fairness allocation is located by head of LRU > > > while old pages are approaching the tail so there is huge time gab. > > > During the time, old pages could be dropped/promoting so one of side > > > could be filled with one type rather than interleaving both types pages > > > you expected. > > > > I assumed general case, not ideal case. > > Your example can be possible, but would be corner case. > > I talked with Joonsoo yesterday and should post our conclusion > for other reviewers/maintainers. > > It's not a corner case and it could happen depending on zone and CMA > configuration. For example, there is 330M high zone and CMA consumes > 300M in the space while normal movable area consumes just 30M. > In the case, unmovable allocation could make too many unnecessary > reclaiming of the zone so the conclusion we reached is to need target > reclaiming(ex, isolate_mode_t). > > But not sure it should be part of this patchset because this patchset > is surely enhance(ie, before, it was hard to allocate page from CMA area > but this patchset makes it works) but this patchset could make mentioned > problem as side-effect so I think we could solve the issue(ie, too many > reclaiming in unbalanced zone) in another patchset. > > Joonsoo, please mention this problem in the description when you respin > so other MM guys can notice that and give ideas, which would be helpful > a lot. Okay. Will do :) Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753035AbaETLiS (ORCPT ); Tue, 20 May 2014 07:38:18 -0400 Received: from mailout2.w1.samsung.com ([210.118.77.12]:17351 "EHLO mailout2.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752017AbaETLiQ (ORCPT ); Tue, 20 May 2014 07:38:16 -0400 MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed X-AuditID: cbfec7f4-b7fac6d000006cfe-d1-537b3ea6544b Content-transfer-encoding: 8BIT Message-id: <537B3EA5.2040302@samsung.com> Date: Tue, 20 May 2014 13:38:13 +0200 From: Marek Szyprowski User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 To: Gioh Kim , Michal Nazarewicz , Joonsoo Kim Cc: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> In-reply-to: <537AA6C7.1040506@lge.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrNLMWRmVeSWpSXmKPExsVy+t/xy7rL7KqDDe7cELGYs34Nm8Wzpi+M FhemLWS2mHz4I5PF6k2+FgdnL2GyWNndzGaxvXMGu8XlXXPYLO6t+c9qMfndM0aLBcdbWC3u 73/AZvH3ynoWBz6Pw2/eM3tc7utl8tg56y67x6ZPk9g9ut5eYfI4MeM3i8e6P6+YPN7vu8rm 0bdlFaPH5tPVHp83yQVwR3HZpKTmZJalFunbJXBlLPlsXvBHqKLj4G/mBsYvfF2MnBwSAiYS s/t+MEPYYhIX7q1n62Lk4hASWMoo8XTOURaQBK+AoMSPyfeAbA4OZgF5iSOXskHCzAJmEo9a 1jFD1H9ilLj8rosNpIZXQEti/7lMkBoWAVWJjX0TWUFsNgFDia63ICWcHKICMRK7Py9kBLFF BPIktszYADaHWeAfk8StuSvAGoQFoiSebW9ihFjQxSyxbNJ5sASngLrEjPMbmScwCsxCct8s hPtmIblvASPzKkbR1NLkguKk9FxDveLE3OLSvHS95PzcTYyQCPuyg3HxMatDjAIcjEo8vA62 VcFCrIllxZW5hxglOJiVRHi1bauDhXhTEiurUovy44tKc1KLDzEycXBKNTD2GkYor+ybtELm VJ/Thp5nBqp6D8NZVMuTri2a3THPduexC1sWu9xtCIvKaXZ/pv/v8WPpR9LTWOuyvv89fncR n9Hh7x/9ws2+ytpUXd3hbqDwPUnk+lmnZ8U/d+iIf29+cWzyTed3XgufqKcd681pVVwYOUuS 076tbftMjq19U1/WLbptLRujxFKckWioxVxUnAgAdkhZio4CAAA= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On 2014-05-20 02:50, Gioh Kim wrote: > > > 2014-05-20 오전 4:59, Michal Nazarewicz 쓴 글: >> On Sun, May 18 2014, Joonsoo Kim wrote: >>> I think that this problem is originated from atomic_pool_init(). >>> If configured coherent_pool size is larger than default cma size, >>> it can be failed even if this patch is applied. > > The coherent_pool size (atomic_pool.size) should be restricted smaller > than cma size. > > This is another issue, however I think the default atomic pool size is > too small. > Only one port of USB host needs at most 256Kbytes coherent memory > (according to the USB host spec). This pool is used only for allocation done in atomic context (allocations done with GFP_ATOMIC flag), otherwise the standard allocation path is used. Are you sure that each usb host port really needs so much memory allocated in atomic context? > If a platform has several ports, it needs more than 1MB. > Therefore the default atomic pool size should be at least 1MB. > >>> >>> How about below patch? >>> It uses fallback allocation if CMA is failed. >> >> Yes, I thought about it, but __dma_alloc uses similar code: >> >> else if (!IS_ENABLED(CONFIG_DMA_CMA)) >> addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, >> caller); >> else >> addr = __alloc_from_contiguous(dev, size, prot, &page, caller); >> >> so it probably needs to be changed as well. > > If CMA option is not selected, __alloc_from_contiguous would not be > called. > We don't need to the fallback allocation. > > And if CMA option is selected and initialized correctly, > the cma allocation can fail in case of no-CMA-memory situation. > I thinks in that case we don't need to the fallback allocation also, > because it is normal case. > > Therefore I think the restriction of CMA size option and make CMA work > can cover every cases. > > I think below patch is also good choice. > If both of you, Michal and Joonsoo, do not agree with me, please > inform me. > I will make a patch including option restriction and fallback allocation. I'm not sure if we need a fallback for failed CMA allocation. The only issue that have been mentioned here and needs to be resolved is support for disabling cma by kernel command line. Right now it will fails completely. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751163AbaEUAPP (ORCPT ); Tue, 20 May 2014 20:15:15 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:47418 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751016AbaEUAPO (ORCPT ); Tue, 20 May 2014 20:15:14 -0400 X-Original-SENDERIP: 10.178.33.69 X-Original-MAILFROM: gioh.kim@lge.com Message-ID: <537BF00E.3030409@lge.com> Date: Wed, 21 May 2014 09:15:10 +0900 From: Gioh Kim User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Marek Szyprowski , Michal Nazarewicz , Joonsoo Kim CC: Minchan Kim , Andrew Morton , Rik van Riel , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Heesub Shin , Mel Gorman , Johannes Weiner , =?UTF-8?B?7J206rG07Zi4?= , gurugio@gmail.com Subject: Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <20140513030057.GC32092@bbox> <20140515015301.GA10116@js1304-P5Q-DELUXE> <5375C619.8010501@lge.com> <537962A0.4090600@lge.com> <20140519055527.GA24099@js1304-P5Q-DELUXE> <537AA6C7.1040506@lge.com> <537B3EA5.2040302@samsung.com> In-Reply-To: <537B3EA5.2040302@samsung.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2014-05-20 오후 8:38, Marek Szyprowski 쓴 글: > Hello, > > On 2014-05-20 02:50, Gioh Kim wrote: >> >> >> 2014-05-20 오전 4:59, Michal Nazarewicz 쓴 글: >>> On Sun, May 18 2014, Joonsoo Kim wrote: >>>> I think that this problem is originated from atomic_pool_init(). >>>> If configured coherent_pool size is larger than default cma size, >>>> it can be failed even if this patch is applied. >> >> The coherent_pool size (atomic_pool.size) should be restricted smaller than cma size. >> >> This is another issue, however I think the default atomic pool size is too small. >> Only one port of USB host needs at most 256Kbytes coherent memory (according to the USB host spec). > > This pool is used only for allocation done in atomic context (allocations > done with GFP_ATOMIC flag), otherwise the standard allocation path is used. > Are you sure that each usb host port really needs so much memory allocated > in atomic context? I don't know why but drivers/usb/host/ohci-hcd.c:ohci_init() calls dma_alloc_coherent with zero gfp. Therefore it occurs panic if CMA is turned on and CONFIG_CMA_SIZE_MBYTES is zero. A pointer pool->vaddr is NULL in __alloc_from_pool. Below is my kernel message. [ 24.339858] -----------[ cut here ]----------- [ 24.344535] WARNING: at arch/arm/mm/dma-mapping.c:492 __dma_alloc.isra.19+0x25c/0x2a4() [ 24.352554] coherent pool not initialised! [ 24.356644] Modules linked in: [ 24.359701] CPU: 1 PID: 711 Comm: sh Not tainted 3.10.19+ #42 [ 24.365488] [<800140e0>] (unwind_backtrace+0x0/0xf8) from [<80011f20>] (show_stack+0x10/0x14) [ 24.374045] [<80011f20>] (show_stack+0x10/0x14) from [<8001f21c>] (warn_slowpath_common+0x4c/0x6c) [ 24.383022] [<8001f21c>] (warn_slowpath_common+0x4c/0x6c) from [<8001f2d0>] (warn_slowpath_fmt+0x30/0x40) [ 24.392602] [<8001f2d0>] (warn_slowpath_fmt+0x30/0x40) from [<80017f5c>] (__dma_alloc.isra.19+0x25c/0x2a4) [ 24.402270] [<80017f5c>] (__dma_alloc.isra.19+0x25c/0x2a4) from [<800180d0>] (arm_dma_alloc+0x90/0x98) [ 24.411580] [<800180d0>] (arm_dma_alloc+0x90/0x98) from [<8034ab54>] (ohci_init+0x1b0/0x278) [ 24.420035] [<8034ab54>] (ohci_init+0x1b0/0x278) from [<80332e00>] (usb_add_hcd+0x184/0x5b8) [ 24.428484] [<80332e00>] (usb_add_hcd+0x184/0x5b8) from [<8034b8d4>] (ohci_platform_probe+0xd0/0x174) [ 24.437713] [<8034b8d4>] (ohci_platform_probe+0xd0/0x174) from [<802f1cac>] (platform_drv_probe+0x14/0x18) [ 24.447385] [<802f1cac>] (platform_drv_probe+0x14/0x18) from [<802f0a54>] (driver_probe_device+0x6c/0x1f8) [ 24.457049] [<802f0a54>] (driver_probe_device+0x6c/0x1f8) from [<802ef16c>] (bus_for_each_drv+0x44/0x8c) [ 24.466537] [<802ef16c>] (bus_for_each_drv+0x44/0x8c) from [<802f09bc>] (device_attach+0x74/0x80) [ 24.475416] [<802f09bc>] (device_attach+0x74/0x80) from [<802f0050>] (bus_probe_device+0x84/0xa8) [ 24.484295] [<802f0050>] (bus_probe_device+0x84/0xa8) from [<802ee89c>] (device_add+0x4c0/0x58c) [ 24.493088] [<802ee89c>] (device_add+0x4c0/0x58c) from [<802f21b8>] (platform_device_add+0xac/0x214) [ 24.502227] [<802f21b8>] (platform_device_add+0xac/0x214) from [<8001bf3c>] (lg115x_init_usb+0xbc/0xe4) [ 24.511618] [<8001bf3c>] (lg115x_init_usb+0xbc/0xe4) from [<80008734>] (do_user_initcalls+0x98/0x128) [ 24.520843] [<80008734>] (do_user_initcalls+0x98/0x128) from [<80008870>] (proc_write_usercalls+0xac/0xd0) [ 24.530512] [<80008870>] (proc_write_usercalls+0xac/0xd0) from [<80138f48>] (proc_reg_write+0x58/0x80) [ 24.539830] [<80138f48>] (proc_reg_write+0x58/0x80) from [<800f0084>] (vfs_write+0xb0/0x1bc) [ 24.548275] [<800f0084>] (vfs_write+0xb0/0x1bc) from [<800f04d0>] (SyS_write+0x3c/0x70) [ 24.556287] [<800f04d0>] (SyS_write+0x3c/0x70) from [<8000e5c0>] (ret_fast_syscall+0x0/0x30) [ 24.564726] --[ end trace c092568e2a263d21 ]-- [ 24.569345] ohci-platform ohci-platform.0: can't setup [ 24.574498] ohci-platform ohci-platform.0: USB bus 1 deregistered [ 24.582241] ohci-platform: probe of ohci-platform.0 failed with error -12 [ 24.590496] ohci-platform ohci-platform.1: Generic Platform OHCI Controller [ 24.598984] ohci-platform ohci-platform.1: new USB bus registered, assigned bus number 1 > >> If a platform has several ports, it needs more than 1MB. >> Therefore the default atomic pool size should be at least 1MB. >> >>>> >>>> How about below patch? >>>> It uses fallback allocation if CMA is failed. >>> >>> Yes, I thought about it, but __dma_alloc uses similar code: >>> >>> else if (!IS_ENABLED(CONFIG_DMA_CMA)) >>> addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller); >>> else >>> addr = __alloc_from_contiguous(dev, size, prot, &page, caller); >>> >>> so it probably needs to be changed as well. >> >> If CMA option is not selected, __alloc_from_contiguous would not be called. >> We don't need to the fallback allocation. >> >> And if CMA option is selected and initialized correctly, >> the cma allocation can fail in case of no-CMA-memory situation. >> I thinks in that case we don't need to the fallback allocation also, >> because it is normal case. >> >> Therefore I think the restriction of CMA size option and make CMA work can cover every cases. >> >> I think below patch is also good choice. >> If both of you, Michal and Joonsoo, do not agree with me, please inform me. >> I will make a patch including option restriction and fallback allocation. > > I'm not sure if we need a fallback for failed CMA allocation. The only issue that > have been mentioned here and needs to be resolved is support for disabling cma by > kernel command line. Right now it will fails completely. cma=0 in the kernel command line and CONFIG_CMA_SIZE_MBYTES 0 are set selected_size as zero in dma_contiguous_reserve. And dma_contiguous_reserve_area cannot be called and atomic_pool is not initialized. After that dma_alloc_coherent try to allocate via atomic_pool (__alloc_from_pool) or CMA (__alloc_from_contiguous). Allocation via atomic_pool fails becauseof atomic_pool->vaddr is NULL. And CMA allocation shouldn't be called because cma=0 or setting CONFIG_CMA_SIZE_MBYTES 0 is the same with disabling CMA. If cma=0 or CONFIG_CMA_SIZE_MBYTES is 0, __alloc_remap_buffer should be called instead of __alloc_from_contiguous even-if CMA is turned on. I'm poor at English so I describe the problem in seudo code: if (CMA is turned on) and ((cma=0 in command line) or (CONFIG_CMA_SIZE_MBYTES=0)) try to allocate from CMA but CMA is not initialized > > Best regards From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751435AbaEXA6B (ORCPT ); Fri, 23 May 2014 20:58:01 -0400 Received: from smtp.codeaurora.org ([198.145.11.231]:54156 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751017AbaEXA57 (ORCPT ); Fri, 23 May 2014 20:57:59 -0400 Message-ID: <537FEE96.8000704@codeaurora.org> Date: Fri, 23 May 2014 17:57:58 -0700 From: Laura Abbott User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Joonsoo Kim , Andrew Morton CC: Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, lmark@codeaurora.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> In-Reply-To: <5370FF1D.10707@codeaurora.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/12/2014 10:04 AM, Laura Abbott wrote: > > I'm going to see about running this through tests internally for comparison. > Hopefully I'll get useful results in a day or so. > > Thanks, > Laura > We ran some tests internally and found that for our purposes these patches made the benchmarks worse vs. the existing implementation of using CMA first for some pages. These are mostly androidisms but androidisms that we care about for having a device be useful. The foreground memory headroom on the device was on average about 40 MB smaller when using these patches vs our existing implementation of something like solution #1. By foreground memory headroom we simply mean the amount of memory that the foreground application can allocate before it is killed by the Android Low Memory killer. We also found that when running a sequence of app launches these patches had more high priority app kills by the LMK and more alloc stalls. The test did a total of 500 hundred app launches (using 9 separate applications) The CMA memory in our system is rarely used by its client and is therefore available to the system most of the time. Test device - 4 CPUs - Android 4.4.2 - 512MB of RAM - 68 MB of CMA Results: Existing solution: Foreground headroom: 200MB Number of higher priority LMK kills (oom_score_adj < 529): 332 Number of alloc stalls: 607 Test patches: Foreground headroom: 160MB Number of higher priority LMK kills (oom_score_adj < 529): 459 Number of alloc stalls: 29538 We believe that the issues seen with these patches are the result of the LMK being more aggressive. The LMK will be more aggressive because it will ignore free CMA pages for unmovable allocations, and since most calls to the LMK are made by kswapd (which uses GFP_KERNEL) the LMK will mostly ignore free CMA pages. Because the LMK thresholds are higher than the zone watermarks, there will often be a lot of free CMA pages in the system when the LMK is called, which the LMK will usually ignore. Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751501AbaEZCl0 (ORCPT ); Sun, 25 May 2014 22:41:26 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:44966 "EHLO lgeamrelo02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751458AbaEZClZ (ORCPT ); Sun, 25 May 2014 22:41:25 -0400 X-Original-SENDERIP: 10.177.220.145 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Mon, 26 May 2014 11:44:17 +0900 From: Joonsoo Kim To: Laura Abbott Cc: Andrew Morton , Rik van Riel , Johannes Weiner , Mel Gorman , Minchan Kim , Heesub Shin , Marek Szyprowski , Michal Nazarewicz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, lmark@codeaurora.org Subject: Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Message-ID: <20140526024417.GA26935@js1304-P5Q-DELUXE> References: <1399509144-8898-1-git-send-email-iamjoonsoo.kim@lge.com> <1399509144-8898-3-git-send-email-iamjoonsoo.kim@lge.com> <5370FF1D.10707@codeaurora.org> <537FEE96.8000704@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <537FEE96.8000704@codeaurora.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 23, 2014 at 05:57:58PM -0700, Laura Abbott wrote: > On 5/12/2014 10:04 AM, Laura Abbott wrote: > > > > I'm going to see about running this through tests internally for comparison. > > Hopefully I'll get useful results in a day or so. > > > > Thanks, > > Laura > > > > We ran some tests internally and found that for our purposes these patches made > the benchmarks worse vs. the existing implementation of using CMA first for some > pages. These are mostly androidisms but androidisms that we care about for > having a device be useful. > > The foreground memory headroom on the device was on average about 40 MB smaller > when using these patches vs our existing implementation of something like > solution #1. By foreground memory headroom we simply mean the amount of memory > that the foreground application can allocate before it is killed by the Android > Low Memory killer. > > We also found that when running a sequence of app launches these patches had > more high priority app kills by the LMK and more alloc stalls. The test did a > total of 500 hundred app launches (using 9 separate applications) The CMA > memory in our system is rarely used by its client and is therefore available > to the system most of the time. > > Test device > - 4 CPUs > - Android 4.4.2 > - 512MB of RAM > - 68 MB of CMA > > > Results: > > Existing solution: > Foreground headroom: 200MB > Number of higher priority LMK kills (oom_score_adj < 529): 332 > Number of alloc stalls: 607 > > > Test patches: > Foreground headroom: 160MB > Number of higher priority LMK kills (oom_score_adj < 529): > 459 Number of alloc stalls: 29538 > > We believe that the issues seen with these patches are the result of the LMK > being more aggressive. The LMK will be more aggressive because it will ignore > free CMA pages for unmovable allocations, and since most calls to the LMK are > made by kswapd (which uses GFP_KERNEL) the LMK will mostly ignore free CMA > pages. Because the LMK thresholds are higher than the zone watermarks, there > will often be a lot of free CMA pages in the system when the LMK is called, > which the LMK will usually ignore. Hello, Really thanks for testing!!! If possible, please let me know nr_free_cma of these patches/your in-house implementation before testing. I can guess following scenario about your test. On boot-up, CMA memory are mostly used by native processes, because your implementation use CMA first for some pages. kswapd is woken up late since non-CMA free memory is larger than my implementation. And, on reclaiming, the LMK reclaiming memory by killing app process would reclaim movable memory with high probability since cma memory are mostly used by native processes and app processes have just movable memory. This is just my guess. But, if it is true, this is not fair test for this patchset. If possible, could you make nr_free_cma same on both implementation before testing? Moreover, in mainline implementation, the LMK doesn't consider if memory type is CMA or not. Maybe your overall system would be highly optimized for your implementation, so I'm not sure if your testing is appropriate or not for this patchset. Anyway, I would like to optimize this for android. :) Please let me know more about your system. Thanks.