From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx147.postini.com [74.125.245.147]) by kanga.kvack.org (Postfix) with SMTP id BC6006B0062 for ; Tue, 19 Jun 2012 01:49:07 -0400 (EDT) Message-ID: <4FE012CD.6010605@kernel.org> Date: Tue, 19 Jun 2012 14:49:01 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: RFC: Easy-Reclaimable LRU list Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "linux-mm@kvack.org" , LKML Cc: Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Hi everybody! Recently, there are some efforts to handle system memory pressure. 1) low memory notification - [1] 2) fallocate(VOLATILE) - [2] 3) fadvise(NOREUSE) - [3] For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". Reclaimable LRU list includes _easy_ reclaimable pages. For example, easy reclaimable pages are following as. 1. invalidated but remained LRU list. 2. pageout pages for reclaim(PG_reclaim pages) 3. fadvise(NOREUSE) 4. fallocate(VOLATILE) Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary swapout in anon pages in easy-reclaimable LRU list. It also can make admin measure how many we have available pages at the moment without latency. It's very important in recent mobile systems because page reclaim/writeback is very critical of application latency. Of course, it could affect normal desktop, too. With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, for example. If it's below threshold we defined, we could trigger 1st level notification if we really need prototying low memory notification. We may change madvise(DONTNEED) implementation instead of zapping page immediately. If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. Of course, we can discard instead of swap out if system memory pressure happens. We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. The rationale is that in non-rotation device, read/write cost is much asynchronous. Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too. Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late. I hope listen others opinion before get into the code. Any comment are welcome. Thanks. [1] http://lkml.org/lkml/2012/5/1/97 [2] https://lkml.org/lkml/2012/6/1/322 [3] https://lkml.org/lkml/2011/6/24/136 -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id 00B8A6B00F3 for ; Thu, 21 Jun 2012 15:53:49 -0400 (EDT) Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 21 Jun 2012 13:53:49 -0600 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id A8995C93F61 for ; Thu, 21 Jun 2012 15:21:29 -0400 (EDT) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5LJLU3n204898 for ; Thu, 21 Jun 2012 15:21:30 -0400 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5LJMPDU013246 for ; Thu, 21 Jun 2012 13:22:26 -0600 Message-ID: <4FE37434.808@linaro.org> Date: Thu, 21 Jun 2012 12:21:24 -0700 From: John Stultz MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> In-Reply-To: <4FE012CD.6010605@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins On 06/18/2012 10:49 PM, Minchan Kim wrote: > Hi everybody! > > Recently, there are some efforts to handle system memory pressure. > > 1) low memory notification - [1] > 2) fallocate(VOLATILE) - [2] > 3) fadvise(NOREUSE) - [3] > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > Reclaimable LRU list includes _easy_ reclaimable pages. > For example, easy reclaimable pages are following as. > > 1. invalidated but remained LRU list. > 2. pageout pages for reclaim(PG_reclaim pages) > 3. fadvise(NOREUSE) > 4. fallocate(VOLATILE) > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary > swapout in anon pages in easy-reclaimable LRU list. I was hoping there would be further comment on this by more core VM devs, but so far things have been quiet (is everyone on vacation?). Overall this seems reasonable for the volatile ranges functionality. The one down-side being that dealing with the ranges on a per-page basis can make marking and unmarking larger ranges as volatile fairly expensive. In my tests with my last patchset, it was over 75x slower (~1.5ms) marking and umarking a 1meg range when we deactivate and activate all of the pages, instead of just inserting the volatile range into an interval tree and purge via the shrinker (~20us). Granted, my initial approach is somewhat naive, and some pagevec batching has improved things three-fold (down to ~500us) , but I'm still ~25x slower when iterating over all the pages. There's surely further improvements to be made, but this added cost worries me, as users are unlikely to generously volunteer up memory to the kernel as volatile if doing so frequently adds significant overhead. This makes me wonder if having something like an early-shrinker which gets called prior to shrinking the lrus might be a better approach for volatile ranges. It would still be numa-unaware, but would keep the overhead very light to both volatile users and non users. Even so, I'd be interested in seeing more about your approach, in the hopes that it might not be as costly as my initial attempt. Do you have any plans to start prototyping this? thanks -john -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx193.postini.com [74.125.245.193]) by kanga.kvack.org (Postfix) with SMTP id 983446B0145 for ; Fri, 22 Jun 2012 02:57:08 -0400 (EDT) Message-ID: <4FE41752.8050305@kernel.org> Date: Fri, 22 Jun 2012 15:57:22 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> In-Reply-To: <4FE37434.808@linaro.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: John Stultz Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Hi John, On 06/22/2012 04:21 AM, John Stultz wrote: > On 06/18/2012 10:49 PM, Minchan Kim wrote: >> Hi everybody! >> >> Recently, there are some efforts to handle system memory pressure. >> >> 1) low memory notification - [1] >> 2) fallocate(VOLATILE) - [2] >> 3) fadvise(NOREUSE) - [3] >> >> For them, I would like to add new LRU list, aka "Ereclaimable" which >> is opposite of "unevictable". >> Reclaimable LRU list includes _easy_ reclaimable pages. >> For example, easy reclaimable pages are following as. >> >> 1. invalidated but remained LRU list. >> 2. pageout pages for reclaim(PG_reclaim pages) >> 3. fadvise(NOREUSE) >> 4. fallocate(VOLATILE) >> >> Their pages shouldn't stir normal LRU list and compaction might not >> migrate them, even. >> Reclaimer can reclaim Ereclaimable pages before normal lru list and >> will avoid unnecessary >> swapout in anon pages in easy-reclaimable LRU list. > > I was hoping there would be further comment on this by more core VM > devs, but so far things have been quiet (is everyone on vacation?). At least, there are no dissent comment until now. Let be a positive. :) > > Overall this seems reasonable for the volatile ranges functionality. > The one down-side being that dealing with the ranges on a per-page basis > can make marking and unmarking larger ranges as volatile fairly > expensive. In my tests with my last patchset, it was over 75x slower > (~1.5ms) marking and umarking a 1meg range when we deactivate and > activate all of the pages, instead of just inserting the volatile range > into an interval tree and purge via the shrinker (~20us). Granted, my > initial approach is somewhat naive, and some pagevec batching has > improved things three-fold (down to ~500us) , but I'm still ~25x slower > when iterating over all the pages. > > There's surely further improvements to be made, but this added cost > worries me, as users are unlikely to generously volunteer up memory to > the kernel as volatile if doing so frequently adds significant overhead. > > This makes me wonder if having something like an early-shrinker which > gets called prior to shrinking the lrus might be a better approach for > volatile ranges. It would still be numa-unaware, but would keep the > overhead very light to both volatile users and non users. How about doing it in background? In your process context, you can schedule your work to workqueue and when work is executed, you can move the pages into lru list you want. Just an idea. > > Even so, I'd be interested in seeing more about your approach, in the > hopes that it might not be as costly as my initial attempt. Do you have > any plans to start prototyping this? I will wait response a few day and if anyone doesn't raise critical problems, will start. But please keep in mind.I guess it's never trivial so you shouldn't depend on my schedule. Thanks. > > thanks > -john > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx121.postini.com [74.125.245.121]) by kanga.kvack.org (Postfix) with SMTP id 0D0B06B0291 for ; Sat, 23 Jun 2012 00:47:46 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 5BB613EE0BB for ; Sat, 23 Jun 2012 13:47:45 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 435CA45DE53 for ; Sat, 23 Jun 2012 13:47:45 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 14A5345DE4F for ; Sat, 23 Jun 2012 13:47:45 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 02FE61DB803F for ; Sat, 23 Jun 2012 13:47:45 +0900 (JST) Received: from m1000.s.css.fujitsu.com (m1000.s.css.fujitsu.com [10.240.81.136]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id B19E41DB8037 for ; Sat, 23 Jun 2012 13:47:44 +0900 (JST) Message-ID: <4FE549E8.2050905@jp.fujitsu.com> Date: Sat, 23 Jun 2012 13:45:28 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> In-Reply-To: <4FE41752.8050305@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: John Stultz , "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins (2012/06/22 15:57), Minchan Kim wrote: > Hi John, > > On 06/22/2012 04:21 AM, John Stultz wrote: > >> On 06/18/2012 10:49 PM, Minchan Kim wrote: >>> Hi everybody! >>> >>> Recently, there are some efforts to handle system memory pressure. >>> >>> 1) low memory notification - [1] >>> 2) fallocate(VOLATILE) - [2] >>> 3) fadvise(NOREUSE) - [3] >>> >>> For them, I would like to add new LRU list, aka "Ereclaimable" which >>> is opposite of "unevictable". >>> Reclaimable LRU list includes _easy_ reclaimable pages. >>> For example, easy reclaimable pages are following as. >>> >>> 1. invalidated but remained LRU list. >>> 2. pageout pages for reclaim(PG_reclaim pages) >>> 3. fadvise(NOREUSE) >>> 4. fallocate(VOLATILE) >>> >>> Their pages shouldn't stir normal LRU list and compaction might not >>> migrate them, even. >>> Reclaimer can reclaim Ereclaimable pages before normal lru list and >>> will avoid unnecessary >>> swapout in anon pages in easy-reclaimable LRU list. >> >> I was hoping there would be further comment on this by more core VM >> devs, but so far things have been quiet (is everyone on vacation?). > > > At least, there are no dissent comment until now. > Let be a positive. :) I think this is interesting approach. Major concern is how to guarantee EReclaimable pages are really EReclaimable...Do you have any idea ? madviced pages are really EReclaimable ? A (very) small concern is will you use one more page-flags for this ? ;) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id 700886B02C2 for ; Sat, 23 Jun 2012 11:53:31 -0400 (EDT) Message-ID: <4FE5E66C.6080309@redhat.com> Date: Sat, 23 Jun 2012 11:53:16 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> <4FE549E8.2050905@jp.fujitsu.com> In-Reply-To: <4FE549E8.2050905@jp.fujitsu.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: Minchan Kim , John Stultz , "linux-mm@kvack.org" , LKML , Mel Gorman , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote: > I think this is interesting approach. Major concern is how to guarantee > EReclaimable > pages are really EReclaimable...Do you have any idea ? madviced pages > are really EReclaimable ? I suspect the EReclaimable pages can only be clean page cache pages that are not mapped by any processes. Once somebody tries to use the page, mark_page_accessed will move it to another list. > A (very) small concern is will you use one more page-flags for this ? ;) This could be an issue on a 32 bit system, true. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx177.postini.com [74.125.245.177]) by kanga.kvack.org (Postfix) with SMTP id ECB116B02DD for ; Sun, 24 Jun 2012 07:09:51 -0400 (EDT) Received: by yenr5 with SMTP id r5so2896880yen.14 for ; Sun, 24 Jun 2012 04:09:51 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4FE5E66C.6080309@redhat.com> References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> <4FE549E8.2050905@jp.fujitsu.com> <4FE5E66C.6080309@redhat.com> From: KOSAKI Motohiro Date: Sun, 24 Jun 2012 07:09:30 -0400 Message-ID: Subject: Re: RFC: Easy-Reclaimable LRU list Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Kamezawa Hiroyuki , Minchan Kim , John Stultz , "linux-mm@kvack.org" , LKML , Mel Gorman , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins On Sat, Jun 23, 2012 at 11:53 AM, Rik van Riel wrote: > On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote: > >> I think this is interesting approach. Major concern is how to guarantee >> EReclaimable >> pages are really EReclaimable...Do you have any idea ? madviced pages >> are really EReclaimable ? > > I suspect the EReclaimable pages can only be clean page > cache pages that are not mapped by any processes. > > Once somebody tries to use the page, mark_page_accessed > will move it to another list. 100% agree. >> A (very) small concern is will you use one more page-flags for this ? ;) > > This could be an issue on a 32 bit system, true. Do we really need SwapBacked bit? Actually swap-backed is per-superblock attribute and don't change dynamically (i.e. no race happen). thus this bit might be able to move into page->mapping or page->mapping->host. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx113.postini.com [74.125.245.113]) by kanga.kvack.org (Postfix) with SMTP id 07BD26B007D for ; Sun, 24 Jun 2012 20:14:42 -0400 (EDT) Message-ID: <4FE7AD8A.2080508@kernel.org> Date: Mon, 25 Jun 2012 09:15:06 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> <4FE549E8.2050905@jp.fujitsu.com> In-Reply-To: <4FE549E8.2050905@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: John Stultz , "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Hi Kame, On 06/23/2012 01:45 PM, Kamezawa Hiroyuki wrote: > (2012/06/22 15:57), Minchan Kim wrote: >> Hi John, >> >> On 06/22/2012 04:21 AM, John Stultz wrote: >> >>> On 06/18/2012 10:49 PM, Minchan Kim wrote: >>>> Hi everybody! >>>> >>>> Recently, there are some efforts to handle system memory pressure. >>>> >>>> 1) low memory notification - [1] >>>> 2) fallocate(VOLATILE) - [2] >>>> 3) fadvise(NOREUSE) - [3] >>>> >>>> For them, I would like to add new LRU list, aka "Ereclaimable" which >>>> is opposite of "unevictable". >>>> Reclaimable LRU list includes _easy_ reclaimable pages. >>>> For example, easy reclaimable pages are following as. >>>> >>>> 1. invalidated but remained LRU list. >>>> 2. pageout pages for reclaim(PG_reclaim pages) >>>> 3. fadvise(NOREUSE) >>>> 4. fallocate(VOLATILE) >>>> >>>> Their pages shouldn't stir normal LRU list and compaction might not >>>> migrate them, even. >>>> Reclaimer can reclaim Ereclaimable pages before normal lru list and >>>> will avoid unnecessary >>>> swapout in anon pages in easy-reclaimable LRU list. >>> >>> I was hoping there would be further comment on this by more core VM >>> devs, but so far things have been quiet (is everyone on vacation?). >> >> >> At least, there are no dissent comment until now. >> Let be a positive. :) > > I think this is interesting approach. Major concern is how to guarantee > EReclaimable > pages are really EReclaimable...Do you have any idea ? madviced pages > are really > EReclaimable ? I would like to select just discardable pages. 1. unmapped file page 2. PG_reclaimed page - (that pages would have no mapped and a candidate for reclaim ASAP) 3. fallocate(VOLATILE) - (We can just discard them without swapout) 4. madvise(MADV_DONTNEED)/fadvise(NOREUSE) - (It could be difficult than (1,2,3) but it's very likely to reclaim easily than others. > > A (very) small concern is will you use one more page-flags for this ? ;) Maybe and it could be a serious problem on 32 bit machine. I didn't dive into that but I guess we can reuse PG_reclaim bit. PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have active LRU list. so we can change following as - #define PG_reclaim + #define PG_Ereclaim SetPageReclaim(page) { page->flags |= (PG_Ereclaim|PG_active); } TestPageReclaim(page) { if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) return true; return false; } SetPageEreclaim(page) { page->flags |= PG_Ereclaim; } Thanks for the comment, Kame. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx135.postini.com [74.125.245.135]) by kanga.kvack.org (Postfix) with SMTP id 7FE246B031F for ; Mon, 25 Jun 2012 04:48:57 -0400 (EDT) Message-ID: <4FE82555.2010704@parallels.com> Date: Mon, 25 Jun 2012 12:46:13 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> In-Reply-To: <4FE012CD.6010605@kernel.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins On 06/19/2012 09:49 AM, Minchan Kim wrote: > Hi everybody! > > Recently, there are some efforts to handle system memory pressure. > > 1) low memory notification - [1] > 2) fallocate(VOLATILE) - [2] > 3) fadvise(NOREUSE) - [3] > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > Reclaimable LRU list includes_easy_ reclaimable pages. > For example, easy reclaimable pages are following as. > > 1. invalidated but remained LRU list. > 2. pageout pages for reclaim(PG_reclaim pages) > 3. fadvise(NOREUSE) > 4. fallocate(VOLATILE) > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. What about other things moving memory like CMA ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx186.postini.com [74.125.245.186]) by kanga.kvack.org (Postfix) with SMTP id 925B86B032D for ; Mon, 25 Jun 2012 06:24:41 -0400 (EDT) Date: Mon, 25 Jun 2012 11:24:35 +0100 From: Mel Gorman Subject: Re: RFC: Easy-Reclaimable LRU list Message-ID: <20120625102435.GD8271@suse.de> References: <4FE012CD.6010605@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <4FE012CD.6010605@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote: > Hi everybody! > > Recently, there are some efforts to handle system memory pressure. > > 1) low memory notification - [1] > 2) fallocate(VOLATILE) - [2] > 3) fadvise(NOREUSE) - [3] > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > Reclaimable LRU list includes _easy_ reclaimable pages. > For example, easy reclaimable pages are following as. > > 1. invalidated but remained LRU list. > 2. pageout pages for reclaim(PG_reclaim pages) > 3. fadvise(NOREUSE) > 4. fallocate(VOLATILE) > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. Why would compaction not migrate them? We might still want to migrate NORESUSE or VOLATILE pages. > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary > swapout in anon pages in easy-reclaimable LRU list. > It also can make admin measure how many we have available pages at the moment without latency. That's not true for PG_reclaim pages as those pages cannot be discarded until writeback completes. One reason why I tried moving PG_reclaim pages to a separate list was to avoid excessive scanning when writing back to slow devices. If those pages were moved to an "easy-reclaimable" LRU list then the value would be reduced as scanning would still occur. It might make it worse because the whole Ereclaimable list would be scanned for pages that cannot be reclaimed at all before moving to another LRU list. This separate list does not exist today because it required a page bit to implement and I did not want it to be a 64-bit only feature. You will probably hit the same problem. The setting of the page bit is also going to be a problem but you may be able to lazily move pages to the EReclaimable list in the same way unevictable pages are handled. > It's very important in recent mobile systems because page reclaim/writeback is very critical > of application latency. Of course, it could affect normal desktop, too. > With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, > for example. If it's below threshold we defined, we could trigger 1st level notification > if we really need prototying low memory notification. > If PG_reclaim pages are on this list, then that calculation will not be helpful. > We may change madvise(DONTNEED) implementation instead of zapping page immediately. > If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. > Of course, we can discard instead of swap out if system memory pressure happens. > We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. > > As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. That alters ageing of pages significantly. It means that workloads that are using read heavily will have their pages discarded first. > The rationale is that in non-rotation device, read/write cost is much asynchronous. While this is true that does not justify throwing away unmapped clean page cache first every time. > Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages > if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too. > > Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late. > I hope listen others opinion before get into the code. > Care is needed. I think you'll only be able to use this list for NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be "easily-reclaimable" and if you add clean unmapped pages then there will be regressions in workloads that are read-intensive. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx160.postini.com [74.125.245.160]) by kanga.kvack.org (Postfix) with SMTP id C6F406B0152 for ; Mon, 25 Jun 2012 20:12:30 -0400 (EDT) Message-ID: <4FE8FE70.6050107@kernel.org> Date: Tue, 26 Jun 2012 09:12:32 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE82555.2010704@parallels.com> In-Reply-To: <4FE82555.2010704@parallels.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins On 06/25/2012 05:46 PM, Glauber Costa wrote: > On 06/19/2012 09:49 AM, Minchan Kim wrote: >> Hi everybody! >> >> Recently, there are some efforts to handle system memory pressure. >> >> 1) low memory notification - [1] >> 2) fallocate(VOLATILE) - [2] >> 3) fadvise(NOREUSE) - [3] >> >> For them, I would like to add new LRU list, aka "Ereclaimable" which >> is opposite of "unevictable". >> Reclaimable LRU list includes_easy_ reclaimable pages. >> For example, easy reclaimable pages are following as. >> >> 1. invalidated but remained LRU list. >> 2. pageout pages for reclaim(PG_reclaim pages) >> 3. fadvise(NOREUSE) >> 4. fallocate(VOLATILE) >> >> Their pages shouldn't stir normal LRU list and compaction might not >> migrate them, even. > What about other things moving memory like CMA ? Sorry for not being able to understand your point. Can you elaborate a bit more? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx126.postini.com [74.125.245.126]) by kanga.kvack.org (Postfix) with SMTP id 920F56B0254 for ; Mon, 25 Jun 2012 20:26:57 -0400 (EDT) Message-ID: <4FE901D1.9090400@kernel.org> Date: Tue, 26 Jun 2012 09:26:57 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <20120625102435.GD8271@suse.de> In-Reply-To: <20120625102435.GD8271@suse.de> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins On 06/25/2012 07:24 PM, Mel Gorman wrote: > On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote: >> Hi everybody! >> >> Recently, there are some efforts to handle system memory pressure. >> >> 1) low memory notification - [1] >> 2) fallocate(VOLATILE) - [2] >> 3) fadvise(NOREUSE) - [3] >> >> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". >> Reclaimable LRU list includes _easy_ reclaimable pages. >> For example, easy reclaimable pages are following as. >> >> 1. invalidated but remained LRU list. >> 2. pageout pages for reclaim(PG_reclaim pages) >> 3. fadvise(NOREUSE) >> 4. fallocate(VOLATILE) >> >> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. > > Why would compaction not migrate them? We might still want to migrate > NORESUSE or VOLATILE pages. It might. > >> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary >> swapout in anon pages in easy-reclaimable LRU list. >> It also can make admin measure how many we have available pages at the moment without latency. > > That's not true for PG_reclaim pages as those pages cannot be discarded > until writeback completes. > > One reason why I tried moving PG_reclaim pages to a separate list was > to avoid excessive scanning when writing back to slow devices. If those > pages were moved to an "easy-reclaimable" LRU list then the value would > be reduced as scanning would still occur. It might make it worse because > the whole Ereclaimable list would be scanned for pages that cannot be > reclaimed at all before moving to another LRU list. I should have written more clear. I mean following as end_page_writeback(struct page *) { if (PageReclaim(page)) move_ereclaim_lru_list(page); } So Ereclaimable LRU list can have a discardable pages. > > This separate list does not exist today because it required a page bit to > implement and I did not want it to be a 64-bit only feature. You will > probably hit the same problem. True. Others already pointed it out in this thread. And I post a idea. Copy/Paste " Maybe and it could be a serious problem on 32 bit machine. I didn't dive into that but I guess we can reuse PG_reclaim bit. PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have active LRU list. so we can change following as - #define PG_reclaim + #define PG_Ereclaim SetPageReclaim(page) { page->flags |= (PG_Ereclaim|PG_active); } TestPageReclaim(page) { if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) return true; return false; } SetPageEreclaim(page) { page->flags |= PG_Ereclaim; } " > > The setting of the page bit is also going to be a problem but you may be > able to lazily move pages to the EReclaimable list in the same way > unevictable pages are handled. First of all, I don't consider lazy moving like unevictable. We can move VOLATILE/NOREUSE pages into EReclaiabmle LRU list in backgroud by using workqueue. Please tell me the scenario if we consider lazy moving. > >> It's very important in recent mobile systems because page reclaim/writeback is very critical >> of application latency. Of course, it could affect normal desktop, too. >> With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, >> for example. If it's below threshold we defined, we could trigger 1st level notification >> if we really need prototying low memory notification. >> > > If PG_reclaim pages are on this list, then that calculation will not be > helpful. PG_reclaim pages would be not in Ereclaimable LRU list like I mentioned above. > >> We may change madvise(DONTNEED) implementation instead of zapping page immediately. >> If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. >> Of course, we can discard instead of swap out if system memory pressure happens. >> We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. >> >> As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. > > That alters ageing of pages significantly. It means that workloads that > are using read heavily will have their pages discarded first.\ > >> The rationale is that in non-rotation device, read/write cost is much asynchronous. > > While this is true that does not justify throwing away unmapped clean > page cache first every time. That's true. That is workload I have a concern. We need balancing unmmapped/mapped pages so sometime, some mapped pages would be moved into unevictable LRU list with unmapping all of pte. I believe It could mitigate the problem, but not perfect, I admit. Maybe we need some knob for admin to tune it. Anyway, it's a big concern for me and one of careful test for regression. > >> Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages >> if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too. >> >> Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late. >> I hope listen others opinion before get into the code. >> > > Care is needed. I think you'll only be able to use this list for > NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be > "easily-reclaimable" and if you add clean unmapped pages then there will > be regressions in workloads that are read-intensive. > Thanks for the feedback, Mel. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx127.postini.com [74.125.245.127]) by kanga.kvack.org (Postfix) with SMTP id 425A06B0163 for ; Tue, 26 Jun 2012 04:09:57 -0400 (EDT) Message-ID: <4FE96DAF.3050208@parallels.com> Date: Tue, 26 Jun 2012 12:07:11 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE82555.2010704@parallels.com> <4FE8FE70.6050107@kernel.org> In-Reply-To: <4FE8FE70.6050107@kernel.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins On 06/26/2012 04:12 AM, Minchan Kim wrote: > On 06/25/2012 05:46 PM, Glauber Costa wrote: > >> On 06/19/2012 09:49 AM, Minchan Kim wrote: >>> Hi everybody! >>> >>> Recently, there are some efforts to handle system memory pressure. >>> >>> 1) low memory notification - [1] >>> 2) fallocate(VOLATILE) - [2] >>> 3) fadvise(NOREUSE) - [3] >>> >>> For them, I would like to add new LRU list, aka "Ereclaimable" which >>> is opposite of "unevictable". >>> Reclaimable LRU list includes_easy_ reclaimable pages. >>> For example, easy reclaimable pages are following as. >>> >>> 1. invalidated but remained LRU list. >>> 2. pageout pages for reclaim(PG_reclaim pages) >>> 3. fadvise(NOREUSE) >>> 4. fallocate(VOLATILE) >>> >>> Their pages shouldn't stir normal LRU list and compaction might not >>> migrate them, even. >> What about other things moving memory like CMA ? > > > Sorry for not being able to understand your point. > Can you elaborate a bit more? > Well, maybe I didn't =) I was just wondering why exactly it is that troubles your scheme with compaction, and if such restriction would also apply to memory movement schemes like CMA. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx205.postini.com [74.125.245.205]) by kanga.kvack.org (Postfix) with SMTP id 4C2686B005A for ; Tue, 17 Jul 2012 11:55:25 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so1246293pbb.14 for ; Tue, 17 Jul 2012 08:55:24 -0700 (PDT) Date: Wed, 18 Jul 2012 00:03:48 +0800 From: Zheng Liu Subject: Re: RFC: Easy-Reclaimable LRU list Message-ID: <20120717160348.GA5441@gmail.com> References: <4FE012CD.6010605@kernel.org> <20120625102435.GD8271@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120625102435.GD8271@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Minchan Kim , "linux-mm@kvack.org" , LKML , Rik van Riel , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins On Mon, Jun 25, 2012 at 11:24:35AM +0100, Mel Gorman wrote: > On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote: > > Hi everybody! > > > > Recently, there are some efforts to handle system memory pressure. > > > > 1) low memory notification - [1] > > 2) fallocate(VOLATILE) - [2] > > 3) fadvise(NOREUSE) - [3] > > > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > > Reclaimable LRU list includes _easy_ reclaimable pages. > > For example, easy reclaimable pages are following as. > > > > 1. invalidated but remained LRU list. > > 2. pageout pages for reclaim(PG_reclaim pages) > > 3. fadvise(NOREUSE) > > 4. fallocate(VOLATILE) > > > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. > > Why would compaction not migrate them? We might still want to migrate > NORESUSE or VOLATILE pages. > > > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary > > swapout in anon pages in easy-reclaimable LRU list. > > It also can make admin measure how many we have available pages at the moment without latency. > > That's not true for PG_reclaim pages as those pages cannot be discarded > until writeback completes. > > One reason why I tried moving PG_reclaim pages to a separate list was > to avoid excessive scanning when writing back to slow devices. If those > pages were moved to an "easy-reclaimable" LRU list then the value would > be reduced as scanning would still occur. It might make it worse because > the whole Ereclaimable list would be scanned for pages that cannot be > reclaimed at all before moving to another LRU list. > > This separate list does not exist today because it required a page bit to > implement and I did not want it to be a 64-bit only feature. You will > probably hit the same problem. > > The setting of the page bit is also going to be a problem but you may be > able to lazily move pages to the EReclaimable list in the same way > unevictable pages are handled. > > > It's very important in recent mobile systems because page reclaim/writeback is very critical > > of application latency. Of course, it could affect normal desktop, too. > > With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, > > for example. If it's below threshold we defined, we could trigger 1st level notification > > if we really need prototying low memory notification. > > > > If PG_reclaim pages are on this list, then that calculation will not be > helpful. > > > We may change madvise(DONTNEED) implementation instead of zapping page immediately. > > If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. > > Of course, we can discard instead of swap out if system memory pressure happens. > > We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. > > > > As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. > > That alters ageing of pages significantly. It means that workloads that > are using read heavily will have their pages discarded first. Hi Mel, Sorry, I only notice this thread today. The key issue is that we need to balance between page cache and mapped file page. AFAIK, in latest kernel, the page cache gets a higher priority than mapped file page because it is easy to be activated and be promoted into active list. For example, when the application reads some data twice at a offset, mark_page_accessed will be called twice, and this page will be activated. However, when the application accesses a mapped file page twice, it is only in inactive list and access bit is marked. Until we try to free pages, this page will be given a chance to keep in inactive list. It is unfair for mapped file page. In old kernel, such as 2.6.18, mapped file page is treated as anonymous page, which has a higher priority. Meanwhile, for most developers, they think that there is no any differences between page cache and mapped file page. So IMHO we need to reduce the priority of page cache, or at least we need to measure access times of mapped file page correctly. As this thread is discussed [1], we met this problem in our product system. 1. http://www.spinics.net/lists/linux-mm/msg34642.html Regards, Zheng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753106Ab2FSFtE (ORCPT ); Tue, 19 Jun 2012 01:49:04 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:42116 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751314Ab2FSFtC (ORCPT ); Tue, 19 Jun 2012 01:49:02 -0400 X-AuditID: 9c930197-b7b87ae000000e4d-e7-4fe012cc1e1a Message-ID: <4FE012CD.6010605@kernel.org> Date: Tue, 19 Jun 2012 14:49:01 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 To: "linux-mm@kvack.org" , LKML CC: Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: RFC: Easy-Reclaimable LRU list Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi everybody! Recently, there are some efforts to handle system memory pressure. 1) low memory notification - [1] 2) fallocate(VOLATILE) - [2] 3) fadvise(NOREUSE) - [3] For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". Reclaimable LRU list includes _easy_ reclaimable pages. For example, easy reclaimable pages are following as. 1. invalidated but remained LRU list. 2. pageout pages for reclaim(PG_reclaim pages) 3. fadvise(NOREUSE) 4. fallocate(VOLATILE) Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary swapout in anon pages in easy-reclaimable LRU list. It also can make admin measure how many we have available pages at the moment without latency. It's very important in recent mobile systems because page reclaim/writeback is very critical of application latency. Of course, it could affect normal desktop, too. With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, for example. If it's below threshold we defined, we could trigger 1st level notification if we really need prototying low memory notification. We may change madvise(DONTNEED) implementation instead of zapping page immediately. If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. Of course, we can discard instead of swap out if system memory pressure happens. We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. The rationale is that in non-rotation device, read/write cost is much asynchronous. Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too. Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late. I hope listen others opinion before get into the code. Any comment are welcome. Thanks. [1] http://lkml.org/lkml/2012/5/1/97 [2] https://lkml.org/lkml/2012/6/1/322 [3] https://lkml.org/lkml/2011/6/24/136 -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760148Ab2FUT1d (ORCPT ); Thu, 21 Jun 2012 15:27:33 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:52566 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760044Ab2FUT1c (ORCPT ); Thu, 21 Jun 2012 15:27:32 -0400 Message-ID: <4FE37434.808@linaro.org> Date: Thu, 21 Jun 2012 12:21:24 -0700 From: John Stultz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 To: Minchan Kim CC: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> In-Reply-To: <4FE012CD.6010605@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12062119-5518-0000-0000-00000565D551 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2012 10:49 PM, Minchan Kim wrote: > Hi everybody! > > Recently, there are some efforts to handle system memory pressure. > > 1) low memory notification - [1] > 2) fallocate(VOLATILE) - [2] > 3) fadvise(NOREUSE) - [3] > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > Reclaimable LRU list includes _easy_ reclaimable pages. > For example, easy reclaimable pages are following as. > > 1. invalidated but remained LRU list. > 2. pageout pages for reclaim(PG_reclaim pages) > 3. fadvise(NOREUSE) > 4. fallocate(VOLATILE) > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary > swapout in anon pages in easy-reclaimable LRU list. I was hoping there would be further comment on this by more core VM devs, but so far things have been quiet (is everyone on vacation?). Overall this seems reasonable for the volatile ranges functionality. The one down-side being that dealing with the ranges on a per-page basis can make marking and unmarking larger ranges as volatile fairly expensive. In my tests with my last patchset, it was over 75x slower (~1.5ms) marking and umarking a 1meg range when we deactivate and activate all of the pages, instead of just inserting the volatile range into an interval tree and purge via the shrinker (~20us). Granted, my initial approach is somewhat naive, and some pagevec batching has improved things three-fold (down to ~500us) , but I'm still ~25x slower when iterating over all the pages. There's surely further improvements to be made, but this added cost worries me, as users are unlikely to generously volunteer up memory to the kernel as volatile if doing so frequently adds significant overhead. This makes me wonder if having something like an early-shrinker which gets called prior to shrinking the lrus might be a better approach for volatile ranges. It would still be numa-unaware, but would keep the overhead very light to both volatile users and non users. Even so, I'd be interested in seeing more about your approach, in the hopes that it might not be as costly as my initial attempt. Do you have any plans to start prototyping this? thanks -john From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761345Ab2FVG5K (ORCPT ); Fri, 22 Jun 2012 02:57:10 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:61574 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760678Ab2FVG5J (ORCPT ); Fri, 22 Jun 2012 02:57:09 -0400 X-AuditID: 9c930197-b7b87ae000000e4d-cc-4fe41740ebdc Message-ID: <4FE41752.8050305@kernel.org> Date: Fri, 22 Jun 2012 15:57:22 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel,gmane.linux.kernel.mm To: John Stultz CC: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> In-Reply-To: <4FE37434.808@linaro.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi John, On 06/22/2012 04:21 AM, John Stultz wrote: > On 06/18/2012 10:49 PM, Minchan Kim wrote: >> Hi everybody! >> >> Recently, there are some efforts to handle system memory pressure. >> >> 1) low memory notification - [1] >> 2) fallocate(VOLATILE) - [2] >> 3) fadvise(NOREUSE) - [3] >> >> For them, I would like to add new LRU list, aka "Ereclaimable" which >> is opposite of "unevictable". >> Reclaimable LRU list includes _easy_ reclaimable pages. >> For example, easy reclaimable pages are following as. >> >> 1. invalidated but remained LRU list. >> 2. pageout pages for reclaim(PG_reclaim pages) >> 3. fadvise(NOREUSE) >> 4. fallocate(VOLATILE) >> >> Their pages shouldn't stir normal LRU list and compaction might not >> migrate them, even. >> Reclaimer can reclaim Ereclaimable pages before normal lru list and >> will avoid unnecessary >> swapout in anon pages in easy-reclaimable LRU list. > > I was hoping there would be further comment on this by more core VM > devs, but so far things have been quiet (is everyone on vacation?). At least, there are no dissent comment until now. Let be a positive. :) > > Overall this seems reasonable for the volatile ranges functionality. > The one down-side being that dealing with the ranges on a per-page basis > can make marking and unmarking larger ranges as volatile fairly > expensive. In my tests with my last patchset, it was over 75x slower > (~1.5ms) marking and umarking a 1meg range when we deactivate and > activate all of the pages, instead of just inserting the volatile range > into an interval tree and purge via the shrinker (~20us). Granted, my > initial approach is somewhat naive, and some pagevec batching has > improved things three-fold (down to ~500us) , but I'm still ~25x slower > when iterating over all the pages. > > There's surely further improvements to be made, but this added cost > worries me, as users are unlikely to generously volunteer up memory to > the kernel as volatile if doing so frequently adds significant overhead. > > This makes me wonder if having something like an early-shrinker which > gets called prior to shrinking the lrus might be a better approach for > volatile ranges. It would still be numa-unaware, but would keep the > overhead very light to both volatile users and non users. How about doing it in background? In your process context, you can schedule your work to workqueue and when work is executed, you can move the pages into lru list you want. Just an idea. > > Even so, I'd be interested in seeing more about your approach, in the > hopes that it might not be as costly as my initial attempt. Do you have > any plans to start prototyping this? I will wait response a few day and if anyone doesn't raise critical problems, will start. But please keep in mind.I guess it's never trivial so you shouldn't depend on my schedule. Thanks. > > thanks > -john > -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757226Ab2FWErr (ORCPT ); Sat, 23 Jun 2012 00:47:47 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:60568 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755455Ab2FWErq (ORCPT ); Sat, 23 Jun 2012 00:47:46 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FE549E8.2050905@jp.fujitsu.com> Date: Sat, 23 Jun 2012 13:45:28 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 To: Minchan Kim CC: John Stultz , "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> In-Reply-To: <4FE41752.8050305@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/22 15:57), Minchan Kim wrote: > Hi John, > > On 06/22/2012 04:21 AM, John Stultz wrote: > >> On 06/18/2012 10:49 PM, Minchan Kim wrote: >>> Hi everybody! >>> >>> Recently, there are some efforts to handle system memory pressure. >>> >>> 1) low memory notification - [1] >>> 2) fallocate(VOLATILE) - [2] >>> 3) fadvise(NOREUSE) - [3] >>> >>> For them, I would like to add new LRU list, aka "Ereclaimable" which >>> is opposite of "unevictable". >>> Reclaimable LRU list includes _easy_ reclaimable pages. >>> For example, easy reclaimable pages are following as. >>> >>> 1. invalidated but remained LRU list. >>> 2. pageout pages for reclaim(PG_reclaim pages) >>> 3. fadvise(NOREUSE) >>> 4. fallocate(VOLATILE) >>> >>> Their pages shouldn't stir normal LRU list and compaction might not >>> migrate them, even. >>> Reclaimer can reclaim Ereclaimable pages before normal lru list and >>> will avoid unnecessary >>> swapout in anon pages in easy-reclaimable LRU list. >> >> I was hoping there would be further comment on this by more core VM >> devs, but so far things have been quiet (is everyone on vacation?). > > > At least, there are no dissent comment until now. > Let be a positive. :) I think this is interesting approach. Major concern is how to guarantee EReclaimable pages are really EReclaimable...Do you have any idea ? madviced pages are really EReclaimable ? A (very) small concern is will you use one more page-flags for this ? ;) Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755294Ab2FWPxi (ORCPT ); Sat, 23 Jun 2012 11:53:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53544 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754157Ab2FWPxh (ORCPT ); Sat, 23 Jun 2012 11:53:37 -0400 Message-ID: <4FE5E66C.6080309@redhat.com> Date: Sat, 23 Jun 2012 11:53:16 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 To: Kamezawa Hiroyuki CC: Minchan Kim , John Stultz , "linux-mm@kvack.org" , LKML , Mel Gorman , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> <4FE549E8.2050905@jp.fujitsu.com> In-Reply-To: <4FE549E8.2050905@jp.fujitsu.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote: > I think this is interesting approach. Major concern is how to guarantee > EReclaimable > pages are really EReclaimable...Do you have any idea ? madviced pages > are really EReclaimable ? I suspect the EReclaimable pages can only be clean page cache pages that are not mapped by any processes. Once somebody tries to use the page, mark_page_accessed will move it to another list. > A (very) small concern is will you use one more page-flags for this ? ;) This could be an issue on a 32 bit system, true. -- All rights reversed From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755710Ab2FXLJx (ORCPT ); Sun, 24 Jun 2012 07:09:53 -0400 Received: from mail-gg0-f174.google.com ([209.85.161.174]:36597 "EHLO mail-gg0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753819Ab2FXLJv (ORCPT ); Sun, 24 Jun 2012 07:09:51 -0400 MIME-Version: 1.0 In-Reply-To: <4FE5E66C.6080309@redhat.com> References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> <4FE549E8.2050905@jp.fujitsu.com> <4FE5E66C.6080309@redhat.com> From: KOSAKI Motohiro Date: Sun, 24 Jun 2012 07:09:30 -0400 Message-ID: Subject: Re: RFC: Easy-Reclaimable LRU list To: Rik van Riel Cc: Kamezawa Hiroyuki , Minchan Kim , John Stultz , "linux-mm@kvack.org" , LKML , Mel Gorman , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 23, 2012 at 11:53 AM, Rik van Riel wrote: > On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote: > >> I think this is interesting approach. Major concern is how to guarantee >> EReclaimable >> pages are really EReclaimable...Do you have any idea ? madviced pages >> are really EReclaimable ? > > I suspect the EReclaimable pages can only be clean page > cache pages that are not mapped by any processes. > > Once somebody tries to use the page, mark_page_accessed > will move it to another list. 100% agree. >> A (very) small concern is will you use one more page-flags for this ? ;) > > This could be an issue on a 32 bit system, true. Do we really need SwapBacked bit? Actually swap-backed is per-superblock attribute and don't change dynamically (i.e. no race happen). thus this bit might be able to move into page->mapping or page->mapping->host. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752215Ab2FYAOo (ORCPT ); Sun, 24 Jun 2012 20:14:44 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:63988 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751005Ab2FYAOn (ORCPT ); Sun, 24 Jun 2012 20:14:43 -0400 X-AuditID: 9c93016f-b7cbdae0000024ac-17-4fe7ad6ffbfb Message-ID: <4FE7AD8A.2080508@kernel.org> Date: Mon, 25 Jun 2012 09:15:06 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Kamezawa Hiroyuki CC: John Stultz , "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE37434.808@linaro.org> <4FE41752.8050305@kernel.org> <4FE549E8.2050905@jp.fujitsu.com> In-Reply-To: <4FE549E8.2050905@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Kame, On 06/23/2012 01:45 PM, Kamezawa Hiroyuki wrote: > (2012/06/22 15:57), Minchan Kim wrote: >> Hi John, >> >> On 06/22/2012 04:21 AM, John Stultz wrote: >> >>> On 06/18/2012 10:49 PM, Minchan Kim wrote: >>>> Hi everybody! >>>> >>>> Recently, there are some efforts to handle system memory pressure. >>>> >>>> 1) low memory notification - [1] >>>> 2) fallocate(VOLATILE) - [2] >>>> 3) fadvise(NOREUSE) - [3] >>>> >>>> For them, I would like to add new LRU list, aka "Ereclaimable" which >>>> is opposite of "unevictable". >>>> Reclaimable LRU list includes _easy_ reclaimable pages. >>>> For example, easy reclaimable pages are following as. >>>> >>>> 1. invalidated but remained LRU list. >>>> 2. pageout pages for reclaim(PG_reclaim pages) >>>> 3. fadvise(NOREUSE) >>>> 4. fallocate(VOLATILE) >>>> >>>> Their pages shouldn't stir normal LRU list and compaction might not >>>> migrate them, even. >>>> Reclaimer can reclaim Ereclaimable pages before normal lru list and >>>> will avoid unnecessary >>>> swapout in anon pages in easy-reclaimable LRU list. >>> >>> I was hoping there would be further comment on this by more core VM >>> devs, but so far things have been quiet (is everyone on vacation?). >> >> >> At least, there are no dissent comment until now. >> Let be a positive. :) > > I think this is interesting approach. Major concern is how to guarantee > EReclaimable > pages are really EReclaimable...Do you have any idea ? madviced pages > are really > EReclaimable ? I would like to select just discardable pages. 1. unmapped file page 2. PG_reclaimed page - (that pages would have no mapped and a candidate for reclaim ASAP) 3. fallocate(VOLATILE) - (We can just discard them without swapout) 4. madvise(MADV_DONTNEED)/fadvise(NOREUSE) - (It could be difficult than (1,2,3) but it's very likely to reclaim easily than others. > > A (very) small concern is will you use one more page-flags for this ? ;) Maybe and it could be a serious problem on 32 bit machine. I didn't dive into that but I guess we can reuse PG_reclaim bit. PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have active LRU list. so we can change following as - #define PG_reclaim + #define PG_Ereclaim SetPageReclaim(page) { page->flags |= (PG_Ereclaim|PG_active); } TestPageReclaim(page) { if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) return true; return false; } SetPageEreclaim(page) { page->flags |= PG_Ereclaim; } Thanks for the comment, Kame. -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754678Ab2FYIs5 (ORCPT ); Mon, 25 Jun 2012 04:48:57 -0400 Received: from mx2.parallels.com ([64.131.90.16]:60571 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753178Ab2FYIs4 (ORCPT ); Mon, 25 Jun 2012 04:48:56 -0400 Message-ID: <4FE82555.2010704@parallels.com> Date: Mon, 25 Jun 2012 12:46:13 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0 MIME-Version: 1.0 To: Minchan Kim CC: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> In-Reply-To: <4FE012CD.6010605@kernel.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/19/2012 09:49 AM, Minchan Kim wrote: > Hi everybody! > > Recently, there are some efforts to handle system memory pressure. > > 1) low memory notification - [1] > 2) fallocate(VOLATILE) - [2] > 3) fadvise(NOREUSE) - [3] > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > Reclaimable LRU list includes_easy_ reclaimable pages. > For example, easy reclaimable pages are following as. > > 1. invalidated but remained LRU list. > 2. pageout pages for reclaim(PG_reclaim pages) > 3. fadvise(NOREUSE) > 4. fallocate(VOLATILE) > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. What about other things moving memory like CMA ? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755558Ab2FYKYm (ORCPT ); Mon, 25 Jun 2012 06:24:42 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51665 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753368Ab2FYKYl (ORCPT ); Mon, 25 Jun 2012 06:24:41 -0400 Date: Mon, 25 Jun 2012 11:24:35 +0100 From: Mel Gorman To: Minchan Kim Cc: "linux-mm@kvack.org" , LKML , Rik van Riel , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list Message-ID: <20120625102435.GD8271@suse.de> References: <4FE012CD.6010605@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <4FE012CD.6010605@kernel.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote: > Hi everybody! > > Recently, there are some efforts to handle system memory pressure. > > 1) low memory notification - [1] > 2) fallocate(VOLATILE) - [2] > 3) fadvise(NOREUSE) - [3] > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > Reclaimable LRU list includes _easy_ reclaimable pages. > For example, easy reclaimable pages are following as. > > 1. invalidated but remained LRU list. > 2. pageout pages for reclaim(PG_reclaim pages) > 3. fadvise(NOREUSE) > 4. fallocate(VOLATILE) > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. Why would compaction not migrate them? We might still want to migrate NORESUSE or VOLATILE pages. > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary > swapout in anon pages in easy-reclaimable LRU list. > It also can make admin measure how many we have available pages at the moment without latency. That's not true for PG_reclaim pages as those pages cannot be discarded until writeback completes. One reason why I tried moving PG_reclaim pages to a separate list was to avoid excessive scanning when writing back to slow devices. If those pages were moved to an "easy-reclaimable" LRU list then the value would be reduced as scanning would still occur. It might make it worse because the whole Ereclaimable list would be scanned for pages that cannot be reclaimed at all before moving to another LRU list. This separate list does not exist today because it required a page bit to implement and I did not want it to be a 64-bit only feature. You will probably hit the same problem. The setting of the page bit is also going to be a problem but you may be able to lazily move pages to the EReclaimable list in the same way unevictable pages are handled. > It's very important in recent mobile systems because page reclaim/writeback is very critical > of application latency. Of course, it could affect normal desktop, too. > With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, > for example. If it's below threshold we defined, we could trigger 1st level notification > if we really need prototying low memory notification. > If PG_reclaim pages are on this list, then that calculation will not be helpful. > We may change madvise(DONTNEED) implementation instead of zapping page immediately. > If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. > Of course, we can discard instead of swap out if system memory pressure happens. > We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. > > As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. That alters ageing of pages significantly. It means that workloads that are using read heavily will have their pages discarded first. > The rationale is that in non-rotation device, read/write cost is much asynchronous. While this is true that does not justify throwing away unmapped clean page cache first every time. > Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages > if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too. > > Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late. > I hope listen others opinion before get into the code. > Care is needed. I think you'll only be able to use this list for NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be "easily-reclaimable" and if you add clean unmapped pages then there will be regressions in workloads that are read-intensive. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758060Ab2FZAMc (ORCPT ); Mon, 25 Jun 2012 20:12:32 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:48570 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757958Ab2FZAMa (ORCPT ); Mon, 25 Jun 2012 20:12:30 -0400 X-AuditID: 9c930197-b7c94ae0000037ff-e9-4fe8fe6d34d3 Message-ID: <4FE8FE70.6050107@kernel.org> Date: Tue, 26 Jun 2012 09:12:32 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Glauber Costa CC: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE82555.2010704@parallels.com> In-Reply-To: <4FE82555.2010704@parallels.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/25/2012 05:46 PM, Glauber Costa wrote: > On 06/19/2012 09:49 AM, Minchan Kim wrote: >> Hi everybody! >> >> Recently, there are some efforts to handle system memory pressure. >> >> 1) low memory notification - [1] >> 2) fallocate(VOLATILE) - [2] >> 3) fadvise(NOREUSE) - [3] >> >> For them, I would like to add new LRU list, aka "Ereclaimable" which >> is opposite of "unevictable". >> Reclaimable LRU list includes_easy_ reclaimable pages. >> For example, easy reclaimable pages are following as. >> >> 1. invalidated but remained LRU list. >> 2. pageout pages for reclaim(PG_reclaim pages) >> 3. fadvise(NOREUSE) >> 4. fallocate(VOLATILE) >> >> Their pages shouldn't stir normal LRU list and compaction might not >> migrate them, even. > What about other things moving memory like CMA ? Sorry for not being able to understand your point. Can you elaborate a bit more? -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758126Ab2FZA07 (ORCPT ); Mon, 25 Jun 2012 20:26:59 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:63966 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756821Ab2FZA06 (ORCPT ); Mon, 25 Jun 2012 20:26:58 -0400 X-AuditID: 9c93016f-b7cbdae0000024ac-b7-4fe901cff931 Message-ID: <4FE901D1.9090400@kernel.org> Date: Tue, 26 Jun 2012 09:26:57 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Mel Gorman CC: "linux-mm@kvack.org" , LKML , Rik van Riel , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <20120625102435.GD8271@suse.de> In-Reply-To: <20120625102435.GD8271@suse.de> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/25/2012 07:24 PM, Mel Gorman wrote: > On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote: >> Hi everybody! >> >> Recently, there are some efforts to handle system memory pressure. >> >> 1) low memory notification - [1] >> 2) fallocate(VOLATILE) - [2] >> 3) fadvise(NOREUSE) - [3] >> >> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". >> Reclaimable LRU list includes _easy_ reclaimable pages. >> For example, easy reclaimable pages are following as. >> >> 1. invalidated but remained LRU list. >> 2. pageout pages for reclaim(PG_reclaim pages) >> 3. fadvise(NOREUSE) >> 4. fallocate(VOLATILE) >> >> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. > > Why would compaction not migrate them? We might still want to migrate > NORESUSE or VOLATILE pages. It might. > >> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary >> swapout in anon pages in easy-reclaimable LRU list. >> It also can make admin measure how many we have available pages at the moment without latency. > > That's not true for PG_reclaim pages as those pages cannot be discarded > until writeback completes. > > One reason why I tried moving PG_reclaim pages to a separate list was > to avoid excessive scanning when writing back to slow devices. If those > pages were moved to an "easy-reclaimable" LRU list then the value would > be reduced as scanning would still occur. It might make it worse because > the whole Ereclaimable list would be scanned for pages that cannot be > reclaimed at all before moving to another LRU list. I should have written more clear. I mean following as end_page_writeback(struct page *) { if (PageReclaim(page)) move_ereclaim_lru_list(page); } So Ereclaimable LRU list can have a discardable pages. > > This separate list does not exist today because it required a page bit to > implement and I did not want it to be a 64-bit only feature. You will > probably hit the same problem. True. Others already pointed it out in this thread. And I post a idea. Copy/Paste " Maybe and it could be a serious problem on 32 bit machine. I didn't dive into that but I guess we can reuse PG_reclaim bit. PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have active LRU list. so we can change following as - #define PG_reclaim + #define PG_Ereclaim SetPageReclaim(page) { page->flags |= (PG_Ereclaim|PG_active); } TestPageReclaim(page) { if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) return true; return false; } SetPageEreclaim(page) { page->flags |= PG_Ereclaim; } " > > The setting of the page bit is also going to be a problem but you may be > able to lazily move pages to the EReclaimable list in the same way > unevictable pages are handled. First of all, I don't consider lazy moving like unevictable. We can move VOLATILE/NOREUSE pages into EReclaiabmle LRU list in backgroud by using workqueue. Please tell me the scenario if we consider lazy moving. > >> It's very important in recent mobile systems because page reclaim/writeback is very critical >> of application latency. Of course, it could affect normal desktop, too. >> With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, >> for example. If it's below threshold we defined, we could trigger 1st level notification >> if we really need prototying low memory notification. >> > > If PG_reclaim pages are on this list, then that calculation will not be > helpful. PG_reclaim pages would be not in Ereclaimable LRU list like I mentioned above. > >> We may change madvise(DONTNEED) implementation instead of zapping page immediately. >> If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. >> Of course, we can discard instead of swap out if system memory pressure happens. >> We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. >> >> As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. > > That alters ageing of pages significantly. It means that workloads that > are using read heavily will have their pages discarded first.\ > >> The rationale is that in non-rotation device, read/write cost is much asynchronous. > > While this is true that does not justify throwing away unmapped clean > page cache first every time. That's true. That is workload I have a concern. We need balancing unmmapped/mapped pages so sometime, some mapped pages would be moved into unevictable LRU list with unmapping all of pte. I believe It could mitigate the problem, but not perfect, I admit. Maybe we need some knob for admin to tune it. Anyway, it's a big concern for me and one of careful test for regression. > >> Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages >> if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too. >> >> Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late. >> I hope listen others opinion before get into the code. >> > > Care is needed. I think you'll only be able to use this list for > NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be > "easily-reclaimable" and if you add clean unmapped pages then there will > be regressions in workloads that are read-intensive. > Thanks for the feedback, Mel. -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758539Ab2FZIKA (ORCPT ); Tue, 26 Jun 2012 04:10:00 -0400 Received: from mx2.parallels.com ([64.131.90.16]:40538 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758513Ab2FZIJz (ORCPT ); Tue, 26 Jun 2012 04:09:55 -0400 Message-ID: <4FE96DAF.3050208@parallels.com> Date: Tue, 26 Jun 2012 12:07:11 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0 MIME-Version: 1.0 To: Minchan Kim CC: "linux-mm@kvack.org" , LKML , Rik van Riel , Mel Gorman , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list References: <4FE012CD.6010605@kernel.org> <4FE82555.2010704@parallels.com> <4FE8FE70.6050107@kernel.org> In-Reply-To: <4FE8FE70.6050107@kernel.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [109.173.9.3] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/26/2012 04:12 AM, Minchan Kim wrote: > On 06/25/2012 05:46 PM, Glauber Costa wrote: > >> On 06/19/2012 09:49 AM, Minchan Kim wrote: >>> Hi everybody! >>> >>> Recently, there are some efforts to handle system memory pressure. >>> >>> 1) low memory notification - [1] >>> 2) fallocate(VOLATILE) - [2] >>> 3) fadvise(NOREUSE) - [3] >>> >>> For them, I would like to add new LRU list, aka "Ereclaimable" which >>> is opposite of "unevictable". >>> Reclaimable LRU list includes_easy_ reclaimable pages. >>> For example, easy reclaimable pages are following as. >>> >>> 1. invalidated but remained LRU list. >>> 2. pageout pages for reclaim(PG_reclaim pages) >>> 3. fadvise(NOREUSE) >>> 4. fallocate(VOLATILE) >>> >>> Their pages shouldn't stir normal LRU list and compaction might not >>> migrate them, even. >> What about other things moving memory like CMA ? > > > Sorry for not being able to understand your point. > Can you elaborate a bit more? > Well, maybe I didn't =) I was just wondering why exactly it is that troubles your scheme with compaction, and if such restriction would also apply to memory movement schemes like CMA. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755678Ab2GQPz1 (ORCPT ); Tue, 17 Jul 2012 11:55:27 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:44723 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751791Ab2GQPzZ (ORCPT ); Tue, 17 Jul 2012 11:55:25 -0400 Date: Wed, 18 Jul 2012 00:03:48 +0800 From: Zheng Liu To: Mel Gorman Cc: Minchan Kim , "linux-mm@kvack.org" , LKML , Rik van Riel , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins Subject: Re: RFC: Easy-Reclaimable LRU list Message-ID: <20120717160348.GA5441@gmail.com> Mail-Followup-To: Mel Gorman , Minchan Kim , "linux-mm@kvack.org" , LKML , Rik van Riel , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Johannes Weiner , Andrea Arcangeli , Andrew Morton , Anton Vorontsov , John Stultz , Pekka Enberg , Wu Fengguang , Hugh Dickins References: <4FE012CD.6010605@kernel.org> <20120625102435.GD8271@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120625102435.GD8271@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 25, 2012 at 11:24:35AM +0100, Mel Gorman wrote: > On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote: > > Hi everybody! > > > > Recently, there are some efforts to handle system memory pressure. > > > > 1) low memory notification - [1] > > 2) fallocate(VOLATILE) - [2] > > 3) fadvise(NOREUSE) - [3] > > > > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable". > > Reclaimable LRU list includes _easy_ reclaimable pages. > > For example, easy reclaimable pages are following as. > > > > 1. invalidated but remained LRU list. > > 2. pageout pages for reclaim(PG_reclaim pages) > > 3. fadvise(NOREUSE) > > 4. fallocate(VOLATILE) > > > > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even. > > Why would compaction not migrate them? We might still want to migrate > NORESUSE or VOLATILE pages. > > > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary > > swapout in anon pages in easy-reclaimable LRU list. > > It also can make admin measure how many we have available pages at the moment without latency. > > That's not true for PG_reclaim pages as those pages cannot be discarded > until writeback completes. > > One reason why I tried moving PG_reclaim pages to a separate list was > to avoid excessive scanning when writing back to slow devices. If those > pages were moved to an "easy-reclaimable" LRU list then the value would > be reduced as scanning would still occur. It might make it worse because > the whole Ereclaimable list would be scanned for pages that cannot be > reclaimed at all before moving to another LRU list. > > This separate list does not exist today because it required a page bit to > implement and I did not want it to be a 64-bit only feature. You will > probably hit the same problem. > > The setting of the page bit is also going to be a problem but you may be > able to lazily move pages to the EReclaimable list in the same way > unevictable pages are handled. > > > It's very important in recent mobile systems because page reclaim/writeback is very critical > > of application latency. Of course, it could affect normal desktop, too. > > With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES, > > for example. If it's below threshold we defined, we could trigger 1st level notification > > if we really need prototying low memory notification. > > > > If PG_reclaim pages are on this list, then that calculation will not be > helpful. > > > We may change madvise(DONTNEED) implementation instead of zapping page immediately. > > If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault. > > Of course, we can discard instead of swap out if system memory pressure happens. > > We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread. > > > > As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly. > > That alters ageing of pages significantly. It means that workloads that > are using read heavily will have their pages discarded first. Hi Mel, Sorry, I only notice this thread today. The key issue is that we need to balance between page cache and mapped file page. AFAIK, in latest kernel, the page cache gets a higher priority than mapped file page because it is easy to be activated and be promoted into active list. For example, when the application reads some data twice at a offset, mark_page_accessed will be called twice, and this page will be activated. However, when the application accesses a mapped file page twice, it is only in inactive list and access bit is marked. Until we try to free pages, this page will be given a chance to keep in inactive list. It is unfair for mapped file page. In old kernel, such as 2.6.18, mapped file page is treated as anonymous page, which has a higher priority. Meanwhile, for most developers, they think that there is no any differences between page cache and mapped file page. So IMHO we need to reduce the priority of page cache, or at least we need to measure access times of mapped file page correctly. As this thread is discussed [1], we met this problem in our product system. 1. http://www.spinics.net/lists/linux-mm/msg34642.html Regards, Zheng