From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yh0-f52.google.com (mail-yh0-f52.google.com [209.85.213.52]) by kanga.kvack.org (Postfix) with ESMTP id 2E5166B0035 for ; Mon, 6 Jan 2014 21:31:31 -0500 (EST) Received: by mail-yh0-f52.google.com with SMTP id i7so3782079yha.11 for ; Mon, 06 Jan 2014 18:31:30 -0800 (PST) Received: from mail-ig0-x229.google.com (mail-ig0-x229.google.com [2607:f8b0:4001:c05::229]) by mx.google.com with ESMTPS id m9si18215644yha.198.2014.01.06.18.31.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 06 Jan 2014 18:31:30 -0800 (PST) Received: by mail-ig0-f169.google.com with SMTP id hk11so8989136igb.0 for ; Mon, 06 Jan 2014 18:31:29 -0800 (PST) MIME-Version: 1.0 Date: Mon, 6 Jan 2014 18:31:29 -0800 Message-ID: Subject: swap, compress, discard: what's in the future? From: Luigi Semenzato Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org I would like to know (and I apologize if there is an obvious answer) if folks on this list have pointers to documents or discussions regarding the long-term evolution of the Linux memory manager. I realize there is plenty of shorter-term stuff to worry about, but a long-term vision would be helpful---even more so if there is some agreement. My super-simple view is that when memory reclaim is possible there is a cost attached to it, and the goal is to minimize the cost. The cost for reclaiming a unit of memory of some kind is a function of various parameters: the CPU cycles, the I/O bandwidth, and the latency, to name the main components. This function can change a lot depending on the load and in practice it may have to be grossly approximated, but the concept is valid IMO. For instance, the cost of compressing and decompressing RAM is mainly CPU cycles. A user program (a browser, for instance :) may be caching decompressed JPEGs into transcendent (discardable) memory, for quick display. In this case, almost certainly the decompressed JPEGs should be discarded before memory is compressed, under the realistic assumption that one JPEG decompression is cheaper than one LZO compression/decompression. But there may be situations in which a lot more work has gone into creating the application cache, and then it makes sense to compress/decompress it rather than discard it. It may be hard for the kernel to figure out how expensive it is to recreate the application cache, so the application should tell it. Of course, for a cache the cost needs to be multiplied by the probability that the memory will be used again in the future. A good part of the Linux VM is dedicated to estimating that probability, for some kinds of memory. But I don't see simple hooks for describing various costs such as the one I mentioned, and I wonder if this paradigm makes sense in general, or if it is peculiar to Chrome OS. Thanks! ... and Happy New Year -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f44.google.com (mail-pa0-f44.google.com [209.85.220.44]) by kanga.kvack.org (Postfix) with ESMTP id E4C986B0036 for ; Mon, 6 Jan 2014 22:01:35 -0500 (EST) Received: by mail-pa0-f44.google.com with SMTP id fa1so19517964pad.3 for ; Mon, 06 Jan 2014 19:01:35 -0800 (PST) Received: from LGEAMRELO02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id sz7si55309834pab.29.2014.01.06.19.01.32 for ; Mon, 06 Jan 2014 19:01:34 -0800 (PST) Date: Tue, 7 Jan 2014 12:01:48 +0900 From: Minchan Kim Subject: Re: swap, compress, discard: what's in the future? Message-ID: <20140107030148.GA24188@bbox> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Luigi Semenzato Cc: linux-mm@kvack.org Hello Luigi, On Mon, Jan 06, 2014 at 06:31:29PM -0800, Luigi Semenzato wrote: > I would like to know (and I apologize if there is an obvious answer) > if folks on this list have pointers to documents or discussions > regarding the long-term evolution of the Linux memory manager. I > realize there is plenty of shorter-term stuff to worry about, but a > long-term vision would be helpful---even more so if there is some > agreement. > > My super-simple view is that when memory reclaim is possible there is > a cost attached to it, and the goal is to minimize the cost. The cost > for reclaiming a unit of memory of some kind is a function of various > parameters: the CPU cycles, the I/O bandwidth, and the latency, to > name the main components. This function can change a lot depending on > the load and in practice it may have to be grossly approximated, but > the concept is valid IMO. > > For instance, the cost of compressing and decompressing RAM is mainly > CPU cycles. A user program (a browser, for instance :) may be caching > decompressed JPEGs into transcendent (discardable) memory, for quick > display. In this case, almost certainly the decompressed JPEGs should > be discarded before memory is compressed, under the realistic > assumption that one JPEG decompression is cheaper than one LZO > compression/decompression. But there may be situations in which a lot > more work has gone into creating the application cache, and then it > makes sense to compress/decompress it rather than discard it. It may > be hard for the kernel to figure out how expensive it is to recreate > the application cache, so the application should tell it. Agreed. It's very hard for kernel to figure it out so VM should depend on user's hint. and thing you said is the exact example of volatile range system call that I am suggesting. http://lwn.net/Articles/578761/ > > Of course, for a cache the cost needs to be multiplied by the > probability that the memory will be used again in the future. A good > part of the Linux VM is dedicated to estimating that probability, for > some kinds of memory. But I don't see simple hooks for describing > various costs such as the one I mentioned, and I wonder if this > paradigm makes sense in general, or if it is peculiar to Chrome OS. Your statement makes sense to me but unfortunately, current VM doesn't consider everything you mentioned. It is just based on page access recency by approximate LRU logic + some heuristic(ex, mapped page and VM_EXEC pages are more precious). The reason it makes hard is just complexity/overhead of implementation. If someone has nice idea to define parameters and implement with small overhead, it would be very nice! > > Thanks! > ... and Happy New Year > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qc0-f178.google.com (mail-qc0-f178.google.com [209.85.216.178]) by kanga.kvack.org (Postfix) with ESMTP id 09BF76B0031 for ; Tue, 7 Jan 2014 01:33:13 -0500 (EST) Received: by mail-qc0-f178.google.com with SMTP id i17so18366831qcy.37 for ; Mon, 06 Jan 2014 22:33:12 -0800 (PST) Received: from mail-ve0-x22b.google.com (mail-ve0-x22b.google.com [2607:f8b0:400c:c01::22b]) by mx.google.com with ESMTPS id t1si9052502qch.28.2014.01.06.22.33.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 06 Jan 2014 22:33:12 -0800 (PST) Received: by mail-ve0-f171.google.com with SMTP id pa12so9737793veb.2 for ; Mon, 06 Jan 2014 22:33:11 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20140107030148.GA24188@bbox> References: <20140107030148.GA24188@bbox> Date: Tue, 7 Jan 2014 14:33:11 +0800 Message-ID: Subject: Re: swap, compress, discard: what's in the future? From: Bob Liu Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Luigi Semenzato , Linux-MM , Rik van Riel On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim wrote: > Hello Luigi, > > On Mon, Jan 06, 2014 at 06:31:29PM -0800, Luigi Semenzato wrote: >> I would like to know (and I apologize if there is an obvious answer) >> if folks on this list have pointers to documents or discussions >> regarding the long-term evolution of the Linux memory manager. I >> realize there is plenty of shorter-term stuff to worry about, but a >> long-term vision would be helpful---even more so if there is some >> agreement. >> >> My super-simple view is that when memory reclaim is possible there is >> a cost attached to it, and the goal is to minimize the cost. The cost >> for reclaiming a unit of memory of some kind is a function of various >> parameters: the CPU cycles, the I/O bandwidth, and the latency, to >> name the main components. This function can change a lot depending on >> the load and in practice it may have to be grossly approximated, but >> the concept is valid IMO. >> >> For instance, the cost of compressing and decompressing RAM is mainly >> CPU cycles. A user program (a browser, for instance :) may be caching >> decompressed JPEGs into transcendent (discardable) memory, for quick >> display. In this case, almost certainly the decompressed JPEGs should >> be discarded before memory is compressed, under the realistic >> assumption that one JPEG decompression is cheaper than one LZO >> compression/decompression. But there may be situations in which a lot >> more work has gone into creating the application cache, and then it >> makes sense to compress/decompress it rather than discard it. It may >> be hard for the kernel to figure out how expensive it is to recreate >> the application cache, so the application should tell it. > > Agreed. It's very hard for kernel to figure it out so VM should depend > on user's hint. and thing you said is the exact example of volatile > range system call that I am suggesting. > > http://lwn.net/Articles/578761/ > >> >> Of course, for a cache the cost needs to be multiplied by the >> probability that the memory will be used again in the future. A good >> part of the Linux VM is dedicated to estimating that probability, for >> some kinds of memory. But I don't see simple hooks for describing >> various costs such as the one I mentioned, and I wonder if this >> paradigm makes sense in general, or if it is peculiar to Chrome OS. > > Your statement makes sense to me but unfortunately, current VM doesn't > consider everything you mentioned. > It is just based on page access recency by approximate LRU logic + > some heuristic(ex, mapped page and VM_EXEC pages are more precious). It seems that the ARC page replacement algorithm in zfs have good performance and more intelligent. http://en.wikipedia.org/wiki/Adaptive_replacement_cache Is there any history reason of linux didn't implement something like ARC as the page cache replacement algorithm? > The reason it makes hard is just complexity/overhead of implementation. > If someone has nice idea to define parameters and implement with > small overhead, it would be very nice! > -- Regards, --Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by kanga.kvack.org (Postfix) with ESMTP id 978976B0031 for ; Tue, 7 Jan 2014 02:12:50 -0500 (EST) Received: by mail-pa0-f42.google.com with SMTP id lj1so3354pab.29 for ; Mon, 06 Jan 2014 23:12:50 -0800 (PST) Received: from LGEAMRELO01.lge.com (lgeamrelo01.lge.com. [156.147.1.125]) by mx.google.com with ESMTP id ph10si57403968pbb.169.2014.01.06.23.12.46 for ; Mon, 06 Jan 2014 23:12:48 -0800 (PST) Date: Tue, 7 Jan 2014 16:13:03 +0900 From: Minchan Kim Subject: Re: swap, compress, discard: what's in the future? Message-ID: <20140107071303.GC24188@bbox> References: <20140107030148.GA24188@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Bob Liu Cc: Luigi Semenzato , Linux-MM , Rik van Riel Hello Bob, On Tue, Jan 07, 2014 at 02:33:11PM +0800, Bob Liu wrote: > On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim wrote: > > Hello Luigi, > > > > On Mon, Jan 06, 2014 at 06:31:29PM -0800, Luigi Semenzato wrote: > >> I would like to know (and I apologize if there is an obvious answer) > >> if folks on this list have pointers to documents or discussions > >> regarding the long-term evolution of the Linux memory manager. I > >> realize there is plenty of shorter-term stuff to worry about, but a > >> long-term vision would be helpful---even more so if there is some > >> agreement. > >> > >> My super-simple view is that when memory reclaim is possible there is > >> a cost attached to it, and the goal is to minimize the cost. The cost > >> for reclaiming a unit of memory of some kind is a function of various > >> parameters: the CPU cycles, the I/O bandwidth, and the latency, to > >> name the main components. This function can change a lot depending on > >> the load and in practice it may have to be grossly approximated, but > >> the concept is valid IMO. > >> > >> For instance, the cost of compressing and decompressing RAM is mainly > >> CPU cycles. A user program (a browser, for instance :) may be caching > >> decompressed JPEGs into transcendent (discardable) memory, for quick > >> display. In this case, almost certainly the decompressed JPEGs should > >> be discarded before memory is compressed, under the realistic > >> assumption that one JPEG decompression is cheaper than one LZO > >> compression/decompression. But there may be situations in which a lot > >> more work has gone into creating the application cache, and then it > >> makes sense to compress/decompress it rather than discard it. It may > >> be hard for the kernel to figure out how expensive it is to recreate > >> the application cache, so the application should tell it. > > > > Agreed. It's very hard for kernel to figure it out so VM should depend > > on user's hint. and thing you said is the exact example of volatile > > range system call that I am suggesting. > > > > http://lwn.net/Articles/578761/ > > > >> > >> Of course, for a cache the cost needs to be multiplied by the > >> probability that the memory will be used again in the future. A good > >> part of the Linux VM is dedicated to estimating that probability, for > >> some kinds of memory. But I don't see simple hooks for describing > >> various costs such as the one I mentioned, and I wonder if this > >> paradigm makes sense in general, or if it is peculiar to Chrome OS. > > > > Your statement makes sense to me but unfortunately, current VM doesn't > > consider everything you mentioned. > > It is just based on page access recency by approximate LRU logic + > > some heuristic(ex, mapped page and VM_EXEC pages are more precious). > > It seems that the ARC page replacement algorithm in zfs have good > performance and more intelligent. > http://en.wikipedia.org/wiki/Adaptive_replacement_cache > Is there any history reason of linux didn't implement something like > ARC as the page cache replacement algorithm? I guess most biggest reason was patent? Anyway, I think Rik and Peter saw it at that time. > > > The reason it makes hard is just complexity/overhead of implementation. > > If someone has nice idea to define parameters and implement with > > small overhead, it would be very nice! > > > > -- > Regards, > --Bob > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ve0-f172.google.com (mail-ve0-f172.google.com [209.85.128.172]) by kanga.kvack.org (Postfix) with ESMTP id 261CB6B0031 for ; Tue, 7 Jan 2014 08:45:18 -0500 (EST) Received: by mail-ve0-f172.google.com with SMTP id jw12so110913veb.31 for ; Tue, 07 Jan 2014 05:45:17 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTP id sr1si1911665vdc.78.2014.01.07.05.45.16 for ; Tue, 07 Jan 2014 05:45:16 -0800 (PST) Message-ID: <52CC04DD.3020603@redhat.com> Date: Tue, 07 Jan 2014 08:45:01 -0500 From: Rik van Riel MIME-Version: 1.0 Subject: Re: swap, compress, discard: what's in the future? References: <20140107030148.GA24188@bbox> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Bob Liu , Minchan Kim Cc: Luigi Semenzato , Linux-MM , hnaz@cmpxchg.org On 01/07/2014 01:33 AM, Bob Liu wrote: > On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim wrote: >> Your statement makes sense to me but unfortunately, current VM doesn't >> consider everything you mentioned. >> It is just based on page access recency by approximate LRU logic + >> some heuristic(ex, mapped page and VM_EXEC pages are more precious). > > It seems that the ARC page replacement algorithm in zfs have good > performance and more intelligent. > http://en.wikipedia.org/wiki/Adaptive_replacement_cache > Is there any history reason of linux didn't implement something like > ARC as the page cache replacement algorithm? ARC by itself was quickly superceded by CLOCK-Pro, which looks like it would be even better. Johannes introduces an algorithm with similar properties in his "thrash based page cache replacement" patch series. However, algorithms like ARC and clockpro are best for a cache that caches a large data set (much larger than the cache size), and has to deal with large inter-reference distances. For anonymous memory, we are dealing with the opposite: the total amount of anonymous memory is on the same order of magnitude as the amount of RAM, and the inter-reference distance will be smaller as a result. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f182.google.com (mail-pd0-f182.google.com [209.85.192.182]) by kanga.kvack.org (Postfix) with ESMTP id 05D6D6B0031 for ; Thu, 9 Jan 2014 03:18:55 -0500 (EST) Received: by mail-pd0-f182.google.com with SMTP id v10so2876264pde.13 for ; Thu, 09 Jan 2014 00:18:55 -0800 (PST) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id pi8si3170652pac.117.2014.01.09.00.18.53 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 09 Jan 2014 00:18:54 -0800 (PST) Message-ID: <52CE5B58.8080203@oracle.com> Date: Thu, 09 Jan 2014 16:18:32 +0800 From: Bob Liu MIME-Version: 1.0 Subject: Re: swap, compress, discard: what's in the future? References: <20140107030148.GA24188@bbox> <52CC04DD.3020603@redhat.com> In-Reply-To: <52CC04DD.3020603@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Bob Liu , Minchan Kim , Luigi Semenzato , Linux-MM , hnaz@cmpxchg.org On 01/07/2014 09:45 PM, Rik van Riel wrote: > On 01/07/2014 01:33 AM, Bob Liu wrote: >> On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim wrote: > >>> Your statement makes sense to me but unfortunately, current VM doesn't >>> consider everything you mentioned. >>> It is just based on page access recency by approximate LRU logic + >>> some heuristic(ex, mapped page and VM_EXEC pages are more precious). >> >> It seems that the ARC page replacement algorithm in zfs have good >> performance and more intelligent. >> http://en.wikipedia.org/wiki/Adaptive_replacement_cache >> Is there any history reason of linux didn't implement something like >> ARC as the page cache replacement algorithm? > > ARC by itself was quickly superceded by CLOCK-Pro, which > looks like it would be even better. > > Johannes introduces an algorithm with similar properties > in his "thrash based page cache replacement" patch series. > But it seems you and Peter have already implemented CLOCK-Pro and CART page cache replacement many years ago. Why they were not get merged at that time? I found some information from http://linux-mm.org/AdvancedPageReplacement Linux implementations: Rahul Iyer's implementation of CART, RahulIyerCART Rik van Riel's ClockProApproximation. Rik van Riel's proposal for the tracking of NonResidentPages, which is used by both his ClockProApproximation and by Peter Zijlstra's CART and Clock-pro implementations. Peter Zijlstra's CART PeterZCart Peter Zijlstra's Clock-Pro PeterZClockPro2 Thanks, -Bob > However, algorithms like ARC and clockpro are best for > a cache that caches a large data set (much larger than > the cache size), and has to deal with large inter-reference > distances. > > For anonymous memory, we are dealing with the opposite: > the total amount of anonymous memory is on the same > order of magnitude as the amount of RAM, and the > inter-reference distance will be smaller as a result. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ve0-f174.google.com (mail-ve0-f174.google.com [209.85.128.174]) by kanga.kvack.org (Postfix) with ESMTP id 3CE1E6B0035 for ; Thu, 9 Jan 2014 11:42:11 -0500 (EST) Received: by mail-ve0-f174.google.com with SMTP id pa12so2532453veb.5 for ; Thu, 09 Jan 2014 08:42:11 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTP id f9si5984099qar.190.2014.01.09.08.42.09 for ; Thu, 09 Jan 2014 08:42:10 -0800 (PST) Message-ID: <52CED13E.50700@redhat.com> Date: Thu, 09 Jan 2014 11:41:34 -0500 From: Rik van Riel MIME-Version: 1.0 Subject: Re: swap, compress, discard: what's in the future? References: <20140107030148.GA24188@bbox> <52CC04DD.3020603@redhat.com> <52CE5B58.8080203@oracle.com> In-Reply-To: <52CE5B58.8080203@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Bob Liu Cc: Bob Liu , Minchan Kim , Luigi Semenzato , Linux-MM , hnaz@cmpxchg.org On 01/09/2014 03:18 AM, Bob Liu wrote: > > On 01/07/2014 09:45 PM, Rik van Riel wrote: >> On 01/07/2014 01:33 AM, Bob Liu wrote: >>> On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim wrote: >> >>>> Your statement makes sense to me but unfortunately, current VM doesn't >>>> consider everything you mentioned. >>>> It is just based on page access recency by approximate LRU logic + >>>> some heuristic(ex, mapped page and VM_EXEC pages are more precious). >>> >>> It seems that the ARC page replacement algorithm in zfs have good >>> performance and more intelligent. >>> http://en.wikipedia.org/wiki/Adaptive_replacement_cache >>> Is there any history reason of linux didn't implement something like >>> ARC as the page cache replacement algorithm? >> >> ARC by itself was quickly superceded by CLOCK-Pro, which >> looks like it would be even better. >> >> Johannes introduces an algorithm with similar properties >> in his "thrash based page cache replacement" patch series. >> > > But it seems you and Peter have already implemented CLOCK-Pro and CART > page cache replacement many years ago. Why they were not get merged at > that time? Scalability concerns, lack of time, and the VM not being ready to take the code. The split LRU code makes it much more logical to merge a replacement scheme that is suitable for second level caches, because the anonymous memory is in an LRU scheme that is more suitable to its kind of usage. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org