From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx170.postini.com [74.125.245.170]) by kanga.kvack.org (Postfix) with SMTP id 24BEA6B004A for ; Fri, 6 Apr 2012 03:16:43 -0400 (EDT) Received: by pbcup15 with SMTP id up15so2781342pbc.14 for ; Fri, 06 Apr 2012 00:16:42 -0700 (PDT) Message-ID: <4F7E9854.1020904@gmail.com> Date: Fri, 06 Apr 2012 15:16:36 +0800 From: "gnehzuil.lzheng@gmail.com" MIME-Version: 1.0 Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> In-Reply-To: <37371333672160@webcorp7.yandex-team.ru> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Alexey Ivanov Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org On 04/06/2012 08:29 AM, Alexey Ivanov wrote: > In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory. > > From my little research I've found following lkml thread: > https://lkml.org/lkml/2008/6/11/278 > And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db > which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped. > > Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)? > Hi Alexey, Cc to linux-mm mailing list. I have met the similar problem and I have sent a mail to discuss it. Maybe it can help you (http://marc.info/?l=linux-mm&m=132947026019538&w=2). Now Konstantin has sent a patch set to try to expand vm_flags from 32 bit to 64 bit. Then we can add the new flag into vm_flags and prioritize mmaped pages in madvise(2). Regards, Zheng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx131.postini.com [74.125.245.131]) by kanga.kvack.org (Postfix) with SMTP id 638236B0044 for ; Mon, 9 Apr 2012 13:11:17 -0400 (EDT) From: Alexey Ivanov In-Reply-To: <4F7E9854.1020904@gmail.com> References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> Subject: Re: mapped pagecache pages vs unmapped pages MIME-Version: 1.0 Message-Id: <12701333991475@webcorp7.yandex-team.ru> Date: Mon, 09 Apr 2012 21:11:14 +0400 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r Sender: owner-linux-mm@kvack.org List-ID: To: "gnehzuil.lzheng@gmail.com" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Thanks for the hint! Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? 06.04.2012, 11:16, "gnehzuil.lzheng@gmail.com" : > On 04/06/2012 08:29 AM, Alexey Ivanov wrote: > >> ?In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory. >> >> ?From my little research I've found following lkml thread: >> ?https://lkml.org/lkml/2008/6/11/278 >> ?And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db >> ?which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped. >> >> ?Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)? > > Hi Alexey, > > Cc to linux-mm mailing list. > > I have met the similar problem and I have sent a mail to discuss it. > Maybe it can help you > (http://marc.info/?l=linux-mm&m=132947026019538&w=2). > > Now Konstantin has sent a patch set to try to expand vm_flags from 32 > bit to 64 bit. ?Then we can add the new flag into vm_flags and > prioritize mmaped pages in madvise(2). > > Regards, > Zheng > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at ?http://vger.kernel.org/majordomo-info.html > Please read the FAQ at ?http://www.tux.org/lkml/ -- Alexey Ivanov Yandex Search Admin Team ************* tel.: +7 (985) 120-35-83 (int. 7176) http://staff.yandex-team.ru/rbtz ************* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx156.postini.com [74.125.245.156]) by kanga.kvack.org (Postfix) with SMTP id 029EA6B0092 for ; Mon, 9 Apr 2012 14:17:37 -0400 (EDT) Received: by lagz14 with SMTP id z14so4793531lag.14 for ; Mon, 09 Apr 2012 11:17:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <12701333991475@webcorp7.yandex-team.ru> References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> Date: Mon, 9 Apr 2012 11:17:35 -0700 Message-ID: Subject: Re: mapped pagecache pages vs unmapped pages From: Ying Han Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Alexey Ivanov Cc: "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Rik van Riel On Mon, Apr 9, 2012 at 10:11 AM, Alexey Ivanov wrote: > Thanks for the hint! > > Can anyone clarify the reason of not using zone->inactive_ratio in inacti= ve_file_is_low_global()? anonymous pages starts out referenced in active list, and scanning the whole active list will only rotate those pages. So we would like to limit the size of inactive anon to save scanning. --Ying > > 06.04.2012, 11:16, "gnehzuil.lzheng@gmail.com" : >> On 04/06/2012 08:29 AM, Alexey Ivanov wrote: >> >>> =A0In progress of migration from FreeBSD to Linux and we found some str= ange behavior: periodically running tasks (like rsync/p2p deployment) evict= mapped pages from memory. >>> >>> =A0From my little research I've found following lkml thread: >>> =A0https://lkml.org/lkml/2008/6/11/278 >>> =A0And more precisely this commit: https://git.kernel.org/?p=3Dlinux/ke= rnel/git/torvalds/linux-2.6.git;a=3Dcommit;h=3D4f98a2fee8acdb4ac84545df98cc= cecfd130f8db >>> =A0which along with splitting LRU into "anon" and "file" removed suppor= t of reclaim_mapped. >>> >>> =A0Is there a knob to prioritize mapped memory over unmapped (without m= odifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in m= emory) or at least some way to change proportion of Active(file)/Inactive(f= ile)? >> >> Hi Alexey, >> >> Cc to linux-mm mailing list. >> >> I have met the similar problem and I have sent a mail to discuss it. >> Maybe it can help you >> (http://marc.info/?l=3Dlinux-mm&m=3D132947026019538&w=3D2). >> >> Now Konstantin has sent a patch set to try to expand vm_flags from 32 >> bit to 64 bit. =A0Then we can add the new flag into vm_flags and >> prioritize mmaped pages in madvise(2). >> >> Regards, >> Zheng >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" = in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at =A0http://www.tux.org/lkml/ > > -- > Alexey Ivanov > Yandex Search Admin Team > ************* > tel.: +7 (985) 120-35-83 (int. 7176) > http://staff.yandex-team.ru/rbtz > ************* > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter= .ca/ > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id E59D66B0044 for ; Mon, 9 Apr 2012 14:56:09 -0400 (EDT) Message-ID: <4F8326FD.8020507@redhat.com> Date: Mon, 09 Apr 2012 14:14:21 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> In-Reply-To: <12701333991475@webcorp7.yandex-team.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Alexey Ivanov Cc: "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" On 04/09/2012 01:11 PM, Alexey Ivanov wrote: > Thanks for the hint! > > Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? New anonymous pages start out on the active anon list, and are always referenced. If memory fills up, they may end up getting moved to the inactive anon list; being referenced while on the inactive anon list is enough to get them promoted back to the active list. New file pages start out on the INACTIVE file list, and start their lives not referenced at all. Due to readahead extra reads, many file pages may never be referenced. Only file pages that are referenced twice make it onto the active list. This means the inactive file list has to be large enough for all the readahead buffers, and give pages enough time on the list that frequently accessed ones can get accessed twice and promoted. http://linux-mm.org/PageReplacementDesign -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx154.postini.com [74.125.245.154]) by kanga.kvack.org (Postfix) with SMTP id 4EC7C6B004A for ; Mon, 9 Apr 2012 19:50:56 -0400 (EDT) From: Alexey Ivanov In-Reply-To: <4F8326FD.8020507@redhat.com> References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> Subject: Re: mapped pagecache pages vs unmapped pages MIME-Version: 1.0 Message-Id: <8041334015453@webcorp4.yandex-team.ru> Date: Tue, 10 Apr 2012 03:50:53 +0400 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com Did you consider making this ratio tunable, at least manually(i.e. via sysctl)? I suppose we are not the only ones with almost-whole-ram-mmaped workload. 09.04.2012, 22:56, "Rik van Riel" : > On 04/09/2012 01:11 PM, Alexey Ivanov wrote: > >> ?Thanks for the hint! >> >> ?Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? > > New anonymous pages start out on the active anon list, and > are always referenced. ?If memory fills up, they may end > up getting moved to the inactive anon list; being referenced > while on the inactive anon list is enough to get them promoted > back to the active list. > > New file pages start out on the INACTIVE file list, and > start their lives not referenced at all. Due to readahead > extra reads, many file pages may never be referenced. > > Only file pages that are referenced twice make it onto > the active list. > > This means the inactive file list has to be large enough > for all the readahead buffers, and give pages enough time > on the list that frequently accessed ones can get accessed > twice and promoted. > > http://linux-mm.org/PageReplacementDesign > > -- > All rights reversed -- Alexey Ivanov Yandex Search Admin Team -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id B0D336B004A for ; Mon, 9 Apr 2012 20:29:28 -0400 (EDT) Message-ID: <4F837F6E.3010508@kernel.org> Date: Tue, 10 Apr 2012 09:31:42 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> <8041334015453@webcorp4.yandex-team.ru> In-Reply-To: <8041334015453@webcorp4.yandex-team.ru> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Alexey Ivanov Cc: Rik van Riel , "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com 2012-04-10 i??i ? 8:50, Alexey Ivanov i?' e,?: > Did you consider making this ratio tunable, at least manually(i.e. via sysctl)? > I suppose we are not the only ones with almost-whole-ram-mmaped workload. Personally, I think it's not good approach. It depends on kernel's internal implemenatation which would be changed in future as we chagend it at 2.6.28. In my opinion, kernel just should do best effort to keep active working set except some critical pages which are code pages. If it's not active working set but user want to keep them, we have to add new feature like fadvise/madvise(WORKING_SET) to give the hint to kenrel. Although it causes changing legacy programs, it doesn't copuled kernel's reclaim algorithm and it's way to go, I think. > > 09.04.2012, 22:56, "Rik van Riel" : >> On 04/09/2012 01:11 PM, Alexey Ivanov wrote: >> >>> Thanks for the hint! >>> >>> Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? >> >> New anonymous pages start out on the active anon list, and >> are always referenced. If memory fills up, they may end >> up getting moved to the inactive anon list; being referenced >> while on the inactive anon list is enough to get them promoted >> back to the active list. >> >> New file pages start out on the INACTIVE file list, and >> start their lives not referenced at all. Due to readahead >> extra reads, many file pages may never be referenced. >> >> Only file pages that are referenced twice make it onto >> the active list. >> >> This means the inactive file list has to be large enough >> for all the readahead buffers, and give pages enough time >> on the list that frequently accessed ones can get accessed >> twice and promoted. >> >> http://linux-mm.org/PageReplacementDesign >> >> -- >> All rights reversed > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx130.postini.com [74.125.245.130]) by kanga.kvack.org (Postfix) with SMTP id 0CF756B004A for ; Mon, 9 Apr 2012 20:49:29 -0400 (EDT) Message-ID: <4F838390.1080909@redhat.com> Date: Mon, 09 Apr 2012 20:49:20 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> <8041334015453@webcorp4.yandex-team.ru> <4F837F6E.3010508@kernel.org> In-Reply-To: <4F837F6E.3010508@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Alexey Ivanov , "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com On 04/09/2012 08:31 PM, Minchan Kim wrote: > 2012-04-10 i??i ? 8:50, Alexey Ivanov i?' e,?: > >> Did you consider making this ratio tunable, at least manually(i.e. via sysctl)? >> I suppose we are not the only ones with almost-whole-ram-mmaped workload. > > Personally, I think it's not good approach. > It depends on kernel's internal implemenatation which would be changed > in future as we chagend it at 2.6.28. I also believe that a tunable for this is not going to be a very workable approach, for the simple reason that changing the value does not make a predictable change in the effectiveness of working set detection or protection. > In my opinion, kernel just should do best effort to keep active working > set except some critical pages which are code pages. Johannes has some experimental code to measure refaults, and calculate their distance in a multi-zone, multi-cgroup environment. That would allow us to predictably place things in the working set as required. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx169.postini.com [74.125.245.169]) by kanga.kvack.org (Postfix) with SMTP id CD5296B004A for ; Mon, 9 Apr 2012 21:22:47 -0400 (EDT) Message-ID: <4F838BF4.7020104@kernel.org> Date: Tue, 10 Apr 2012 10:25:08 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> <8041334015453@webcorp4.yandex-team.ru> <4F837F6E.3010508@kernel.org> <4F838390.1080909@redhat.com> In-Reply-To: <4F838390.1080909@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel , hannes@cmpxchg.org Cc: Alexey Ivanov , "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com 2012-04-10 i??i ? 9:49, Rik van Riel i?' e,?: > On 04/09/2012 08:31 PM, Minchan Kim wrote: >> 2012-04-10 i??i ? 8:50, Alexey Ivanov i?' e,?: >> >>> Did you consider making this ratio tunable, at least manually(i.e. >>> via sysctl)? >>> I suppose we are not the only ones with almost-whole-ram-mmaped >>> workload. >> >> Personally, I think it's not good approach. >> It depends on kernel's internal implemenatation which would be changed >> in future as we chagend it at 2.6.28. > > I also believe that a tunable for this is not going to be > a very workable approach, for the simple reason that changing > the value does not make a predictable change in the effectiveness > of working set detection or protection. > >> In my opinion, kernel just should do best effort to keep active working >> set except some critical pages which are code pages. > > Johannes has some experimental code to measure refaults, and > calculate their distance in a multi-zone, multi-cgroup environment. > > That would allow us to predictably place things in the working set > as required. > Hannes, it can help many people if you post your code. ;) -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756828Ab2DFAgP (ORCPT ); Thu, 5 Apr 2012 20:36:15 -0400 Received: from forward18.mail.yandex.net ([95.108.253.143]:54381 "EHLO forward18.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756469Ab2DFAgK (ORCPT ); Thu, 5 Apr 2012 20:36:10 -0400 X-Greylist: delayed 407 seconds by postgrey-1.27 at vger.kernel.org; Thu, 05 Apr 2012 20:36:10 EDT From: Alexey Ivanov To: linux-kernel@vger.kernel.org Subject: mapped pagecache pages vs unmapped pages MIME-Version: 1.0 Message-Id: <37371333672160@webcorp7.yandex-team.ru> Date: Fri, 06 Apr 2012 04:29:20 +0400 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory. >>From my little research I've found following lkml thread: https://lkml.org/lkml/2008/6/11/278 And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped. Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)? -- Sincerely, Alexey Ivanov From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755357Ab2DFHQo (ORCPT ); Fri, 6 Apr 2012 03:16:44 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:53281 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755050Ab2DFHQm (ORCPT ); Fri, 6 Apr 2012 03:16:42 -0400 Message-ID: <4F7E9854.1020904@gmail.com> Date: Fri, 06 Apr 2012 15:16:36 +0800 From: "gnehzuil.lzheng@gmail.com" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111124 Thunderbird/8.0 MIME-Version: 1.0 To: Alexey Ivanov CC: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> In-Reply-To: <37371333672160@webcorp7.yandex-team.ru> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/06/2012 08:29 AM, Alexey Ivanov wrote: > In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory. > > From my little research I've found following lkml thread: > https://lkml.org/lkml/2008/6/11/278 > And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db > which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped. > > Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)? > Hi Alexey, Cc to linux-mm mailing list. I have met the similar problem and I have sent a mail to discuss it. Maybe it can help you (http://marc.info/?l=linux-mm&m=132947026019538&w=2). Now Konstantin has sent a patch set to try to expand vm_flags from 32 bit to 64 bit. Then we can add the new flag into vm_flags and prioritize mmaped pages in madvise(2). Regards, Zheng From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755385Ab2DIRLU (ORCPT ); Mon, 9 Apr 2012 13:11:20 -0400 Received: from forward17.mail.yandex.net ([95.108.253.142]:59498 "EHLO forward17.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751672Ab2DIRLT (ORCPT ); Mon, 9 Apr 2012 13:11:19 -0400 From: Alexey Ivanov To: "gnehzuil.lzheng@gmail.com" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" In-Reply-To: <4F7E9854.1020904@gmail.com> References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> Subject: Re: mapped pagecache pages vs unmapped pages MIME-Version: 1.0 Message-Id: <12701333991475@webcorp7.yandex-team.ru> Date: Mon, 09 Apr 2012 21:11:14 +0400 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks for the hint! Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? 06.04.2012, 11:16, "gnehzuil.lzheng@gmail.com" : > On 04/06/2012 08:29 AM, Alexey Ivanov wrote: > >> šIn progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory. >> >> šFrom my little research I've found following lkml thread: >> šhttps://lkml.org/lkml/2008/6/11/278 >> šAnd more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db >> šwhich along with splitting LRU into "anon" and "file" removed support of reclaim_mapped. >> >> šIs there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)? > > Hi Alexey, > > Cc to linux-mm mailing list. > > I have met the similar problem and I have sent a mail to discuss it. > Maybe it can help you > (http://marc.info/?l=linux-mm&m=132947026019538&w=2). > > Now Konstantin has sent a patch set to try to expand vm_flags from 32 > bit to 64 bit. šThen we can add the new flag into vm_flags and > prioritize mmaped pages in madvise(2). > > Regards, > Zheng > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at šhttp://vger.kernel.org/majordomo-info.html > Please read the FAQ at šhttp://www.tux.org/lkml/ -- Alexey Ivanov Yandex Search Admin Team ************* tel.: +7 (985) 120-35-83 (int. 7176) http://staff.yandex-team.ru/rbtz ************* From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757654Ab2DISYA (ORCPT ); Mon, 9 Apr 2012 14:24:00 -0400 Received: from mail-lb0-f174.google.com ([209.85.217.174]:59486 "EHLO mail-lb0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753041Ab2DISX7 convert rfc822-to-8bit (ORCPT ); Mon, 9 Apr 2012 14:23:59 -0400 MIME-Version: 1.0 In-Reply-To: <12701333991475@webcorp7.yandex-team.ru> References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> Date: Mon, 9 Apr 2012 11:17:35 -0700 Message-ID: Subject: Re: mapped pagecache pages vs unmapped pages From: Ying Han To: Alexey Ivanov Cc: "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Rik van Riel Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 9, 2012 at 10:11 AM, Alexey Ivanov wrote: > Thanks for the hint! > > Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? anonymous pages starts out referenced in active list, and scanning the whole active list will only rotate those pages. So we would like to limit the size of inactive anon to save scanning. --Ying > > 06.04.2012, 11:16, "gnehzuil.lzheng@gmail.com" : >> On 04/06/2012 08:29 AM, Alexey Ivanov wrote: >> >>>  In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory. >>> >>>  From my little research I've found following lkml thread: >>>  https://lkml.org/lkml/2008/6/11/278 >>>  And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db >>>  which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped. >>> >>>  Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)? >> >> Hi Alexey, >> >> Cc to linux-mm mailing list. >> >> I have met the similar problem and I have sent a mail to discuss it. >> Maybe it can help you >> (http://marc.info/?l=linux-mm&m=132947026019538&w=2). >> >> Now Konstantin has sent a patch set to try to expand vm_flags from 32 >> bit to 64 bit.  Then we can add the new flag into vm_flags and >> prioritize mmaped pages in madvise(2). >> >> Regards, >> Zheng >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at  http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at  http://www.tux.org/lkml/ > > -- > Alexey Ivanov > Yandex Search Admin Team > ************* > tel.: +7 (985) 120-35-83 (int. 7176) > http://staff.yandex-team.ru/rbtz > ************* > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org.  For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757626Ab2DIS4N (ORCPT ); Mon, 9 Apr 2012 14:56:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45770 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754630Ab2DIS4L (ORCPT ); Mon, 9 Apr 2012 14:56:11 -0400 Message-ID: <4F8326FD.8020507@redhat.com> Date: Mon, 09 Apr 2012 14:14:21 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: Alexey Ivanov CC: "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> In-Reply-To: <12701333991475@webcorp7.yandex-team.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/09/2012 01:11 PM, Alexey Ivanov wrote: > Thanks for the hint! > > Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? New anonymous pages start out on the active anon list, and are always referenced. If memory fills up, they may end up getting moved to the inactive anon list; being referenced while on the inactive anon list is enough to get them promoted back to the active list. New file pages start out on the INACTIVE file list, and start their lives not referenced at all. Due to readahead extra reads, many file pages may never be referenced. Only file pages that are referenced twice make it onto the active list. This means the inactive file list has to be large enough for all the readahead buffers, and give pages enough time on the list that frequently accessed ones can get accessed twice and promoted. http://linux-mm.org/PageReplacementDesign -- All rights reversed From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758157Ab2DJACK (ORCPT ); Mon, 9 Apr 2012 20:02:10 -0400 Received: from forward6.mail.yandex.net ([77.88.60.125]:56932 "EHLO forward6.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750858Ab2DJACJ (ORCPT ); Mon, 9 Apr 2012 20:02:09 -0400 X-Greylist: delayed 673 seconds by postgrey-1.27 at vger.kernel.org; Mon, 09 Apr 2012 20:02:09 EDT From: Alexey Ivanov To: Rik van Riel Cc: "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com In-Reply-To: <4F8326FD.8020507@redhat.com> References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> Subject: Re: mapped pagecache pages vs unmapped pages MIME-Version: 1.0 Message-Id: <8041334015453@webcorp4.yandex-team.ru> Date: Tue, 10 Apr 2012 03:50:53 +0400 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Did you consider making this ratio tunable, at least manually(i.e. via sysctl)? I suppose we are not the only ones with almost-whole-ram-mmaped workload. 09.04.2012, 22:56, "Rik van Riel" : > On 04/09/2012 01:11 PM, Alexey Ivanov wrote: > >> šThanks for the hint! >> >> šCan anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? > > New anonymous pages start out on the active anon list, and > are always referenced. šIf memory fills up, they may end > up getting moved to the inactive anon list; being referenced > while on the inactive anon list is enough to get them promoted > back to the active list. > > New file pages start out on the INACTIVE file list, and > start their lives not referenced at all. Due to readahead > extra reads, many file pages may never be referenced. > > Only file pages that are referenced twice make it onto > the active list. > > This means the inactive file list has to be large enough > for all the readahead buffers, and give pages enough time > on the list that frequently accessed ones can get accessed > twice and promoted. > > http://linux-mm.org/PageReplacementDesign > > -- > All rights reversed -- Alexey Ivanov Yandex Search Admin Team From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758284Ab2DJA3a (ORCPT ); Mon, 9 Apr 2012 20:29:30 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:62249 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753320Ab2DJA33 (ORCPT ); Mon, 9 Apr 2012 20:29:29 -0400 X-AuditID: 9c930197-b7b09ae000000a5d-86-4f837ee6fbbd Message-ID: <4F837F6E.3010508@kernel.org> Date: Tue, 10 Apr 2012 09:31:42 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Alexey Ivanov CC: Rik van Riel , "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> <8041334015453@webcorp4.yandex-team.ru> In-Reply-To: <8041334015453@webcorp4.yandex-team.ru> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2012-04-10 오전 8:50, Alexey Ivanov ì“´ 글: > Did you consider making this ratio tunable, at least manually(i.e. via sysctl)? > I suppose we are not the only ones with almost-whole-ram-mmaped workload. Personally, I think it's not good approach. It depends on kernel's internal implemenatation which would be changed in future as we chagend it at 2.6.28. In my opinion, kernel just should do best effort to keep active working set except some critical pages which are code pages. If it's not active working set but user want to keep them, we have to add new feature like fadvise/madvise(WORKING_SET) to give the hint to kenrel. Although it causes changing legacy programs, it doesn't copuled kernel's reclaim algorithm and it's way to go, I think. > > 09.04.2012, 22:56, "Rik van Riel" : >> On 04/09/2012 01:11 PM, Alexey Ivanov wrote: >> >>> Thanks for the hint! >>> >>> Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()? >> >> New anonymous pages start out on the active anon list, and >> are always referenced. If memory fills up, they may end >> up getting moved to the inactive anon list; being referenced >> while on the inactive anon list is enough to get them promoted >> back to the active list. >> >> New file pages start out on the INACTIVE file list, and >> start their lives not referenced at all. Due to readahead >> extra reads, many file pages may never be referenced. >> >> Only file pages that are referenced twice make it onto >> the active list. >> >> This means the inactive file list has to be large enough >> for all the readahead buffers, and give pages enough time >> on the list that frequently accessed ones can get accessed >> twice and promoted. >> >> http://linux-mm.org/PageReplacementDesign >> >> -- >> All rights reversed > -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932273Ab2DJAtc (ORCPT ); Mon, 9 Apr 2012 20:49:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:61914 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932102Ab2DJAtb (ORCPT ); Mon, 9 Apr 2012 20:49:31 -0400 Message-ID: <4F838390.1080909@redhat.com> Date: Mon, 09 Apr 2012 20:49:20 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: Minchan Kim CC: Alexey Ivanov , "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> <8041334015453@webcorp4.yandex-team.ru> <4F837F6E.3010508@kernel.org> In-Reply-To: <4F837F6E.3010508@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/09/2012 08:31 PM, Minchan Kim wrote: > 2012-04-10 오전 8:50, Alexey Ivanov ì“´ 글: > >> Did you consider making this ratio tunable, at least manually(i.e. via sysctl)? >> I suppose we are not the only ones with almost-whole-ram-mmaped workload. > > Personally, I think it's not good approach. > It depends on kernel's internal implemenatation which would be changed > in future as we chagend it at 2.6.28. I also believe that a tunable for this is not going to be a very workable approach, for the simple reason that changing the value does not make a predictable change in the effectiveness of working set detection or protection. > In my opinion, kernel just should do best effort to keep active working > set except some critical pages which are code pages. Johannes has some experimental code to measure refaults, and calculate their distance in a multi-zone, multi-cgroup environment. That would allow us to predictably place things in the working set as required. -- All rights reversed From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932362Ab2DJBWt (ORCPT ); Mon, 9 Apr 2012 21:22:49 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:60315 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752521Ab2DJBWs (ORCPT ); Mon, 9 Apr 2012 21:22:48 -0400 X-AuditID: 9c930179-b7b2cae000000ca1-02-4f838b65329d Message-ID: <4F838BF4.7020104@kernel.org> Date: Tue, 10 Apr 2012 10:25:08 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Rik van Riel , hannes@cmpxchg.org CC: Alexey Ivanov , "gnehzuil.lzheng@gmail.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , yinghan@google.com Subject: Re: mapped pagecache pages vs unmapped pages References: <37371333672160@webcorp7.yandex-team.ru> <4F7E9854.1020904@gmail.com> <12701333991475@webcorp7.yandex-team.ru> <4F8326FD.8020507@redhat.com> <8041334015453@webcorp4.yandex-team.ru> <4F837F6E.3010508@kernel.org> <4F838390.1080909@redhat.com> In-Reply-To: <4F838390.1080909@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2012-04-10 오전 9:49, Rik van Riel ì“´ 글: > On 04/09/2012 08:31 PM, Minchan Kim wrote: >> 2012-04-10 오전 8:50, Alexey Ivanov ì“´ 글: >> >>> Did you consider making this ratio tunable, at least manually(i.e. >>> via sysctl)? >>> I suppose we are not the only ones with almost-whole-ram-mmaped >>> workload. >> >> Personally, I think it's not good approach. >> It depends on kernel's internal implemenatation which would be changed >> in future as we chagend it at 2.6.28. > > I also believe that a tunable for this is not going to be > a very workable approach, for the simple reason that changing > the value does not make a predictable change in the effectiveness > of working set detection or protection. > >> In my opinion, kernel just should do best effort to keep active working >> set except some critical pages which are code pages. > > Johannes has some experimental code to measure refaults, and > calculate their distance in a multi-zone, multi-cgroup environment. > > That would allow us to predictably place things in the working set > as required. > Hannes, it can help many people if you post your code. ;) -- Kind regards, Minchan Kim