From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id 9D0856B007E for ; Fri, 17 Feb 2012 04:17:25 -0500 (EST) Received: by dadv6 with SMTP id v6so3702001dad.14 for ; Fri, 17 Feb 2012 01:17:24 -0800 (PST) Date: Fri, 17 Feb 2012 17:22:05 +0800 From: Zheng Liu Subject: Fine granularity page reclaim Message-ID: <20120217092205.GA9462@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Hi all, Currently, we encounter a problem about page reclaim. In our product system, there is a lot of applictions that manipulate a number of files. In these files, they can be divided into two categories. One is index file, another is block file. The number of index files is about 15,000, and the number of block files is about 23,000 in a 2TB disk. The application accesses index file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope to hold index file in memory as much as possible, and it works well in Redhat 2.6.18-164. It is about 60-70% of index files that can be hold in memory. However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the linux uses an active list and an inactive list to handle page reclaim, and in 2.6.32 that they are divided into anonymous list and file list. So I am curious about why most of index files can be hold in 2.6.18? The index file should be replaced because mmap doesn't impact the lru list. BTW, I have some problems that need to be discussed. 1. I want to let index and block files are separately reclaimed. Is there any ways to satisify me in current upstream? 2. Maybe we can provide a mechansim to let different files to be mapped into differnet nodes. we can provide a ioctl(2) to tell kernel that this file should be mapped into a specific node id. A nid member is added into addpress_space struct. When alloc_page is called, the page can be allocated from that specific node id. 3. Currently the page can be reclaimed according to pid in memcg. But it is too coarse. I don't know whether memcg could provide a fine granularity page reclaim mechansim. For example, the page is reclaimed according to inode number. I don't subscribe this mailing list, So please Cc me. Thank you. Regards, Zheng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx107.postini.com [74.125.245.107]) by kanga.kvack.org (Postfix) with SMTP id 4D1BD6B0126 for ; Fri, 17 Feb 2012 15:20:11 -0500 (EST) Received: by bkty12 with SMTP id y12so4406590bkt.14 for ; Fri, 17 Feb 2012 12:20:09 -0800 (PST) Message-ID: <4F3EB675.9030702@openvz.org> Date: Sat, 18 Feb 2012 00:20:05 +0400 From: Konstantin Khlebnikov MIME-Version: 1.0 Subject: Re: Fine granularity page reclaim References: <20120217092205.GA9462@gmail.com> In-Reply-To: <20120217092205.GA9462@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Zheng Liu Cc: "linux-mm@kvack.org" Zheng Liu wrote: > Hi all, > > Currently, we encounter a problem about page reclaim. In our product system, > there is a lot of applictions that manipulate a number of files. In these > files, they can be divided into two categories. One is index file, another is > block file. The number of index files is about 15,000, and the number of > block files is about 23,000 in a 2TB disk. The application accesses index > file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope > to hold index file in memory as much as possible, and it works well in Redhat > 2.6.18-164. It is about 60-70% of index files that can be hold in memory. > However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the > linux uses an active list and an inactive list to handle page reclaim, and in > 2.6.32 that they are divided into anonymous list and file list. So I am > curious about why most of index files can be hold in 2.6.18? The index file > should be replaced because mmap doesn't impact the lru list. There was my patch for fixing similar problem with shared/executable mapped pages "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c maybe it will help in your case. > > BTW, I have some problems that need to be discussed. > > 1. I want to let index and block files are separately reclaimed. Is there any > ways to satisify me in current upstream? > > 2. Maybe we can provide a mechansim to let different files to be mapped into > differnet nodes. we can provide a ioctl(2) to tell kernel that this file should > be mapped into a specific node id. A nid member is added into addpress_space > struct. When alloc_page is called, the page can be allocated from that specific > node id. > > 3. Currently the page can be reclaimed according to pid in memcg. But it is too > coarse. I don't know whether memcg could provide a fine granularity page > reclaim mechansim. For example, the page is reclaimed according to inode number. > > I don't subscribe this mailing list, So please Cc me. Thank you. > > Regards, > Zheng > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx130.postini.com [74.125.245.130]) by kanga.kvack.org (Postfix) with SMTP id CFBBD6B004D for ; Mon, 20 Feb 2012 01:15:23 -0500 (EST) Received: by pbcwz17 with SMTP id wz17so7291867pbc.14 for ; Sun, 19 Feb 2012 22:15:23 -0800 (PST) Date: Mon, 20 Feb 2012 14:20:06 +0800 From: Zheng Liu Subject: Re: Fine granularity page reclaim Message-ID: <20120220062006.GA5028@gmail.com> References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F3EB675.9030702@openvz.org> Sender: owner-linux-mm@kvack.org List-ID: To: Konstantin Khlebnikov Cc: "linux-mm@kvack.org" , linux-kernl@vger.kernel.org Cc linux-kernel mailing list. On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: > Zheng Liu wrote: > >Hi all, > > > >Currently, we encounter a problem about page reclaim. In our product system, > >there is a lot of applictions that manipulate a number of files. In these > >files, they can be divided into two categories. One is index file, another is > >block file. The number of index files is about 15,000, and the number of > >block files is about 23,000 in a 2TB disk. The application accesses index > >file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope > >to hold index file in memory as much as possible, and it works well in Redhat > >2.6.18-164. It is about 60-70% of index files that can be hold in memory. > >However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the > >linux uses an active list and an inactive list to handle page reclaim, and in > >2.6.32 that they are divided into anonymous list and file list. So I am > >curious about why most of index files can be hold in 2.6.18? The index file > >should be replaced because mmap doesn't impact the lru list. > > There was my patch for fixing similar problem with shared/executable mapped pages > "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c > maybe it will help in your case. Hi Konstantin, Thank you for your reply. I have tested it in upstream kernel. These patches are useful for multi-processes applications. But, in our product system, there are some applications that are multi-thread. So 'references_ptes > 1' cannot help these applications to hold the data in memory. Regards, Zheng > > > > >BTW, I have some problems that need to be discussed. > > > >1. I want to let index and block files are separately reclaimed. Is there any > >ways to satisify me in current upstream? > > > >2. Maybe we can provide a mechansim to let different files to be mapped into > >differnet nodes. we can provide a ioctl(2) to tell kernel that this file should > >be mapped into a specific node id. A nid member is added into addpress_space > >struct. When alloc_page is called, the page can be allocated from that specific > >node id. > > > >3. Currently the page can be reclaimed according to pid in memcg. But it is too > >coarse. I don't know whether memcg could provide a fine granularity page > >reclaim mechansim. For example, the page is reclaimed according to inode number. > > > >I don't subscribe this mailing list, So please Cc me. Thank you. > > > >Regards, > >Zheng > > > >-- > >To unsubscribe, send a message with 'unsubscribe linux-mm' in > >the body to majordomo@kvack.org. For more info on Linux MM, > >see: http://www.linux-mm.org/ . > >Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > >Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx131.postini.com [74.125.245.131]) by kanga.kvack.org (Postfix) with SMTP id ED79F6B007E for ; Mon, 20 Feb 2012 02:09:59 -0500 (EST) Received: by bkty12 with SMTP id y12so5677296bkt.14 for ; Sun, 19 Feb 2012 23:09:58 -0800 (PST) Message-ID: <4F41F1C2.3030908@openvz.org> Date: Mon, 20 Feb 2012 11:09:54 +0400 From: Konstantin Khlebnikov MIME-Version: 1.0 Subject: Re: Fine granularity page reclaim References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> <20120220062006.GA5028@gmail.com> In-Reply-To: <20120220062006.GA5028@gmail.com> Content-Type: multipart/mixed; boundary="------------010609060400070508090203" Sender: owner-linux-mm@kvack.org List-ID: To: Zheng Liu Cc: "linux-mm@kvack.org" , "linux-kernl@vger.kernel.org" This is a multi-part message in MIME format. --------------010609060400070508090203 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Zheng Liu wrote: > Cc linux-kernel mailing list. > > On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: >> Zheng Liu wrote: >>> Hi all, >>> >>> Currently, we encounter a problem about page reclaim. In our product system, >>> there is a lot of applictions that manipulate a number of files. In these >>> files, they can be divided into two categories. One is index file, another is >>> block file. The number of index files is about 15,000, and the number of >>> block files is about 23,000 in a 2TB disk. The application accesses index >>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope >>> to hold index file in memory as much as possible, and it works well in Redhat >>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory. >>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the >>> linux uses an active list and an inactive list to handle page reclaim, and in >>> 2.6.32 that they are divided into anonymous list and file list. So I am >>> curious about why most of index files can be hold in 2.6.18? The index file >>> should be replaced because mmap doesn't impact the lru list. >> >> There was my patch for fixing similar problem with shared/executable mapped pages >> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c >> maybe it will help in your case. > > Hi Konstantin, > > Thank you for your reply. I have tested it in upstream kernel. These > patches are useful for multi-processes applications. But, in our product > system, there are some applications that are multi-thread. So > 'references_ptes> 1' cannot help these applications to hold the data in > memory. Ok, what if you mmap you data as executable, just to test. Then these pages will be activated after first touch. In attachment patch with per-mm flag with the same effect. > > Regards, > Zheng > >> >>> >>> BTW, I have some problems that need to be discussed. >>> >>> 1. I want to let index and block files are separately reclaimed. Is there any >>> ways to satisify me in current upstream? >>> >>> 2. Maybe we can provide a mechansim to let different files to be mapped into >>> differnet nodes. we can provide a ioctl(2) to tell kernel that this file should >>> be mapped into a specific node id. A nid member is added into addpress_space >>> struct. When alloc_page is called, the page can be allocated from that specific >>> node id. >>> >>> 3. Currently the page can be reclaimed according to pid in memcg. But it is too >>> coarse. I don't know whether memcg could provide a fine granularity page >>> reclaim mechansim. For example, the page is reclaimed according to inode number. >>> >>> I don't subscribe this mailing list, So please Cc me. Thank you. >>> >>> Regards, >>> Zheng >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ >>> Don't email: email@kvack.org >> --------------010609060400070508090203 Content-Type: text/plain; name="mm-introduce-mmf_vm_preferrded-flag" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mm-introduce-mmf_vm_preferrded-flag" mm: introduce MMF_VM_PREFERRDED flag From: Konstantin Khlebnikov This patch introduce mm->flags bit: MMF_VM_PREFERRED, which doubles access bit weight for this mm. Currently the only one effect: mm with this bit activates mapped file pages after first touch, if vma does not marked as sequentially accessed. This should be per-vma sign, but there no free bits in vma->vm_flags, maybe we can make this stuff 64-only. interface: prctl(PR_SET_MM_PREFERRED, 1) to set and prctl(PR_SET_MM_PREFERRED, 0) to clear. Signed-off-by: Konstantin Khlebnikov --- include/linux/prctl.h | 2 ++ include/linux/sched.h | 1 + kernel/sys.c | 17 +++++++++++++++++ mm/rmap.c | 5 ++++- 4 files changed, 24 insertions(+), 1 deletions(-) diff --git a/include/linux/prctl.h b/include/linux/prctl.h index 7ddc7f1..d0f9ceb 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -114,4 +114,6 @@ # define PR_SET_MM_START_BRK 6 # define PR_SET_MM_BRK 7 +#define PR_SET_MM_PREFERRED 36 + #endif /* _LINUX_PRCTL_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 75c15c5..b60883a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -437,6 +437,7 @@ extern int get_dumpable(struct mm_struct *mm); /* leave room for more dump flags */ #define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */ #define MMF_VM_HUGEPAGE 17 /* set when VM_HUGEPAGE is set on vma */ +#define MMF_VM_PREFERRED 18 /* Double pte access bits weight */ #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK) diff --git a/kernel/sys.c b/kernel/sys.c index 4070153..bacf8d5 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1810,6 +1810,20 @@ static int prctl_set_mm(int opt, unsigned long addr, } #endif +static int set_mm_preferred(struct mm_struct *mm, int state) +{ + switch (state) { + case 0: + clear_bit(MMF_VM_PREFERRED, &mm->flags); + return 0; + case 1: + set_bit(MMF_VM_PREFERRED, &mm->flags); + return 0; + default: + return -EINVAL; + } +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { @@ -1962,6 +1976,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_SET_MM: error = prctl_set_mm(arg2, arg3, arg4, arg5); break; + case PR_SET_MM_PREFERRED: + error = set_mm_preferred(me->mm, arg2); + break; default: error = -EINVAL; break; diff --git a/mm/rmap.c b/mm/rmap.c index 78cc46b..b0fd1d1 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -766,8 +766,11 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, (*mapcount)--; - if (referenced) + if (referenced) { + if (test_bit(MMF_VM_PREFERRED, &mm->flags)) + referenced <<= 1; *vm_flags |= vma->vm_flags; + } out: return referenced; } --------------010609060400070508090203-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx121.postini.com [74.125.245.121]) by kanga.kvack.org (Postfix) with SMTP id 2EB146B007E for ; Wed, 7 Mar 2012 12:45:07 -0500 (EST) Received: by pbcup15 with SMTP id up15so884891pbc.14 for ; Wed, 07 Mar 2012 09:45:06 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <4F41F1C2.3030908@openvz.org> References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> <20120220062006.GA5028@gmail.com> <4F41F1C2.3030908@openvz.org> Date: Thu, 8 Mar 2012 01:45:06 +0800 Message-ID: Subject: Re: Fine granularity page reclaim From: Zheng Liu Content-Type: multipart/alternative; boundary=047d7b2eda4f4e414f04baaab87d Sender: owner-linux-mm@kvack.org List-ID: To: Konstantin Khlebnikov Cc: "linux-mm@kvack.org" , linux-kernel@vger.kernel.org --047d7b2eda4f4e414f04baaab87d Content-Type: text/plain; charset=ISO-8859-1 On Monday, February 20, 2012, Konstantin Khlebnikov wrote: > Zheng Liu wrote: >> >> Cc linux-kernel mailing list. >> >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: >>> >>> Zheng Liu wrote: >>>> >>>> Hi all, >>>> >>>> Currently, we encounter a problem about page reclaim. In our product system, >>>> there is a lot of applictions that manipulate a number of files. In these >>>> files, they can be divided into two categories. One is index file, another is >>>> block file. The number of index files is about 15,000, and the number of >>>> block files is about 23,000 in a 2TB disk. The application accesses index >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope >>>> to hold index file in memory as much as possible, and it works well in Redhat >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory. >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the >>>> linux uses an active list and an inactive list to handle page reclaim, and in >>>> 2.6.32 that they are divided into anonymous list and file list. So I am >>>> curious about why most of index files can be hold in 2.6.18? The index file >>>> should be replaced because mmap doesn't impact the lru list. >>> >>> There was my patch for fixing similar problem with shared/executable mapped pages >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c >>> maybe it will help in your case. >> >> Hi Konstantin, >> >> Thank you for your reply. I have tested it in upstream kernel. These >> patches are useful for multi-processes applications. But, in our product >> system, there are some applications that are multi-thread. So >> 'references_ptes> 1' cannot help these applications to hold the data in >> memory. > > Ok, what if you mmap you data as executable, just to test. > Then these pages will be activated after first touch. > In attachment patch with per-mm flag with the same effect. > Hi Konstantin, Sorry for the delay reply. Last two weeks I was trying these two solutions and evaluating the impacts for the performance in our product system. Good news is that these two solutions both work well. They can keep mapped files in memory under mult-thread. But I have a question for the first solution (map the file with PROT_EXEC flag). I think this way is too tricky. As I said previously, these files that needs to be mapped only are normal index file, and they shouldn't be mapped with PROT_EXEC flag from the view of an application programmer. So actually the key issue is that we should provide a mechanism, which lets different file sets can be reclaimed separately. I am not sure whether this idea is useful or not. So any feedbacks are welcomed.:-). Thank you. Regards, Zheng >> >> Regards, >> Zheng >> >>> >>>> >>>> BTW, I have some problems that need to be discussed. >>>> >>>> 1. I want to let index and block files are separately reclaimed. Is there any >>>> ways to satisify me in current upstream? >>>> >>>> 2. Maybe we can provide a mechansim to let different files to be mapped into >>>> differnet nodes. we can provide a ioctl(2) to tell kernel that this file should >>>> be mapped into a specific node id. A nid member is added into addpress_space >>>> struct. When alloc_page is called, the page can be allocated from that specific >>>> node id. >>>> >>>> 3. Currently the page can be reclaimed according to pid in memcg. But it is too >>>> coarse. I don't know whether memcg could provide a fine granularity page >>>> reclaim mechansim. For example, the page is reclaimed according to inode number. >>>> >>>> I don't subscribe this mailing list, So please Cc me. Thank you. >>>> >>>> Regards, >>>> Zheng >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ >>>> Don't email: email@kvack.org >>> > > --047d7b2eda4f4e414f04baaab87d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Monday, February 20, 2012, Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> = Zheng Liu wrote:
>>
>> Cc linux-kernel mailing list.
>>
>> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin K= hlebnikov wrote:
>>>
>>> Zheng Liu wrote:
>&g= t;>>
>>>> Hi all,
>>>>
>>>&= gt; Currently, we encounter a problem about page reclaim. In our product sy= stem,
>>>> there is a lot of applictions that manipulate a number of = files. In these
>>>> files, they can be divided into two cat= egories. One is index file, another is
>>>> block file. The = number of index files is about 15,000, and the number of
>>>> block files is about 23,000 in a 2TB disk. The application= accesses index
>>>> file using mmap(2), and read/write bloc= k file using pread(2)/pwrite(2). We hope
>>>> to hold index = file in memory as much as possible, and it works well in Redhat
>>>> 2.6.18-164. It is about 60-70% of index files that can be = hold in memory.
>>>> However, it doesn't work well in Re= dhat 2.6.32-133. I know in 2.6.18 that the
>>>> linux uses a= n active list and an inactive list to handle page reclaim, and in
>>>> 2.6.32 that they are divided into anonymous list and file = list. So I am
>>>> curious about why most of index files can= be hold in 2.6.18? The index file
>>>> should be replaced b= ecause mmap doesn't impact the lru list.
>>>
>>> There was my patch for fixing similar problem = with shared/executable mapped pages
>>> "vmscan: promote s= hared file mapped pages" commit 34dbc67a644f and commit c909e99364c >>> maybe it will help in your case.
>>
>> Hi Ko= nstantin,
>>
>> Thank you for your reply. =A0I have teste= d it in upstream kernel. =A0These
>> patches are useful for multi-= processes applications. =A0But, in our product
>> system, there are some applications that are multi-thread. =A0So>> 'references_ptes> =A01' cannot help these application= s to hold the data in
>> memory.
>
> Ok, what if you m= map you data as executable, just to test.
> Then these pages will be activated after first touch.
> In attac= hment patch with per-mm flag with the same effect.
>

Hi Konsta= ntin,

Sorry for the delay reply. =A0Last two weeks I was trying thes= e two solutions
and evaluating the impacts for the performance in our product system.
Go= od news is that these two solutions both work well. They can keep
mapped= files in memory under mult-thread. =A0But I have a question for
the fir= st solution (map the file with PROT_EXEC flag). =A0I think this way is
too tricky. =A0As I said previously, these files that needs to be mapped on= ly
are normal index file, and they shouldn't be mapped with PROT_EXE= C flag
from the view of an application programmer. =A0So actually the ke= y issue is
that we should provide a mechanism, which lets different file sets can bereclaimed separately. =A0I am not sure whether this idea is useful or not= . =A0So
any feedbacks are welcomed.:-). =A0Thank you.

Regards,Zheng

>>
>> Regards,
>> Zheng
>>
>>= >
>>>>
>>>> BTW, I have some problems that= need to be discussed.
>>>>
>>>> 1. I want to= let index and block files are separately reclaimed. Is there any
>>>> ways to satisify me in current upstream?
>>>&g= t;
>>>> 2. Maybe we can provide a mechansim to let different= files to be mapped into
>>>> differnet nodes. we can provid= e a ioctl(2) to tell kernel that this file should
>>>> be mapped into a specific node id. A nid member is added i= nto addpress_space
>>>> struct. When alloc_page is called, t= he page can be allocated from that specific
>>>> node id. >>>>
>>>> 3. Currently the page can be reclaimed= according to pid in memcg. But it is too
>>>> coarse. I don= 't know whether memcg could provide a fine granularity page
>>= >> reclaim mechansim. For example, the page is reclaimed according to= inode number.
>>>>
>>>> I don't subscribe this mailing lis= t, So please Cc me. Thank you.
>>>>
>>>> Rega= rds,
>>>> Zheng
>>>>
>>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux= -mm' in
>>>> the body to majordomo@kvack.org. =A0For more info on Linux MM,
>>&= gt;> see: http://www.linux-mm.org/<= /a> .
>>>> Fight unfair telecom internet charges in Canada: sign
http://stopthemeter.ca/
>>>= > Don't email:<a href=3Dmailto:"dont@kvack.org"> =A0 e= mail@kvack.org</a>
>>>
>
> --047d7b2eda4f4e414f04baaab87d-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx182.postini.com [74.125.245.182]) by kanga.kvack.org (Postfix) with SMTP id 316636B002C for ; Wed, 7 Mar 2012 15:33:27 -0500 (EST) Received: by bkwq16 with SMTP id q16so7725834bkw.14 for ; Wed, 07 Mar 2012 12:33:25 -0800 (PST) Message-ID: <4F57C610.8050101@openvz.org> Date: Thu, 08 Mar 2012 00:33:20 +0400 From: Konstantin Khlebnikov MIME-Version: 1.0 Subject: Re: Fine granularity page reclaim References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> <20120220062006.GA5028@gmail.com> <4F41F1C2.3030908@openvz.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Zheng Liu Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Zheng Liu wrote: > > > On Monday, February 20, 2012, Konstantin Khlebnikov > wrote: > > Zheng Liu wrote: > >> > >> Cc linux-kernel mailing list. > >> > >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: > >>> > >>> Zheng Liu wrote: > >>>> > >>>> Hi all, > >>>> > >>>> Currently, we encounter a problem about page reclaim. In our product system, > >>>> there is a lot of applictions that manipulate a number of files. In these > >>>> files, they can be divided into two categories. One is index file, another is > >>>> block file. The number of index files is about 15,000, and the number of > >>>> block files is about 23,000 in a 2TB disk. The application accesses index > >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope > >>>> to hold index file in memory as much as possible, and it works well in Redhat > >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory. > >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the > >>>> linux uses an active list and an inactive list to handle page reclaim, and in > >>>> 2.6.32 that they are divided into anonymous list and file list. So I am > >>>> curious about why most of index files can be hold in 2.6.18? The index file > >>>> should be replaced because mmap doesn't impact the lru list. > >>> > >>> There was my patch for fixing similar problem with shared/executable mapped pages > >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c > >>> maybe it will help in your case. > >> > >> Hi Konstantin, > >> > >> Thank you for your reply. I have tested it in upstream kernel. These > >> patches are useful for multi-processes applications. But, in our product > >> system, there are some applications that are multi-thread. So > >> 'references_ptes> 1' cannot help these applications to hold the data in > >> memory. > > > > Ok, what if you mmap you data as executable, just to test. > > Then these pages will be activated after first touch. > > In attachment patch with per-mm flag with the same effect. > > > > Hi Konstantin, > > Sorry for the delay reply. Last two weeks I was trying these two solutions > and evaluating the impacts for the performance in our product system. > Good news is that these two solutions both work well. They can keep > mapped files in memory under mult-thread. But I have a question for > the first solution (map the file with PROT_EXEC flag). I think this way is > too tricky. As I said previously, these files that needs to be mapped only > are normal index file, and they shouldn't be mapped with PROT_EXEC flag > from the view of an application programmer. So actually the key issue is > that we should provide a mechanism, which lets different file sets can be > reclaimed separately. I am not sure whether this idea is useful or not. So > any feedbacks are welcomed.:-). Thank you. > Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not very flexible too. I prefer setting some kind of memory pressure priorities for each vma and inode. Probably we can sort vma and inodes into different cgroup-like sets and balance memory pressure between them. Maybe someone was thought about it... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx207.postini.com [74.125.245.207]) by kanga.kvack.org (Postfix) with SMTP id E5EB66B002C for ; Wed, 7 Mar 2012 21:49:35 -0500 (EST) Received: by dadv6 with SMTP id v6so31649dad.14 for ; Wed, 07 Mar 2012 18:49:35 -0800 (PST) Date: Thu, 8 Mar 2012 10:54:52 +0800 From: Zheng Liu Subject: Re: Fine granularity page reclaim Message-ID: <20120308025452.GA6196@gmail.com> References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> <20120220062006.GA5028@gmail.com> <4F41F1C2.3030908@openvz.org> <4F57C610.8050101@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F57C610.8050101@openvz.org> Sender: owner-linux-mm@kvack.org List-ID: To: Konstantin Khlebnikov Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" On Thu, Mar 08, 2012 at 12:33:20AM +0400, Konstantin Khlebnikov wrote: > Zheng Liu wrote: > > > > > >On Monday, February 20, 2012, Konstantin Khlebnikov > wrote: > > > Zheng Liu wrote: > > >> > > >> Cc linux-kernel mailing list. > > >> > > >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: > > >>> > > >>> Zheng Liu wrote: > > >>>> > > >>>> Hi all, > > >>>> > > >>>> Currently, we encounter a problem about page reclaim. In our product system, > > >>>> there is a lot of applictions that manipulate a number of files. In these > > >>>> files, they can be divided into two categories. One is index file, another is > > >>>> block file. The number of index files is about 15,000, and the number of > > >>>> block files is about 23,000 in a 2TB disk. The application accesses index > > >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope > > >>>> to hold index file in memory as much as possible, and it works well in Redhat > > >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory. > > >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the > > >>>> linux uses an active list and an inactive list to handle page reclaim, and in > > >>>> 2.6.32 that they are divided into anonymous list and file list. So I am > > >>>> curious about why most of index files can be hold in 2.6.18? The index file > > >>>> should be replaced because mmap doesn't impact the lru list. > > >>> > > >>> There was my patch for fixing similar problem with shared/executable mapped pages > > >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c > > >>> maybe it will help in your case. > > >> > > >> Hi Konstantin, > > >> > > >> Thank you for your reply. I have tested it in upstream kernel. These > > >> patches are useful for multi-processes applications. But, in our product > > >> system, there are some applications that are multi-thread. So > > >> 'references_ptes> 1' cannot help these applications to hold the data in > > >> memory. > > > > > > Ok, what if you mmap you data as executable, just to test. > > > Then these pages will be activated after first touch. > > > In attachment patch with per-mm flag with the same effect. > > > > > > >Hi Konstantin, > > > >Sorry for the delay reply. Last two weeks I was trying these two solutions > >and evaluating the impacts for the performance in our product system. > >Good news is that these two solutions both work well. They can keep > >mapped files in memory under mult-thread. But I have a question for > >the first solution (map the file with PROT_EXEC flag). I think this way is > >too tricky. As I said previously, these files that needs to be mapped only > >are normal index file, and they shouldn't be mapped with PROT_EXEC flag > >from the view of an application programmer. So actually the key issue is > >that we should provide a mechanism, which lets different file sets can be > >reclaimed separately. I am not sure whether this idea is useful or not. So > >any feedbacks are welcomed.:-). Thank you. > > > > Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not > very flexible too. I prefer setting some kind of memory pressure priorities > for each vma and inode. Probably we can sort vma and inodes into different > cgroup-like sets and balance memory pressure between them. > Maybe someone was thought about it... Thanks for your advices. About setting pressure priorities for each vma and inode, I will send a new mail to mailing list to discuss this problem. Maybe someone has some good ideas for it. ;-) Regards, Zheng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id 5FE516B004D for ; Fri, 6 Apr 2012 20:18:44 -0400 (EDT) Received: by lagz14 with SMTP id z14so3361863lag.14 for ; Fri, 06 Apr 2012 17:18:42 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20120217092205.GA9462@gmail.com> References: <20120217092205.GA9462@gmail.com> Date: Fri, 6 Apr 2012 17:18:42 -0700 Message-ID: Subject: Re: Fine granularity page reclaim From: Ying Han Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Zheng Liu Cc: linux-mm@kvack.org On Fri, Feb 17, 2012 at 1:22 AM, Zheng Liu wrote: > Hi all, > > Currently, we encounter a problem about page reclaim. In our product syst= em, > there is a lot of applictions that manipulate a number of files. In these > files, they can be divided into two categories. One is index file, anothe= r is > block file. The number of index files is about 15,000, and the number of > block files is about 23,000 in a 2TB disk. The application accesses index > file using mmap(2), and read/write block file using pread(2)/pwrite(2). W= e hope > to hold index file in memory as much as possible, and it works well in Re= dhat > 2.6.18-164. It is about 60-70% of index files that can be hold in memory. > However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that= the > linux uses an active list and an inactive list to handle page reclaim, an= d in > 2.6.32 that they are divided into anonymous list and file list. So I am > curious about why most of index files can be hold in 2.6.18? One of changes after the split-lru is different scan ratio (active vs inactive) for file-lru and anon-lru. You can check the following two functions: inactive_anon_is_low_global() inactive_file_is_low_global() Depends on your machine size, we might end of scanning more pages on file l= ru. --Ying The index file > should be replaced because mmap doesn't impact the lru list. > > BTW, I have some problems that need to be discussed. > > 1. I want to let index and block files are separately reclaimed. Is there= any > ways to satisify me in current upstream? > > 2. Maybe we can provide a mechansim to let different files to be mapped i= nto > differnet nodes. we can provide a ioctl(2) to tell kernel that this file = should > be mapped into a specific node id. A nid member is added into addpress_sp= ace > struct. When alloc_page is called, the page can be allocated from that sp= ecific > node id. > > 3. Currently the page can be reclaimed according to pid in memcg. But it = is too > coarse. I don't know whether memcg could provide a fine granularity page > reclaim mechansim. For example, the page is reclaimed according to inode = number. > > I don't subscribe this mailing list, So please Cc me. Thank you. > > Regards, > Zheng > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter= .ca/ > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752296Ab2BTGT3 (ORCPT ); Mon, 20 Feb 2012 01:19:29 -0500 Received: from mail-pw0-f46.google.com ([209.85.160.46]:60584 "EHLO mail-pw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751252Ab2BTGT2 convert rfc822-to-8bit (ORCPT ); Mon, 20 Feb 2012 01:19:28 -0500 Authentication-Results: mr.google.com; spf=pass (google.com: domain of gnehzuil.liu@gmail.com designates 10.68.217.67 as permitted sender) smtp.mail=gnehzuil.liu@gmail.com; dkim=pass header.i=gnehzuil.liu@gmail.com MIME-Version: 1.0 In-Reply-To: <20120220062006.GA5028@gmail.com> References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> <20120220062006.GA5028@gmail.com> Date: Mon, 20 Feb 2012 14:19:28 +0800 Message-ID: Subject: Fwd: Fine granularity page reclaim From: Zheng Liu Cc: linux-kernel@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT To: unlisted-recipients:; (no To-header on input) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ---------- Forwarded message ---------- From: Zheng Liu Date: Mon, Feb 20, 2012 at 2:20 PM Subject: Re: Fine granularity page reclaim To: Konstantin Khlebnikov Cc: "linux-mm@kvack.org" , linux-kernl@vger.kernel.org Cc linux-kernel mailing list. On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: > Zheng Liu wrote: > >Hi all, > > > >Currently, we encounter a problem about page reclaim. In our product > > system, > >there is a lot of applictions that manipulate a number of files. In these > >files, they can be divided into two categories. One is index file, > > another is > >block file. The number of index files is about 15,000, and the number of > >block files is about 23,000 in a 2TB disk. The application accesses index > >file using mmap(2), and read/write block file using pread(2)/pwrite(2). > > We hope > >to hold index file in memory as much as possible, and it works well in > > Redhat > >2.6.18-164. It is about 60-70% of index files that can be hold in memory. > >However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that > > the > >linux uses an active list and an inactive list to handle page reclaim, > > and in > >2.6.32 that they are divided into anonymous list and file list. So I am > >curious about why most of index files can be hold in 2.6.18? The index > > file > >should be replaced because mmap doesn't impact the lru list. > > There was my patch for fixing similar problem with shared/executable > mapped pages > "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit > c909e99364c > maybe it will help in your case. Hi Konstantin, Thank you for your reply.  I have tested it in upstream kernel.  These patches are useful for multi-processes applications.  But, in our product system, there are some applications that are multi-thread.  So 'references_ptes > 1' cannot help these applications to hold the data in memory. Regards, Zheng > > > > >BTW, I have some problems that need to be discussed. > > > >1. I want to let index and block files are separately reclaimed. Is there > > any > >ways to satisify me in current upstream? > > > >2. Maybe we can provide a mechansim to let different files to be mapped > > into > >differnet nodes. we can provide a ioctl(2) to tell kernel that this file > > should > >be mapped into a specific node id. A nid member is added into > > addpress_space > >struct. When alloc_page is called, the page can be allocated from that > > specific > >node id. > > > >3. Currently the page can be reclaimed according to pid in memcg. But it > > is too > >coarse. I don't know whether memcg could provide a fine granularity page > >reclaim mechansim. For example, the page is reclaimed according to inode > > number. > > > >I don't subscribe this mailing list, So please Cc me. Thank you. > > > >Regards, > >Zheng > > > >-- > >To unsubscribe, send a message with 'unsubscribe linux-mm' in > >the body to majordomo@kvack.org.  For more info on Linux MM, > >see: http://www.linux-mm.org/ . > >Fight unfair telecom internet charges in Canada: sign > > http://stopthemeter.ca/ > >Don't email:  email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760008Ab2CGUd1 (ORCPT ); Wed, 7 Mar 2012 15:33:27 -0500 Received: from mail-bk0-f46.google.com ([209.85.214.46]:33885 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758083Ab2CGUd0 (ORCPT ); Wed, 7 Mar 2012 15:33:26 -0500 Message-ID: <4F57C610.8050101@openvz.org> Date: Thu, 08 Mar 2012 00:33:20 +0400 From: Konstantin Khlebnikov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120217 Firefox/10.0.2 Iceape/2.7.2 MIME-Version: 1.0 To: Zheng Liu CC: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: Fine granularity page reclaim References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> <20120220062006.GA5028@gmail.com> <4F41F1C2.3030908@openvz.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Zheng Liu wrote: > > > On Monday, February 20, 2012, Konstantin Khlebnikov > wrote: > > Zheng Liu wrote: > >> > >> Cc linux-kernel mailing list. > >> > >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: > >>> > >>> Zheng Liu wrote: > >>>> > >>>> Hi all, > >>>> > >>>> Currently, we encounter a problem about page reclaim. In our product system, > >>>> there is a lot of applictions that manipulate a number of files. In these > >>>> files, they can be divided into two categories. One is index file, another is > >>>> block file. The number of index files is about 15,000, and the number of > >>>> block files is about 23,000 in a 2TB disk. The application accesses index > >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope > >>>> to hold index file in memory as much as possible, and it works well in Redhat > >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory. > >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the > >>>> linux uses an active list and an inactive list to handle page reclaim, and in > >>>> 2.6.32 that they are divided into anonymous list and file list. So I am > >>>> curious about why most of index files can be hold in 2.6.18? The index file > >>>> should be replaced because mmap doesn't impact the lru list. > >>> > >>> There was my patch for fixing similar problem with shared/executable mapped pages > >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c > >>> maybe it will help in your case. > >> > >> Hi Konstantin, > >> > >> Thank you for your reply. I have tested it in upstream kernel. These > >> patches are useful for multi-processes applications. But, in our product > >> system, there are some applications that are multi-thread. So > >> 'references_ptes> 1' cannot help these applications to hold the data in > >> memory. > > > > Ok, what if you mmap you data as executable, just to test. > > Then these pages will be activated after first touch. > > In attachment patch with per-mm flag with the same effect. > > > > Hi Konstantin, > > Sorry for the delay reply. Last two weeks I was trying these two solutions > and evaluating the impacts for the performance in our product system. > Good news is that these two solutions both work well. They can keep > mapped files in memory under mult-thread. But I have a question for > the first solution (map the file with PROT_EXEC flag). I think this way is > too tricky. As I said previously, these files that needs to be mapped only > are normal index file, and they shouldn't be mapped with PROT_EXEC flag > from the view of an application programmer. So actually the key issue is > that we should provide a mechanism, which lets different file sets can be > reclaimed separately. I am not sure whether this idea is useful or not. So > any feedbacks are welcomed.:-). Thank you. > Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not very flexible too. I prefer setting some kind of memory pressure priorities for each vma and inode. Probably we can sort vma and inodes into different cgroup-like sets and balance memory pressure between them. Maybe someone was thought about it... From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752574Ab2CHCti (ORCPT ); Wed, 7 Mar 2012 21:49:38 -0500 Received: from mail-pz0-f46.google.com ([209.85.210.46]:44474 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751347Ab2CHCtf (ORCPT ); Wed, 7 Mar 2012 21:49:35 -0500 Date: Thu, 8 Mar 2012 10:54:52 +0800 From: Zheng Liu To: Konstantin Khlebnikov Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: Fine granularity page reclaim Message-ID: <20120308025452.GA6196@gmail.com> Mail-Followup-To: Konstantin Khlebnikov , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" References: <20120217092205.GA9462@gmail.com> <4F3EB675.9030702@openvz.org> <20120220062006.GA5028@gmail.com> <4F41F1C2.3030908@openvz.org> <4F57C610.8050101@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F57C610.8050101@openvz.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 08, 2012 at 12:33:20AM +0400, Konstantin Khlebnikov wrote: > Zheng Liu wrote: > > > > > >On Monday, February 20, 2012, Konstantin Khlebnikov > wrote: > > > Zheng Liu wrote: > > >> > > >> Cc linux-kernel mailing list. > > >> > > >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote: > > >>> > > >>> Zheng Liu wrote: > > >>>> > > >>>> Hi all, > > >>>> > > >>>> Currently, we encounter a problem about page reclaim. In our product system, > > >>>> there is a lot of applictions that manipulate a number of files. In these > > >>>> files, they can be divided into two categories. One is index file, another is > > >>>> block file. The number of index files is about 15,000, and the number of > > >>>> block files is about 23,000 in a 2TB disk. The application accesses index > > >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope > > >>>> to hold index file in memory as much as possible, and it works well in Redhat > > >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory. > > >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the > > >>>> linux uses an active list and an inactive list to handle page reclaim, and in > > >>>> 2.6.32 that they are divided into anonymous list and file list. So I am > > >>>> curious about why most of index files can be hold in 2.6.18? The index file > > >>>> should be replaced because mmap doesn't impact the lru list. > > >>> > > >>> There was my patch for fixing similar problem with shared/executable mapped pages > > >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c > > >>> maybe it will help in your case. > > >> > > >> Hi Konstantin, > > >> > > >> Thank you for your reply. I have tested it in upstream kernel. These > > >> patches are useful for multi-processes applications. But, in our product > > >> system, there are some applications that are multi-thread. So > > >> 'references_ptes> 1' cannot help these applications to hold the data in > > >> memory. > > > > > > Ok, what if you mmap you data as executable, just to test. > > > Then these pages will be activated after first touch. > > > In attachment patch with per-mm flag with the same effect. > > > > > > >Hi Konstantin, > > > >Sorry for the delay reply. Last two weeks I was trying these two solutions > >and evaluating the impacts for the performance in our product system. > >Good news is that these two solutions both work well. They can keep > >mapped files in memory under mult-thread. But I have a question for > >the first solution (map the file with PROT_EXEC flag). I think this way is > >too tricky. As I said previously, these files that needs to be mapped only > >are normal index file, and they shouldn't be mapped with PROT_EXEC flag > >from the view of an application programmer. So actually the key issue is > >that we should provide a mechanism, which lets different file sets can be > >reclaimed separately. I am not sure whether this idea is useful or not. So > >any feedbacks are welcomed.:-). Thank you. > > > > Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not > very flexible too. I prefer setting some kind of memory pressure priorities > for each vma and inode. Probably we can sort vma and inodes into different > cgroup-like sets and balance memory pressure between them. > Maybe someone was thought about it... Thanks for your advices. About setting pressure priorities for each vma and inode, I will send a new mail to mailing list to discuss this problem. Maybe someone has some good ideas for it. ;-) Regards, Zheng