public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Fwd: Fine granularity page reclaim
       [not found]   ` <20120220062006.GA5028@gmail.com>
@ 2012-02-20  6:19     ` Zheng Liu
       [not found]     ` <4F41F1C2.3030908@openvz.org>
  1 sibling, 0 replies; 3+ messages in thread
From: Zheng Liu @ 2012-02-20  6:19 UTC (permalink / raw)
  Cc: linux-kernel

---------- Forwarded message ----------
From: Zheng Liu <gnehzuil.liu@gmail.com>
Date: Mon, Feb 20, 2012 at 2:20 PM
Subject: Re: Fine granularity page reclaim
To: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>, linux-kernl@vger.kernel.org


Cc linux-kernel mailing list.

On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
> Zheng Liu wrote:
> >Hi all,
> >
> >Currently, we encounter a problem about page reclaim. In our product
> > system,
> >there is a lot of applictions that manipulate a number of files. In these
> >files, they can be divided into two categories. One is index file,
> > another is
> >block file. The number of index files is about 15,000, and the number of
> >block files is about 23,000 in a 2TB disk. The application accesses index
> >file using mmap(2), and read/write block file using pread(2)/pwrite(2).
> > We hope
> >to hold index file in memory as much as possible, and it works well in
> > Redhat
> >2.6.18-164. It is about 60-70% of index files that can be hold in memory.
> >However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that
> > the
> >linux uses an active list and an inactive list to handle page reclaim,
> > and in
> >2.6.32 that they are divided into anonymous list and file list. So I am
> >curious about why most of index files can be hold in 2.6.18? The index
> > file
> >should be replaced because mmap doesn't impact the lru list.
>
> There was my patch for fixing similar problem with shared/executable
> mapped pages
> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit
> c909e99364c
> maybe it will help in your case.

Hi Konstantin,

Thank you for your reply.  I have tested it in upstream kernel.  These
patches are useful for multi-processes applications.  But, in our product
system, there are some applications that are multi-thread.  So
'references_ptes > 1' cannot help these applications to hold the data in
memory.

Regards,
Zheng

>
> >
> >BTW, I have some problems that need to be discussed.
> >
> >1. I want to let index and block files are separately reclaimed. Is there
> > any
> >ways to satisify me in current upstream?
> >
> >2. Maybe we can provide a mechansim to let different files to be mapped
> > into
> >differnet nodes. we can provide a ioctl(2) to tell kernel that this file
> > should
> >be mapped into a specific node id. A nid member is added into
> > addpress_space
> >struct. When alloc_page is called, the page can be allocated from that
> > specific
> >node id.
> >
> >3. Currently the page can be reclaimed according to pid in memcg. But it
> > is too
> >coarse. I don't know whether memcg could provide a fine granularity page
> >reclaim mechansim. For example, the page is reclaimed according to inode
> > number.
> >
> >I don't subscribe this mailing list, So please Cc me. Thank you.
> >
> >Regards,
> >Zheng
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Fight unfair telecom internet charges in Canada: sign
> > http://stopthemeter.ca/
> >Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Fine granularity page reclaim
       [not found]       ` <CANWLp03njY11Swiic7_mv6Gk3C=v4YYe5nLzbAjLH0KftyQftA@mail.gmail.com>
@ 2012-03-07 20:33         ` Konstantin Khlebnikov
  2012-03-08  2:54           ` Zheng Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-07 20:33 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org

Zheng Liu wrote:
>
>
> On Monday, February 20, 2012, Konstantin Khlebnikov <khlebnikov@openvz.org <mailto:khlebnikov@openvz.org>> wrote:
>  > Zheng Liu wrote:
>  >>
>  >> Cc linux-kernel mailing list.
>  >>
>  >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
>  >>>
>  >>> Zheng Liu wrote:
>  >>>>
>  >>>> Hi all,
>  >>>>
>  >>>> Currently, we encounter a problem about page reclaim. In our product system,
>  >>>> there is a lot of applictions that manipulate a number of files. In these
>  >>>> files, they can be divided into two categories. One is index file, another is
>  >>>> block file. The number of index files is about 15,000, and the number of
>  >>>> block files is about 23,000 in a 2TB disk. The application accesses index
>  >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
>  >>>> to hold index file in memory as much as possible, and it works well in Redhat
>  >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
>  >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
>  >>>> linux uses an active list and an inactive list to handle page reclaim, and in
>  >>>> 2.6.32 that they are divided into anonymous list and file list. So I am
>  >>>> curious about why most of index files can be hold in 2.6.18? The index file
>  >>>> should be replaced because mmap doesn't impact the lru list.
>  >>>
>  >>> There was my patch for fixing similar problem with shared/executable mapped pages
>  >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
>  >>> maybe it will help in your case.
>  >>
>  >> Hi Konstantin,
>  >>
>  >> Thank you for your reply.  I have tested it in upstream kernel.  These
>  >> patches are useful for multi-processes applications.  But, in our product
>  >> system, there are some applications that are multi-thread.  So
>  >> 'references_ptes>  1' cannot help these applications to hold the data in
>  >> memory.
>  >
>  > Ok, what if you mmap you data as executable, just to test.
>  > Then these pages will be activated after first touch.
>  > In attachment patch with per-mm flag with the same effect.
>  >
>
> Hi Konstantin,
>
> Sorry for the delay reply.  Last two weeks I was trying these two solutions
> and evaluating the impacts for the performance in our product system.
> Good news is that these two solutions both work well. They can keep
> mapped files in memory under mult-thread.  But I have a question for
> the first solution (map the file with PROT_EXEC flag).  I think this way is
> too tricky.  As I said previously, these files that needs to be mapped only
> are normal index file, and they shouldn't be mapped with PROT_EXEC flag
> from the view of an application programmer.  So actually the key issue is
> that we should provide a mechanism, which lets different file sets can be
> reclaimed separately.  I am not sure whether this idea is useful or not.  So
> any feedbacks are welcomed.:-).  Thank you.
>

Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not
very flexible too. I prefer setting some kind of memory pressure priorities
for each vma and inode. Probably we can sort vma and inodes into different
cgroup-like sets and balance memory pressure between them.
Maybe someone was thought about it...

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Fine granularity page reclaim
  2012-03-07 20:33         ` Konstantin Khlebnikov
@ 2012-03-08  2:54           ` Zheng Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Zheng Liu @ 2012-03-08  2:54 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org

On Thu, Mar 08, 2012 at 12:33:20AM +0400, Konstantin Khlebnikov wrote:
> Zheng Liu wrote:
> >
> >
> >On Monday, February 20, 2012, Konstantin Khlebnikov <khlebnikov@openvz.org <mailto:khlebnikov@openvz.org>> wrote:
> > > Zheng Liu wrote:
> > >>
> > >> Cc linux-kernel mailing list.
> > >>
> > >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
> > >>>
> > >>> Zheng Liu wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> Currently, we encounter a problem about page reclaim. In our product system,
> > >>>> there is a lot of applictions that manipulate a number of files. In these
> > >>>> files, they can be divided into two categories. One is index file, another is
> > >>>> block file. The number of index files is about 15,000, and the number of
> > >>>> block files is about 23,000 in a 2TB disk. The application accesses index
> > >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
> > >>>> to hold index file in memory as much as possible, and it works well in Redhat
> > >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
> > >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
> > >>>> linux uses an active list and an inactive list to handle page reclaim, and in
> > >>>> 2.6.32 that they are divided into anonymous list and file list. So I am
> > >>>> curious about why most of index files can be hold in 2.6.18? The index file
> > >>>> should be replaced because mmap doesn't impact the lru list.
> > >>>
> > >>> There was my patch for fixing similar problem with shared/executable mapped pages
> > >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
> > >>> maybe it will help in your case.
> > >>
> > >> Hi Konstantin,
> > >>
> > >> Thank you for your reply.  I have tested it in upstream kernel.  These
> > >> patches are useful for multi-processes applications.  But, in our product
> > >> system, there are some applications that are multi-thread.  So
> > >> 'references_ptes>  1' cannot help these applications to hold the data in
> > >> memory.
> > >
> > > Ok, what if you mmap you data as executable, just to test.
> > > Then these pages will be activated after first touch.
> > > In attachment patch with per-mm flag with the same effect.
> > >
> >
> >Hi Konstantin,
> >
> >Sorry for the delay reply.  Last two weeks I was trying these two solutions
> >and evaluating the impacts for the performance in our product system.
> >Good news is that these two solutions both work well. They can keep
> >mapped files in memory under mult-thread.  But I have a question for
> >the first solution (map the file with PROT_EXEC flag).  I think this way is
> >too tricky.  As I said previously, these files that needs to be mapped only
> >are normal index file, and they shouldn't be mapped with PROT_EXEC flag
> >from the view of an application programmer.  So actually the key issue is
> >that we should provide a mechanism, which lets different file sets can be
> >reclaimed separately.  I am not sure whether this idea is useful or not.  So
> >any feedbacks are welcomed.:-).  Thank you.
> >
> 
> Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not
> very flexible too. I prefer setting some kind of memory pressure priorities
> for each vma and inode. Probably we can sort vma and inodes into different
> cgroup-like sets and balance memory pressure between them.
> Maybe someone was thought about it...

Thanks for your advices.  About setting pressure priorities for each vma
and inode, I will send a new mail to mailing list to discuss this
problem.  Maybe someone has some good ideas for it. ;-)

Regards,
Zheng

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-03-08  2:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20120217092205.GA9462@gmail.com>
     [not found] ` <4F3EB675.9030702@openvz.org>
     [not found]   ` <20120220062006.GA5028@gmail.com>
2012-02-20  6:19     ` Fwd: Fine granularity page reclaim Zheng Liu
     [not found]     ` <4F41F1C2.3030908@openvz.org>
     [not found]       ` <CANWLp03njY11Swiic7_mv6Gk3C=v4YYe5nLzbAjLH0KftyQftA@mail.gmail.com>
2012-03-07 20:33         ` Konstantin Khlebnikov
2012-03-08  2:54           ` Zheng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox