linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Minchan Kim <minchan@kernel.org>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	riel@redhat.com, kosaki.motohiro@jp.fujitsu.com
Subject: Re: Fwd: Control page reclaim granularity
Date: Tue, 13 Mar 2012 11:51:08 +0900	[thread overview]
Message-ID: <20120313025108.GB7125@barrios> (raw)
In-Reply-To: <20120312151542.GA16253@gmail.com>

On Mon, Mar 12, 2012 at 11:15:43PM +0800, Zheng Liu wrote:
> On Mon, Mar 12, 2012 at 10:42:26PM +0900, Minchan Kim wrote:
> > On Mon, Mar 12, 2012 at 04:14:14PM +0800, Zheng Liu wrote:
> > > On 03/12/2012 02:20 PM, Konstantin Khlebnikov wrote:
> > > > Minchan Kim wrote:
> > > >> On Mon, Mar 12, 2012 at 10:06:09AM +0800, Zheng Liu wrote:
> > > >>> On Mon, Mar 12, 2012 at 09:29:34AM +0900, Minchan Kim wrote:
> > > >>>> I forgot to Ccing you.
> > > >>>> Sorry.
> > > >>>>
> > > >>>> ---------- Forwarded message ----------
> > > >>>> From: Minchan Kim<minchan@kernel.org>
> > > >>>> Date: Mon, Mar 12, 2012 at 9:28 AM
> > > >>>> Subject: Re: Control page reclaim granularity
> > > >>>> To: Minchan Kim<minchan@kernel.org>, linux-mm<linux-mm@kvack.org>,
> > > >>>> linux-kernel<linux-kernel@vger.kernel.org>, Konstantin Khlebnikov<
> > > >>>> khlebnikov@openvz.org>, riel@redhat.com, kosaki.motohiro@jp.fujitsu.com
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Mar 09, 2012 at 12:54:03AM +0800, Zheng Liu wrote:
> > > >>>>> Hi Minchan,
> > > >>>>>
> > > >>>>> Sorry, I forgot to say that I don't subscribe linux-mm and
> > > >>>>> linux-kernel
> > > >>>>> mailing list.  So please Cc me.
> > > >>>>>
> > > >>>>> IMHO, maybe we should re-think about how does user use mmap(2).  I
> > > >>>>> describe the cases I known in our product system.  They can be
> > > >>>>> categorized into two cases.  One is mmaped all data files into memory
> > > >>>>> and sometime it uses write(2) to append some data, and another uses
> > > >>>>> mmap(2)/munmap(2) and read(2)/write(2) to manipulate the files.  In
> > > >>>>> the
> > > >>>>> second case,  the application wants to keep mmaped page into memory
> > > >>>>> and
> > > >>>>> let file pages to be reclaimed firstly.  So, IMO, when application
> > > >>>>> uses
> > > >>>>> mmap(2) to manipulate files, it is possible to imply that it wants
> > > >>>>> keep
> > > >>>>> these mmaped pages into memory and do not be reclaimed.  At least
> > > >>>>> these
> > > >>>>> pages do not be reclaimed early than file pages.  I think that
> > > >>>>> maybe we
> > > >>>>> can recover that routine and provide a sysctl parameter to let the
> > > >>>>> user
> > > >>>>> to set this ratio between mmaped pages and file pages.
> > > >>>>
> > > >>>> I am not convinced why we should handle mapped page specially.
> > > >>>> Sometimem, someone may use mmap by reducing buffer copy compared to
> > > >>>> read
> > > >>>> system call.
> > > >>>> So I think we can't make sure mmaped pages are always win.
> > > >>>>
> > > >>>> My suggestion is that it would be better to declare by user explicitly.
> > > >>>> I think we can implement it by madvise and fadvise's WILLNEED option.
> > > >>>> Current implementation is just readahead if there isn't a page in
> > > >>>> memory
> > > >>>> but I think
> > > >>>> we can promote from inactive to active if there is already a page in
> > > >>>> memory.
> > > >>>>
> > > >>>> It's more clear and it couldn't be affected by kernel page reclaim
> > > >>>> algorithm change
> > > >>>> like this.
> > > >>>
> > > >>> Thank you for your advice.  But I still have question about this
> > > >>> solution.  If we improve the madvise(2) and fadvise(2)'s WILLNEED
> > > >>> option,  it will cause an inconsistently status for pages that be
> > > >>> manipulated by madvise(2) and/or fadvise(2).  For example, when I call
> > > >>> madvise with WILLNEED flag, some pages will be moved into active list if
> > > >>> they already have been in memory, and other pages will be read into
> > > >>> memory and be saved in inactive list if they don't be in memory.  Then
> > > >>> pages that are in inactive list are possible to be reclaim.  So from the
> > > >>> view of users, it is inconsistent because some pages are in memory and
> > > >>> some pages are reclaimed.  But actually the user hopes that all of pages
> > > >>> can be kept in memory.  IMHO, this inconsistency is weird and makes
> > > >>> users
> > > >>> puzzled.
> > > >>
> > > >> Now problem is that
> > > >>
> > > >> 1. User want to keep pages which are used once in a while in memory.
> > > >> 2. Kernel want to reclaim them because they are surely reclaim target
> > > >>     pages in point of view by LRU.
> > > >>
> > > >> The most desriable approach is that user should use mlock to guarantee
> > > >> them in memory. But mlock is too big overhead and user doesn't want to
> > > >> keep
> > > >> memory all pages all at once.(Ie, he want demand paging when he need
> > > >> the page)
> > > >> Right?
> > > >>
> > > >> madvise, it's a just hint for kernel and kernel doesn't need to make
> > > >> sure madvise's behavior.
> > > >> In point of view, such inconsistency might not be a big problem.
> > > >>
> > > >> Big problem I think now is that user should use madvise(WILLNEED)
> > > >> periodically because such
> > > >> activation happens once when user calls madvise. If user doesn't use
> > > >> page frequently after
> > > >> user calls it, it ends up moving into inactive list and even could be
> > > >> reclaimed.
> > > >> It's not good. :-(
> > > >>
> > > >> Okay. How about adding new VM_WORKINGSET?
> > > >> And reclaimer would give one more round trip in active/inactive list
> > > >> erwhen reclaim happens
> > > >> if the page is referenced.
> > > >>
> > > >> Sigh. We have no room for new VM_FLAG in 32 bit.
> > > > p
> > > > It would be nice to mark struct address_space with this flag and export
> > > > AS_UNEVICTABLE somehow.
> > > > Maybe we can reuse file-locking engine for managing these bits =)
> > > 
> > > Make sense to me.  We can mark this flag in struct address_space and check
> > > it in page_refereneced_file().  If this flag is set, it will be cleard and
> > 
> > Disadvantage is that we could set reclaim granularity as per-inode.
> > I want to set it as per-vma, not per-inode.
> 
> I don't think this is a disadvantage.  This per-inode reclaim
> granularity is useful for us.  Actually I have thought to implement a
> per-inode memcg to let different file sets to be reclaimed separately.
> So maybe we can provide two mechanisms to let the user to choose how to
> use them.

I don't oppose supporting both mechanism but I don't want to give only per-inode
approach.

> 
> > 
> > > the function returns referenced > 1.  Then this page can be promoted into
> > > activate list.  But I prefer to set/clear this flag in madvise.
> > 
> > Hmm, My idea is following as,
> > If we can set new VM flag into VMA or something, reclaimer can check it when shrink_[in]active_list
> > and he can prevent to deactivate/reclaim if he takes a look the page is in VMA which
> > are set by new VM flag and the page is referenced recently at least once.
> > It means it gives one more round trip in his list(ie, active/inactive list)
> > rather than activation so that the page would become less reclaimable.
> 
> No matter what the page is given one more round trip or is promoted into
> active list, it can satisfy our current requirement.  So now the
> question is which is better.  If we add a new VM flag, as you said
> before, vma->vm_flags has no room for it in 32 bit.  I have noticed that
> this topic has been discussed [1] and the result is that vm_flags is
> still a unsigned long type.  So we need to use a tricky technique to solve
> it.  If we add a new flag in struct addpress_space, it might be easy to
> implement it.

In case of per-inode, it's good but it doesn't work for per-vma and file-range.

> 
> 1. http://lkml.indiana.edu/hypermail/linux/kernel/1104.1/00975.html
> 
> Regards,
> Zheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-03-13  2:51 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-08  7:34 Control page reclaim granularity Zheng Liu
2012-03-08  8:39 ` Greg Thelen
2012-03-08 16:13   ` Zheng Liu
2012-03-08 16:32     ` Zhu Yanhai
2012-03-14  7:19     ` Greg Thelen
2012-03-08  9:35 ` Minchan Kim
2012-03-08 16:54   ` Zheng Liu
2012-03-12  0:28     ` Minchan Kim
2012-03-12  2:06       ` Fwd: " Zheng Liu
2012-03-12  5:19         ` Minchan Kim
2012-03-12  6:20           ` Konstantin Khlebnikov
2012-03-12  8:14             ` Zheng Liu
2012-03-12 13:42               ` Minchan Kim
2012-03-12 14:18                 ` Konstantin Khlebnikov
2012-03-13  2:48                   ` Minchan Kim
2012-03-13  4:37                     ` Konstantin Khlebnikov
2012-03-13  5:00                       ` Konstantin Khlebnikov
2012-03-13  6:30                     ` Zheng Liu
2012-03-13  6:48                       ` Zheng Liu
2012-03-13  7:21                         ` Konstantin Khlebnikov
2012-03-13  7:43                           ` Kautuk Consul
2012-03-13  7:47                             ` Kautuk Consul
2012-03-13  8:05                               ` Zheng Liu
2012-03-13  8:04                                 ` Kautuk Consul
2012-03-13  8:08                                   ` Kautuk Consul
2012-03-13  8:28                                     ` Zheng Liu
2012-03-13  8:36                                       ` Kautuk Consul
2012-03-13  9:03                                         ` Kautuk Consul
2012-03-12 15:15                 ` Zheng Liu
2012-03-13  2:51                   ` Minchan Kim [this message]
2012-03-12 14:55   ` Rik van Riel
2012-03-13  2:57     ` Minchan Kim
2012-03-13 14:57       ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120313025108.GB7125@barrios \
    --to=minchan@kernel.org \
    --cc=khlebnikov@openvz.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).