Re: Fwd: Control page reclaim granularity

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Minchan Kim <minchan@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>
Subject: Re: Fwd: Control page reclaim granularity
Date: Tue, 13 Mar 2012 08:37:21 +0400	[thread overview]
Message-ID: <4F5ECF01.2000402@openvz.org> (raw)
In-Reply-To: <20120313024818.GA7125@barrios>

Minchan Kim wrote:
> On Mon, Mar 12, 2012 at 06:18:21PM +0400, Konstantin Khlebnikov wrote:
>> Minchan Kim wrote:
>>> On Mon, Mar 12, 2012 at 04:14:14PM +0800, Zheng Liu wrote:
>>>> On 03/12/2012 02:20 PM, Konstantin Khlebnikov wrote:
>>>>> Minchan Kim wrote:
>>>>>> On Mon, Mar 12, 2012 at 10:06:09AM +0800, Zheng Liu wrote:
<CUT>
>>>>>>
>>>>>> Now problem is that
>>>>>>
>>>>>> 1. User want to keep pages which are used once in a while in memory.
>>>>>> 2. Kernel want to reclaim them because they are surely reclaim target
>>>>>>      pages in point of view by LRU.
>>>>>>
>>>>>> The most desriable approach is that user should use mlock to guarantee
>>>>>> them in memory. But mlock is too big overhead and user doesn't want to
>>>>>> keep
>>>>>> memory all pages all at once.(Ie, he want demand paging when he need
>>>>>> the page)
>>>>>> Right?
>>>>>>
>>>>>> madvise, it's a just hint for kernel and kernel doesn't need to make
>>>>>> sure madvise's behavior.
>>>>>> In point of view, such inconsistency might not be a big problem.
>>>>>>
>>>>>> Big problem I think now is that user should use madvise(WILLNEED)
>>>>>> periodically because such
>>>>>> activation happens once when user calls madvise. If user doesn't use
>>>>>> page frequently after
>>>>>> user calls it, it ends up moving into inactive list and even could be
>>>>>> reclaimed.
>>>>>> It's not good. :-(
>>>>>>
>>>>>> Okay. How about adding new VM_WORKINGSET?
>>>>>> And reclaimer would give one more round trip in active/inactive list
>>>>>> erwhen reclaim happens
>>>>>> if the page is referenced.
>>>>>>
>>>>>> Sigh. We have no room for new VM_FLAG in 32 bit.
>>>>> p
>>>>> It would be nice to mark struct address_space with this flag and export
>>>>> AS_UNEVICTABLE somehow.
>>>>> Maybe we can reuse file-locking engine for managing these bits =)
>>>>
>>>> Make sense to me.  We can mark this flag in struct address_space and check
>>>> it in page_refereneced_file().  If this flag is set, it will be cleard and
>>>
>>> Disadvantage is that we could set reclaim granularity as per-inode.
>>> I want to set it as per-vma, not per-inode.
>>
>> But with per-inode flag we can tune all files, not only memory-mapped.
>
> I don't oppose per-inode setting but I believe we need file range or mmapped vma,
> still. One file may have different characteristic part, something is working set
> something is streaming part.
>
>> See, attached patch. Currently I thinking about managing code,
>> file-locking engine really fits perfectly =)
>
> file-locking engine?
> You consider fcntl as interface for it?
> What do you mean?
>

If we set bits on inode we somehow account its users and clear AS_WORKINGSET and AS_UNEVICTABLE
at last file close. We can use file-locking engine for locking inodes in memory -- file lock automatically
release inode at last fput(). Maybe it's too tricky and we should add couple simple atomic counters to
generic strict inode (like i_writecount/i_readcount) but in this case we will add new code on fast-path.
So, looks like invention new kind of struct file_lock is best approach.
I don't want implement range-locking for now, but I can do it if somebody really wants this.

Yes, we can use fcntl(), but fadvise() is much better.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Minchan Kim <minchan@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>
Subject: Re: Fwd: Control page reclaim granularity
Date: Tue, 13 Mar 2012 08:37:21 +0400	[thread overview]
Message-ID: <4F5ECF01.2000402@openvz.org> (raw)
In-Reply-To: <20120313024818.GA7125@barrios>

Minchan Kim wrote:
> On Mon, Mar 12, 2012 at 06:18:21PM +0400, Konstantin Khlebnikov wrote:
>> Minchan Kim wrote:
>>> On Mon, Mar 12, 2012 at 04:14:14PM +0800, Zheng Liu wrote:
>>>> On 03/12/2012 02:20 PM, Konstantin Khlebnikov wrote:
>>>>> Minchan Kim wrote:
>>>>>> On Mon, Mar 12, 2012 at 10:06:09AM +0800, Zheng Liu wrote:
<CUT>
>>>>>>
>>>>>> Now problem is that
>>>>>>
>>>>>> 1. User want to keep pages which are used once in a while in memory.
>>>>>> 2. Kernel want to reclaim them because they are surely reclaim target
>>>>>>      pages in point of view by LRU.
>>>>>>
>>>>>> The most desriable approach is that user should use mlock to guarantee
>>>>>> them in memory. But mlock is too big overhead and user doesn't want to
>>>>>> keep
>>>>>> memory all pages all at once.(Ie, he want demand paging when he need
>>>>>> the page)
>>>>>> Right?
>>>>>>
>>>>>> madvise, it's a just hint for kernel and kernel doesn't need to make
>>>>>> sure madvise's behavior.
>>>>>> In point of view, such inconsistency might not be a big problem.
>>>>>>
>>>>>> Big problem I think now is that user should use madvise(WILLNEED)
>>>>>> periodically because such
>>>>>> activation happens once when user calls madvise. If user doesn't use
>>>>>> page frequently after
>>>>>> user calls it, it ends up moving into inactive list and even could be
>>>>>> reclaimed.
>>>>>> It's not good. :-(
>>>>>>
>>>>>> Okay. How about adding new VM_WORKINGSET?
>>>>>> And reclaimer would give one more round trip in active/inactive list
>>>>>> erwhen reclaim happens
>>>>>> if the page is referenced.
>>>>>>
>>>>>> Sigh. We have no room for new VM_FLAG in 32 bit.
>>>>> p
>>>>> It would be nice to mark struct address_space with this flag and export
>>>>> AS_UNEVICTABLE somehow.
>>>>> Maybe we can reuse file-locking engine for managing these bits =)
>>>>
>>>> Make sense to me.  We can mark this flag in struct address_space and check
>>>> it in page_refereneced_file().  If this flag is set, it will be cleard and
>>>
>>> Disadvantage is that we could set reclaim granularity as per-inode.
>>> I want to set it as per-vma, not per-inode.
>>
>> But with per-inode flag we can tune all files, not only memory-mapped.
>
> I don't oppose per-inode setting but I believe we need file range or mmapped vma,
> still. One file may have different characteristic part, something is working set
> something is streaming part.
>
>> See, attached patch. Currently I thinking about managing code,
>> file-locking engine really fits perfectly =)
>
> file-locking engine?
> You consider fcntl as interface for it?
> What do you mean?
>

If we set bits on inode we somehow account its users and clear AS_WORKINGSET and AS_UNEVICTABLE
at last file close. We can use file-locking engine for locking inodes in memory -- file lock automatically
release inode at last fput(). Maybe it's too tricky and we should add couple simple atomic counters to
generic strict inode (like i_writecount/i_readcount) but in this case we will add new code on fast-path.
So, looks like invention new kind of struct file_lock is best approach.
I don't want implement range-locking for now, but I can do it if somebody really wants this.

Yes, we can use fcntl(), but fadvise() is much better.

next prev parent reply	other threads:[~2012-03-13  4:37 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-08  7:34 Control page reclaim granularity Zheng Liu
2012-03-08  7:34 ` Zheng Liu
2012-03-08  8:39 ` Greg Thelen
2012-03-08  8:39   ` Greg Thelen
2012-03-08 16:13   ` Zheng Liu
2012-03-08 16:13     ` Zheng Liu
2012-03-08 16:32     ` Zhu Yanhai
2012-03-14  7:19     ` Greg Thelen
2012-03-14  7:19       ` Greg Thelen
2012-03-08  9:35 ` Minchan Kim
2012-03-08  9:35   ` Minchan Kim
2012-03-08 16:54   ` Zheng Liu
2012-03-08 16:54     ` Zheng Liu
2012-03-12  0:28     ` Minchan Kim
2012-03-12  0:28       ` Minchan Kim
2012-03-12  2:06       ` Fwd: " Zheng Liu
2012-03-12  2:06         ` Zheng Liu
2012-03-12  5:19         ` Minchan Kim
2012-03-12  5:19           ` Minchan Kim
2012-03-12  6:20           ` Konstantin Khlebnikov
2012-03-12  6:20             ` Konstantin Khlebnikov
2012-03-12  8:14             ` Zheng Liu
2012-03-12  8:14               ` Zheng Liu
2012-03-12 13:42               ` Minchan Kim
2012-03-12 13:42                 ` Minchan Kim
2012-03-12 14:18                 ` Konstantin Khlebnikov
2012-03-13  2:48                   ` Minchan Kim
2012-03-13  2:48                     ` Minchan Kim
2012-03-13  4:37                     ` Konstantin Khlebnikov [this message]
2012-03-13  4:37                       ` Konstantin Khlebnikov
2012-03-13  5:00                       ` Konstantin Khlebnikov
2012-03-13  5:00                         ` Konstantin Khlebnikov
2012-03-13  6:30                     ` Zheng Liu
2012-03-13  6:30                       ` Zheng Liu
2012-03-13  6:48                       ` Zheng Liu
2012-03-13  6:48                         ` Zheng Liu
2012-03-13  7:21                         ` Konstantin Khlebnikov
2012-03-13  7:21                           ` Konstantin Khlebnikov
2012-03-13  7:43                           ` Kautuk Consul
2012-03-13  7:43                             ` Kautuk Consul
2012-03-13  7:47                             ` Kautuk Consul
2012-03-13  7:47                               ` Kautuk Consul
2012-03-13  8:05                               ` Zheng Liu
2012-03-13  8:05                                 ` Zheng Liu
2012-03-13  8:04                                 ` Kautuk Consul
2012-03-13  8:04                                   ` Kautuk Consul
2012-03-13  8:08                                   ` Kautuk Consul
2012-03-13  8:08                                     ` Kautuk Consul
2012-03-13  8:28                                     ` Zheng Liu
2012-03-13  8:28                                       ` Zheng Liu
2012-03-13  8:36                                       ` Kautuk Consul
2012-03-13  8:36                                         ` Kautuk Consul
2012-03-13  9:03                                         ` Kautuk Consul
2012-03-13  9:03                                           ` Kautuk Consul
2012-03-12 15:15                 ` Zheng Liu
2012-03-12 15:15                   ` Zheng Liu
2012-03-13  2:51                   ` Minchan Kim
2012-03-13  2:51                     ` Minchan Kim
2012-03-12 14:55   ` Rik van Riel
2012-03-12 14:55     ` Rik van Riel
2012-03-13  2:57     ` Minchan Kim
2012-03-13  2:57       ` Minchan Kim
2012-03-13 14:57       ` Rik van Riel
2012-03-13 14:57         ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F5ECF01.2000402@openvz.org \
    --to=khlebnikov@openvz.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.