All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Minchan Kim <minchan@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>
Subject: Re: Fwd: Control page reclaim granularity
Date: Mon, 12 Mar 2012 18:18:21 +0400	[thread overview]
Message-ID: <4F5E05AD.20200@openvz.org> (raw)
In-Reply-To: <20120312134226.GA5120@barrios>

[-- Attachment #1: Type: text/plain, Size: 6473 bytes --]

Minchan Kim wrote:
> On Mon, Mar 12, 2012 at 04:14:14PM +0800, Zheng Liu wrote:
>> On 03/12/2012 02:20 PM, Konstantin Khlebnikov wrote:
>>> Minchan Kim wrote:
>>>> On Mon, Mar 12, 2012 at 10:06:09AM +0800, Zheng Liu wrote:
>>>>> On Mon, Mar 12, 2012 at 09:29:34AM +0900, Minchan Kim wrote:
>>>>>> I forgot to Ccing you.
>>>>>> Sorry.
>>>>>>
>>>>>> ---------- Forwarded message ----------
>>>>>> From: Minchan Kim<minchan@kernel.org>
>>>>>> Date: Mon, Mar 12, 2012 at 9:28 AM
>>>>>> Subject: Re: Control page reclaim granularity
>>>>>> To: Minchan Kim<minchan@kernel.org>, linux-mm<linux-mm@kvack.org>,
>>>>>> linux-kernel<linux-kernel@vger.kernel.org>, Konstantin Khlebnikov<
>>>>>> khlebnikov@openvz.org>, riel@redhat.com, kosaki.motohiro@jp.fujitsu.com
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 09, 2012 at 12:54:03AM +0800, Zheng Liu wrote:
>>>>>>> Hi Minchan,
>>>>>>>
>>>>>>> Sorry, I forgot to say that I don't subscribe linux-mm and
>>>>>>> linux-kernel
>>>>>>> mailing list.  So please Cc me.
>>>>>>>
>>>>>>> IMHO, maybe we should re-think about how does user use mmap(2).  I
>>>>>>> describe the cases I known in our product system.  They can be
>>>>>>> categorized into two cases.  One is mmaped all data files into memory
>>>>>>> and sometime it uses write(2) to append some data, and another uses
>>>>>>> mmap(2)/munmap(2) and read(2)/write(2) to manipulate the files.  In
>>>>>>> the
>>>>>>> second case,  the application wants to keep mmaped page into memory
>>>>>>> and
>>>>>>> let file pages to be reclaimed firstly.  So, IMO, when application
>>>>>>> uses
>>>>>>> mmap(2) to manipulate files, it is possible to imply that it wants
>>>>>>> keep
>>>>>>> these mmaped pages into memory and do not be reclaimed.  At least
>>>>>>> these
>>>>>>> pages do not be reclaimed early than file pages.  I think that
>>>>>>> maybe we
>>>>>>> can recover that routine and provide a sysctl parameter to let the
>>>>>>> user
>>>>>>> to set this ratio between mmaped pages and file pages.
>>>>>>
>>>>>> I am not convinced why we should handle mapped page specially.
>>>>>> Sometimem, someone may use mmap by reducing buffer copy compared to
>>>>>> read
>>>>>> system call.
>>>>>> So I think we can't make sure mmaped pages are always win.
>>>>>>
>>>>>> My suggestion is that it would be better to declare by user explicitly.
>>>>>> I think we can implement it by madvise and fadvise's WILLNEED option.
>>>>>> Current implementation is just readahead if there isn't a page in
>>>>>> memory
>>>>>> but I think
>>>>>> we can promote from inactive to active if there is already a page in
>>>>>> memory.
>>>>>>
>>>>>> It's more clear and it couldn't be affected by kernel page reclaim
>>>>>> algorithm change
>>>>>> like this.
>>>>>
>>>>> Thank you for your advice.  But I still have question about this
>>>>> solution.  If we improve the madvise(2) and fadvise(2)'s WILLNEED
>>>>> option,  it will cause an inconsistently status for pages that be
>>>>> manipulated by madvise(2) and/or fadvise(2).  For example, when I call
>>>>> madvise with WILLNEED flag, some pages will be moved into active list if
>>>>> they already have been in memory, and other pages will be read into
>>>>> memory and be saved in inactive list if they don't be in memory.  Then
>>>>> pages that are in inactive list are possible to be reclaim.  So from the
>>>>> view of users, it is inconsistent because some pages are in memory and
>>>>> some pages are reclaimed.  But actually the user hopes that all of pages
>>>>> can be kept in memory.  IMHO, this inconsistency is weird and makes
>>>>> users
>>>>> puzzled.
>>>>
>>>> Now problem is that
>>>>
>>>> 1. User want to keep pages which are used once in a while in memory.
>>>> 2. Kernel want to reclaim them because they are surely reclaim target
>>>>      pages in point of view by LRU.
>>>>
>>>> The most desriable approach is that user should use mlock to guarantee
>>>> them in memory. But mlock is too big overhead and user doesn't want to
>>>> keep
>>>> memory all pages all at once.(Ie, he want demand paging when he need
>>>> the page)
>>>> Right?
>>>>
>>>> madvise, it's a just hint for kernel and kernel doesn't need to make
>>>> sure madvise's behavior.
>>>> In point of view, such inconsistency might not be a big problem.
>>>>
>>>> Big problem I think now is that user should use madvise(WILLNEED)
>>>> periodically because such
>>>> activation happens once when user calls madvise. If user doesn't use
>>>> page frequently after
>>>> user calls it, it ends up moving into inactive list and even could be
>>>> reclaimed.
>>>> It's not good. :-(
>>>>
>>>> Okay. How about adding new VM_WORKINGSET?
>>>> And reclaimer would give one more round trip in active/inactive list
>>>> erwhen reclaim happens
>>>> if the page is referenced.
>>>>
>>>> Sigh. We have no room for new VM_FLAG in 32 bit.
>>> p
>>> It would be nice to mark struct address_space with this flag and export
>>> AS_UNEVICTABLE somehow.
>>> Maybe we can reuse file-locking engine for managing these bits =)
>>
>> Make sense to me.  We can mark this flag in struct address_space and check
>> it in page_refereneced_file().  If this flag is set, it will be cleard and
>
> Disadvantage is that we could set reclaim granularity as per-inode.
> I want to set it as per-vma, not per-inode.

But with per-inode flag we can tune all files, not only memory-mapped.
See, attached patch. Currently I thinking about managing code,
file-locking engine really fits perfectly =)

>
>> the function returns referenced>  1.  Then this page can be promoted into
>> activate list.  But I prefer to set/clear this flag in madvise.
>
> Hmm, My idea is following as,
> If we can set new VM flag into VMA or something, reclaimer can check it when shrink_[in]active_list
> and he can prevent to deactivate/reclaim if he takes a look the page is in VMA which
> are set by new VM flag and the page is referenced recently at least once.
> It means it gives one more round trip in his list(ie, active/inactive list)
> rather than activation so that the page would become less reclaimable.
>
>>
>> PS, I have subscribed linux-mm mailing list. :-)
>
> Congratulations! :)
>
>>
>> Regards,
>> Zheng
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


[-- Attachment #2: mm-introduce-mapping-as_workingset-flag --]
[-- Type: text/plain, Size: 3095 bytes --]

mm: introduce mapping AS_WORKINGSET flag

From: Konstantin Khlebnikov <khlebnikov@openvz.org>

This patch introduces new flag AS_WORKINGSET in mapping->flags.
If it set reclaimer will activates all pages for this inode after first usage.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/pagemap.h |   16 ++++++++++++++++
 mm/vmscan.c             |   15 ++++++++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cfaaa69..c15fc17 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -24,6 +24,7 @@ enum mapping_flags {
 	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
 	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
+	AS_WORKINGSET	= __GFP_BITS_SHIFT + 4,	/* promote pages activation */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -53,6 +54,21 @@ static inline int mapping_unevictable(struct address_space *mapping)
 	return !!mapping;
 }
 
+static inline void mapping_set_workingset(struct address_space *mapping)
+{
+	set_bit(AS_WORKINGSET, &mapping->flags);
+}
+
+static inline void mapping_clear_workingset(struct address_space *mapping)
+{
+	clear_bit(AS_WORKINGSET, &mapping->flags);
+}
+
+static inline int mapping_test_workingset(struct address_space *mapping)
+{
+	return mapping && test_bit(AS_WORKINGSET, &mapping->flags);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 57b9658..5ccbe8c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -701,6 +701,7 @@ enum page_references {
 };
 
 static enum page_references page_check_references(struct page *page,
+						  struct address_space *mapping,
 						  struct mem_cgroup_zone *mz,
 						  struct scan_control *sc)
 {
@@ -721,6 +722,13 @@ static enum page_references page_check_references(struct page *page,
 	if (vm_flags & VM_LOCKED)
 		return PAGEREF_RECLAIM;
 
+	/*
+	 * Activate workingset page if referenced at least once.
+	 */
+	if (mapping_test_workingset(mapping) &&
+	    (referenced_ptes || referenced_page))
+		return PAGEREF_ACTIVATE;
+
 	if (referenced_ptes) {
 		if (PageAnon(page))
 			return PAGEREF_ACTIVATE;
@@ -828,7 +836,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-		references = page_check_references(page, mz, sc);
+		mapping = page_mapping(page);
+
+		references = page_check_references(page, mapping, mz, sc);
 		switch (references) {
 		case PAGEREF_ACTIVATE:
 			goto activate_locked;
@@ -848,11 +858,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			if (!add_to_swap(page))
 				goto activate_locked;
+			mapping = &swapper_space;
 			may_enter_fs = 1;
 		}
 
-		mapping = page_mapping(page);
-
 		/*
 		 * The page is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.

  reply	other threads:[~2012-03-12 14:18 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-08  7:34 Control page reclaim granularity Zheng Liu
2012-03-08  7:34 ` Zheng Liu
2012-03-08  8:39 ` Greg Thelen
2012-03-08  8:39   ` Greg Thelen
2012-03-08 16:13   ` Zheng Liu
2012-03-08 16:13     ` Zheng Liu
2012-03-08 16:32     ` Zhu Yanhai
2012-03-14  7:19     ` Greg Thelen
2012-03-14  7:19       ` Greg Thelen
2012-03-08  9:35 ` Minchan Kim
2012-03-08  9:35   ` Minchan Kim
2012-03-08 16:54   ` Zheng Liu
2012-03-08 16:54     ` Zheng Liu
2012-03-12  0:28     ` Minchan Kim
2012-03-12  0:28       ` Minchan Kim
2012-03-12  2:06       ` Fwd: " Zheng Liu
2012-03-12  2:06         ` Zheng Liu
2012-03-12  5:19         ` Minchan Kim
2012-03-12  5:19           ` Minchan Kim
2012-03-12  6:20           ` Konstantin Khlebnikov
2012-03-12  6:20             ` Konstantin Khlebnikov
2012-03-12  8:14             ` Zheng Liu
2012-03-12  8:14               ` Zheng Liu
2012-03-12 13:42               ` Minchan Kim
2012-03-12 13:42                 ` Minchan Kim
2012-03-12 14:18                 ` Konstantin Khlebnikov [this message]
2012-03-13  2:48                   ` Minchan Kim
2012-03-13  2:48                     ` Minchan Kim
2012-03-13  4:37                     ` Konstantin Khlebnikov
2012-03-13  4:37                       ` Konstantin Khlebnikov
2012-03-13  5:00                       ` Konstantin Khlebnikov
2012-03-13  5:00                         ` Konstantin Khlebnikov
2012-03-13  6:30                     ` Zheng Liu
2012-03-13  6:30                       ` Zheng Liu
2012-03-13  6:48                       ` Zheng Liu
2012-03-13  6:48                         ` Zheng Liu
2012-03-13  7:21                         ` Konstantin Khlebnikov
2012-03-13  7:21                           ` Konstantin Khlebnikov
2012-03-13  7:43                           ` Kautuk Consul
2012-03-13  7:43                             ` Kautuk Consul
2012-03-13  7:47                             ` Kautuk Consul
2012-03-13  7:47                               ` Kautuk Consul
2012-03-13  8:05                               ` Zheng Liu
2012-03-13  8:05                                 ` Zheng Liu
2012-03-13  8:04                                 ` Kautuk Consul
2012-03-13  8:04                                   ` Kautuk Consul
2012-03-13  8:08                                   ` Kautuk Consul
2012-03-13  8:08                                     ` Kautuk Consul
2012-03-13  8:28                                     ` Zheng Liu
2012-03-13  8:28                                       ` Zheng Liu
2012-03-13  8:36                                       ` Kautuk Consul
2012-03-13  8:36                                         ` Kautuk Consul
2012-03-13  9:03                                         ` Kautuk Consul
2012-03-13  9:03                                           ` Kautuk Consul
2012-03-12 15:15                 ` Zheng Liu
2012-03-12 15:15                   ` Zheng Liu
2012-03-13  2:51                   ` Minchan Kim
2012-03-13  2:51                     ` Minchan Kim
2012-03-12 14:55   ` Rik van Riel
2012-03-12 14:55     ` Rik van Riel
2012-03-13  2:57     ` Minchan Kim
2012-03-13  2:57       ` Minchan Kim
2012-03-13 14:57       ` Rik van Riel
2012-03-13 14:57         ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F5E05AD.20200@openvz.org \
    --to=khlebnikov@openvz.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.