linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernl@vger.kernel.org" <linux-kernl@vger.kernel.org>
Subject: Re: Fine granularity page reclaim
Date: Mon, 20 Feb 2012 11:09:54 +0400	[thread overview]
Message-ID: <4F41F1C2.3030908@openvz.org> (raw)
In-Reply-To: <20120220062006.GA5028@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]

Zheng Liu wrote:
> Cc linux-kernel mailing list.
>
> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
>> Zheng Liu wrote:
>>> Hi all,
>>>
>>> Currently, we encounter a problem about page reclaim. In our product system,
>>> there is a lot of applictions that manipulate a number of files. In these
>>> files, they can be divided into two categories. One is index file, another is
>>> block file. The number of index files is about 15,000, and the number of
>>> block files is about 23,000 in a 2TB disk. The application accesses index
>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
>>> to hold index file in memory as much as possible, and it works well in Redhat
>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
>>> linux uses an active list and an inactive list to handle page reclaim, and in
>>> 2.6.32 that they are divided into anonymous list and file list. So I am
>>> curious about why most of index files can be hold in 2.6.18? The index file
>>> should be replaced because mmap doesn't impact the lru list.
>>
>> There was my patch for fixing similar problem with shared/executable mapped pages
>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
>> maybe it will help in your case.
>
> Hi Konstantin,
>
> Thank you for your reply.  I have tested it in upstream kernel.  These
> patches are useful for multi-processes applications.  But, in our product
> system, there are some applications that are multi-thread.  So
> 'references_ptes>  1' cannot help these applications to hold the data in
> memory.

Ok, what if you mmap you data as executable, just to test.
Then these pages will be activated after first touch.
In attachment patch with per-mm flag with the same effect.

>
> Regards,
> Zheng
>
>>
>>>
>>> BTW, I have some problems that need to be discussed.
>>>
>>> 1. I want to let index and block files are separately reclaimed. Is there any
>>> ways to satisify me in current upstream?
>>>
>>> 2. Maybe we can provide a mechansim to let different files to be mapped into
>>> differnet nodes. we can provide a ioctl(2) to tell kernel that this file should
>>> be mapped into a specific node id. A nid member is added into addpress_space
>>> struct. When alloc_page is called, the page can be allocated from that specific
>>> node id.
>>>
>>> 3. Currently the page can be reclaimed according to pid in memcg. But it is too
>>> coarse. I don't know whether memcg could provide a fine granularity page
>>> reclaim mechansim. For example, the page is reclaimed according to inode number.
>>>
>>> I don't subscribe this mailing list, So please Cc me. Thank you.
>>>
>>> Regards,
>>> Zheng
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>>> Don't email:<a href=mailto:"dont@kvack.org">   email@kvack.org</a>
>>


[-- Attachment #2: mm-introduce-mmf_vm_preferrded-flag --]
[-- Type: text/plain, Size: 2899 bytes --]

mm: introduce MMF_VM_PREFERRDED flag

From: Konstantin Khlebnikov <khlebnikov@openvz.org>

This patch introduce mm->flags bit: MMF_VM_PREFERRED,
which doubles access bit weight for this mm.

Currently the only one effect:
mm with this bit activates mapped file pages after first touch,
if vma does not marked as sequentially accessed.

This should be per-vma sign, but there no free bits in vma->vm_flags,
maybe we can make this stuff 64-only.

interface:
prctl(PR_SET_MM_PREFERRED, 1) to set and
prctl(PR_SET_MM_PREFERRED, 0) to clear.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/prctl.h |    2 ++
 include/linux/sched.h |    1 +
 kernel/sys.c          |   17 +++++++++++++++++
 mm/rmap.c             |    5 ++++-
 4 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index 7ddc7f1..d0f9ceb 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -114,4 +114,6 @@
 # define PR_SET_MM_START_BRK		6
 # define PR_SET_MM_BRK			7
 
+#define PR_SET_MM_PREFERRED	36
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75c15c5..b60883a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -437,6 +437,7 @@ extern int get_dumpable(struct mm_struct *mm);
 					/* leave room for more dump flags */
 #define MMF_VM_MERGEABLE	16	/* KSM may merge identical pages */
 #define MMF_VM_HUGEPAGE		17	/* set when VM_HUGEPAGE is set on vma */
+#define MMF_VM_PREFERRED	18	/* Double pte access bits weight */
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK)
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 4070153..bacf8d5 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1810,6 +1810,20 @@ static int prctl_set_mm(int opt, unsigned long addr,
 }
 #endif
 
+static int set_mm_preferred(struct mm_struct *mm, int state)
+{
+	switch (state) {
+		case 0:
+			clear_bit(MMF_VM_PREFERRED, &mm->flags);
+			return 0;
+		case 1:
+			set_bit(MMF_VM_PREFERRED, &mm->flags);
+			return 0;
+		default:
+			return -EINVAL;
+	}
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -1962,6 +1976,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		case PR_SET_MM:
 			error = prctl_set_mm(arg2, arg3, arg4, arg5);
 			break;
+		case PR_SET_MM_PREFERRED:
+			error = set_mm_preferred(me->mm, arg2);
+			break;
 		default:
 			error = -EINVAL;
 			break;
diff --git a/mm/rmap.c b/mm/rmap.c
index 78cc46b..b0fd1d1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -766,8 +766,11 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 
 	(*mapcount)--;
 
-	if (referenced)
+	if (referenced) {
+		if (test_bit(MMF_VM_PREFERRED, &mm->flags))
+			referenced <<= 1;
 		*vm_flags |= vma->vm_flags;
+	}
 out:
 	return referenced;
 }

  reply	other threads:[~2012-02-20  7:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-17  9:22 Fine granularity page reclaim Zheng Liu
2012-02-17 20:20 ` Konstantin Khlebnikov
2012-02-20  6:20   ` Zheng Liu
2012-02-20  7:09     ` Konstantin Khlebnikov [this message]
2012-03-07 17:45       ` Zheng Liu
2012-03-07 20:33         ` Konstantin Khlebnikov
2012-03-08  2:54           ` Zheng Liu
2012-04-07  0:18 ` Ying Han

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F41F1C2.3030908@openvz.org \
    --to=khlebnikov@openvz.org \
    --cc=gnehzuil.liu@gmail.com \
    --cc=linux-kernl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).