From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernl@vger.kernel.org" <linux-kernl@vger.kernel.org>
Subject: Re: Fine granularity page reclaim
Date: Mon, 20 Feb 2012 11:09:54 +0400 [thread overview]
Message-ID: <4F41F1C2.3030908@openvz.org> (raw)
In-Reply-To: <20120220062006.GA5028@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]
Zheng Liu wrote:
> Cc linux-kernel mailing list.
>
> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
>> Zheng Liu wrote:
>>> Hi all,
>>>
>>> Currently, we encounter a problem about page reclaim. In our product system,
>>> there is a lot of applictions that manipulate a number of files. In these
>>> files, they can be divided into two categories. One is index file, another is
>>> block file. The number of index files is about 15,000, and the number of
>>> block files is about 23,000 in a 2TB disk. The application accesses index
>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
>>> to hold index file in memory as much as possible, and it works well in Redhat
>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
>>> linux uses an active list and an inactive list to handle page reclaim, and in
>>> 2.6.32 that they are divided into anonymous list and file list. So I am
>>> curious about why most of index files can be hold in 2.6.18? The index file
>>> should be replaced because mmap doesn't impact the lru list.
>>
>> There was my patch for fixing similar problem with shared/executable mapped pages
>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
>> maybe it will help in your case.
>
> Hi Konstantin,
>
> Thank you for your reply. I have tested it in upstream kernel. These
> patches are useful for multi-processes applications. But, in our product
> system, there are some applications that are multi-thread. So
> 'references_ptes> 1' cannot help these applications to hold the data in
> memory.
Ok, what if you mmap you data as executable, just to test.
Then these pages will be activated after first touch.
In attachment patch with per-mm flag with the same effect.
>
> Regards,
> Zheng
>
>>
>>>
>>> BTW, I have some problems that need to be discussed.
>>>
>>> 1. I want to let index and block files are separately reclaimed. Is there any
>>> ways to satisify me in current upstream?
>>>
>>> 2. Maybe we can provide a mechansim to let different files to be mapped into
>>> differnet nodes. we can provide a ioctl(2) to tell kernel that this file should
>>> be mapped into a specific node id. A nid member is added into addpress_space
>>> struct. When alloc_page is called, the page can be allocated from that specific
>>> node id.
>>>
>>> 3. Currently the page can be reclaimed according to pid in memcg. But it is too
>>> coarse. I don't know whether memcg could provide a fine granularity page
>>> reclaim mechansim. For example, the page is reclaimed according to inode number.
>>>
>>> I don't subscribe this mailing list, So please Cc me. Thank you.
>>>
>>> Regards,
>>> Zheng
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org. For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>>> Don't email:<a href=mailto:"dont@kvack.org"> email@kvack.org</a>
>>
[-- Attachment #2: mm-introduce-mmf_vm_preferrded-flag --]
[-- Type: text/plain, Size: 2899 bytes --]
mm: introduce MMF_VM_PREFERRDED flag
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
This patch introduce mm->flags bit: MMF_VM_PREFERRED,
which doubles access bit weight for this mm.
Currently the only one effect:
mm with this bit activates mapped file pages after first touch,
if vma does not marked as sequentially accessed.
This should be per-vma sign, but there no free bits in vma->vm_flags,
maybe we can make this stuff 64-only.
interface:
prctl(PR_SET_MM_PREFERRED, 1) to set and
prctl(PR_SET_MM_PREFERRED, 0) to clear.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
include/linux/prctl.h | 2 ++
include/linux/sched.h | 1 +
kernel/sys.c | 17 +++++++++++++++++
mm/rmap.c | 5 ++++-
4 files changed, 24 insertions(+), 1 deletions(-)
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index 7ddc7f1..d0f9ceb 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -114,4 +114,6 @@
# define PR_SET_MM_START_BRK 6
# define PR_SET_MM_BRK 7
+#define PR_SET_MM_PREFERRED 36
+
#endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75c15c5..b60883a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -437,6 +437,7 @@ extern int get_dumpable(struct mm_struct *mm);
/* leave room for more dump flags */
#define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */
#define MMF_VM_HUGEPAGE 17 /* set when VM_HUGEPAGE is set on vma */
+#define MMF_VM_PREFERRED 18 /* Double pte access bits weight */
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK)
diff --git a/kernel/sys.c b/kernel/sys.c
index 4070153..bacf8d5 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1810,6 +1810,20 @@ static int prctl_set_mm(int opt, unsigned long addr,
}
#endif
+static int set_mm_preferred(struct mm_struct *mm, int state)
+{
+ switch (state) {
+ case 0:
+ clear_bit(MMF_VM_PREFERRED, &mm->flags);
+ return 0;
+ case 1:
+ set_bit(MMF_VM_PREFERRED, &mm->flags);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
{
@@ -1962,6 +1976,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_SET_MM:
error = prctl_set_mm(arg2, arg3, arg4, arg5);
break;
+ case PR_SET_MM_PREFERRED:
+ error = set_mm_preferred(me->mm, arg2);
+ break;
default:
error = -EINVAL;
break;
diff --git a/mm/rmap.c b/mm/rmap.c
index 78cc46b..b0fd1d1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -766,8 +766,11 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
(*mapcount)--;
- if (referenced)
+ if (referenced) {
+ if (test_bit(MMF_VM_PREFERRED, &mm->flags))
+ referenced <<= 1;
*vm_flags |= vma->vm_flags;
+ }
out:
return referenced;
}
next prev parent reply other threads:[~2012-02-20 7:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-17 9:22 Fine granularity page reclaim Zheng Liu
2012-02-17 20:20 ` Konstantin Khlebnikov
2012-02-20 6:20 ` Zheng Liu
2012-02-20 7:09 ` Konstantin Khlebnikov [this message]
2012-03-07 17:45 ` Zheng Liu
2012-03-07 20:33 ` Konstantin Khlebnikov
2012-03-08 2:54 ` Zheng Liu
2012-04-07 0:18 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F41F1C2.3030908@openvz.org \
--to=khlebnikov@openvz.org \
--cc=gnehzuil.liu@gmail.com \
--cc=linux-kernl@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).