linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Fine granularity page reclaim
@ 2012-02-17  9:22 Zheng Liu
  2012-02-17 20:20 ` Konstantin Khlebnikov
  2012-04-07  0:18 ` Ying Han
  0 siblings, 2 replies; 8+ messages in thread
From: Zheng Liu @ 2012-02-17  9:22 UTC (permalink / raw)
  To: linux-mm

Hi all,

Currently, we encounter a problem about page reclaim. In our product system,
there is a lot of applictions that manipulate a number of files. In these
files, they can be divided into two categories. One is index file, another is
block file. The number of index files is about 15,000, and the number of
block files is about 23,000 in a 2TB disk. The application accesses index
file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
to hold index file in memory as much as possible, and it works well in Redhat
2.6.18-164. It is about 60-70% of index files that can be hold in memory.
However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
linux uses an active list and an inactive list to handle page reclaim, and in
2.6.32 that they are divided into anonymous list and file list. So I am
curious about why most of index files can be hold in 2.6.18? The index file
should be replaced because mmap doesn't impact the lru list.

BTW, I have some problems that need to be discussed.

1. I want to let index and block files are separately reclaimed. Is there any
ways to satisify me in current upstream?

2. Maybe we can provide a mechansim to let different files to be mapped into
differnet nodes. we can provide a ioctl(2) to tell kernel that this file should
be mapped into a specific node id. A nid member is added into addpress_space
struct. When alloc_page is called, the page can be allocated from that specific
node id.

3. Currently the page can be reclaimed according to pid in memcg. But it is too
coarse. I don't know whether memcg could provide a fine granularity page
reclaim mechansim. For example, the page is reclaimed according to inode number.

I don't subscribe this mailing list, So please Cc me. Thank you.

Regards,
Zheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fine granularity page reclaim
  2012-02-17  9:22 Fine granularity page reclaim Zheng Liu
@ 2012-02-17 20:20 ` Konstantin Khlebnikov
  2012-02-20  6:20   ` Zheng Liu
  2012-04-07  0:18 ` Ying Han
  1 sibling, 1 reply; 8+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-17 20:20 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-mm@kvack.org

Zheng Liu wrote:
> Hi all,
>
> Currently, we encounter a problem about page reclaim. In our product system,
> there is a lot of applictions that manipulate a number of files. In these
> files, they can be divided into two categories. One is index file, another is
> block file. The number of index files is about 15,000, and the number of
> block files is about 23,000 in a 2TB disk. The application accesses index
> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
> to hold index file in memory as much as possible, and it works well in Redhat
> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
> linux uses an active list and an inactive list to handle page reclaim, and in
> 2.6.32 that they are divided into anonymous list and file list. So I am
> curious about why most of index files can be hold in 2.6.18? The index file
> should be replaced because mmap doesn't impact the lru list.

There was my patch for fixing similar problem with shared/executable mapped pages
"vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
maybe it will help in your case.

>
> BTW, I have some problems that need to be discussed.
>
> 1. I want to let index and block files are separately reclaimed. Is there any
> ways to satisify me in current upstream?
>
> 2. Maybe we can provide a mechansim to let different files to be mapped into
> differnet nodes. we can provide a ioctl(2) to tell kernel that this file should
> be mapped into a specific node id. A nid member is added into addpress_space
> struct. When alloc_page is called, the page can be allocated from that specific
> node id.
>
> 3. Currently the page can be reclaimed according to pid in memcg. But it is too
> coarse. I don't know whether memcg could provide a fine granularity page
> reclaim mechansim. For example, the page is reclaimed according to inode number.
>
> I don't subscribe this mailing list, So please Cc me. Thank you.
>
> Regards,
> Zheng
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fine granularity page reclaim
  2012-02-17 20:20 ` Konstantin Khlebnikov
@ 2012-02-20  6:20   ` Zheng Liu
  2012-02-20  7:09     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 8+ messages in thread
From: Zheng Liu @ 2012-02-20  6:20 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm@kvack.org, linux-kernl

Cc linux-kernel mailing list.

On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
> Zheng Liu wrote:
> >Hi all,
> >
> >Currently, we encounter a problem about page reclaim. In our product system,
> >there is a lot of applictions that manipulate a number of files. In these
> >files, they can be divided into two categories. One is index file, another is
> >block file. The number of index files is about 15,000, and the number of
> >block files is about 23,000 in a 2TB disk. The application accesses index
> >file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
> >to hold index file in memory as much as possible, and it works well in Redhat
> >2.6.18-164. It is about 60-70% of index files that can be hold in memory.
> >However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
> >linux uses an active list and an inactive list to handle page reclaim, and in
> >2.6.32 that they are divided into anonymous list and file list. So I am
> >curious about why most of index files can be hold in 2.6.18? The index file
> >should be replaced because mmap doesn't impact the lru list.
> 
> There was my patch for fixing similar problem with shared/executable mapped pages
> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
> maybe it will help in your case.

Hi Konstantin,

Thank you for your reply.  I have tested it in upstream kernel.  These
patches are useful for multi-processes applications.  But, in our product
system, there are some applications that are multi-thread.  So
'references_ptes > 1' cannot help these applications to hold the data in
memory.

Regards,
Zheng

> 
> >
> >BTW, I have some problems that need to be discussed.
> >
> >1. I want to let index and block files are separately reclaimed. Is there any
> >ways to satisify me in current upstream?
> >
> >2. Maybe we can provide a mechansim to let different files to be mapped into
> >differnet nodes. we can provide a ioctl(2) to tell kernel that this file should
> >be mapped into a specific node id. A nid member is added into addpress_space
> >struct. When alloc_page is called, the page can be allocated from that specific
> >node id.
> >
> >3. Currently the page can be reclaimed according to pid in memcg. But it is too
> >coarse. I don't know whether memcg could provide a fine granularity page
> >reclaim mechansim. For example, the page is reclaimed according to inode number.
> >
> >I don't subscribe this mailing list, So please Cc me. Thank you.
> >
> >Regards,
> >Zheng
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> >Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fine granularity page reclaim
  2012-02-20  6:20   ` Zheng Liu
@ 2012-02-20  7:09     ` Konstantin Khlebnikov
  2012-03-07 17:45       ` Zheng Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-20  7:09 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-mm@kvack.org, linux-kernl@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]

Zheng Liu wrote:
> Cc linux-kernel mailing list.
>
> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
>> Zheng Liu wrote:
>>> Hi all,
>>>
>>> Currently, we encounter a problem about page reclaim. In our product system,
>>> there is a lot of applictions that manipulate a number of files. In these
>>> files, they can be divided into two categories. One is index file, another is
>>> block file. The number of index files is about 15,000, and the number of
>>> block files is about 23,000 in a 2TB disk. The application accesses index
>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
>>> to hold index file in memory as much as possible, and it works well in Redhat
>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
>>> linux uses an active list and an inactive list to handle page reclaim, and in
>>> 2.6.32 that they are divided into anonymous list and file list. So I am
>>> curious about why most of index files can be hold in 2.6.18? The index file
>>> should be replaced because mmap doesn't impact the lru list.
>>
>> There was my patch for fixing similar problem with shared/executable mapped pages
>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
>> maybe it will help in your case.
>
> Hi Konstantin,
>
> Thank you for your reply.  I have tested it in upstream kernel.  These
> patches are useful for multi-processes applications.  But, in our product
> system, there are some applications that are multi-thread.  So
> 'references_ptes>  1' cannot help these applications to hold the data in
> memory.

Ok, what if you mmap you data as executable, just to test.
Then these pages will be activated after first touch.
In attachment patch with per-mm flag with the same effect.

>
> Regards,
> Zheng
>
>>
>>>
>>> BTW, I have some problems that need to be discussed.
>>>
>>> 1. I want to let index and block files are separately reclaimed. Is there any
>>> ways to satisify me in current upstream?
>>>
>>> 2. Maybe we can provide a mechansim to let different files to be mapped into
>>> differnet nodes. we can provide a ioctl(2) to tell kernel that this file should
>>> be mapped into a specific node id. A nid member is added into addpress_space
>>> struct. When alloc_page is called, the page can be allocated from that specific
>>> node id.
>>>
>>> 3. Currently the page can be reclaimed according to pid in memcg. But it is too
>>> coarse. I don't know whether memcg could provide a fine granularity page
>>> reclaim mechansim. For example, the page is reclaimed according to inode number.
>>>
>>> I don't subscribe this mailing list, So please Cc me. Thank you.
>>>
>>> Regards,
>>> Zheng
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>>> Don't email:<a href=mailto:"dont@kvack.org">   email@kvack.org</a>
>>


[-- Attachment #2: mm-introduce-mmf_vm_preferrded-flag --]
[-- Type: text/plain, Size: 2899 bytes --]

mm: introduce MMF_VM_PREFERRDED flag

From: Konstantin Khlebnikov <khlebnikov@openvz.org>

This patch introduce mm->flags bit: MMF_VM_PREFERRED,
which doubles access bit weight for this mm.

Currently the only one effect:
mm with this bit activates mapped file pages after first touch,
if vma does not marked as sequentially accessed.

This should be per-vma sign, but there no free bits in vma->vm_flags,
maybe we can make this stuff 64-only.

interface:
prctl(PR_SET_MM_PREFERRED, 1) to set and
prctl(PR_SET_MM_PREFERRED, 0) to clear.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/prctl.h |    2 ++
 include/linux/sched.h |    1 +
 kernel/sys.c          |   17 +++++++++++++++++
 mm/rmap.c             |    5 ++++-
 4 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index 7ddc7f1..d0f9ceb 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -114,4 +114,6 @@
 # define PR_SET_MM_START_BRK		6
 # define PR_SET_MM_BRK			7
 
+#define PR_SET_MM_PREFERRED	36
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75c15c5..b60883a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -437,6 +437,7 @@ extern int get_dumpable(struct mm_struct *mm);
 					/* leave room for more dump flags */
 #define MMF_VM_MERGEABLE	16	/* KSM may merge identical pages */
 #define MMF_VM_HUGEPAGE		17	/* set when VM_HUGEPAGE is set on vma */
+#define MMF_VM_PREFERRED	18	/* Double pte access bits weight */
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK)
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 4070153..bacf8d5 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1810,6 +1810,20 @@ static int prctl_set_mm(int opt, unsigned long addr,
 }
 #endif
 
+static int set_mm_preferred(struct mm_struct *mm, int state)
+{
+	switch (state) {
+		case 0:
+			clear_bit(MMF_VM_PREFERRED, &mm->flags);
+			return 0;
+		case 1:
+			set_bit(MMF_VM_PREFERRED, &mm->flags);
+			return 0;
+		default:
+			return -EINVAL;
+	}
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -1962,6 +1976,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		case PR_SET_MM:
 			error = prctl_set_mm(arg2, arg3, arg4, arg5);
 			break;
+		case PR_SET_MM_PREFERRED:
+			error = set_mm_preferred(me->mm, arg2);
+			break;
 		default:
 			error = -EINVAL;
 			break;
diff --git a/mm/rmap.c b/mm/rmap.c
index 78cc46b..b0fd1d1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -766,8 +766,11 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 
 	(*mapcount)--;
 
-	if (referenced)
+	if (referenced) {
+		if (test_bit(MMF_VM_PREFERRED, &mm->flags))
+			referenced <<= 1;
 		*vm_flags |= vma->vm_flags;
+	}
 out:
 	return referenced;
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Fine granularity page reclaim
  2012-02-20  7:09     ` Konstantin Khlebnikov
@ 2012-03-07 17:45       ` Zheng Liu
  2012-03-07 20:33         ` Konstantin Khlebnikov
  0 siblings, 1 reply; 8+ messages in thread
From: Zheng Liu @ 2012-03-07 17:45 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm@kvack.org, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4124 bytes --]

On Monday, February 20, 2012, Konstantin Khlebnikov <khlebnikov@openvz.org>
wrote:
> Zheng Liu wrote:
>>
>> Cc linux-kernel mailing list.
>>
>> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
>>>
>>> Zheng Liu wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Currently, we encounter a problem about page reclaim. In our product
system,
>>>> there is a lot of applictions that manipulate a number of files. In
these
>>>> files, they can be divided into two categories. One is index file,
another is
>>>> block file. The number of index files is about 15,000, and the number
of
>>>> block files is about 23,000 in a 2TB disk. The application accesses
index
>>>> file using mmap(2), and read/write block file using
pread(2)/pwrite(2). We hope
>>>> to hold index file in memory as much as possible, and it works well in
Redhat
>>>> 2.6.18-164. It is about 60-70% of index files that can be hold in
memory.
>>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18
that the
>>>> linux uses an active list and an inactive list to handle page reclaim,
and in
>>>> 2.6.32 that they are divided into anonymous list and file list. So I am
>>>> curious about why most of index files can be hold in 2.6.18? The index
file
>>>> should be replaced because mmap doesn't impact the lru list.
>>>
>>> There was my patch for fixing similar problem with shared/executable
mapped pages
>>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and
commit c909e99364c
>>> maybe it will help in your case.
>>
>> Hi Konstantin,
>>
>> Thank you for your reply.  I have tested it in upstream kernel.  These
>> patches are useful for multi-processes applications.  But, in our product
>> system, there are some applications that are multi-thread.  So
>> 'references_ptes>  1' cannot help these applications to hold the data in
>> memory.
>
> Ok, what if you mmap you data as executable, just to test.
> Then these pages will be activated after first touch.
> In attachment patch with per-mm flag with the same effect.
>

Hi Konstantin,

Sorry for the delay reply.  Last two weeks I was trying these two solutions
and evaluating the impacts for the performance in our product system.
Good news is that these two solutions both work well. They can keep
mapped files in memory under mult-thread.  But I have a question for
the first solution (map the file with PROT_EXEC flag).  I think this way is
too tricky.  As I said previously, these files that needs to be mapped only
are normal index file, and they shouldn't be mapped with PROT_EXEC flag
from the view of an application programmer.  So actually the key issue is
that we should provide a mechanism, which lets different file sets can be
reclaimed separately.  I am not sure whether this idea is useful or not.  So
any feedbacks are welcomed.:-).  Thank you.

Regards,
Zheng

>>
>> Regards,
>> Zheng
>>
>>>
>>>>
>>>> BTW, I have some problems that need to be discussed.
>>>>
>>>> 1. I want to let index and block files are separately reclaimed. Is
there any
>>>> ways to satisify me in current upstream?
>>>>
>>>> 2. Maybe we can provide a mechansim to let different files to be
mapped into
>>>> differnet nodes. we can provide a ioctl(2) to tell kernel that this
file should
>>>> be mapped into a specific node id. A nid member is added into
addpress_space
>>>> struct. When alloc_page is called, the page can be allocated from that
specific
>>>> node id.
>>>>
>>>> 3. Currently the page can be reclaimed according to pid in memcg. But
it is too
>>>> coarse. I don't know whether memcg could provide a fine granularity
page
>>>> reclaim mechansim. For example, the page is reclaimed according to
inode number.
>>>>
>>>> I don't subscribe this mailing list, So please Cc me. Thank you.
>>>>
>>>> Regards,
>>>> Zheng
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Fight unfair telecom internet charges in Canada: sign
http://stopthemeter.ca/
>>>> Don't email:<a href=mailto:"dont@kvack.org">   email@kvack.org</a>
>>>
>
>

[-- Attachment #2: Type: text/html, Size: 5475 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fine granularity page reclaim
  2012-03-07 17:45       ` Zheng Liu
@ 2012-03-07 20:33         ` Konstantin Khlebnikov
  2012-03-08  2:54           ` Zheng Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-07 20:33 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org

Zheng Liu wrote:
>
>
> On Monday, February 20, 2012, Konstantin Khlebnikov <khlebnikov@openvz.org <mailto:khlebnikov@openvz.org>> wrote:
>  > Zheng Liu wrote:
>  >>
>  >> Cc linux-kernel mailing list.
>  >>
>  >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
>  >>>
>  >>> Zheng Liu wrote:
>  >>>>
>  >>>> Hi all,
>  >>>>
>  >>>> Currently, we encounter a problem about page reclaim. In our product system,
>  >>>> there is a lot of applictions that manipulate a number of files. In these
>  >>>> files, they can be divided into two categories. One is index file, another is
>  >>>> block file. The number of index files is about 15,000, and the number of
>  >>>> block files is about 23,000 in a 2TB disk. The application accesses index
>  >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
>  >>>> to hold index file in memory as much as possible, and it works well in Redhat
>  >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
>  >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
>  >>>> linux uses an active list and an inactive list to handle page reclaim, and in
>  >>>> 2.6.32 that they are divided into anonymous list and file list. So I am
>  >>>> curious about why most of index files can be hold in 2.6.18? The index file
>  >>>> should be replaced because mmap doesn't impact the lru list.
>  >>>
>  >>> There was my patch for fixing similar problem with shared/executable mapped pages
>  >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
>  >>> maybe it will help in your case.
>  >>
>  >> Hi Konstantin,
>  >>
>  >> Thank you for your reply.  I have tested it in upstream kernel.  These
>  >> patches are useful for multi-processes applications.  But, in our product
>  >> system, there are some applications that are multi-thread.  So
>  >> 'references_ptes>  1' cannot help these applications to hold the data in
>  >> memory.
>  >
>  > Ok, what if you mmap you data as executable, just to test.
>  > Then these pages will be activated after first touch.
>  > In attachment patch with per-mm flag with the same effect.
>  >
>
> Hi Konstantin,
>
> Sorry for the delay reply.  Last two weeks I was trying these two solutions
> and evaluating the impacts for the performance in our product system.
> Good news is that these two solutions both work well. They can keep
> mapped files in memory under mult-thread.  But I have a question for
> the first solution (map the file with PROT_EXEC flag).  I think this way is
> too tricky.  As I said previously, these files that needs to be mapped only
> are normal index file, and they shouldn't be mapped with PROT_EXEC flag
> from the view of an application programmer.  So actually the key issue is
> that we should provide a mechanism, which lets different file sets can be
> reclaimed separately.  I am not sure whether this idea is useful or not.  So
> any feedbacks are welcomed.:-).  Thank you.
>

Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not
very flexible too. I prefer setting some kind of memory pressure priorities
for each vma and inode. Probably we can sort vma and inodes into different
cgroup-like sets and balance memory pressure between them.
Maybe someone was thought about it...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fine granularity page reclaim
  2012-03-07 20:33         ` Konstantin Khlebnikov
@ 2012-03-08  2:54           ` Zheng Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Zheng Liu @ 2012-03-08  2:54 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org

On Thu, Mar 08, 2012 at 12:33:20AM +0400, Konstantin Khlebnikov wrote:
> Zheng Liu wrote:
> >
> >
> >On Monday, February 20, 2012, Konstantin Khlebnikov <khlebnikov@openvz.org <mailto:khlebnikov@openvz.org>> wrote:
> > > Zheng Liu wrote:
> > >>
> > >> Cc linux-kernel mailing list.
> > >>
> > >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
> > >>>
> > >>> Zheng Liu wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> Currently, we encounter a problem about page reclaim. In our product system,
> > >>>> there is a lot of applictions that manipulate a number of files. In these
> > >>>> files, they can be divided into two categories. One is index file, another is
> > >>>> block file. The number of index files is about 15,000, and the number of
> > >>>> block files is about 23,000 in a 2TB disk. The application accesses index
> > >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
> > >>>> to hold index file in memory as much as possible, and it works well in Redhat
> > >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
> > >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
> > >>>> linux uses an active list and an inactive list to handle page reclaim, and in
> > >>>> 2.6.32 that they are divided into anonymous list and file list. So I am
> > >>>> curious about why most of index files can be hold in 2.6.18? The index file
> > >>>> should be replaced because mmap doesn't impact the lru list.
> > >>>
> > >>> There was my patch for fixing similar problem with shared/executable mapped pages
> > >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
> > >>> maybe it will help in your case.
> > >>
> > >> Hi Konstantin,
> > >>
> > >> Thank you for your reply.  I have tested it in upstream kernel.  These
> > >> patches are useful for multi-processes applications.  But, in our product
> > >> system, there are some applications that are multi-thread.  So
> > >> 'references_ptes>  1' cannot help these applications to hold the data in
> > >> memory.
> > >
> > > Ok, what if you mmap you data as executable, just to test.
> > > Then these pages will be activated after first touch.
> > > In attachment patch with per-mm flag with the same effect.
> > >
> >
> >Hi Konstantin,
> >
> >Sorry for the delay reply.  Last two weeks I was trying these two solutions
> >and evaluating the impacts for the performance in our product system.
> >Good news is that these two solutions both work well. They can keep
> >mapped files in memory under mult-thread.  But I have a question for
> >the first solution (map the file with PROT_EXEC flag).  I think this way is
> >too tricky.  As I said previously, these files that needs to be mapped only
> >are normal index file, and they shouldn't be mapped with PROT_EXEC flag
> >from the view of an application programmer.  So actually the key issue is
> >that we should provide a mechanism, which lets different file sets can be
> >reclaimed separately.  I am not sure whether this idea is useful or not.  So
> >any feedbacks are welcomed.:-).  Thank you.
> >
> 
> Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not
> very flexible too. I prefer setting some kind of memory pressure priorities
> for each vma and inode. Probably we can sort vma and inodes into different
> cgroup-like sets and balance memory pressure between them.
> Maybe someone was thought about it...

Thanks for your advices.  About setting pressure priorities for each vma
and inode, I will send a new mail to mailing list to discuss this
problem.  Maybe someone has some good ideas for it. ;-)

Regards,
Zheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fine granularity page reclaim
  2012-02-17  9:22 Fine granularity page reclaim Zheng Liu
  2012-02-17 20:20 ` Konstantin Khlebnikov
@ 2012-04-07  0:18 ` Ying Han
  1 sibling, 0 replies; 8+ messages in thread
From: Ying Han @ 2012-04-07  0:18 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-mm

On Fri, Feb 17, 2012 at 1:22 AM, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> Hi all,
>
> Currently, we encounter a problem about page reclaim. In our product system,
> there is a lot of applictions that manipulate a number of files. In these
> files, they can be divided into two categories. One is index file, another is
> block file. The number of index files is about 15,000, and the number of
> block files is about 23,000 in a 2TB disk. The application accesses index
> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
> to hold index file in memory as much as possible, and it works well in Redhat
> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
> linux uses an active list and an inactive list to handle page reclaim, and in
> 2.6.32 that they are divided into anonymous list and file list. So I am
> curious about why most of index files can be hold in 2.6.18?

One of changes after the split-lru is different scan ratio (active vs
inactive) for file-lru and anon-lru. You can check the following two
functions:

inactive_anon_is_low_global()
inactive_file_is_low_global()

Depends on your machine size, we might end of scanning more pages on file lru.

--Ying

The index file
> should be replaced because mmap doesn't impact the lru list.
>
> BTW, I have some problems that need to be discussed.
>
> 1. I want to let index and block files are separately reclaimed. Is there any
> ways to satisify me in current upstream?
>
> 2. Maybe we can provide a mechansim to let different files to be mapped into
> differnet nodes. we can provide a ioctl(2) to tell kernel that this file should
> be mapped into a specific node id. A nid member is added into addpress_space
> struct. When alloc_page is called, the page can be allocated from that specific
> node id.
>
> 3. Currently the page can be reclaimed according to pid in memcg. But it is too
> coarse. I don't know whether memcg could provide a fine granularity page
> reclaim mechansim. For example, the page is reclaimed according to inode number.
>
> I don't subscribe this mailing list, So please Cc me. Thank you.
>
> Regards,
> Zheng
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-04-07  0:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-17  9:22 Fine granularity page reclaim Zheng Liu
2012-02-17 20:20 ` Konstantin Khlebnikov
2012-02-20  6:20   ` Zheng Liu
2012-02-20  7:09     ` Konstantin Khlebnikov
2012-03-07 17:45       ` Zheng Liu
2012-03-07 20:33         ` Konstantin Khlebnikov
2012-03-08  2:54           ` Zheng Liu
2012-04-07  0:18 ` Ying Han

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).