* [RFC][PATCH] the proposal of improve page reclaim by throttle
@ 2008-02-19 5:44 KOSAKI Motohiro
2008-02-19 6:34 ` Nick Piggin
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2008-02-19 5:44 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn,
linux-mm, linux-kernel
Cc: kosaki.motohiro
background
========================================
current VM implementation doesn't has limit of # of parallel reclaim.
when heavy workload, it bring to 2 bad things
- heavy lock contention
- unnecessary swap out
abount 2 month ago, KAMEZA Hiroyuki proposed the patch of page
reclaim throttle and explain it improve reclaim time.
http://marc.info/?l=linux-mm&m=119667465917215&w=2
but unfortunately it works only memcgroup reclaim.
Today, I implement it again for support global reclaim and mesure it.
test machine, method and result
==================================================
<test machine>
CPU: IA64 x8
MEM: 8GB
SWAP: 2GB
<test method>
got hackbench from
http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
$ /usr/bin/time hackbench 120 process 1000
this parameter mean consume all physical memory and
1GB swap space on my test environment.
<test result (average of 3 times measurement)>
before:
hackbench result: 282.30
/usr/bin/time result
user: 14.16
sys: 1248.47
elapse: 432.93
major fault: 29026
max parallel reclaim tasks: 1298
max consumption time of
try_to_free_pages(): 70394
after:
hackbench result: 30.36
/usr/bin/time result
user: 14.26
sys: 294.44
elapse: 118.01
major fault: 3064
max parallel reclaim tasks: 4
max consumption time of
try_to_free_pages(): 12234
conclusion
=========================================
this patch improve 3 things.
1. reduce unnecessary swap
(see above major fault. about 90% reduced)
2. improve throughput performance
(see above hackbench result. about 90% reduced)
3. improve interactive performance.
(see above max consumption of try_to_free_pages.
about 80% reduced)
4. reduce lock contention.
(see above sys time. about 80% reduced)
Now, we got about 1000% performance improvement of hackbench :)
foture works
==========================================================
- more discussion with memory controller guys.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Balbir Singh <balbir@linux.vnet.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
---
include/linux/nodemask.h | 1
mm/vmscan.c | 49 +++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 48 insertions(+), 2 deletions(-)
Index: b/include/linux/nodemask.h
===================================================================
--- a/include/linux/nodemask.h 2008-02-19 13:58:05.000000000 +0900
+++ b/include/linux/nodemask.h 2008-02-19 13:58:23.000000000 +0900
@@ -431,6 +431,7 @@ static inline int num_node_state(enum no
#define num_online_nodes() num_node_state(N_ONLINE)
#define num_possible_nodes() num_node_state(N_POSSIBLE)
+#define num_highmem_nodes() num_node_state(N_HIGH_MEMORY)
#define node_online(node) node_state((node), N_ONLINE)
#define node_possible(node) node_state((node), N_POSSIBLE)
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c 2008-02-19 13:58:05.000000000 +0900
+++ b/mm/vmscan.c 2008-02-19 14:04:06.000000000 +0900
@@ -127,6 +127,11 @@ long vm_total_pages; /* The total number
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
+static atomic_t nr_reclaimers = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(reclaim_throttle_waitq);
+#define RECLAIM_LIMIT (2 * num_highmem_nodes())
+
+
#ifdef CONFIG_CGROUP_MEM_CONT
#define scan_global_lru(sc) (!(sc)->mem_cgroup)
#else
@@ -1421,6 +1426,46 @@ out:
return ret;
}
+static unsigned long try_to_free_pages_throttled(struct zone **zones,
+ int order,
+ gfp_t gfp_mask,
+ struct scan_control *sc)
+{
+ unsigned long nr_reclaimed = 0;
+ unsigned long start_time;
+ int i;
+
+ start_time = jiffies;
+
+ wait_event(reclaim_throttle_waitq,
+ atomic_add_unless(&nr_reclaimers, 1, RECLAIM_LIMIT));
+
+ /* more reclaim until needed? */
+ if (unlikely(time_after(jiffies, start_time + HZ))) {
+ for (i = 0; zones[i] != NULL; i++) {
+ struct zone *zone = zones[i];
+ int classzone_idx = zone_idx(zones[0]);
+
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_watermark_ok(zone, order, 4*zone->pages_high,
+ classzone_idx, 0)) {
+ nr_reclaimed = 1;
+ goto out;
+ }
+ }
+ }
+
+ nr_reclaimed = do_try_to_free_pages(zones, gfp_mask, sc);
+
+out:
+ atomic_dec(&nr_reclaimers);
+ wake_up_all(&reclaim_throttle_waitq);
+
+ return nr_reclaimed;
+}
+
unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask)
{
struct scan_control sc = {
@@ -1434,7 +1479,7 @@ unsigned long try_to_free_pages(struct z
.isolate_pages = isolate_pages_global,
};
- return do_try_to_free_pages(zones, gfp_mask, &sc);
+ return try_to_free_pages_throttled(zones, order, gfp_mask, &sc);
}
#ifdef CONFIG_CGROUP_MEM_CONT
@@ -1456,7 +1501,7 @@ unsigned long try_to_free_mem_cgroup_pag
int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE);
zones = NODE_DATA(numa_node_id())->node_zonelists[target_zone].zones;
- if (do_try_to_free_pages(zones, sc.gfp_mask, &sc))
+ if (try_to_free_pages_throttled(zones, 0, sc.gfp_mask, &sc))
return 1;
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-19 5:44 [RFC][PATCH] the proposal of improve page reclaim by throttle KOSAKI Motohiro @ 2008-02-19 6:34 ` Nick Piggin 2008-02-19 7:09 ` KOSAKI Motohiro 2008-02-19 13:31 ` Rik van Riel 2008-02-20 8:56 ` minchan Kim 2008-02-21 9:48 ` Balbir Singh 2 siblings, 2 replies; 15+ messages in thread From: Nick Piggin @ 2008-02-19 6:34 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel On Tuesday 19 February 2008 16:44, KOSAKI Motohiro wrote: > background > ======================================== > current VM implementation doesn't has limit of # of parallel reclaim. > when heavy workload, it bring to 2 bad things > - heavy lock contention > - unnecessary swap out > > abount 2 month ago, KAMEZA Hiroyuki proposed the patch of page > reclaim throttle and explain it improve reclaim time. > http://marc.info/?l=linux-mm&m=119667465917215&w=2 > > but unfortunately it works only memcgroup reclaim. > Today, I implement it again for support global reclaim and mesure it. > > > test machine, method and result > ================================================== > <test machine> > CPU: IA64 x8 > MEM: 8GB > SWAP: 2GB > > <test method> > got hackbench from > http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c > > $ /usr/bin/time hackbench 120 process 1000 > > this parameter mean consume all physical memory and > 1GB swap space on my test environment. > > <test result (average of 3 times measurement)> > > before: > hackbench result: 282.30 > /usr/bin/time result > user: 14.16 > sys: 1248.47 > elapse: 432.93 > major fault: 29026 > max parallel reclaim tasks: 1298 > max consumption time of > try_to_free_pages(): 70394 > > after: > hackbench result: 30.36 > /usr/bin/time result > user: 14.26 > sys: 294.44 > elapse: 118.01 > major fault: 3064 > max parallel reclaim tasks: 4 > max consumption time of > try_to_free_pages(): 12234 > > > conclusion > ========================================= > this patch improve 3 things. > 1. reduce unnecessary swap > (see above major fault. about 90% reduced) > 2. improve throughput performance > (see above hackbench result. about 90% reduced) > 3. improve interactive performance. > (see above max consumption of try_to_free_pages. > about 80% reduced) > 4. reduce lock contention. > (see above sys time. about 80% reduced) > > > Now, we got about 1000% performance improvement of hackbench :) > > > > foture works > ========================================================== > - more discussion with memory controller guys. Hi, Yeah this is definitely needed and a nice result. I'm worried about a) placing a global limit on parallelism, and b) placing a limit on parallelism at all. I think it should maybe be a per-zone thing... What happens if you make it a per-zone mutex, and allow just a single process to reclaim pages from a given zone at a time? I guess that is going to slow down throughput a little bit in some cases though... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-19 6:34 ` Nick Piggin @ 2008-02-19 7:09 ` KOSAKI Motohiro 2008-02-19 13:31 ` Rik van Riel 1 sibling, 0 replies; 15+ messages in thread From: KOSAKI Motohiro @ 2008-02-19 7:09 UTC (permalink / raw) To: Nick Piggin Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel Hi Nick, > Yeah this is definitely needed and a nice result. > > I'm worried about a) placing a global limit on parallelism, and b) > placing a limit on parallelism at all. sorry, i don't understand yet. a) and b) have any relation? > > I think it should maybe be a per-zone thing... > > What happens if you make it a per-zone mutex, and allow just a single > process to reclaim pages from a given zone at a time? I guess that is > going to slow down throughput a little bit in some cases though... That makes sense. OK. I'll repost after 2-3 days. Thanks. - kosaki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-19 6:34 ` Nick Piggin 2008-02-19 7:09 ` KOSAKI Motohiro @ 2008-02-19 13:31 ` Rik van Riel 1 sibling, 0 replies; 15+ messages in thread From: Rik van Riel @ 2008-02-19 13:31 UTC (permalink / raw) To: Nick Piggin Cc: KOSAKI Motohiro, KAMEZAWA Hiroyuki, Balbir Singh, Lee Schermerhorn, linux-mm, linux-kernel On Tue, 19 Feb 2008 17:34:59 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Tuesday 19 February 2008 16:44, KOSAKI Motohiro wrote: > > background > > ======================================== > > current VM implementation doesn't has limit of # of parallel reclaim. > > when heavy workload, it bring to 2 bad things > > - heavy lock contention > > - unnecessary swap out > I think it should maybe be a per-zone thing... > > What happens if you make it a per-zone mutex, and allow just a single > process to reclaim pages from a given zone at a time? I guess that is > going to slow down throughput a little bit in some cases though... I agree, doing things per zone will probably work better, because that way one process can do page reclaim on every NUMA node at the same time. -- All rights reversed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-19 5:44 [RFC][PATCH] the proposal of improve page reclaim by throttle KOSAKI Motohiro 2008-02-19 6:34 ` Nick Piggin @ 2008-02-20 8:56 ` minchan Kim 2008-02-20 9:24 ` KOSAKI Motohiro 2008-02-21 9:48 ` Balbir Singh 2 siblings, 1 reply; 15+ messages in thread From: minchan Kim @ 2008-02-20 8:56 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel Hi, KOSAKI. I am a many interested in your patch. so I want to test it with exact same method as you did. I will test it in embedded environment(ARM 920T, 32M ram) and my desktop machine.(Core2Duo 2.2G, 2G ram) I guess this patch won't be efficient in embedded environment. Since many embedded board just have one processor and don't have any swap device. What I want to know is that this patch have a regression in UP and NO swap device like embedded. I think I can't show some field only top or freemem. Becuase top or freemem won't be able to work well if system have a great overhead with page reclaiming and swapping. So, How do I evaluate following field as you did ? * elapse (what do you mean it ??) * major fault * max parallel reclaim tasks: * max consumption time of try_to_free_pages(): If you have a patch for testing, Let me receive it. On Feb 19, 2008 2:44 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > background > ======================================== > current VM implementation doesn't has limit of # of parallel reclaim. > when heavy workload, it bring to 2 bad things > - heavy lock contention > - unnecessary swap out > > abount 2 month ago, KAMEZA Hiroyuki proposed the patch of page > reclaim throttle and explain it improve reclaim time. > http://marc.info/?l=linux-mm&m=119667465917215&w=2 > > but unfortunately it works only memcgroup reclaim. > Today, I implement it again for support global reclaim and mesure it. > > > test machine, method and result > ================================================== > <test machine> > CPU: IA64 x8 > MEM: 8GB > SWAP: 2GB > > <test method> > got hackbench from > http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c > > $ /usr/bin/time hackbench 120 process 1000 > > this parameter mean consume all physical memory and > 1GB swap space on my test environment. > > <test result (average of 3 times measurement)> > > before: > hackbench result: 282.30 > /usr/bin/time result > user: 14.16 > sys: 1248.47 > elapse: 432.93 > major fault: 29026 > max parallel reclaim tasks: 1298 > max consumption time of > try_to_free_pages(): 70394 > > after: > hackbench result: 30.36 > /usr/bin/time result > user: 14.26 > sys: 294.44 > elapse: 118.01 > major fault: 3064 > max parallel reclaim tasks: 4 > max consumption time of > try_to_free_pages(): 12234 > > > conclusion > ========================================= > this patch improve 3 things. > 1. reduce unnecessary swap > (see above major fault. about 90% reduced) > 2. improve throughput performance > (see above hackbench result. about 90% reduced) > 3. improve interactive performance. > (see above max consumption of try_to_free_pages. > about 80% reduced) > 4. reduce lock contention. > (see above sys time. about 80% reduced) > > > Now, we got about 1000% performance improvement of hackbench :) > > > > foture works > ========================================================== > - more discussion with memory controller guys. > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > CC: Balbir Singh <balbir@linux.vnet.ibm.com> > CC: Rik van Riel <riel@redhat.com> > CC: Lee Schermerhorn <Lee.Schermerhorn@hp.com> > > --- > include/linux/nodemask.h | 1 > mm/vmscan.c | 49 +++++++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 48 insertions(+), 2 deletions(-) > > Index: b/include/linux/nodemask.h > =================================================================== > --- a/include/linux/nodemask.h 2008-02-19 13:58:05.000000000 +0900 > +++ b/include/linux/nodemask.h 2008-02-19 13:58:23.000000000 +0900 > @@ -431,6 +431,7 @@ static inline int num_node_state(enum no > > #define num_online_nodes() num_node_state(N_ONLINE) > #define num_possible_nodes() num_node_state(N_POSSIBLE) > +#define num_highmem_nodes() num_node_state(N_HIGH_MEMORY) > #define node_online(node) node_state((node), N_ONLINE) > #define node_possible(node) node_state((node), N_POSSIBLE) > > Index: b/mm/vmscan.c > =================================================================== > --- a/mm/vmscan.c 2008-02-19 13:58:05.000000000 +0900 > +++ b/mm/vmscan.c 2008-02-19 14:04:06.000000000 +0900 > @@ -127,6 +127,11 @@ long vm_total_pages; /* The total number > static LIST_HEAD(shrinker_list); > static DECLARE_RWSEM(shrinker_rwsem); > > +static atomic_t nr_reclaimers = ATOMIC_INIT(0); > +static DECLARE_WAIT_QUEUE_HEAD(reclaim_throttle_waitq); > +#define RECLAIM_LIMIT (2 * num_highmem_nodes()) > + > + > #ifdef CONFIG_CGROUP_MEM_CONT > #define scan_global_lru(sc) (!(sc)->mem_cgroup) > #else > @@ -1421,6 +1426,46 @@ out: > return ret; > } > > +static unsigned long try_to_free_pages_throttled(struct zone **zones, > + int order, > + gfp_t gfp_mask, > + struct scan_control *sc) > +{ > + unsigned long nr_reclaimed = 0; > + unsigned long start_time; > + int i; > + > + start_time = jiffies; > + > + wait_event(reclaim_throttle_waitq, > + atomic_add_unless(&nr_reclaimers, 1, RECLAIM_LIMIT)); > + > + /* more reclaim until needed? */ > + if (unlikely(time_after(jiffies, start_time + HZ))) { > + for (i = 0; zones[i] != NULL; i++) { > + struct zone *zone = zones[i]; > + int classzone_idx = zone_idx(zones[0]); > + > + if (!populated_zone(zone)) > + continue; > + > + if (zone_watermark_ok(zone, order, 4*zone->pages_high, > + classzone_idx, 0)) { > + nr_reclaimed = 1; > + goto out; > + } > + } > + } > + > + nr_reclaimed = do_try_to_free_pages(zones, gfp_mask, sc); > + > +out: > + atomic_dec(&nr_reclaimers); > + wake_up_all(&reclaim_throttle_waitq); > + > + return nr_reclaimed; > +} > + > unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask) > { > struct scan_control sc = { > @@ -1434,7 +1479,7 @@ unsigned long try_to_free_pages(struct z > .isolate_pages = isolate_pages_global, > }; > > - return do_try_to_free_pages(zones, gfp_mask, &sc); > + return try_to_free_pages_throttled(zones, order, gfp_mask, &sc); > } > > #ifdef CONFIG_CGROUP_MEM_CONT > @@ -1456,7 +1501,7 @@ unsigned long try_to_free_mem_cgroup_pag > int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE); > > zones = NODE_DATA(numa_node_id())->node_zonelists[target_zone].zones; > - if (do_try_to_free_pages(zones, sc.gfp_mask, &sc)) > + if (try_to_free_pages_throttled(zones, 0, sc.gfp_mask, &sc)) > return 1; > return 0; > } > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- Thanks, barrios -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-20 8:56 ` minchan Kim @ 2008-02-20 9:24 ` KOSAKI Motohiro 2008-02-20 9:49 ` minchan Kim 0 siblings, 1 reply; 15+ messages in thread From: KOSAKI Motohiro @ 2008-02-20 9:24 UTC (permalink / raw) To: minchan Kim Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel Hi Kim-san Do you adjust hackbench parameter? my parameter adjust my test machine(8GB mem), if unchanged, maybe doesn't works it because lack memory. > I am a many interested in your patch. so I want to test it with exact > same method as you did. > I will test it in embedded environment(ARM 920T, 32M ram) and my > desktop machine.(Core2Duo 2.2G, 2G ram) Hm I don't have embedded test machine. but I can desktop. I will test it about weekend. if you don't mind, could you please send me .config file and tell me your test kernel version? Thanks, interesting report. > I guess this patch won't be efficient in embedded environment. > Since many embedded board just have one processor and don't have any > swap device. reclaim conflict rarely happened on UP. thus, my patch expect no improvement. but (of course) I will fix regression. > So, How do I evaluate following field as you did ? > > * elapse (what do you mean it ??) > * major fault /usr/bin/time command output that. > * max parallel reclaim tasks: > * max consumption time of > try_to_free_pages(): sorry, I inserted debug code to my patch at that time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-20 9:24 ` KOSAKI Motohiro @ 2008-02-20 9:49 ` minchan Kim 2008-02-20 10:09 ` KOSAKI Motohiro 0 siblings, 1 reply; 15+ messages in thread From: minchan Kim @ 2008-02-20 9:49 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel On Feb 20, 2008 6:24 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Hi Kim-san > > Do you adjust hackbench parameter? > my parameter adjust my test machine(8GB mem), > if unchanged, maybe doesn't works it because lack memory. I already adjusted it. :-) But, In my desktop, I couldn't make to consune my swap device above half. (My swap device is 512M size) Because my kernel almost was hang before happening many swapping. Perhaps, it might be a not hang. However, Although I wait a very long time, My box don't have a any response. I will try do it more. > > I am a many interested in your patch. so I want to test it with exact > > same method as you did. > > I will test it in embedded environment(ARM 920T, 32M ram) and my > > desktop machine.(Core2Duo 2.2G, 2G ram) > > Hm > I don't have embedded test machine. > but I can desktop. > I will test it about weekend. > if you don't mind, could you please send me .config file > and tell me your test kernel version? I mean I will test your patch by myself. Because I already have a embedded board and Desktop. > Thanks, interesting report. > > > > I guess this patch won't be efficient in embedded environment. > > Since many embedded board just have one processor and don't have any > > swap device. > > reclaim conflict rarely happened on UP. > thus, my patch expect no improvement. I agree with you. > but (of course) I will fix regression. I didn't say your patch had a regression. What I mean is just that I am concern about it. Actually, Many VM guys is working on server environment. They didn't try to do performance test in embedde system. and that patch was submitted in mainline. Actually, I am concern about it. > > So, How do I evaluate following field as you did ? > > > > * elapse (what do you mean it ??) > > * major fault > > /usr/bin/time command output that. > > > > * max parallel reclaim tasks: > > * max consumption time of > > try_to_free_pages(): > > sorry, I inserted debug code to my patch at that time. > Could you send me that debug code ? If you will send it to me, I will test it my environment (ARM-920T, Core2Duo). And I will report test result. -- Thanks, barrios -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-20 9:49 ` minchan Kim @ 2008-02-20 10:09 ` KOSAKI Motohiro 2008-02-21 9:38 ` minchan Kim 0 siblings, 1 reply; 15+ messages in thread From: KOSAKI Motohiro @ 2008-02-20 10:09 UTC (permalink / raw) To: minchan Kim Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 959 bytes --] Hi > > > * max parallel reclaim tasks: > > > * max consumption time of > > > try_to_free_pages(): > > > > sorry, I inserted debug code to my patch at that time. > > Could you send me that debug code ? > If you will send it to me, I will test it my environment (ARM-920T, Core2Duo). > And I will report test result. attached it. but it is very messy ;-) usage: ./benchloop.sh sample output ========================================================= max reclaim 2 Running with 120*40 (== 4800) tasks. Time: 34.177 14.17user 284.38system 1:43.85elapsed 287%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (3813major+148922minor)pagefaults 0swaps max prepare time: 4599 0 max reclaim time: 2350 5781 total 8271 max reclaimer 4 max overkill 62131 max saved overkill 9740 max reclaimer represent to max parallel reclaim tasks. total represetnto max consumption time of try_to_free_pages(). Thanks [-- Attachment #2: reclaim-throttle-3.patch --] [-- Type: application/octet-stream, Size: 7106 bytes --] --- include/linux/mmzone.h | 1 include/linux/nodemask.h | 2 kernel/sysctl.c | 78 ++++++++++++++++++++++++++++++++++++++ mm/vmscan.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 173 insertions(+), 3 deletions(-) Index: b/kernel/sysctl.c =================================================================== --- a/kernel/sysctl.c 2008-02-15 20:14:40.000000000 +0900 +++ b/kernel/sysctl.c 2008-02-16 16:45:58.000000000 +0900 @@ -187,6 +187,18 @@ int sysctl_legacy_va_layout; extern int prove_locking; extern int lock_stat; +extern int max_reclaimer; +extern unsigned long max_reclaim_time; +extern unsigned long max_reclaim_prepare_time; +extern int reclaim_limit; +extern unsigned long max_overkill_reclaim; + +extern unsigned long max_reclaim_time_aux; +extern unsigned long max_reclaim_prepare_time_aux; +extern unsigned long max_total_time; + + + /* The default sysctl tables: */ static struct ctl_table root_table[] = { @@ -1155,6 +1167,72 @@ static struct ctl_table vm_table[] = { .extra2 = &one, }, #endif + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_reclaimer", + .data = &max_reclaimer, + .maxlen = sizeof(max_reclaimer), + .mode = 0644, + .proc_handler = &proc_dointvec, + .strategy = &sysctl_intvec, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_reclaim_time", + .data = &max_reclaim_time, + .maxlen = sizeof(max_reclaim_time), + .mode = 0644, + .proc_handler = &proc_doulongvec_minmax, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_reclaim_prepare_time", + .data = &max_reclaim_prepare_time, + .maxlen = sizeof(max_reclaim_prepare_time), + .mode = 0644, + .proc_handler = &proc_doulongvec_minmax, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "reclaim_limit", + .data = &reclaim_limit, + .maxlen = sizeof(reclaim_limit), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_overkill_reclaim", + .data = &max_overkill_reclaim, + .maxlen = sizeof(max_overkill_reclaim), + .mode = 0644, + .proc_handler = &proc_doulongvec_minmax, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_reclaim_time_aux", + .data = &max_reclaim_time_aux, + .maxlen = sizeof(max_reclaim_time_aux), + .mode = 0644, + .proc_handler = &proc_doulongvec_minmax, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_reclaim_prepare_time_aux", + .data = &max_reclaim_prepare_time_aux, + .maxlen = sizeof(max_reclaim_prepare_time_aux), + .mode = 0644, + .proc_handler = &proc_doulongvec_minmax, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_total_time", + .data = &max_total_time, + .maxlen = sizeof(max_total_time), + .mode = 0644, + .proc_handler = &proc_doulongvec_minmax, + }, + /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt Index: b/mm/vmscan.c =================================================================== --- a/mm/vmscan.c 2008-02-15 20:14:40.000000000 +0900 +++ b/mm/vmscan.c 2008-02-17 11:50:50.000000000 +0900 @@ -1421,6 +1421,23 @@ out: return ret; } +static DEFINE_SPINLOCK(research_reclaim_max_lock); +static atomic_t nr_reclaimers = ATOMIC_INIT(0); +static DECLARE_WAIT_QUEUE_HEAD(reclaim_throttle_waitq); + +// limit +int reclaim_limit = 2; +#define RECLAIM_LIMIT (reclaim_limit * num_highmem_nodes()) + +// record +int max_reclaimer = 0; +unsigned long max_reclaim_time = 0; +unsigned long max_reclaim_time_aux = 0; +unsigned long max_reclaim_prepare_time = 0; +unsigned long max_reclaim_prepare_time_aux = 0; +unsigned long max_overkill_reclaim = 0; +unsigned long max_total_time = 0; + unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask) { struct scan_control sc = { @@ -1433,8 +1450,82 @@ unsigned long try_to_free_pages(struct z .mem_cgroup = NULL, .isolate_pages = isolate_pages_global, }; - - return do_try_to_free_pages(zones, gfp_mask, &sc); + unsigned long nr_reclaimed; + u64 start_time; + u64 prepared_time; + u64 end_time; + u64 preparing_time; + u64 reclaiming_time; + unsigned long free_mem; + int record_max_prepare_time = 0; + unsigned long total_time; + + start_time = jiffies_64; + + if (unlikely(!atomic_add_unless(&nr_reclaimers, 1, RECLAIM_LIMIT))) + wait_event(reclaim_throttle_waitq, + atomic_add_unless(&nr_reclaimers, 1, RECLAIM_LIMIT)); + + spin_lock(&research_reclaim_max_lock); + if (atomic_read(&nr_reclaimers) > max_reclaimer) + max_reclaimer = atomic_read(&nr_reclaimers); + + prepared_time = jiffies_64; + preparing_time = prepared_time - start_time; + if (preparing_time > max_reclaim_time) { + record_max_prepare_time = 1; + max_reclaim_prepare_time = preparing_time; + } + spin_unlock(&research_reclaim_max_lock); + + /* more reclaim until needed? */ + if (preparing_time > HZ) { + int i; + + for (i = 0; zones[i] != NULL; i++) { + struct zone *zone = zones[i]; + int classzone_idx = zone_idx(zones[0]); + + if (!populated_zone(zone)) + continue; + + if (zone_watermark_ok(zone, order, 4*zone->pages_high, + classzone_idx, 0)) { + nr_reclaimed = 1; + goto out; + } + } + } + + nr_reclaimed = do_try_to_free_pages(zones, gfp_mask, &sc); + + spin_lock(&research_reclaim_max_lock); + end_time = jiffies_64; + reclaiming_time = end_time - prepared_time; + + if (record_max_prepare_time) + max_reclaim_prepare_time_aux = reclaiming_time; + + if (reclaiming_time > max_reclaim_time) { + max_reclaim_time_aux = preparing_time; + max_reclaim_time = reclaiming_time; + } + + total_time = preparing_time + reclaiming_time; + if( total_time > max_total_time ){ + max_total_time = total_time; + } + + free_mem = global_page_state(NR_FREE_PAGES); + if (free_mem > max_overkill_reclaim) + max_overkill_reclaim = free_mem; + spin_unlock(&research_reclaim_max_lock); + +out: + atomic_dec(&nr_reclaimers); + wake_up_all(&reclaim_throttle_waitq); + + return nr_reclaimed; } #ifdef CONFIG_CGROUP_MEM_CONT Index: b/include/linux/mmzone.h =================================================================== --- a/include/linux/mmzone.h 2008-02-15 20:14:40.000000000 +0900 +++ b/include/linux/mmzone.h 2008-02-15 20:14:49.000000000 +0900 @@ -334,7 +334,6 @@ struct zone { */ unsigned long spanned_pages; /* total size, including holes */ unsigned long present_pages; /* amount of memory (excluding holes) */ - /* * rarely used fields: */ Index: b/include/linux/nodemask.h =================================================================== --- a/include/linux/nodemask.h 2008-02-15 20:14:40.000000000 +0900 +++ b/include/linux/nodemask.h 2008-02-15 20:14:49.000000000 +0900 @@ -431,6 +431,8 @@ static inline int num_node_state(enum no #define num_online_nodes() num_node_state(N_ONLINE) #define num_possible_nodes() num_node_state(N_POSSIBLE) +#define num_highmem_nodes() num_node_state(N_HIGH_MEMORY) + #define node_online(node) node_state((node), N_ONLINE) #define node_possible(node) node_state((node), N_POSSIBLE) [-- Attachment #3: benchloop.sh --] [-- Type: application/octet-stream, Size: 1515 bytes --] #!/bin/sh for i in 2 10000; do for j in 1 2 3; do sudo sh -c "echo $i > /proc/sys/vm/reclaim_limit" sudo sh -c "echo 0 > /proc/sys/vm/max_reclaim_prepare_time" sudo sh -c "echo 0 > /proc/sys/vm/max_reclaim_time" sudo sh -c "echo 0 > /proc/sys/vm/max_reclaim_prepare_time_aux" sudo sh -c "echo 0 > /proc/sys/vm/max_reclaim_time_aux" sudo sh -c "echo 0 > /proc/sys/vm/max_total_time" sudo sh -c "echo 0 > /proc/sys/vm/max_reclaimer" sudo sh -c "echo 0 > /proc/sys/vm/max_overkill_reclaim" sudo sh -c "echo 0 > /proc/sys/vm/max_saved_overkill_reclaim" sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" echo "max reclaim $i" /usr/bin/time ./hackbench 120 process 1000 2>&1 | uniq prepare_time=`cat /proc/sys/vm/max_reclaim_prepare_time` prepare_time_aux=`cat /proc/sys/vm/max_reclaim_prepare_time_aux` echo "max prepare time: $prepare_time $prepare_time_aux" reclaim_time_aux=`cat /proc/sys/vm/max_reclaim_time_aux` reclaim_time=`cat /proc/sys/vm/max_reclaim_time` echo "max reclaim time: $reclaim_time_aux $reclaim_time" echo "total" cat /proc/sys/vm/max_total_time echo "max reclaimer" cat /proc/sys/vm/max_reclaimer echo "max overkill" cat /proc/sys/vm/max_overkill_reclaim echo "max saved overkill" cat /proc/sys/vm/max_saved_overkill_reclaim sudo sh -c "echo 1000 > /proc/sys/vm/reclaim_limit" done done ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-20 10:09 ` KOSAKI Motohiro @ 2008-02-21 9:38 ` minchan Kim 2008-02-21 10:55 ` KOSAKI Motohiro 0 siblings, 1 reply; 15+ messages in thread From: minchan Kim @ 2008-02-21 9:38 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel I miss CC's. so I resend. First of all, I tried test it in embedded board. --- <test machine> CPU: 200MHz(ARM926EJ-S) MEM: 32M SWAP: none KERNEL : 2.6.25-rc1 <test 1> - NO SWAP before : Running with 5*40 (== 200) tasks. Time: 12.591 Command being timed: "./hackbench.arm 5 process 100" User time (seconds): 0.78 System time(seconds): 13.39 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 14.22s Major (requiring I/O) page faults: 20 max parallel reclaim tasks: 30 max consumption time of try_to_free_pages(): 789 after: Running with 5*40 (== 200) tasks. Time: 11.535 Command being timed: "./hackbench.arm 5 process 100" User time (seconds): 0.69 System time (seconds): 12.42 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 13.16s Major (requiring I/O) page faults: 18 max parallel reclaim tasks: 4 max consumption time of try_to_free_pages(): 740 <test 2> - SWAP before: Running with 6*40 (== 240) tasks. Time: 121.686 Command being timed: "./hackbench.arm 6 process 100" User time (seconds): 1.89 System time (seconds): 44.95 Percent of CPU this job got: 37% Elapsed (wall clock) time (h:mm:ss or m:ss): 2m 3.79s Major (requiring I/O) page faults: 230 max parallel reclaim tasks: 56 max consumption time of try_to_free_pages(): 10811 after : Running with 6*40 (== 240) tasks. Time: 67.757 Command being timed: "./hackbench.arm 6 process 100" User time (seconds): 1.56 System time (seconds): 35.41 Percent of CPU this job got: 52% Elapsed (wall clock) time (h:mm:ss or m:ss): 1m 9.87s Major (requiring I/O) page faults: 16 max parallel reclaim tasks: 4 max consumption time of try_to_free_pages(): 6419 <test 3> NO_SWAP before: ' OOM killer kill hackbench!!!' after : Time: 16.578 Command being timed: "./hackbench.arm 6 process 100" User time (seconds): 0.71 System time (seconds): 17.92 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 18.69s Major (requiring I/O) page faults: 22 max parallel reclaim tasks: 4 max consumption time of try_to_free_pages(): 1785 =============================== It was a very interesting result. In embedded system, your patch improve performance a little in case without noswap(normal case in embedded system). But, more important thing is OOM occured when I made 240 process without swap device and vanilla kernel. Then, I applied your patch, it worked very well without OOM. I think that's why zone's page_scanned was six times greater than number of lru pages. At result, OOM happened. So, I think your patch also improves performance in embedded system. In case OOM didn't occur, reclaiming performance without swap device was better than one with swap device. Now, I think we need to improve reclaiming procedure in embedded system(UP and NO swap). On Wed, Feb 20, 2008 at 7:09 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Hi > > > > > > * max parallel reclaim tasks: > > > > * max consumption time of > > > > try_to_free_pages(): > > > > > > sorry, I inserted debug code to my patch at that time. > > > > Could you send me that debug code ? > > If you will send it to me, I will test it my environment (ARM-920T, Core2Duo). > > And I will report test result. > > attached it. > but it is very messy ;-) > > usage: > ./benchloop.sh > > sample output > ========================================================= > max reclaim 2 > Running with 120*40 (== 4800) tasks. > Time: 34.177 > 14.17user 284.38system 1:43.85elapsed 287%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (3813major+148922minor)pagefaults 0swaps > max prepare time: 4599 0 > max reclaim time: 2350 5781 > total > 8271 > max reclaimer > 4 > max overkill > 62131 > max saved overkill > 9740 > > > max reclaimer represent to max parallel reclaim tasks. > total represetnto max consumption time of try_to_free_pages(). > > Thanks > > -- Thanks, barrios -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-21 9:38 ` minchan Kim @ 2008-02-21 10:55 ` KOSAKI Motohiro 2008-02-21 12:29 ` minchan Kim 0 siblings, 1 reply; 15+ messages in thread From: KOSAKI Motohiro @ 2008-02-21 10:55 UTC (permalink / raw) To: minchan Kim Cc: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel Hi Kim-san, Thank you very much. btw, what different between <test 1> and <test 2>? > It was a very interesting result. > In embedded system, your patch improve performance a little in case > without noswap(normal case in embedded system). > But, more important thing is OOM occured when I made 240 process > without swap device and vanilla kernel. > Then, I applied your patch, it worked very well without OOM. Wow, it is very interesting result! I am very happy. > I think that's why zone's page_scanned was six times greater than > number of lru pages. > At result, OOM happened. please repost question with change subject. i don't know reason of vanilla kernel behavior, sorry. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-21 10:55 ` KOSAKI Motohiro @ 2008-02-21 12:29 ` minchan Kim 2008-02-21 12:41 ` KOSAKI Motohiro 0 siblings, 1 reply; 15+ messages in thread From: minchan Kim @ 2008-02-21 12:29 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel On Thu, Feb 21, 2008 at 7:55 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Hi Kim-san, > > Thank you very much. > btw, what different between <test 1> and <test 2>? <test 1> have no swap device with 200 tasks by hackbench. But <test 2> have swap device(32M) with 240 tasks by hackbench. If <test2> have no swap device without your patch, <test2> is killed by OOM. <test 1> - NO SWAP Running with 5*40 (== 200) tasks. ... <test 2> - SWAP Running with 6*40 (== 240) tasks. ... > > > It was a very interesting result. > > In embedded system, your patch improve performance a little in case > > without noswap(normal case in embedded system). > > But, more important thing is OOM occured when I made 240 process > > without swap device and vanilla kernel. > > Then, I applied your patch, it worked very well without OOM. > > Wow, it is very interesting result! > I am very happy. > > > > I think that's why zone's page_scanned was six times greater than > > number of lru pages. > > At result, OOM happened. > > please repost question with change subject. > i don't know reason of vanilla kernel behavior, sorry. Normally, embedded linux have only one zone(DMA). If your patch isn't applied, several processes can reclaim memory in parallel. then, DMA zone's pages_scanned is suddenly increased largely. Because embedded linux have no swap device, kernel can't stop to scan lru list until meeting page cache page. so if zone->pages_scanned is greater six time than lru list pages, kernel make the zone with unreclaimable state, As a result, OOM will kill it, too. -- Thanks, barrios -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-21 12:29 ` minchan Kim @ 2008-02-21 12:41 ` KOSAKI Motohiro 0 siblings, 0 replies; 15+ messages in thread From: KOSAKI Motohiro @ 2008-02-21 12:41 UTC (permalink / raw) To: minchan Kim Cc: KAMEZAWA Hiroyuki, Balbir Singh, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel > > please repost question with change subject. > > i don't know reason of vanilla kernel behavior, sorry. > > Normally, embedded linux have only one zone(DMA). > > If your patch isn't applied, several processes can reclaim memory in parallel. > then, DMA zone's pages_scanned is suddenly increased largely. Because > embedded linux have no swap device, kernel can't stop to scan lru > list until meeting page cache page. so if zone->pages_scanned is > greater six time than lru list pages, kernel make the zone with > unreclaimable state, As a result, OOM will kill it, too. sorry, my last mail is easy confusious. if you want discuss vanilla kernel bug, you shold post mail by another thread. if not, your mail is only readed by few people. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-19 5:44 [RFC][PATCH] the proposal of improve page reclaim by throttle KOSAKI Motohiro 2008-02-19 6:34 ` Nick Piggin 2008-02-20 8:56 ` minchan Kim @ 2008-02-21 9:48 ` Balbir Singh 2008-02-21 11:01 ` KOSAKI Motohiro 2 siblings, 1 reply; 15+ messages in thread From: Balbir Singh @ 2008-02-21 9:48 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel KOSAKI Motohiro wrote: > background > ======================================== > current VM implementation doesn't has limit of # of parallel reclaim. > when heavy workload, it bring to 2 bad things > - heavy lock contention > - unnecessary swap out > > abount 2 month ago, KAMEZA Hiroyuki proposed the patch of page > reclaim throttle and explain it improve reclaim time. > http://marc.info/?l=linux-mm&m=119667465917215&w=2 > > but unfortunately it works only memcgroup reclaim. > Today, I implement it again for support global reclaim and mesure it. > Hi, Kosaki, It's good to keep the main reclaim code and the memory controller reclaim in sync, so this is a nice effort. > @@ -1456,7 +1501,7 @@ unsigned long try_to_free_mem_cgroup_pag > int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE); > > zones = NODE_DATA(numa_node_id())->node_zonelists[target_zone].zones; > - if (do_try_to_free_pages(zones, sc.gfp_mask, &sc)) > + if (try_to_free_pages_throttled(zones, 0, sc.gfp_mask, &sc)) > return 1; > return 0; > } > try_to_free_pages_throttled checks for zone_watermark_ok(), that will not work in the case that we are reclaiming from a cgroup which over it's limit. We need a different check, to see if the mem_cgroup is still over it's limit or not. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-21 9:48 ` Balbir Singh @ 2008-02-21 11:01 ` KOSAKI Motohiro 2008-02-21 11:02 ` Balbir Singh 0 siblings, 1 reply; 15+ messages in thread From: KOSAKI Motohiro @ 2008-02-21 11:01 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel Hi balbir-san > It's good to keep the main reclaim code and the memory controller reclaim in > sync, so this is a nice effort. thank you. I will repost next version (fixed nick's opinion) while a few days. > > @@ -1456,7 +1501,7 @@ unsigned long try_to_free_mem_cgroup_pag > > int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE); > > > > zones = NODE_DATA(numa_node_id())->node_zonelists[target_zone].zones; > > - if (do_try_to_free_pages(zones, sc.gfp_mask, &sc)) > > + if (try_to_free_pages_throttled(zones, 0, sc.gfp_mask, &sc)) > > return 1; > > return 0; > > } > > try_to_free_pages_throttled checks for zone_watermark_ok(), that will not work > in the case that we are reclaiming from a cgroup which over it's limit. We need > a different check, to see if the mem_cgroup is still over it's limit or not. That makes sense. unfortunately, I don't know mem-cgroup so much. What do i use function, instead? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] the proposal of improve page reclaim by throttle 2008-02-21 11:01 ` KOSAKI Motohiro @ 2008-02-21 11:02 ` Balbir Singh 0 siblings, 0 replies; 15+ messages in thread From: Balbir Singh @ 2008-02-21 11:02 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel KOSAKI Motohiro wrote: > Hi balbir-san > >> It's good to keep the main reclaim code and the memory controller reclaim in >> sync, so this is a nice effort. > > thank you. > I will repost next version (fixed nick's opinion) while a few days. > >> > @@ -1456,7 +1501,7 @@ unsigned long try_to_free_mem_cgroup_pag >> > int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE); >> > >> > zones = NODE_DATA(numa_node_id())->node_zonelists[target_zone].zones; >> > - if (do_try_to_free_pages(zones, sc.gfp_mask, &sc)) >> > + if (try_to_free_pages_throttled(zones, 0, sc.gfp_mask, &sc)) >> > return 1; >> > return 0; >> > } >> >> try_to_free_pages_throttled checks for zone_watermark_ok(), that will not work >> in the case that we are reclaiming from a cgroup which over it's limit. We need >> a different check, to see if the mem_cgroup is still over it's limit or not. > > That makes sense. > > unfortunately, I don't know mem-cgroup so much. > What do i use function, instead? One option could be that once the memory controller has this feature, we'll need no changes in try_to_free_mem_cgroup_pages. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-02-21 12:41 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-19 5:44 [RFC][PATCH] the proposal of improve page reclaim by throttle KOSAKI Motohiro 2008-02-19 6:34 ` Nick Piggin 2008-02-19 7:09 ` KOSAKI Motohiro 2008-02-19 13:31 ` Rik van Riel 2008-02-20 8:56 ` minchan Kim 2008-02-20 9:24 ` KOSAKI Motohiro 2008-02-20 9:49 ` minchan Kim 2008-02-20 10:09 ` KOSAKI Motohiro 2008-02-21 9:38 ` minchan Kim 2008-02-21 10:55 ` KOSAKI Motohiro 2008-02-21 12:29 ` minchan Kim 2008-02-21 12:41 ` KOSAKI Motohiro 2008-02-21 9:48 ` Balbir Singh 2008-02-21 11:01 ` KOSAKI Motohiro 2008-02-21 11:02 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).