* [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced
2010-12-10 15:46 [PATCH 0/6] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V4 Mel Gorman
@ 2010-12-10 15:46 ` Mel Gorman
2010-12-13 19:34 ` Eric B Munson
2010-12-14 22:33 ` Andrew Morton
2010-12-10 15:46 ` [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node " Mel Gorman
` (4 subsequent siblings)
5 siblings, 2 replies; 24+ messages in thread
From: Mel Gorman @ 2010-12-10 15:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel, Mel Gorman
When the allocator enters its slow path, kswapd is woken up to balance the
node. It continues working until all zones within the node are balanced. For
order-0 allocations, this makes perfect sense but for higher orders it can
have unintended side-effects. If the zone sizes are imbalanced, kswapd may
reclaim heavily within a smaller zone discarding an excessive number of
pages. The user-visible behaviour is that kswapd is awake and reclaiming
even though plenty of pages are free from a suitable zone.
This patch alters the "balance" logic for high-order reclaim allowing kswapd
to stop if any suitable zone becomes balanced to reduce the number of pages
it reclaims from other zones. kswapd still tries to ensure that order-0
watermarks for all zones are met before sleeping.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/mmzone.h | 3 +-
mm/page_alloc.c | 8 +++--
mm/vmscan.c | 68 +++++++++++++++++++++++++++++++++++++++++------
3 files changed, 66 insertions(+), 13 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4890662..dad3612 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -639,6 +639,7 @@ typedef struct pglist_data {
wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;
int kswapd_max_order;
+ enum zone_type classzone_idx;
} pg_data_t;
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
@@ -654,7 +655,7 @@ typedef struct pglist_data {
extern struct mutex zonelists_mutex;
void build_all_zonelists(void *data);
-void wakeup_kswapd(struct zone *zone, int order);
+void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx);
bool zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1845a97..1497fe8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1936,13 +1936,14 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
static inline
void wake_all_kswapd(unsigned int order, struct zonelist *zonelist,
- enum zone_type high_zoneidx)
+ enum zone_type high_zoneidx,
+ enum zone_type classzone_idx)
{
struct zoneref *z;
struct zone *zone;
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
- wakeup_kswapd(zone, order);
+ wakeup_kswapd(zone, order, classzone_idx);
}
static inline int
@@ -2020,7 +2021,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto nopage;
restart:
- wake_all_kswapd(order, zonelist, high_zoneidx);
+ wake_all_kswapd(order, zonelist, high_zoneidx,
+ zone_idx(preferred_zone));
/*
* OK, we're below the kswapd watermark and have kicked background
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 42a4859..625dfba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2239,11 +2239,14 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
* interoperates with the page allocator fallback scheme to ensure that aging
* of pages is balanced across the zones.
*/
-static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
+static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
+ int classzone_idx)
{
int all_zones_ok;
+ int any_zone_ok;
int priority;
int i;
+ int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long total_scanned;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc = {
@@ -2266,7 +2269,6 @@ loop_again:
count_vm_event(PAGEOUTRUN);
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
- int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long lru_pages = 0;
int has_under_min_watermark_zone = 0;
@@ -2275,6 +2277,7 @@ loop_again:
disable_swap_token();
all_zones_ok = 1;
+ any_zone_ok = 0;
/*
* Scan in the highmem->dma direction for the highest
@@ -2393,10 +2396,12 @@ loop_again:
* spectulatively avoid congestion waits
*/
zone_clear_flag(zone, ZONE_CONGESTED);
+ if (i <= classzone_idx)
+ any_zone_ok = 1;
}
}
- if (all_zones_ok)
+ if (all_zones_ok || (order && any_zone_ok))
break; /* kswapd: all done */
/*
* OK, kswapd is getting into trouble. Take a nap, then take
@@ -2419,7 +2424,13 @@ loop_again:
break;
}
out:
- if (!all_zones_ok) {
+
+ /*
+ * order-0: All zones must meet high watermark for a balanced node
+ * high-order: Any zone below pgdats classzone_idx must meet the high
+ * watermark for a balanced node
+ */
+ if (!(all_zones_ok || (order && any_zone_ok))) {
cond_resched();
try_to_freeze();
@@ -2444,6 +2455,36 @@ out:
goto loop_again;
}
+ /*
+ * If kswapd was reclaiming at a higher order, it has the option of
+ * sleeping without all zones being balanced. Before it does, it must
+ * ensure that the watermarks for order-0 on *all* zones are met and
+ * that the congestion flags are cleared. The congestion flag must
+ * be cleared as kswapd is the only mechanism that clears the flag
+ * and it is potentially going to sleep here.
+ */
+ if (order) {
+ for (i = 0; i <= end_zone; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone->all_unreclaimable && priority != DEF_PRIORITY)
+ continue;
+
+ /* Confirm the zone is balanced for order-0 */
+ if (!zone_watermark_ok(zone, 0,
+ high_wmark_pages(zone), 0, 0)) {
+ order = sc.order = 0;
+ goto loop_again;
+ }
+
+ /* If balanced, clear the congested flag */
+ zone_clear_flag(zone, ZONE_CONGESTED);
+ }
+ }
+
return sc.nr_reclaimed;
}
@@ -2507,6 +2548,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order)
static int kswapd(void *p)
{
unsigned long order;
+ int classzone_idx;
pg_data_t *pgdat = (pg_data_t*)p;
struct task_struct *tsk = current;
@@ -2537,21 +2579,27 @@ static int kswapd(void *p)
set_freezable();
order = 0;
+ classzone_idx = MAX_NR_ZONES - 1;
for ( ; ; ) {
unsigned long new_order;
+ int new_classzone_idx;
int ret;
new_order = pgdat->kswapd_max_order;
+ new_classzone_idx = pgdat->classzone_idx;
pgdat->kswapd_max_order = 0;
- if (order < new_order) {
+ pgdat->classzone_idx = MAX_NR_ZONES - 1;
+ if (order < new_order || classzone_idx > new_classzone_idx) {
/*
* Don't sleep if someone wants a larger 'order'
- * allocation
+ * allocation or has tigher zone constraints
*/
order = new_order;
+ classzone_idx = new_classzone_idx;
} else {
kswapd_try_to_sleep(pgdat, order);
order = pgdat->kswapd_max_order;
+ classzone_idx = pgdat->classzone_idx;
}
ret = try_to_freeze();
@@ -2564,7 +2612,7 @@ static int kswapd(void *p)
*/
if (!ret) {
trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
- balance_pgdat(pgdat, order);
+ balance_pgdat(pgdat, order, classzone_idx);
}
}
return 0;
@@ -2573,7 +2621,7 @@ static int kswapd(void *p)
/*
* A zone is low on free memory, so wake its kswapd task to service it.
*/
-void wakeup_kswapd(struct zone *zone, int order)
+void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
{
pg_data_t *pgdat;
@@ -2583,8 +2631,10 @@ void wakeup_kswapd(struct zone *zone, int order)
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
return;
pgdat = zone->zone_pgdat;
- if (pgdat->kswapd_max_order < order)
+ if (pgdat->kswapd_max_order < order) {
pgdat->kswapd_max_order = order;
+ pgdat->classzone_idx = min(pgdat->classzone_idx, classzone_idx);
+ }
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced
2010-12-10 15:46 ` [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced Mel Gorman
@ 2010-12-13 19:34 ` Eric B Munson
2010-12-14 22:33 ` Andrew Morton
1 sibling, 0 replies; 24+ messages in thread
From: Eric B Munson @ 2010-12-13 19:34 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Simon Kirby, KOSAKI Motohiro, Shaohua Li,
Dave Hansen, Johannes Weiner, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]
On Fri, 10 Dec 2010, Mel Gorman wrote:
> When the allocator enters its slow path, kswapd is woken up to balance the
> node. It continues working until all zones within the node are balanced. For
> order-0 allocations, this makes perfect sense but for higher orders it can
> have unintended side-effects. If the zone sizes are imbalanced, kswapd may
> reclaim heavily within a smaller zone discarding an excessive number of
> pages. The user-visible behaviour is that kswapd is awake and reclaiming
> even though plenty of pages are free from a suitable zone.
>
> This patch alters the "balance" logic for high-order reclaim allowing kswapd
> to stop if any suitable zone becomes balanced to reduce the number of pages
> it reclaims from other zones. kswapd still tries to ensure that order-0
> watermarks for all zones are met before sleeping.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Started reviewing before I saw this series.
Reviewed-by: Eric B Munson <emunson@mgebm.net>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced
2010-12-10 15:46 ` [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced Mel Gorman
2010-12-13 19:34 ` Eric B Munson
@ 2010-12-14 22:33 ` Andrew Morton
2010-12-15 10:42 ` Mel Gorman
1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2010-12-14 22:33 UTC (permalink / raw)
To: Mel Gorman
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel
On Fri, 10 Dec 2010 15:46:20 +0000
Mel Gorman <mel@csn.ul.ie> wrote:
> When the allocator enters its slow path, kswapd is woken up to balance the
> node. It continues working until all zones within the node are balanced. For
> order-0 allocations, this makes perfect sense but for higher orders it can
> have unintended side-effects. If the zone sizes are imbalanced, kswapd may
> reclaim heavily within a smaller zone discarding an excessive number of
> pages.
Why was it doing this?
> The user-visible behaviour is that kswapd is awake and reclaiming
> even though plenty of pages are free from a suitable zone.
Suitable for what? I assume you refer to a future allocation which can
be satisfied from more than one of the zones?
But what if that allocation wanted to allocate a high-order page from
a zone which we just abandoned?
> This patch alters the "balance" logic for high-order reclaim allowing kswapd
> to stop if any suitable zone becomes balanced to reduce the number of pages
again, suitable for what?
> it reclaims from other zones. kswapd still tries to ensure that order-0
> watermarks for all zones are met before sleeping.
Handling order-0 pages differently from higher-order pages sounds weird
and wrong.
I don't think I understand this patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced
2010-12-14 22:33 ` Andrew Morton
@ 2010-12-15 10:42 ` Mel Gorman
0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2010-12-15 10:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel
On Tue, Dec 14, 2010 at 02:33:06PM -0800, Andrew Morton wrote:
> On Fri, 10 Dec 2010 15:46:20 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > When the allocator enters its slow path, kswapd is woken up to balance the
> > node. It continues working until all zones within the node are balanced. For
> > order-0 allocations, this makes perfect sense but for higher orders it can
> > have unintended side-effects. If the zone sizes are imbalanced, kswapd may
> > reclaim heavily within a smaller zone discarding an excessive number of
> > pages.
>
> Why was it doing this?
>
Partially because of lumpy reclaim but mostly because it simply stays
awake. If the zone is unbalanced, kswapd will reclaim in there,
shrinking slabs, rotating lists etc. even if ultimately it cannot
balance that zone.
> > The user-visible behaviour is that kswapd is awake and reclaiming
> > even though plenty of pages are free from a suitable zone.
>
> Suitable for what? I assume you refer to a future allocation which can
> be satisfied from more than one of the zones?
>
Yes.
> But what if that allocation wanted to allocate a high-order page from
> a zone which we just abandoned?
>
classzone_idx is taken into account by the series overall and it doesn't
count zones above the classzone_idx.
> > This patch alters the "balance" logic for high-order reclaim allowing kswapd
> > to stop if any suitable zone becomes balanced to reduce the number of pages
>
> again, suitable for what?
>
Suitable for a future allocation of the same type that woke kswapd.
> > it reclaims from other zones. kswapd still tries to ensure that order-0
> > watermarks for all zones are met before sleeping.
>
> Handling order-0 pages differently from higher-order pages sounds weird
> and wrong.
>
> I don't think I understand this patch.
>
The objective is that kswapd will go to sleep again. It has been found
when there is a constant source of high-order allocations that kswapd
stays awake constantly trying to reclaim even though a suitable zone had
free pages.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node is balanced
2010-12-10 15:46 [PATCH 0/6] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V4 Mel Gorman
2010-12-10 15:46 ` [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced Mel Gorman
@ 2010-12-10 15:46 ` Mel Gorman
2010-12-13 2:03 ` KAMEZAWA Hiroyuki
` (2 more replies)
2010-12-10 15:46 ` [PATCH 3/6] mm: kswapd: Use the order that kswapd was reclaiming at for sleeping_prematurely() Mel Gorman
` (3 subsequent siblings)
5 siblings, 3 replies; 24+ messages in thread
From: Mel Gorman @ 2010-12-10 15:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel, Mel Gorman
When reclaiming for high-orders, kswapd is responsible for balancing a
node but it should not reclaim excessively. It avoids excessive reclaim by
considering if any zone in a node is balanced then the node is balanced. In
the cases where there are imbalanced zone sizes (e.g. ZONE_DMA with both
ZONE_DMA32 and ZONE_NORMAL), kswapd can go to sleep prematurely as just
one small zone was balanced.
This alters the sleep logic of kswapd slightly. It counts the number of pages
that make up the balanced zones. If the total number of balanced pages is
more than a quarter of the zone, kswapd will go back to sleep. This should
keep a node balanced without reclaiming an excessive number of pages.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
mm/vmscan.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++---------
1 files changed, 49 insertions(+), 9 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 625dfba..6723101 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2191,10 +2191,40 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
}
#endif
+/*
+ * pgdat_balanced is used when checking if a node is balanced for high-order
+ * allocations. Only zones that meet watermarks and are in a zone allowed
+ * by the callers classzone_idx are added to balanced_pages. The total of
+ * balanced pages must be at least 25% of the zones allowed by classzone_idx
+ * for the node to be considered balanced. Forcing all zones to be balanced
+ * for high orders can cause excessive reclaim when there are imbalanced zones.
+ * The choice of 25% is due to
+ * o a 16M DMA zone that is balanced will not balance a zone on any
+ * reasonable sized machine
+ * o On all other machines, the top zone must be at least a reasonable
+ * precentage of the middle zones. For example, on 32-bit x86, highmem
+ * would need to be at least 256M for it to be balance a whole node.
+ * Similarly, on x86-64 the Normal zone would need to be at least 1G
+ * to balance a node on its own. These seemed like reasonable ratios.
+ */
+static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
+ int classzone_idx)
+{
+ unsigned long present_pages = 0;
+ int i;
+
+ for (i = 0; i <= classzone_idx; i++)
+ present_pages += pgdat->node_zones[i].present_pages;
+
+ return balanced_pages > (present_pages >> 2);
+}
+
/* is kswapd sleeping prematurely? */
static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
{
int i;
+ unsigned long balanced = 0;
+ bool all_zones_ok = true;
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
@@ -2212,10 +2242,20 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
0, 0))
- return 1;
+ all_zones_ok = false;
+ else
+ balanced += zone->present_pages;
}
- return 0;
+ /*
+ * For high-order requests, the balanced zones must contain at least
+ * 25% of the nodes pages for kswapd to sleep. For order-0, all zones
+ * must be balanced
+ */
+ if (order)
+ return pgdat_balanced(pgdat, balanced, 0);
+ else
+ return !all_zones_ok;
}
/*
@@ -2243,7 +2283,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
int classzone_idx)
{
int all_zones_ok;
- int any_zone_ok;
+ unsigned long balanced;
int priority;
int i;
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
@@ -2277,7 +2317,7 @@ loop_again:
disable_swap_token();
all_zones_ok = 1;
- any_zone_ok = 0;
+ balanced = 0;
/*
* Scan in the highmem->dma direction for the highest
@@ -2397,11 +2437,11 @@ loop_again:
*/
zone_clear_flag(zone, ZONE_CONGESTED);
if (i <= classzone_idx)
- any_zone_ok = 1;
+ balanced += zone->present_pages;
}
}
- if (all_zones_ok || (order && any_zone_ok))
+ if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, classzone_idx)))
break; /* kswapd: all done */
/*
* OK, kswapd is getting into trouble. Take a nap, then take
@@ -2427,10 +2467,10 @@ out:
/*
* order-0: All zones must meet high watermark for a balanced node
- * high-order: Any zone below pgdats classzone_idx must meet the high
- * watermark for a balanced node
+ * high-order: Balanced zones must make up at least 25% of the node
+ * for the node to be balanced
*/
- if (!(all_zones_ok || (order && any_zone_ok))) {
+ if (!(all_zones_ok || (order && pgdat_balanced(pgdat, balanced, classzone_idx)))) {
cond_resched();
try_to_freeze();
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node is balanced
2010-12-10 15:46 ` [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node " Mel Gorman
@ 2010-12-13 2:03 ` KAMEZAWA Hiroyuki
2010-12-13 19:37 ` Eric B Munson
2010-12-14 22:43 ` Andrew Morton
2 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-13 2:03 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Simon Kirby, KOSAKI Motohiro, Shaohua Li,
Dave Hansen, Johannes Weiner, linux-mm, linux-kernel
On Fri, 10 Dec 2010 15:46:21 +0000
Mel Gorman <mel@csn.ul.ie> wrote:
> When reclaiming for high-orders, kswapd is responsible for balancing a
> node but it should not reclaim excessively. It avoids excessive reclaim by
> considering if any zone in a node is balanced then the node is balanced. In
> the cases where there are imbalanced zone sizes (e.g. ZONE_DMA with both
> ZONE_DMA32 and ZONE_NORMAL), kswapd can go to sleep prematurely as just
> one small zone was balanced.
>
> This alters the sleep logic of kswapd slightly. It counts the number of pages
> that make up the balanced zones. If the total number of balanced pages is
> more than a quarter of the zone, kswapd will go back to sleep. This should
> keep a node balanced without reclaiming an excessive number of pages.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node is balanced
2010-12-10 15:46 ` [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node " Mel Gorman
2010-12-13 2:03 ` KAMEZAWA Hiroyuki
@ 2010-12-13 19:37 ` Eric B Munson
2010-12-14 22:43 ` Andrew Morton
2 siblings, 0 replies; 24+ messages in thread
From: Eric B Munson @ 2010-12-13 19:37 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Simon Kirby, KOSAKI Motohiro, Shaohua Li,
Dave Hansen, Johannes Weiner, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 920 bytes --]
On Fri, 10 Dec 2010, Mel Gorman wrote:
> When reclaiming for high-orders, kswapd is responsible for balancing a
> node but it should not reclaim excessively. It avoids excessive reclaim by
> considering if any zone in a node is balanced then the node is balanced. In
> the cases where there are imbalanced zone sizes (e.g. ZONE_DMA with both
> ZONE_DMA32 and ZONE_NORMAL), kswapd can go to sleep prematurely as just
> one small zone was balanced.
>
> This alters the sleep logic of kswapd slightly. It counts the number of pages
> that make up the balanced zones. If the total number of balanced pages is
> more than a quarter of the zone, kswapd will go back to sleep. This should
> keep a node balanced without reclaiming an excessive number of pages.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Eric B Munson <emunson@mgebm.net>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node is balanced
2010-12-10 15:46 ` [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node " Mel Gorman
2010-12-13 2:03 ` KAMEZAWA Hiroyuki
2010-12-13 19:37 ` Eric B Munson
@ 2010-12-14 22:43 ` Andrew Morton
2010-12-15 10:54 ` Mel Gorman
2 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2010-12-14 22:43 UTC (permalink / raw)
To: Mel Gorman
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel
On Fri, 10 Dec 2010 15:46:21 +0000
Mel Gorman <mel@csn.ul.ie> wrote:
> When reclaiming for high-orders, kswapd is responsible for balancing a
> node but it should not reclaim excessively. It avoids excessive reclaim by
> considering if any zone in a node is balanced then the node is balanced.
Here you're referring to your [patch 1/6] yes? Not to current upstream.
> In
> the cases where there are imbalanced zone sizes (e.g. ZONE_DMA with both
> ZONE_DMA32 and ZONE_NORMAL), kswapd can go to sleep prematurely as just
> one small zone was balanced.
Since [1/6]?
> This alters the sleep logic of kswapd slightly. It counts the number of pages
> that make up the balanced zones. If the total number of balanced pages is
Define "balanced page"? Seems to be the sum of the total sizes of all
zones which have reached their desired free-pages threshold?
But this includes all page orders, whereas here we're targetting a
particular order. Although things should work out OK due to the
scaling/sizing proportionality.
> more than a quarter of the zone, kswapd will go back to sleep. This should
> keep a node balanced without reclaiming an excessive number of pages.
ick.
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> ---
> mm/vmscan.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++---------
> 1 files changed, 49 insertions(+), 9 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 625dfba..6723101 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2191,10 +2191,40 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> }
> #endif
>
> +/*
> + * pgdat_balanced is used when checking if a node is balanced for high-order
> + * allocations.
Is this the correct use of the term "balanced"? I think "balanced" is
something that happens *between* zones: They've all achieved the same
(perhaps weighted) ratio of free pages.
> Only zones that meet watermarks and are in a zone allowed
> + * by the callers classzone_idx are added to balanced_pages. The total of
caller's
> + * balanced pages must be at least 25% of the zones allowed by classzone_idx
> + * for the node to be considered balanced. Forcing all zones to be balanced
> + * for high orders can cause excessive reclaim when there are imbalanced zones.
Excessive reclaim of what?
If one particular zone is having trouble achieving its desired level of
free pages of a partocular order, are you saying that kswapd sits there
madly scanning other zones, which have already reached their desired
level? If so, that would be bad.
I think you're saying that we just keep on scanning away at this one
zone. But what was wrong with doing that?
> + * The choice of 25% is due to
> + * o a 16M DMA zone that is balanced will not balance a zone on any
> + * reasonable sized machine
How does a zone balance another zone?
> + * o On all other machines, the top zone must be at least a reasonable
> + * precentage of the middle zones. For example, on 32-bit x86, highmem
> + * would need to be at least 256M for it to be balance a whole node.
> + * Similarly, on x86-64 the Normal zone would need to be at least 1G
> + * to balance a node on its own. These seemed like reasonable ratios.
> + */
> +static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
> + int classzone_idx)
> +{
> + unsigned long present_pages = 0;
> + int i;
> +
> + for (i = 0; i <= classzone_idx; i++)
> + present_pages += pgdat->node_zones[i].present_pages;
> +
> + return balanced_pages > (present_pages >> 2);
> +}
> +
>
> ...
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node is balanced
2010-12-14 22:43 ` Andrew Morton
@ 2010-12-15 10:54 ` Mel Gorman
0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2010-12-15 10:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel
On Tue, Dec 14, 2010 at 02:43:41PM -0800, Andrew Morton wrote:
> On Fri, 10 Dec 2010 15:46:21 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > When reclaiming for high-orders, kswapd is responsible for balancing a
> > node but it should not reclaim excessively. It avoids excessive reclaim by
> > considering if any zone in a node is balanced then the node is balanced.
>
> Here you're referring to your [patch 1/6] yes? Not to current upstream.
>
Yes.
> > In
> > the cases where there are imbalanced zone sizes (e.g. ZONE_DMA with both
> > ZONE_DMA32 and ZONE_NORMAL), kswapd can go to sleep prematurely as just
> > one small zone was balanced.
>
> Since [1/6]?
>
Yes.
> > This alters the sleep logic of kswapd slightly. It counts the number of pages
> > that make up the balanced zones. If the total number of balanced pages is
>
> Define "balanced page"? Seems to be the sum of the total sizes of all
> zones which have reached their desired free-pages threshold?
>
Correct.
> But this includes all page orders, whereas here we're targetting a
> particular order. Although things should work out OK due to the
> scaling/sizing proportionality.
>
It's the size of the whole zone that is being accounted for and as it's
a watermark check, the order is being taken into account.
> > more than a quarter of the zone, kswapd will go back to sleep. This should
> > keep a node balanced without reclaiming an excessive number of pages.
>
> ick.
>
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> > ---
> > mm/vmscan.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++---------
> > 1 files changed, 49 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 625dfba..6723101 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2191,10 +2191,40 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> > }
> > #endif
> >
> > +/*
> > + * pgdat_balanced is used when checking if a node is balanced for high-order
> > + * allocations.
>
> Is this the correct use of the term "balanced"? I think "balanced" is
> something that happens *between* zones: They've all achieved the same
> (perhaps weighted) ratio of free pages.
>
What would be a better term? pgdat_sufficiently_but_not_fully_balanced()? If
it returns true, it can mean the node is either fully "balanced" as you
define it or that enough zones have enough free suitably-ordered pages for
allocations to succeed.
> > Only zones that meet watermarks and are in a zone allowed
> > + * by the callers classzone_idx are added to balanced_pages. The total of
>
> caller's
>
Right.
> > + * balanced pages must be at least 25% of the zones allowed by classzone_idx
> > + * for the node to be considered balanced. Forcing all zones to be balanced
> > + * for high orders can cause excessive reclaim when there are imbalanced zones.
>
> Excessive reclaim of what?
>
slab, list rotations and pages within the imbalanced zones that may never
become balanced. Minimally, kswapd just stays awake consuming CPU.
> If one particular zone is having trouble achieving its desired level of
> free pages of a partocular order, are you saying that kswapd sits there
> madly scanning other zones, which have already reached their desired
> level? If so, that would be bad.
>
As far as I can gather, yes, this is what is happening. I don't have a local
reproduction case so I'm basing this on a bug report. He has two problems -
kswapd stays awake constantly and way too many pages are free.
> I think you're saying that we just keep on scanning away at this one
> zone. But what was wrong with doing that?
>
It wastes CPU.
> > + * The choice of 25% is due to
> > + * o a 16M DMA zone that is balanced will not balance a zone on any
> > + * reasonable sized machine
>
> How does a zone balance another zone?
>
That should have been "will not balance a node".
> > + * o On all other machines, the top zone must be at least a reasonable
> > + * precentage of the middle zones. For example, on 32-bit x86, highmem
> > + * would need to be at least 256M for it to be balance a whole node.
> > + * Similarly, on x86-64 the Normal zone would need to be at least 1G
> > + * to balance a node on its own. These seemed like reasonable ratios.
> > + */
> > +static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
> > + int classzone_idx)
> > +{
> > + unsigned long present_pages = 0;
> > + int i;
> > +
> > + for (i = 0; i <= classzone_idx; i++)
> > + present_pages += pgdat->node_zones[i].present_pages;
> > +
> > + return balanced_pages > (present_pages >> 2);
> > +}
> > +
> >
> > ...
> >
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 3/6] mm: kswapd: Use the order that kswapd was reclaiming at for sleeping_prematurely()
2010-12-10 15:46 [PATCH 0/6] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V4 Mel Gorman
2010-12-10 15:46 ` [PATCH 1/6] mm: kswapd: Stop high-order balancing when any suitable zone is balanced Mel Gorman
2010-12-10 15:46 ` [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node " Mel Gorman
@ 2010-12-10 15:46 ` Mel Gorman
2010-12-13 19:38 ` Eric B Munson
2010-12-10 15:46 ` [PATCH 4/6] mm: kswapd: Reset kswapd_max_order and classzone_idx after reading Mel Gorman
` (2 subsequent siblings)
5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2010-12-10 15:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel, Mel Gorman
Before kswapd goes to sleep, it uses sleeping_prematurely() to check if
there was a race pushing a zone below its watermark. If the race happened,
it stays awake. However, balance_pgdat() can decide to reclaim at order-0
if it decides that high-order reclaim is not working as expected. This
information is not passed back to sleeping_prematurely(). The impact is
that kswapd remains awake reclaiming pages long after it should have gone
to sleep. This patch passes the adjusted order to sleeping_prematurely and
uses the same logic as balance_pgdat to decide if it's ok to go to sleep.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/vmscan.c | 16 +++++++++++-----
1 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6723101..4d968b0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2220,7 +2220,7 @@ static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
}
/* is kswapd sleeping prematurely? */
-static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
+static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
{
int i;
unsigned long balanced = 0;
@@ -2230,7 +2230,7 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
if (remaining)
return 1;
- /* If after HZ/10, a zone is below the high mark, it's premature */
+ /* Check the watermark levels */
for (i = 0; i < pgdat->nr_zones; i++) {
struct zone *zone = pgdat->node_zones + i;
@@ -2262,7 +2262,7 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
* For kswapd, balance_pgdat() will work across all this node's zones until
* they are all at high_wmark_pages(zone).
*
- * Returns the number of pages which were actually freed.
+ * Returns the final order kswapd was reclaiming at
*
* There is special handling here for zones which are full of pinned pages.
* This can happen if the pages are all mlocked, or if they are all used by
@@ -2525,7 +2525,13 @@ out:
}
}
- return sc.nr_reclaimed;
+ /*
+ * Return the order we were reclaiming at so sleeping_prematurely()
+ * makes a decision on the order we were last reclaiming at. However,
+ * if another caller entered the allocator slow path while kswapd
+ * was awake, order will remain at the higher level
+ */
+ return order;
}
static void kswapd_try_to_sleep(pg_data_t *pgdat, int order)
@@ -2652,7 +2658,7 @@ static int kswapd(void *p)
*/
if (!ret) {
trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
- balance_pgdat(pgdat, order, classzone_idx);
+ order = balance_pgdat(pgdat, order, classzone_idx);
}
}
return 0;
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 3/6] mm: kswapd: Use the order that kswapd was reclaiming at for sleeping_prematurely()
2010-12-10 15:46 ` [PATCH 3/6] mm: kswapd: Use the order that kswapd was reclaiming at for sleeping_prematurely() Mel Gorman
@ 2010-12-13 19:38 ` Eric B Munson
0 siblings, 0 replies; 24+ messages in thread
From: Eric B Munson @ 2010-12-13 19:38 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Simon Kirby, KOSAKI Motohiro, Shaohua Li,
Dave Hansen, Johannes Weiner, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 872 bytes --]
On Fri, 10 Dec 2010, Mel Gorman wrote:
> Before kswapd goes to sleep, it uses sleeping_prematurely() to check if
> there was a race pushing a zone below its watermark. If the race happened,
> it stays awake. However, balance_pgdat() can decide to reclaim at order-0
> if it decides that high-order reclaim is not working as expected. This
> information is not passed back to sleeping_prematurely(). The impact is
> that kswapd remains awake reclaiming pages long after it should have gone
> to sleep. This patch passes the adjusted order to sleeping_prematurely and
> uses the same logic as balance_pgdat to decide if it's ok to go to sleep.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Eric B Munson <emunson@mgebm.net>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 4/6] mm: kswapd: Reset kswapd_max_order and classzone_idx after reading
2010-12-10 15:46 [PATCH 0/6] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V4 Mel Gorman
` (2 preceding siblings ...)
2010-12-10 15:46 ` [PATCH 3/6] mm: kswapd: Use the order that kswapd was reclaiming at for sleeping_prematurely() Mel Gorman
@ 2010-12-10 15:46 ` Mel Gorman
2010-12-13 19:39 ` Eric B Munson
2010-12-10 15:46 ` [PATCH 5/6] mm: kswapd: Treat zone->all_unreclaimable in sleeping_prematurely similar to balance_pgdat() Mel Gorman
2010-12-10 15:46 ` [PATCH 6/6] mm: kswapd: Use the classzone idx that kswapd was using for sleeping_prematurely() Mel Gorman
5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2010-12-10 15:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel, Mel Gorman
When kswapd wakes up, it reads its order and classzone from pgdat and
calls balance_pgdat. While its awake, it potentially reclaimes at a high
order and a low classzone index. This might have been a once-off that
was not required by subsequent callers. However, because the pgdat
values were not reset, they remain artifically high while
balance_pgdat() is running and potentially kswapd enters a second
unnecessary reclaim cycle. Reset the pgdat order and classzone index
after reading.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/vmscan.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4d968b0..e1be4e8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2646,6 +2646,8 @@ static int kswapd(void *p)
kswapd_try_to_sleep(pgdat, order);
order = pgdat->kswapd_max_order;
classzone_idx = pgdat->classzone_idx;
+ pgdat->kswapd_max_order = 0;
+ pgdat->classzone_idx = MAX_NR_ZONES - 1;
}
ret = try_to_freeze();
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 4/6] mm: kswapd: Reset kswapd_max_order and classzone_idx after reading
2010-12-10 15:46 ` [PATCH 4/6] mm: kswapd: Reset kswapd_max_order and classzone_idx after reading Mel Gorman
@ 2010-12-13 19:39 ` Eric B Munson
0 siblings, 0 replies; 24+ messages in thread
From: Eric B Munson @ 2010-12-13 19:39 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Simon Kirby, KOSAKI Motohiro, Shaohua Li,
Dave Hansen, Johannes Weiner, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 772 bytes --]
On Fri, 10 Dec 2010, Mel Gorman wrote:
> When kswapd wakes up, it reads its order and classzone from pgdat and
> calls balance_pgdat. While its awake, it potentially reclaimes at a high
> order and a low classzone index. This might have been a once-off that
> was not required by subsequent callers. However, because the pgdat
> values were not reset, they remain artifically high while
> balance_pgdat() is running and potentially kswapd enters a second
> unnecessary reclaim cycle. Reset the pgdat order and classzone index
> after reading.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Eric B Munson <emunson@mgebm.net>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 5/6] mm: kswapd: Treat zone->all_unreclaimable in sleeping_prematurely similar to balance_pgdat()
2010-12-10 15:46 [PATCH 0/6] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V4 Mel Gorman
` (3 preceding siblings ...)
2010-12-10 15:46 ` [PATCH 4/6] mm: kswapd: Reset kswapd_max_order and classzone_idx after reading Mel Gorman
@ 2010-12-10 15:46 ` Mel Gorman
2010-12-13 19:40 ` Eric B Munson
2010-12-10 15:46 ` [PATCH 6/6] mm: kswapd: Use the classzone idx that kswapd was using for sleeping_prematurely() Mel Gorman
5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2010-12-10 15:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel, Mel Gorman
After DEF_PRIORITY, balance_pgdat() considers all_unreclaimable zones to
be balanced but sleeping_prematurely does not. This can force kswapd to
stay awake longer than it should. This patch fixes it.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e1be4e8..5995121 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2237,8 +2237,16 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
if (!populated_zone(zone))
continue;
- if (zone->all_unreclaimable)
+ /*
+ * balance_pgdat() skips over all_unreclaimable after
+ * DEF_PRIORITY. Effectively, it considers them balanced so
+ * they must be considered balanced here as well if kswapd
+ * is to sleep
+ */
+ if (zone->all_unreclaimable) {
+ balanced += zone->present_pages;
continue;
+ }
if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
0, 0))
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 5/6] mm: kswapd: Treat zone->all_unreclaimable in sleeping_prematurely similar to balance_pgdat()
2010-12-10 15:46 ` [PATCH 5/6] mm: kswapd: Treat zone->all_unreclaimable in sleeping_prematurely similar to balance_pgdat() Mel Gorman
@ 2010-12-13 19:40 ` Eric B Munson
0 siblings, 0 replies; 24+ messages in thread
From: Eric B Munson @ 2010-12-13 19:40 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Simon Kirby, KOSAKI Motohiro, Shaohua Li,
Dave Hansen, Johannes Weiner, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 350 bytes --]
On Fri, 10 Dec 2010, Mel Gorman wrote:
> After DEF_PRIORITY, balance_pgdat() considers all_unreclaimable zones to
> be balanced but sleeping_prematurely does not. This can force kswapd to
> stay awake longer than it should. This patch fixes it.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Eric B Munson <emunson@mgebm.net>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 6/6] mm: kswapd: Use the classzone idx that kswapd was using for sleeping_prematurely()
2010-12-10 15:46 [PATCH 0/6] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V4 Mel Gorman
` (4 preceding siblings ...)
2010-12-10 15:46 ` [PATCH 5/6] mm: kswapd: Treat zone->all_unreclaimable in sleeping_prematurely similar to balance_pgdat() Mel Gorman
@ 2010-12-10 15:46 ` Mel Gorman
2010-12-13 19:43 ` Eric B Munson
5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2010-12-10 15:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Simon Kirby, KOSAKI Motohiro, Shaohua Li, Dave Hansen,
Johannes Weiner, linux-mm, linux-kernel, Mel Gorman
When kswapd is woken up for a high-order allocation, it takes account of
the highest usable zone by the caller (the classzone idx). During
allocation, this index is used to select the lowmem_reserve[] that
should be applied to the watermark calculation in zone_watermark_ok().
When balancing a node, kswapd considers the highest unbalanced zone to be the
classzone index. This will always be at least be the callers classzone_idx
and can be higher. However, sleeping_prematurely() always considers the
lowest zone (e.g. ZONE_DMA) to be the classzone index. This means that
sleeping_prematurely() can consider a zone to be balanced that is unusable
by the allocation request that originally woke kswapd. This patch changes
sleeping_prematurely() to use a classzone_idx matching the value it used
in balance_pgdat().
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
mm/vmscan.c | 29 ++++++++++++++++-------------
1 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5995121..cf03a11 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2220,7 +2220,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
}
/* is kswapd sleeping prematurely? */
-static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
+static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
+ int classzone_idx)
{
int i;
unsigned long balanced = 0;
@@ -2228,7 +2229,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
- return 1;
+ return true;
/* Check the watermark levels */
for (i = 0; i < pgdat->nr_zones; i++) {
@@ -2249,7 +2250,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
}
if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
- 0, 0))
+ classzone_idx, 0))
all_zones_ok = false;
else
balanced += zone->present_pages;
@@ -2261,7 +2262,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
* must be balanced
*/
if (order)
- return pgdat_balanced(pgdat, balanced, 0);
+ return pgdat_balanced(pgdat, balanced, classzone_idx);
else
return !all_zones_ok;
}
@@ -2288,7 +2289,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
* of pages is balanced across the zones.
*/
static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
- int classzone_idx)
+ int *classzone_idx)
{
int all_zones_ok;
unsigned long balanced;
@@ -2351,6 +2352,7 @@ loop_again:
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), 0, 0)) {
end_zone = i;
+ *classzone_idx = i;
break;
}
}
@@ -2444,12 +2446,12 @@ loop_again:
* spectulatively avoid congestion waits
*/
zone_clear_flag(zone, ZONE_CONGESTED);
- if (i <= classzone_idx)
+ if (i <= *classzone_idx)
balanced += zone->present_pages;
}
}
- if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, classzone_idx)))
+ if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
break; /* kswapd: all done */
/*
* OK, kswapd is getting into trouble. Take a nap, then take
@@ -2478,7 +2480,7 @@ out:
* high-order: Balanced zones must make up at least 25% of the node
* for the node to be balanced
*/
- if (!(all_zones_ok || (order && pgdat_balanced(pgdat, balanced, classzone_idx)))) {
+ if (!(all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))) {
cond_resched();
try_to_freeze();
@@ -2539,10 +2541,11 @@ out:
* if another caller entered the allocator slow path while kswapd
* was awake, order will remain at the higher level
*/
+ *classzone_idx = end_zone;
return order;
}
-static void kswapd_try_to_sleep(pg_data_t *pgdat, int order)
+static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
{
long remaining = 0;
DEFINE_WAIT(wait);
@@ -2553,7 +2556,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order)
prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
/* Try to sleep for a short interval */
- if (!sleeping_prematurely(pgdat, order, remaining)) {
+ if (!sleeping_prematurely(pgdat, order, remaining, classzone_idx)) {
remaining = schedule_timeout(HZ/10);
finish_wait(&pgdat->kswapd_wait, &wait);
prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
@@ -2563,7 +2566,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order)
* After a short sleep, check if it was a premature sleep. If not, then
* go fully to sleep until explicitly woken up.
*/
- if (!sleeping_prematurely(pgdat, order, remaining)) {
+ if (!sleeping_prematurely(pgdat, order, remaining, classzone_idx)) {
trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
/*
@@ -2651,7 +2654,7 @@ static int kswapd(void *p)
order = new_order;
classzone_idx = new_classzone_idx;
} else {
- kswapd_try_to_sleep(pgdat, order);
+ kswapd_try_to_sleep(pgdat, order, classzone_idx);
order = pgdat->kswapd_max_order;
classzone_idx = pgdat->classzone_idx;
pgdat->kswapd_max_order = 0;
@@ -2668,7 +2671,7 @@ static int kswapd(void *p)
*/
if (!ret) {
trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
- order = balance_pgdat(pgdat, order, classzone_idx);
+ order = balance_pgdat(pgdat, order, &classzone_idx);
}
}
return 0;
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 6/6] mm: kswapd: Use the classzone idx that kswapd was using for sleeping_prematurely()
2010-12-10 15:46 ` [PATCH 6/6] mm: kswapd: Use the classzone idx that kswapd was using for sleeping_prematurely() Mel Gorman
@ 2010-12-13 19:43 ` Eric B Munson
0 siblings, 0 replies; 24+ messages in thread
From: Eric B Munson @ 2010-12-13 19:43 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Simon Kirby, KOSAKI Motohiro, Shaohua Li,
Dave Hansen, Johannes Weiner, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1048 bytes --]
On Fri, 10 Dec 2010, Mel Gorman wrote:
> When kswapd is woken up for a high-order allocation, it takes account of
> the highest usable zone by the caller (the classzone idx). During
> allocation, this index is used to select the lowmem_reserve[] that
> should be applied to the watermark calculation in zone_watermark_ok().
>
> When balancing a node, kswapd considers the highest unbalanced zone to be the
> classzone index. This will always be at least be the callers classzone_idx
> and can be higher. However, sleeping_prematurely() always considers the
> lowest zone (e.g. ZONE_DMA) to be the classzone index. This means that
> sleeping_prematurely() can consider a zone to be balanced that is unusable
> by the allocation request that originally woke kswapd. This patch changes
> sleeping_prematurely() to use a classzone_idx matching the value it used
> in balance_pgdat().
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Eric B Munson <emunson@mgebm.net>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread