[PATCH] mm: wait for congestion to clear on all zones

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: wait for congestion to clear on all zones
@ 2013-01-09 21:41 Zlatko Calusic
  2013-01-09 21:48 ` Andrew Morton
  2013-01-11  1:25 ` Simon Jeons
  0 siblings, 2 replies; 8+ messages in thread
From: Zlatko Calusic @ 2013-01-09 21:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

From: Zlatko Calusic <zlatko.calusic@iskon.hr>

Currently we take a short nap (HZ/10) and wait for congestion to clear
before taking another pass with lower priority in balance_pgdat(). But
we do that only for the highest zone that we encounter is unbalanced
and congested.

This patch changes that to wait on all congested zones in a single
pass in the hope that it will save us some scanning that way. Also we
take a nap as soon as congested zone is encountered and sc.priority <
DEF_PRIORITY - 2 (aka kswapd in trouble).

Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
---
The patch is against the mm tree. Make sure that
mm-avoid-calling-pgdat_balanced-needlessly.patch is applied first (not
yet in the mmotm tree). Tested on half a dozen systems with different
workloads for the last few days, working really well!

 mm/vmscan.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 002ade6..1c5d38a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2565,7 +2565,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 							int *classzone_idx)
 {
 	bool pgdat_is_balanced = false;
-	struct zone *unbalanced_zone;
 	int i;
 	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
 	unsigned long total_scanned;
@@ -2596,9 +2595,6 @@ loop_again:
 
 	do {
 		unsigned long lru_pages = 0;
-		int has_under_min_watermark_zone = 0;
-
-		unbalanced_zone = NULL;
 
 		/*
 		 * Scan in the highmem->dma direction for the highest
@@ -2739,15 +2735,20 @@ loop_again:
 			}
 
 			if (!zone_balanced(zone, testorder, 0, end_zone)) {
-				unbalanced_zone = zone;
-				/*
-				 * We are still under min water mark.  This
-				 * means that we have a GFP_ATOMIC allocation
-				 * failure risk. Hurry up!
-				 */
+			    if (total_scanned && sc.priority < DEF_PRIORITY - 2) {
+				/* OK, kswapd is getting into trouble. */
 				if (!zone_watermark_ok_safe(zone, order,
 					    min_wmark_pages(zone), end_zone, 0))
-					has_under_min_watermark_zone = 1;
+				    /*
+				     * We are still under min water mark.
+				     * This means that we have a GFP_ATOMIC
+				     * allocation failure risk. Hurry up!
+				     */
+				    count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
+				else
+				    /* Take a nap if a zone is congested. */
+				    wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
+			    }
 			} else {
 				/*
 				 * If a zone reaches its high watermark,
@@ -2758,7 +2759,6 @@ loop_again:
 				 */
 				zone_clear_flag(zone, ZONE_CONGESTED);
 			}
-
 		}
 
 		/*
@@ -2776,17 +2776,6 @@ loop_again:
 		}
 
 		/*
-		 * OK, kswapd is getting into trouble.  Take a nap, then take
-		 * another pass across the zones.
-		 */
-		if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
-			if (has_under_min_watermark_zone)
-				count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
-			else if (unbalanced_zone)
-				wait_iff_congested(unbalanced_zone, BLK_RW_ASYNC, HZ/10);
-		}
-
-		/*
 		 * We do this so kswapd doesn't build up large priorities for
 		 * example when it is freeing in parallel with allocators. It
 		 * matches the direct reclaim path behaviour in terms of impact
-- 
1.8.1

-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: wait for congestion to clear on all zones
  2013-01-09 21:41 [PATCH] mm: wait for congestion to clear on all zones Zlatko Calusic
@ 2013-01-09 21:48 ` Andrew Morton
  2013-01-09 22:15   ` Zlatko Calusic
  2013-01-09 22:52   ` Zlatko Calusic
  2013-01-11  1:25 ` Simon Jeons
  1 sibling, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2013-01-09 21:48 UTC (permalink / raw)
  To: Zlatko Calusic
  Cc: Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On Wed, 09 Jan 2013 22:41:48 +0100
Zlatko Calusic <zlatko.calusic@iskon.hr> wrote:

> Currently we take a short nap (HZ/10) and wait for congestion to clear
> before taking another pass with lower priority in balance_pgdat(). But
> we do that only for the highest zone that we encounter is unbalanced
> and congested.
> 
> This patch changes that to wait on all congested zones in a single
> pass in the hope that it will save us some scanning that way. Also we
> take a nap as soon as congested zone is encountered and sc.priority <
> DEF_PRIORITY - 2 (aka kswapd in trouble).
> 
> ...
>
> The patch is against the mm tree. Make sure that
> mm-avoid-calling-pgdat_balanced-needlessly.patch is applied first (not
> yet in the mmotm tree). Tested on half a dozen systems with different
> workloads for the last few days, working really well!

But what are the user-observable effcets of this change?  Less kernel
CPU consumption, presumably?  Did you quantify it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: wait for congestion to clear on all zones
  2013-01-09 21:48 ` Andrew Morton
@ 2013-01-09 22:15   ` Zlatko Calusic
  2013-01-09 22:52   ` Zlatko Calusic
  1 sibling, 0 replies; 8+ messages in thread
From: Zlatko Calusic @ 2013-01-09 22:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On 09.01.2013 22:48, Andrew Morton wrote:
> On Wed, 09 Jan 2013 22:41:48 +0100
> Zlatko Calusic <zlatko.calusic@iskon.hr> wrote:
>
>> Currently we take a short nap (HZ/10) and wait for congestion to clear
>> before taking another pass with lower priority in balance_pgdat(). But
>> we do that only for the highest zone that we encounter is unbalanced
>> and congested.
>>
>> This patch changes that to wait on all congested zones in a single
>> pass in the hope that it will save us some scanning that way. Also we
>> take a nap as soon as congested zone is encountered and sc.priority <
>> DEF_PRIORITY - 2 (aka kswapd in trouble).
>>
>> ...
>>
>> The patch is against the mm tree. Make sure that
>> mm-avoid-calling-pgdat_balanced-needlessly.patch is applied first (not
>> yet in the mmotm tree). Tested on half a dozen systems with different
>> workloads for the last few days, working really well!
>
> But what are the user-observable effcets of this change?  Less kernel
> CPU consumption, presumably?  Did you quantify it?
>

I have an observation that without it, under some circumstances that are 
VERY HARD to repeat (many days need to pass and some stars to align to 
see the effect), the page cache gets hit hard, 2/3 of it evicted in a 
split second. And it's not even under high load! So, I'm still 
monitoring it, but so far the memory utilization really seems better 
with the patch applied (no more mysterious page cache shootdowns).

Other than that, it just seems more correct to wait on all congested 
zones, not just the highest one. When I sent my first patch that 
replaced congestion_wait() I didn't have much time to do elaborate 
analysis (3.7.0 was released in a matter of hours). So, I just plugged 
the hole and continued working on the proper solution.

I do think that this is my last patch in this particular area 
(balance_pgdat() & friends). But, I'll continue investigating for the 
root cause of this interesting debalance that happens only on this 
particular system. Because I think balance_pgdat() behaviour was just 
revealing it, but the real problem is somewhere else.
-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: wait for congestion to clear on all zones
  2013-01-09 21:48 ` Andrew Morton
  2013-01-09 22:15   ` Zlatko Calusic
@ 2013-01-09 22:52   ` Zlatko Calusic
  1 sibling, 0 replies; 8+ messages in thread
From: Zlatko Calusic @ 2013-01-09 22:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On 09.01.2013 22:48, Andrew Morton wrote:
> On Wed, 09 Jan 2013 22:41:48 +0100
> Zlatko Calusic <zlatko.calusic@iskon.hr> wrote:
>
>> Currently we take a short nap (HZ/10) and wait for congestion to clear
>> before taking another pass with lower priority in balance_pgdat(). But
>> we do that only for the highest zone that we encounter is unbalanced
>> and congested.
>>
>> This patch changes that to wait on all congested zones in a single
>> pass in the hope that it will save us some scanning that way. Also we
>> take a nap as soon as congested zone is encountered and sc.priority <
>> DEF_PRIORITY - 2 (aka kswapd in trouble).
>>
>> ...
>>
>> The patch is against the mm tree. Make sure that
>> mm-avoid-calling-pgdat_balanced-needlessly.patch is applied first (not
>> yet in the mmotm tree). Tested on half a dozen systems with different
>> workloads for the last few days, working really well!
>
> But what are the user-observable effcets of this change?  Less kernel
> CPU consumption, presumably?  Did you quantify it?
>

And I forgot to answer all the questions... :(

Actually, I did record kswapd CPU usage after 5 days of uptime and I 
intend to compare it with the new data (after few more days pass). I 
expect maybe slightly better results.

But, I think it's obvious from my first reply that my primary goal with 
this patch is correctness, not optimization. So, I won't be dissapointed 
a little bit if kswapd CPU usage stays the same, so long as the memory 
utilization remains this smooth. ;)

-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: wait for congestion to clear on all zones
  2013-01-09 21:41 [PATCH] mm: wait for congestion to clear on all zones Zlatko Calusic
  2013-01-09 21:48 ` Andrew Morton
@ 2013-01-11  1:25 ` Simon Jeons
  2013-01-11 11:25   ` Zlatko Calusic
  1 sibling, 1 reply; 8+ messages in thread
From: Simon Jeons @ 2013-01-11  1:25 UTC (permalink / raw)
  To: Zlatko Calusic
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On Wed, 2013-01-09 at 22:41 +0100, Zlatko Calusic wrote:
> From: Zlatko Calusic <zlatko.calusic@iskon.hr>
> 
> Currently we take a short nap (HZ/10) and wait for congestion to clear
> before taking another pass with lower priority in balance_pgdat(). But
> we do that only for the highest zone that we encounter is unbalanced
> and congested.
> 
> This patch changes that to wait on all congested zones in a single
> pass in the hope that it will save us some scanning that way. Also we
> take a nap as soon as congested zone is encountered and sc.priority <
> DEF_PRIORITY - 2 (aka kswapd in trouble).

But you still didn't explain what's the problem you meat and what
scenario can get benefit from your change.

> 
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
> ---
> The patch is against the mm tree. Make sure that
> mm-avoid-calling-pgdat_balanced-needlessly.patch is applied first (not
> yet in the mmotm tree). Tested on half a dozen systems with different
> workloads for the last few days, working really well!
> 
>  mm/vmscan.c | 35 ++++++++++++-----------------------
>  1 file changed, 12 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 002ade6..1c5d38a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2565,7 +2565,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
>  							int *classzone_idx)
>  {
>  	bool pgdat_is_balanced = false;
> -	struct zone *unbalanced_zone;
>  	int i;
>  	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
>  	unsigned long total_scanned;
> @@ -2596,9 +2595,6 @@ loop_again:
>  
>  	do {
>  		unsigned long lru_pages = 0;
> -		int has_under_min_watermark_zone = 0;
> -
> -		unbalanced_zone = NULL;
>  
>  		/*
>  		 * Scan in the highmem->dma direction for the highest
> @@ -2739,15 +2735,20 @@ loop_again:
>  			}
>  
>  			if (!zone_balanced(zone, testorder, 0, end_zone)) {
> -				unbalanced_zone = zone;
> -				/*
> -				 * We are still under min water mark.  This
> -				 * means that we have a GFP_ATOMIC allocation
> -				 * failure risk. Hurry up!
> -				 */
> +			    if (total_scanned && sc.priority < DEF_PRIORITY - 2) {
> +				/* OK, kswapd is getting into trouble. */
>  				if (!zone_watermark_ok_safe(zone, order,
>  					    min_wmark_pages(zone), end_zone, 0))
> -					has_under_min_watermark_zone = 1;
> +				    /*
> +				     * We are still under min water mark.
> +				     * This means that we have a GFP_ATOMIC
> +				     * allocation failure risk. Hurry up!
> +				     */
> +				    count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
> +				else
> +				    /* Take a nap if a zone is congested. */
> +				    wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
> +			    }
>  			} else {
>  				/*
>  				 * If a zone reaches its high watermark,
> @@ -2758,7 +2759,6 @@ loop_again:
>  				 */
>  				zone_clear_flag(zone, ZONE_CONGESTED);
>  			}
> -
>  		}
>  
>  		/*
> @@ -2776,17 +2776,6 @@ loop_again:
>  		}
>  
>  		/*
> -		 * OK, kswapd is getting into trouble.  Take a nap, then take
> -		 * another pass across the zones.
> -		 */
> -		if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
> -			if (has_under_min_watermark_zone)
> -				count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
> -			else if (unbalanced_zone)
> -				wait_iff_congested(unbalanced_zone, BLK_RW_ASYNC, HZ/10);
> -		}
> -
> -		/*
>  		 * We do this so kswapd doesn't build up large priorities for
>  		 * example when it is freeing in parallel with allocators. It
>  		 * matches the direct reclaim path behaviour in terms of impact
> -- 
> 1.8.1
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: wait for congestion to clear on all zones
  2013-01-11  1:25 ` Simon Jeons
@ 2013-01-11 11:25   ` Zlatko Calusic
  2013-01-13  0:46     ` Simon Jeons
  0 siblings, 1 reply; 8+ messages in thread
From: Zlatko Calusic @ 2013-01-11 11:25 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On 11.01.2013 02:25, Simon Jeons wrote:
> On Wed, 2013-01-09 at 22:41 +0100, Zlatko Calusic wrote:
>> From: Zlatko Calusic <zlatko.calusic@iskon.hr>
>>
>> Currently we take a short nap (HZ/10) and wait for congestion to clear
>> before taking another pass with lower priority in balance_pgdat(). But
>> we do that only for the highest zone that we encounter is unbalanced
>> and congested.
>>
>> This patch changes that to wait on all congested zones in a single
>> pass in the hope that it will save us some scanning that way. Also we
>> take a nap as soon as congested zone is encountered and sc.priority <
>> DEF_PRIORITY - 2 (aka kswapd in trouble).
> 
> But you still didn't explain what's the problem you meat and what
> scenario can get benefit from your change.
> 

I did in my reply to Andrew. Here's the relevant part:

> I have an observation that without it, under some circumstances that 
> are VERY HARD to repeat (many days need to pass and some stars to align
> to see the effect), the page cache gets hit hard, 2/3 of it evicted in
> a split second. And it's not even under high load! So, I'm still
> monitoring it, but so far the memory utilization really seems better
> with the patch applied (no more mysterious page cache shootdowns). 

The scenario that should get benefit is everyday. I observed problems during
light but constant reading from disk (< 10MB/s). And sending that data
over the network at the same time. Think backup that compresses data on the
fly before pushing it over the network (so it's not very fast).

The trouble is that you can't just fix up a quick benchmark and measure the
impact, because many days need to pass for the bug to show up in all it's beauty.

Is there anybody out there who'd like to comment on the patch logic? I.e. do
you think that waiting on every congested zone is the more correct solution
than waiting on only one (only the highest one, and ignoring the fact that
there may be other even more congested zones)?

Regards,
-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: wait for congestion to clear on all zones
  2013-01-11 11:25   ` Zlatko Calusic
@ 2013-01-13  0:46     ` Simon Jeons
  2013-01-14 14:37       ` Zlatko Calusic
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Jeons @ 2013-01-13  0:46 UTC (permalink / raw)
  To: Zlatko Calusic
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On Fri, 2013-01-11 at 12:25 +0100, Zlatko Calusic wrote:
> On 11.01.2013 02:25, Simon Jeons wrote:
> > On Wed, 2013-01-09 at 22:41 +0100, Zlatko Calusic wrote:
> >> From: Zlatko Calusic <zlatko.calusic@iskon.hr>
> >>
> >> Currently we take a short nap (HZ/10) and wait for congestion to clear
> >> before taking another pass with lower priority in balance_pgdat(). But
> >> we do that only for the highest zone that we encounter is unbalanced
> >> and congested.
> >>
> >> This patch changes that to wait on all congested zones in a single
> >> pass in the hope that it will save us some scanning that way. Also we
> >> take a nap as soon as congested zone is encountered and sc.priority <
> >> DEF_PRIORITY - 2 (aka kswapd in trouble).
> > 
> > But you still didn't explain what's the problem you meat and what
> > scenario can get benefit from your change.
> > 
> 
> I did in my reply to Andrew. Here's the relevant part:
> 
> > I have an observation that without it, under some circumstances that 
> > are VERY HARD to repeat (many days need to pass and some stars to align
> > to see the effect), the page cache gets hit hard, 2/3 of it evicted in
> > a split second. And it's not even under high load! So, I'm still
> > monitoring it, but so far the memory utilization really seems better
> > with the patch applied (no more mysterious page cache shootdowns). 
> 
> The scenario that should get benefit is everyday. I observed problems during
> light but constant reading from disk (< 10MB/s). And sending that data
> over the network at the same time. Think backup that compresses data on the
> fly before pushing it over the network (so it's not very fast).
> 
> The trouble is that you can't just fix up a quick benchmark and measure the
> impact, because many days need to pass for the bug to show up in all it's beauty.
> 
> Is there anybody out there who'd like to comment on the patch logic? I.e. do
> you think that waiting on every congested zone is the more correct solution
> than waiting on only one (only the highest one, and ignoring the fact that
> there may be other even more congested zones)?

What's the benefit of waiting on every congested zone than waiting on
only one against your scenario?

> 
> Regards,

-- 
Simon Jeons <simon.jeons@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: wait for congestion to clear on all zones
  2013-01-13  0:46     ` Simon Jeons
@ 2013-01-14 14:37       ` Zlatko Calusic
  0 siblings, 0 replies; 8+ messages in thread
From: Zlatko Calusic @ 2013-01-14 14:37 UTC (permalink / raw)
  To: Simon Jeons, Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On 13.01.2013 01:46, Simon Jeons wrote:
> On Fri, 2013-01-11 at 12:25 +0100, Zlatko Calusic wrote:
>> On 11.01.2013 02:25, Simon Jeons wrote:
>>> On Wed, 2013-01-09 at 22:41 +0100, Zlatko Calusic wrote:
>>>> From: Zlatko Calusic <zlatko.calusic@iskon.hr>
>>>>
>>>> Currently we take a short nap (HZ/10) and wait for congestion to clear
>>>> before taking another pass with lower priority in balance_pgdat(). But
>>>> we do that only for the highest zone that we encounter is unbalanced
>>>> and congested.
>>>>
>>>> This patch changes that to wait on all congested zones in a single
>>>> pass in the hope that it will save us some scanning that way. Also we
>>>> take a nap as soon as congested zone is encountered and sc.priority <
>>>> DEF_PRIORITY - 2 (aka kswapd in trouble).
>>>
>>> But you still didn't explain what's the problem you meat and what
>>> scenario can get benefit from your change.
>>>
>>
>> I did in my reply to Andrew. Here's the relevant part:
>>
>>> I have an observation that without it, under some circumstances that
>>> are VERY HARD to repeat (many days need to pass and some stars to align
>>> to see the effect), the page cache gets hit hard, 2/3 of it evicted in
>>> a split second. And it's not even under high load! So, I'm still
>>> monitoring it, but so far the memory utilization really seems better
>>> with the patch applied (no more mysterious page cache shootdowns).
>>
>> The scenario that should get benefit is everyday. I observed problems during
>> light but constant reading from disk (< 10MB/s). And sending that data
>> over the network at the same time. Think backup that compresses data on the
>> fly before pushing it over the network (so it's not very fast).
>>
>> The trouble is that you can't just fix up a quick benchmark and measure the
>> impact, because many days need to pass for the bug to show up in all it's beauty.
>>
>> Is there anybody out there who'd like to comment on the patch logic? I.e. do
>> you think that waiting on every congested zone is the more correct solution
>> than waiting on only one (only the highest one, and ignoring the fact that
>> there may be other even more congested zones)?
> 
> What's the benefit of waiting on every congested zone than waiting on
> only one against your scenario?
> 

The good:

Actually, we are _already_ waiting on every congested zone. And have
been for more than a year. So, all this discussion is... moot.

Andrew, ignore this patch, I'll send you a much better one in a minute.
There shouldn't be nearly so many questions about that one. ;)

The bad:

Obviously then, this patch didn't fix my issue. It just took a little
bit longer for it to appear again.

The ugly:

Here's what I observe on one of my machines:

Node 0, zone      DMA
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
Node 0, zone    DMA32
    nr_vmscan_write 23164
    nr_vmscan_immediate_reclaim 582038
Node 0, zone   Normal
    nr_vmscan_write 16584344  <-- ugh!
    nr_vmscan_immediate_reclaim 1118415

But that's just a sneak peek, I'll open a proper thread to discuss this
when I collect a little bit more data. BTW, that Normal zone with
extraordinary amount of writebacks under memory pressure is 4 times
smaller than DMA32 zone, that's why I consider it ugly. :P
-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-14 14:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-09 21:41 [PATCH] mm: wait for congestion to clear on all zones Zlatko Calusic
2013-01-09 21:48 ` Andrew Morton
2013-01-09 22:15   ` Zlatko Calusic
2013-01-09 22:52   ` Zlatko Calusic
2013-01-11  1:25 ` Simon Jeons
2013-01-11 11:25   ` Zlatko Calusic
2013-01-13  0:46     ` Simon Jeons
2013-01-14 14:37       ` Zlatko Calusic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).