* [PATCH -mm] throttle direct reclaim when too many pages are isolated already
@ 2009-07-16  2:38 Rik van Riel
  2009-07-16  2:48 ` Andrew Morton
  2009-07-16  3:19 ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  2:38 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, Wu Fengguang
When way too many processes go into direct reclaim, it is possible
for all of the pages to be taken off the LRU.  One result of this
is that the next process in the page reclaim code thinks there are
no reclaimable pages left and triggers an out of memory kill.
One solution to this problem is to never let so many processes into
the page reclaim path that the entire LRU is emptied.  Limiting the
system to only having half of each inactive list isolated for
reclaim should be safe.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
This patch goes on top of Kosaki's "Account the number of isolated pages"
patch series.
 mm/vmscan.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)
Index: mmotm/mm/vmscan.c
===================================================================
--- mmotm.orig/mm/vmscan.c	2009-07-08 21:37:01.000000000 -0400
+++ mmotm/mm/vmscan.c	2009-07-08 21:39:02.000000000 -0400
@@ -1035,6 +1035,27 @@ int isolate_lru_page(struct page *page)
 }
 
 /*
+ * Are there way too many processes in the direct reclaim path already?
+ */
+static int too_many_isolated(struct zone *zone, int file)
+{
+	unsigned long inactive, isolated;
+
+	if (current_is_kswapd())
+		return 0;
+
+	if (file) {
+		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
+		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
+	} else {
+		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
+		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
+	}
+
+	return isolated > inactive;
+}
+
+/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1049,6 +1070,10 @@ static unsigned long shrink_inactive_lis
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 	int lumpy_reclaim = 0;
 
+	while (unlikely(too_many_isolated(zone, file))) {
+		schedule_timeout_interruptible(HZ/10);
+	}
+
 	/*
 	 * If we need a large contiguous chunk of memory, or have
 	 * trouble getting a small set of contiguous pages, we
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  2:38 [PATCH -mm] throttle direct reclaim when too many pages are isolated already Rik van Riel
@ 2009-07-16  2:48 ` Andrew Morton
  2009-07-16  3:10   ` Rik van Riel
  2009-07-16  3:36   ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v2) Rik van Riel
  2009-07-16  3:19 ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2009-07-16  2:48 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
On Wed, 15 Jul 2009 22:38:53 -0400 Rik van Riel <riel@redhat.com> wrote:
> When way too many processes go into direct reclaim, it is possible
> for all of the pages to be taken off the LRU.  One result of this
> is that the next process in the page reclaim code thinks there are
> no reclaimable pages left and triggers an out of memory kill.
> 
> One solution to this problem is to never let so many processes into
> the page reclaim path that the entire LRU is emptied.  Limiting the
> system to only having half of each inactive list isolated for
> reclaim should be safe.
> 
Since when?  Linux page reclaim has a bilion machine years testing and
now stuff like this turns up.  Did we break it or is this a
never-before-discovered workload?
> ---
> This patch goes on top of Kosaki's "Account the number of isolated pages"
> patch series.
> 
>  mm/vmscan.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> Index: mmotm/mm/vmscan.c
> ===================================================================
> --- mmotm.orig/mm/vmscan.c	2009-07-08 21:37:01.000000000 -0400
> +++ mmotm/mm/vmscan.c	2009-07-08 21:39:02.000000000 -0400
> @@ -1035,6 +1035,27 @@ int isolate_lru_page(struct page *page)
>  }
>  
>  /*
> + * Are there way too many processes in the direct reclaim path already?
> + */
> +static int too_many_isolated(struct zone *zone, int file)
> +{
> +	unsigned long inactive, isolated;
> +
> +	if (current_is_kswapd())
> +		return 0;
> +
> +	if (file) {
> +		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> +		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
> +	} else {
> +		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
> +		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
> +	}
> +
> +	return isolated > inactive;
> +}
> +
> +/*
>   * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
>   * of reclaimed pages
>   */
> @@ -1049,6 +1070,10 @@ static unsigned long shrink_inactive_lis
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>  	int lumpy_reclaim = 0;
>  
> +	while (unlikely(too_many_isolated(zone, file))) {
> +		schedule_timeout_interruptible(HZ/10);
> +	}
This (incorrectly-laid-out) code is a no-op if signal_pending().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  2:48 ` Andrew Morton
@ 2009-07-16  3:10   ` Rik van Riel
  2009-07-16  3:21     ` Andrew Morton
  2009-07-16  3:36   ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v2) Rik van Riel
  1 sibling, 1 reply; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  3:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
Andrew Morton wrote:
> On Wed, 15 Jul 2009 22:38:53 -0400 Rik van Riel <riel@redhat.com> wrote:
> 
>> When way too many processes go into direct reclaim, it is possible
>> for all of the pages to be taken off the LRU.  One result of this
>> is that the next process in the page reclaim code thinks there are
>> no reclaimable pages left and triggers an out of memory kill.
>>
>> One solution to this problem is to never let so many processes into
>> the page reclaim path that the entire LRU is emptied.  Limiting the
>> system to only having half of each inactive list isolated for
>> reclaim should be safe.
>>
> 
> Since when?  Linux page reclaim has a bilion machine years testing and
> now stuff like this turns up.  Did we break it or is this a
> never-before-discovered workload?
It's been there for years, in various forms.  It hardly ever
shows up, but Kosaki's patch series give us a nice chance to
fix it for good.
>> @@ -1049,6 +1070,10 @@ static unsigned long shrink_inactive_lis
>>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>>  	int lumpy_reclaim = 0;
>>  
>> +	while (unlikely(too_many_isolated(zone, file))) {
>> +		schedule_timeout_interruptible(HZ/10);
>> +	}
> 
> This (incorrectly-laid-out) code is a no-op if signal_pending().
Good point, I should add some code to break out of page reclaim
if a fatal signal is pending, and use a normal schedule_timeout
otherwise.
Btw, how is this laid out wrong?  How do I do this better?
-- 
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  2:38 [PATCH -mm] throttle direct reclaim when too many pages are isolated already Rik van Riel
  2009-07-16  2:48 ` Andrew Morton
@ 2009-07-16  3:19 ` KAMEZAWA Hiroyuki
  2009-07-16  3:32   ` Rik van Riel
  1 sibling, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-16  3:19 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Wu Fengguang
On Wed, 15 Jul 2009 22:38:53 -0400
Rik van Riel <riel@redhat.com> wrote:
> When way too many processes go into direct reclaim, it is possible
> for all of the pages to be taken off the LRU.  One result of this
> is that the next process in the page reclaim code thinks there are
> no reclaimable pages left and triggers an out of memory kill.
> 
> One solution to this problem is to never let so many processes into
> the page reclaim path that the entire LRU is emptied.  Limiting the
> system to only having half of each inactive list isolated for
> reclaim should be safe.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> This patch goes on top of Kosaki's "Account the number of isolated pages"
> patch series.
> 
>  mm/vmscan.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> Index: mmotm/mm/vmscan.c
> ===================================================================
> --- mmotm.orig/mm/vmscan.c	2009-07-08 21:37:01.000000000 -0400
> +++ mmotm/mm/vmscan.c	2009-07-08 21:39:02.000000000 -0400
> @@ -1035,6 +1035,27 @@ int isolate_lru_page(struct page *page)
>  }
>  
>  /*
> + * Are there way too many processes in the direct reclaim path already?
> + */
> +static int too_many_isolated(struct zone *zone, int file)
> +{
> +	unsigned long inactive, isolated;
> +
> +	if (current_is_kswapd())
> +		return 0;
> +
> +	if (file) {
> +		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> +		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
> +	} else {
> +		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
> +		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
> +	}
> +
> +	return isolated > inactive;
> +}
Why this means "too much" ?
And, could you put this check under scanning_global_lru(sc) ?
Thanks,
-Kame
> +
> +/*
>   * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
>   * of reclaimed pages
>   */
> @@ -1049,6 +1070,10 @@ static unsigned long shrink_inactive_lis
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>  	int lumpy_reclaim = 0;
>  
> +	while (unlikely(too_many_isolated(zone, file))) {
> +		schedule_timeout_interruptible(HZ/10);
> +	}
> +
>  	/*
>  	 * If we need a large contiguous chunk of memory, or have
>  	 * trouble getting a small set of contiguous pages, we
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:10   ` Rik van Riel
@ 2009-07-16  3:21     ` Andrew Morton
  2009-07-16  3:28       ` Rik van Riel
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2009-07-16  3:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
On Wed, 15 Jul 2009 23:10:43 -0400 Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
> > On Wed, 15 Jul 2009 22:38:53 -0400 Rik van Riel <riel@redhat.com> wrote:
> > 
> >> When way too many processes go into direct reclaim, it is possible
> >> for all of the pages to be taken off the LRU.  One result of this
> >> is that the next process in the page reclaim code thinks there are
> >> no reclaimable pages left and triggers an out of memory kill.
> >>
> >> One solution to this problem is to never let so many processes into
> >> the page reclaim path that the entire LRU is emptied.  Limiting the
> >> system to only having half of each inactive list isolated for
> >> reclaim should be safe.
> >>
> > 
> > Since when?  Linux page reclaim has a bilion machine years testing and
> > now stuff like this turns up.  Did we break it or is this a
> > never-before-discovered workload?
> 
> It's been there for years, in various forms.  It hardly ever
> shows up, but Kosaki's patch series give us a nice chance to
> fix it for good.
OK.
> >> @@ -1049,6 +1070,10 @@ static unsigned long shrink_inactive_lis
> >>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> >>  	int lumpy_reclaim = 0;
> >>  
> >> +	while (unlikely(too_many_isolated(zone, file))) {
> >> +		schedule_timeout_interruptible(HZ/10);
> >> +	}
> > 
> > This (incorrectly-laid-out) code is a no-op if signal_pending().
> 
> Good point, I should add some code to break out of page reclaim
> if a fatal signal is pending,
We can't just return NULL from __alloc_pages(), and if we can't
get a page from the freelists then we're just going to have to keep
reclaiming.  So I'm not sure how we can do this.
> and use a normal schedule_timeout
> otherwise.
congestion_wait() would be typical.
> Btw, how is this laid out wrong?  How do I do this better?
ask checkpatch ;)
WARNING: braces {} are not necessary for single statement blocks
#99: FILE: mm/vmscan.c:1073:
+	while (unlikely(too_many_isolated(zone, file))) {
+		schedule_timeout_interruptible(HZ/10);
+	}
total: 0 errors, 1 warnings, 37 lines checked
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:21     ` Andrew Morton
@ 2009-07-16  3:28       ` Rik van Riel
  2009-07-16  3:38         ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  3:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
Andrew Morton wrote:
> On Wed, 15 Jul 2009 23:10:43 -0400 Rik van Riel <riel@redhat.com> wrote:
> 
>> Andrew Morton wrote:
>>> On Wed, 15 Jul 2009 22:38:53 -0400 Rik van Riel <riel@redhat.com> wrote:
>>>
>>>> When way too many processes go into direct reclaim, it is possible
>>>> for all of the pages to be taken off the LRU.  One result of this
>>>> is that the next process in the page reclaim code thinks there are
>>>> no reclaimable pages left and triggers an out of memory kill.
>>>>
>>>> One solution to this problem is to never let so many processes into
>>>> the page reclaim path that the entire LRU is emptied.  Limiting the
>>>> system to only having half of each inactive list isolated for
>>>> reclaim should be safe.
>>>>
>>> Since when?  Linux page reclaim has a bilion machine years testing and
>>> now stuff like this turns up.  Did we break it or is this a
>>> never-before-discovered workload?
>> It's been there for years, in various forms.  It hardly ever
>> shows up, but Kosaki's patch series give us a nice chance to
>> fix it for good.
> 
> OK.
> 
>>>> @@ -1049,6 +1070,10 @@ static unsigned long shrink_inactive_lis
>>>>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>>>>  	int lumpy_reclaim = 0;
>>>>  
>>>> +	while (unlikely(too_many_isolated(zone, file))) {
>>>> +		schedule_timeout_interruptible(HZ/10);
>>>> +	}
>>> This (incorrectly-laid-out) code is a no-op if signal_pending().
>> Good point, I should add some code to break out of page reclaim
>> if a fatal signal is pending,
> 
> We can't just return NULL from __alloc_pages(), and if we can't
> get a page from the freelists then we're just going to have to keep
> reclaiming.  So I'm not sure how we can do this.
If we are stuck at this point in the page reclaim code,
it is because too many other tasks are reclaiming pages.
That makes it fairly safe to just return SWAP_CLUSTER_MAX
here and hope that __alloc_pages() can get a page.
After all, if __alloc_pages() thinks it made progress,
but still cannot make the allocation, it will call the
pageout code again.
-- 
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:19 ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already KAMEZAWA Hiroyuki
@ 2009-07-16  3:32   ` Rik van Riel
  2009-07-16  3:42     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  3:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Wu Fengguang
KAMEZAWA Hiroyuki wrote:
> On Wed, 15 Jul 2009 22:38:53 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
>> When way too many processes go into direct reclaim, it is possible
>> for all of the pages to be taken off the LRU.  One result of this
>> is that the next process in the page reclaim code thinks there are
>> no reclaimable pages left and triggers an out of memory kill.
>>
>> One solution to this problem is to never let so many processes into
>> the page reclaim path that the entire LRU is emptied.  Limiting the
>> system to only having half of each inactive list isolated for
>> reclaim should be safe.
>>
>> Signed-off-by: Rik van Riel <riel@redhat.com>
>> ---
>> This patch goes on top of Kosaki's "Account the number of isolated pages"
>> patch series.
>>
>>  mm/vmscan.c |   25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> Index: mmotm/mm/vmscan.c
>> ===================================================================
>> --- mmotm.orig/mm/vmscan.c	2009-07-08 21:37:01.000000000 -0400
>> +++ mmotm/mm/vmscan.c	2009-07-08 21:39:02.000000000 -0400
>> @@ -1035,6 +1035,27 @@ int isolate_lru_page(struct page *page)
>>  }
>>  
>>  /*
>> + * Are there way too many processes in the direct reclaim path already?
>> + */
>> +static int too_many_isolated(struct zone *zone, int file)
>> +{
>> +	unsigned long inactive, isolated;
>> +
>> +	if (current_is_kswapd())
>> +		return 0;
>> +
>> +	if (file) {
>> +		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
>> +		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
>> +	} else {
>> +		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
>> +		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
>> +	}
>> +
>> +	return isolated > inactive;
>> +}
> 
> Why this means "too much" ?
This triggers when most of the pages in the zone (in the
category we are trying to reclaim) have already been
isolated by other tasks, to be reclaimed.  There is really
no need to reclaim all of the pages in a zone all at once,
plus it can cause false OOM kills.
Setting the threshold at isolated > inactive gives us
enough of a safety margin that we can do this comparison
lockless.
> And, could you put this check under scanning_global_lru(sc) ?
When most of the pages in a zone have been isolated from
the LRU already by page reclaim, chances are that cgroup
reclaim will suffer from the same problem.
Am I overlooking something?
-- 
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v2)
  2009-07-16  2:48 ` Andrew Morton
  2009-07-16  3:10   ` Rik van Riel
@ 2009-07-16  3:36   ` Rik van Riel
  1 sibling, 0 replies; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  3:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
When way too many processes go into direct reclaim, it is possible
for all of the pages to be taken off the LRU.  One result of this
is that the next process in the page reclaim code thinks there are
no reclaimable pages left and triggers an out of memory kill.
One solution to this problem is to never let so many processes into
the page reclaim path that the entire LRU is emptied.  Limiting the
system to only having half of each inactive list isolated for
reclaim should be safe.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
v2: fix the bugs pointed out by Andrew Morton
This patch goes on top of Kosaki's "Account the number of isolated pages"
patch series.
 mm/vmscan.c |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
Index: mmotm/mm/vmscan.c
===================================================================
--- mmotm.orig/mm/vmscan.c	2009-07-15 22:32:35.000000000 -0400
+++ mmotm/mm/vmscan.c	2009-07-15 23:26:37.000000000 -0400
@@ -1035,6 +1035,27 @@ int isolate_lru_page(struct page *page)
 }
 
 /*
+ * Are there way too many processes in the direct reclaim path already?
+ */
+static int too_many_isolated(struct zone *zone, int file)
+{
+	unsigned long inactive, isolated;
+
+	if (current_is_kswapd())
+		return 0;
+
+	if (file) {
+		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
+		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
+	} else {
+		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
+		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
+	}
+
+	return isolated > inactive;
+}
+
+/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1049,6 +1070,14 @@ static unsigned long shrink_inactive_lis
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 	int lumpy_reclaim = 0;
 
+	while (unlikely(too_many_isolated(zone, file))) {
+		/* We are about to die and free our memory. Return now. */
+		if (fatal_signal_pending(current))
+			return SWAP_CLUSTER_MAX;
+
+		congestion_wait(WRITE, HZ/10);
+	}
+
 	/*
 	 * If we need a large contiguous chunk of memory, or have
 	 * trouble getting a small set of contiguous pages, we
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:28       ` Rik van Riel
@ 2009-07-16  3:38         ` Andrew Morton
  2009-07-16  3:42           ` Rik van Riel
  2009-07-16  3:53           ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3) Rik van Riel
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2009-07-16  3:38 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
On Wed, 15 Jul 2009 23:28:14 -0400 Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
> > On Wed, 15 Jul 2009 23:10:43 -0400 Rik van Riel <riel@redhat.com> wrote:
> > 
> >> Andrew Morton wrote:
> >>> On Wed, 15 Jul 2009 22:38:53 -0400 Rik van Riel <riel@redhat.com> wrote:
> >>>
> >>>> When way too many processes go into direct reclaim, it is possible
> >>>> for all of the pages to be taken off the LRU.  One result of this
> >>>> is that the next process in the page reclaim code thinks there are
> >>>> no reclaimable pages left and triggers an out of memory kill.
> >>>>
> >>>> One solution to this problem is to never let so many processes into
> >>>> the page reclaim path that the entire LRU is emptied.  Limiting the
> >>>> system to only having half of each inactive list isolated for
> >>>> reclaim should be safe.
> >>>>
> >>> Since when?  Linux page reclaim has a bilion machine years testing and
> >>> now stuff like this turns up.  Did we break it or is this a
> >>> never-before-discovered workload?
> >> It's been there for years, in various forms.  It hardly ever
> >> shows up, but Kosaki's patch series give us a nice chance to
> >> fix it for good.
> > 
> > OK.
> > 
> >>>> @@ -1049,6 +1070,10 @@ static unsigned long shrink_inactive_lis
> >>>>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> >>>>  	int lumpy_reclaim = 0;
> >>>>  
> >>>> +	while (unlikely(too_many_isolated(zone, file))) {
> >>>> +		schedule_timeout_interruptible(HZ/10);
> >>>> +	}
> >>> This (incorrectly-laid-out) code is a no-op if signal_pending().
> >> Good point, I should add some code to break out of page reclaim
> >> if a fatal signal is pending,
> > 
> > We can't just return NULL from __alloc_pages(), and if we can't
> > get a page from the freelists then we're just going to have to keep
> > reclaiming.  So I'm not sure how we can do this.
> 
> If we are stuck at this point in the page reclaim code,
> it is because too many other tasks are reclaiming pages.
> 
> That makes it fairly safe to just return SWAP_CLUSTER_MAX
> here and hope that __alloc_pages() can get a page.
> 
> After all, if __alloc_pages() thinks it made progress,
> but still cannot make the allocation, it will call the
> pageout code again.
Which will immediately return because the caller still has
fatal_signal_pending()?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:38         ` Andrew Morton
@ 2009-07-16  3:42           ` Rik van Riel
  2009-07-16  3:51             ` Andrew Morton
  2009-07-16  3:53           ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3) Rik van Riel
  1 sibling, 1 reply; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  3:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
Andrew Morton wrote:
> On Wed, 15 Jul 2009 23:28:14 -0400 Rik van Riel <riel@redhat.com> wrote:
>> If we are stuck at this point in the page reclaim code,
>> it is because too many other tasks are reclaiming pages.
>>
>> That makes it fairly safe to just return SWAP_CLUSTER_MAX
>> here and hope that __alloc_pages() can get a page.
>>
>> After all, if __alloc_pages() thinks it made progress,
>> but still cannot make the allocation, it will call the
>> pageout code again.
> 
> Which will immediately return because the caller still has
> fatal_signal_pending()?
Other processes are in the middle of freeing pages at
this point, so we should succeed in __alloc_pages()
fairly quickly (and then die and free all our memory).
-- 
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:32   ` Rik van Riel
@ 2009-07-16  3:42     ` KAMEZAWA Hiroyuki
  2009-07-16  3:47       ` Rik van Riel
  0 siblings, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-16  3:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Wu Fengguang
On Wed, 15 Jul 2009 23:32:13 -0400
Rik van Riel <riel@redhat.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Wed, 15 Jul 2009 22:38:53 -0400
> > Rik van Riel <riel@redhat.com> wrote:
> > 
> >> When way too many processes go into direct reclaim, it is possible
> >> for all of the pages to be taken off the LRU.  One result of this
> >> is that the next process in the page reclaim code thinks there are
> >> no reclaimable pages left and triggers an out of memory kill.
> >>
> >> One solution to this problem is to never let so many processes into
> >> the page reclaim path that the entire LRU is emptied.  Limiting the
> >> system to only having half of each inactive list isolated for
> >> reclaim should be safe.
> >>
> >> Signed-off-by: Rik van Riel <riel@redhat.com>
> >> ---
> >> This patch goes on top of Kosaki's "Account the number of isolated pages"
> >> patch series.
> >>
> >>  mm/vmscan.c |   25 +++++++++++++++++++++++++
> >>  1 file changed, 25 insertions(+)
> >>
> >> Index: mmotm/mm/vmscan.c
> >> ===================================================================
> >> --- mmotm.orig/mm/vmscan.c	2009-07-08 21:37:01.000000000 -0400
> >> +++ mmotm/mm/vmscan.c	2009-07-08 21:39:02.000000000 -0400
> >> @@ -1035,6 +1035,27 @@ int isolate_lru_page(struct page *page)
> >>  }
> >>  
> >>  /*
> >> + * Are there way too many processes in the direct reclaim path already?
> >> + */
> >> +static int too_many_isolated(struct zone *zone, int file)
> >> +{
> >> +	unsigned long inactive, isolated;
> >> +
> >> +	if (current_is_kswapd())
> >> +		return 0;
> >> +
> >> +	if (file) {
> >> +		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> >> +		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
> >> +	} else {
> >> +		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
> >> +		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
> >> +	}
> >> +
> >> +	return isolated > inactive;
> >> +}
> > 
> > Why this means "too much" ?
> 
> This triggers when most of the pages in the zone (in the
> category we are trying to reclaim) have already been
> isolated by other tasks, to be reclaimed.  There is really
> no need to reclaim all of the pages in a zone all at once,
> plus it can cause false OOM kills.
> 
> Setting the threshold at isolated > inactive gives us
> enough of a safety margin that we can do this comparison
> lockless.
> 
> > And, could you put this check under scanning_global_lru(sc) ?
> 
> When most of the pages in a zone have been isolated from
> the LRU already by page reclaim, chances are that cgroup
> reclaim will suffer from the same problem.
> 
> Am I overlooking something?
> 
Reclaim from cgorup doesn't come from memory shortage but from
"it hits limit". Then, it doen't necessary to reclaim pages from
this zone. fallback to other zone is always ok.
This will trigger unnecessary wait, I think.
Thanks,
-Kame
> -- 
> All rights reversed.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:42     ` KAMEZAWA Hiroyuki
@ 2009-07-16  3:47       ` Rik van Riel
  0 siblings, 0 replies; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  3:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Wu Fengguang
KAMEZAWA Hiroyuki wrote:
>> Am I overlooking something?
>>
> Reclaim from cgorup doesn't come from memory shortage but from
> "it hits limit". Then, it doen't necessary to reclaim pages from
> this zone. fallback to other zone is always ok.
> This will trigger unnecessary wait, I think.
Fair enough.
I'll also change the patch so tasks with a fatal signal
pending will go through congestion_wait at least once,
to give other tasks a chance to free up memory.
That should address everybody's concerns.
-- 
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already
  2009-07-16  3:42           ` Rik van Riel
@ 2009-07-16  3:51             ` Andrew Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2009-07-16  3:51 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
On Wed, 15 Jul 2009 23:42:28 -0400 Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
> > On Wed, 15 Jul 2009 23:28:14 -0400 Rik van Riel <riel@redhat.com> wrote:
> 
> >> If we are stuck at this point in the page reclaim code,
> >> it is because too many other tasks are reclaiming pages.
> >>
> >> That makes it fairly safe to just return SWAP_CLUSTER_MAX
> >> here and hope that __alloc_pages() can get a page.
> >>
> >> After all, if __alloc_pages() thinks it made progress,
> >> but still cannot make the allocation, it will call the
> >> pageout code again.
> > 
> > Which will immediately return because the caller still has
> > fatal_signal_pending()?
> 
> Other processes are in the middle of freeing pages at
> this point, so we should succeed in __alloc_pages()
> fairly quickly (and then die and free all our memory).
What if it's a uniprocessor machine and all those processes are
scheduled out?  We sit there chewing 100% CPU and not doing anything
afaict.
Even if it _is_ SMP, we could still chew decent-sized blips of CPU time
rattling around waiting for something to happen.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3)
  2009-07-16  3:38         ` Andrew Morton
  2009-07-16  3:42           ` Rik van Riel
@ 2009-07-16  3:53           ` Rik van Riel
  2009-07-16  4:02             ` Andrew Morton
  2009-07-29 15:04             ` Pavel Machek
  1 sibling, 2 replies; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  3:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
When way too many processes go into direct reclaim, it is possible
for all of the pages to be taken off the LRU.  One result of this
is that the next process in the page reclaim code thinks there are
no reclaimable pages left and triggers an out of memory kill.
One solution to this problem is to never let so many processes into
the page reclaim path that the entire LRU is emptied.  Limiting the
system to only having half of each inactive list isolated for
reclaim should be safe.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
v3: - only wait if scanning global lru (Kamezawa Hiroyuki)
    - make sure tasks with a fatal signal pending go through
      congestion_timeout at least once, to give other tasks a
      chance to free memory
v2: - fix the bugs pointed out by Andrew Morton
This patch goes on top of Kosaki's "Account the number of isolated pages"
patch series.
 mm/vmscan.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
Index: mmotm/mm/vmscan.c
===================================================================
--- mmotm.orig/mm/vmscan.c	2009-07-15 22:32:35.000000000 -0400
+++ mmotm/mm/vmscan.c	2009-07-15 23:50:01.000000000 -0400
@@ -1035,6 +1035,31 @@ int isolate_lru_page(struct page *page)
 }
 
 /*
+ * Are there way too many processes in the direct reclaim path already?
+ */
+static int too_many_isolated(struct zone *zone, int file,
+		struct scan_control *sc)
+{
+	unsigned long inactive, isolated;
+
+	if (current_is_kswapd())
+		return 0;
+
+	if (!scanning_global_lru(sc))
+		return 0;
+
+	if (file) {
+		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
+		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
+	} else {
+		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
+		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
+	}
+
+	return isolated > inactive;
+}
+
+/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1049,6 +1074,14 @@ static unsigned long shrink_inactive_lis
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 	int lumpy_reclaim = 0;
 
+	while (unlikely(too_many_isolated(zone, file, sc))) {
+		congestion_wait(WRITE, HZ/10);
+
+		/* We are about to die and free our memory. Return now. */
+		if (fatal_signal_pending(current))
+			return SWAP_CLUSTER_MAX;
+	}
+
 	/*
 	 * If we need a large contiguous chunk of memory, or have
 	 * trouble getting a small set of contiguous pages, we
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3)
  2009-07-16  3:53           ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3) Rik van Riel
@ 2009-07-16  4:02             ` Andrew Morton
  2009-07-16  4:09               ` Rik van Riel
  2009-07-29 15:04             ` Pavel Machek
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2009-07-16  4:02 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
On Wed, 15 Jul 2009 23:53:18 -0400 Rik van Riel <riel@redhat.com> wrote:
> @@ -1049,6 +1074,14 @@ static unsigned long shrink_inactive_lis
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>  	int lumpy_reclaim = 0;
>  
> +	while (unlikely(too_many_isolated(zone, file, sc))) {
> +		congestion_wait(WRITE, HZ/10);
> +
> +		/* We are about to die and free our memory. Return now. */
> +		if (fatal_signal_pending(current))
> +			return SWAP_CLUSTER_MAX;
> +	}
mutter.
While I agree that handling fatal signals on the direct reclaim path
is probably a good thing, this seems like a fairly random place at
which to start the enhancement.
If we were to step back and approach this in a broader fashion, perhaps
we would find some commonality with the existing TIF_MEMDIE handling,
dunno.
And I question the testedness of v3 :)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3)
  2009-07-16  4:02             ` Andrew Morton
@ 2009-07-16  4:09               ` Rik van Riel
  2009-07-16  4:26                 ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Rik van Riel @ 2009-07-16  4:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
Andrew Morton wrote:
> While I agree that handling fatal signals on the direct reclaim path
> is probably a good thing, this seems like a fairly random place at
> which to start the enhancement.
You are right, the direct reclaim path has one other
place where congestion_wait is called in a loop,
do_try_to_free_pages itself - we'll probably want to
break out of that loop too, if the task is about to
die and free all its memory.
> If we were to step back and approach this in a broader fashion, perhaps
> we would find some commonality with the existing TIF_MEMDIE handling,
> dunno.
Good point - what is it that makes TIF_MEMDIE special
wrt. other fatal signals, anyway?
I wonder if we should not simply "help along" any task
with fatal signals pending, anywhere in the VM (and maybe
other places in the kernel, too).
The faster we get rid of a killed process, the sooner its
resources become available to the other processes.
> And I question the testedness of v3 :)
No question about that :)
-- 
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3)
  2009-07-16  4:09               ` Rik van Riel
@ 2009-07-16  4:26                 ` Andrew Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2009-07-16  4:26 UTC (permalink / raw)
  To: Rik van Riel; +Cc: KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
On Thu, 16 Jul 2009 00:09:05 -0400 Rik van Riel <riel@redhat.com> wrote:
> > If we were to step back and approach this in a broader fashion, perhaps
> > we would find some commonality with the existing TIF_MEMDIE handling,
> > dunno.
> 
> Good point - what is it that makes TIF_MEMDIE special
> wrt. other fatal signals, anyway?
> 
> I wonder if we should not simply "help along" any task
> with fatal signals pending, anywhere in the VM (and maybe
> other places in the kernel, too).
> 
> The faster we get rid of a killed process, the sooner its
> resources become available to the other processes.
Spose so.
Are their any known (or makeable uppable) situations in which such a
change would be beneficial?  Maybe if the system is in a hopeless
swapstorm and someone is killing processes in an attempt to get control
back.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3)
  2009-07-16  3:53           ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3) Rik van Riel
  2009-07-16  4:02             ` Andrew Morton
@ 2009-07-29 15:04             ` Pavel Machek
  2009-07-29 16:19               ` Rik van Riel
  1 sibling, 1 reply; 19+ messages in thread
From: Pavel Machek @ 2009-07-29 15:04 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
On Wed 2009-07-15 23:53:18, Rik van Riel wrote:
> When way too many processes go into direct reclaim, it is possible
> for all of the pages to be taken off the LRU.  One result of this
> is that the next process in the page reclaim code thinks there are
> no reclaimable pages left and triggers an out of memory kill.
> 
> One solution to this problem is to never let so many processes into
> the page reclaim path that the entire LRU is emptied.  Limiting the
> system to only having half of each inactive list isolated for
> reclaim should be safe.
Is this still racy? Like on 100cpu machine, with LRU size of 50...?
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3)
  2009-07-29 15:04             ` Pavel Machek
@ 2009-07-29 16:19               ` Rik van Riel
  0 siblings, 0 replies; 19+ messages in thread
From: Rik van Riel @ 2009-07-29 16:19 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrew Morton, KOSAKI Motohiro, LKML, linux-mm, Wu Fengguang
Pavel Machek wrote:
> On Wed 2009-07-15 23:53:18, Rik van Riel wrote:
>> When way too many processes go into direct reclaim, it is possible
>> for all of the pages to be taken off the LRU.  One result of this
>> is that the next process in the page reclaim code thinks there are
>> no reclaimable pages left and triggers an out of memory kill.
>>
>> One solution to this problem is to never let so many processes into
>> the page reclaim path that the entire LRU is emptied.  Limiting the
>> system to only having half of each inactive list isolated for
>> reclaim should be safe.
> 
> Is this still racy? Like on 100cpu machine, with LRU size of 50...?
If a 100 CPU system gets down to just 100 reclaimable pages,
getting the OOM killer to trigger sounds desirable.
The goal of this patch is to avoid _false_ OOM kills, when
the system still has enough reclaimable memory available.
-- 
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-07-29 16:19 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-16  2:38 [PATCH -mm] throttle direct reclaim when too many pages are isolated already Rik van Riel
2009-07-16  2:48 ` Andrew Morton
2009-07-16  3:10   ` Rik van Riel
2009-07-16  3:21     ` Andrew Morton
2009-07-16  3:28       ` Rik van Riel
2009-07-16  3:38         ` Andrew Morton
2009-07-16  3:42           ` Rik van Riel
2009-07-16  3:51             ` Andrew Morton
2009-07-16  3:53           ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v3) Rik van Riel
2009-07-16  4:02             ` Andrew Morton
2009-07-16  4:09               ` Rik van Riel
2009-07-16  4:26                 ` Andrew Morton
2009-07-29 15:04             ` Pavel Machek
2009-07-29 16:19               ` Rik van Riel
2009-07-16  3:36   ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already (v2) Rik van Riel
2009-07-16  3:19 ` [PATCH -mm] throttle direct reclaim when too many pages are isolated already KAMEZAWA Hiroyuki
2009-07-16  3:32   ` Rik van Riel
2009-07-16  3:42     ` KAMEZAWA Hiroyuki
2009-07-16  3:47       ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).