linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* readahead and oom
@ 2011-04-26  5:49 Dave Young
  2011-04-26  5:55 ` Wu Fengguang
  2011-04-26  9:37 ` [PATCH] mm: readahead page allocations are OK to fail Wu Fengguang
  0 siblings, 2 replies; 43+ messages in thread
From: Dave Young @ 2011-04-26  5:49 UTC (permalink / raw)
  To: linux-mm, Linux Kernel Mailing List, Wu Fengguang

Hi,

When memory pressure is high, readahead could cause oom killing.
IMHO we should stop readaheading under such circumstances。If it's true
how to fix it?

-- 
Regards
dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  5:49 readahead and oom Dave Young
@ 2011-04-26  5:55 ` Wu Fengguang
  2011-04-26  6:05   ` Dave Young
  2011-04-26  9:37 ` [PATCH] mm: readahead page allocations are OK to fail Wu Fengguang
  1 sibling, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-04-26  5:55 UTC (permalink / raw)
  To: Dave Young; +Cc: linux-mm, Linux Kernel Mailing List

On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
> Hi,
> 
> When memory pressure is high, readahead could cause oom killing.
> IMHO we should stop readaheading under such circumstancesa??If it's true
> how to fix it?

Good question. Before OOM there will be readahead thrashings, which
can be addressed by this patch:

http://lkml.org/lkml/2010/2/2/229

However there seems no much interest on that feature.. I can separate
that out and resubmit it standalone if necessary.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  5:55 ` Wu Fengguang
@ 2011-04-26  6:05   ` Dave Young
  2011-04-26  6:07     ` Dave Young
  2011-04-26  6:13     ` readahead and oom Wu Fengguang
  0 siblings, 2 replies; 43+ messages in thread
From: Dave Young @ 2011-04-26  6:05 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: linux-mm, Linux Kernel Mailing List

On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
>> Hi,
>>
>> When memory pressure is high, readahead could cause oom killing.
>> IMHO we should stop readaheading under such circumstances。If it's true
>> how to fix it?
>
> Good question. Before OOM there will be readahead thrashings, which
> can be addressed by this patch:
>
> http://lkml.org/lkml/2010/2/2/229

Hi, I'm not clear about the patch, could be regard as below cases?
1) readahead alloc fail due to low memory such as other large allocation
2) readahead thrashing caused by itself

>
> However there seems no much interest on that feature.. I can separate
> that out and resubmit it standalone if necessary.
>
> Thanks,
> Fengguang
>



-- 
Regards
dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:05   ` Dave Young
@ 2011-04-26  6:07     ` Dave Young
  2011-04-26  6:25       ` Wu Fengguang
  2011-04-26  6:13     ` readahead and oom Wu Fengguang
  1 sibling, 1 reply; 43+ messages in thread
From: Dave Young @ 2011-04-26  6:07 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: linux-mm, Linux Kernel Mailing List

On Tue, Apr 26, 2011 at 2:05 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
>>> Hi,
>>>
>>> When memory pressure is high, readahead could cause oom killing.
>>> IMHO we should stop readaheading under such circumstances。If it's true
>>> how to fix it?
>>
>> Good question. Before OOM there will be readahead thrashings, which
>> can be addressed by this patch:
>>
>> http://lkml.org/lkml/2010/2/2/229
>
> Hi, I'm not clear about the patch, could be regard as below cases?
> 1) readahead alloc fail due to low memory such as other large allocation

For example vm balloon allocate lots of memory, then readahead could
fail immediately and then oom

> 2) readahead thrashing caused by itself
>
>>
>> However there seems no much interest on that feature.. I can separate
>> that out and resubmit it standalone if necessary.
>>
>> Thanks,
>> Fengguang
>>
>
>
>
> --
> Regards
> dave
>



-- 
Regards
dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:05   ` Dave Young
  2011-04-26  6:07     ` Dave Young
@ 2011-04-26  6:13     ` Wu Fengguang
  2011-04-26  6:23       ` Dave Young
  1 sibling, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-04-26  6:13 UTC (permalink / raw)
  To: Dave Young; +Cc: linux-mm, Linux Kernel Mailing List

On Tue, Apr 26, 2011 at 02:05:12PM +0800, Dave Young wrote:
> On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
> >> Hi,
> >>
> >> When memory pressure is high, readahead could cause oom killing.
> >> IMHO we should stop readaheading under such circumstancesa??If it's true
> >> how to fix it?
> >
> > Good question. Before OOM there will be readahead thrashings, which
> > can be addressed by this patch:
> >
> > http://lkml.org/lkml/2010/2/2/229
> 
> Hi, I'm not clear about the patch, could be regard as below cases?
> 1) readahead alloc fail due to low memory such as other large allocation
> 2) readahead thrashing caused by itself

When memory pressure goes up (not as much as allocation failures and OOM),
the readahead pages may be reclaimed before they are read() accessed
by the user space. At the time read() asks for the page, it will have
to be read from disk _again_. This is called readahead thrashing.

What the patch does is to automatically detect readahead thrashing and
shrink the readahead size adaptively, which will the reduce memory
consumption by readahead buffers.

Thanks,
Fengguang

> >
> > However there seems no much interest on that feature.. I can separate
> > that out and resubmit it standalone if necessary.
> >
> > Thanks,
> > Fengguang
> >
> 
> 
> 
> -- 
> Regards
> dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:13     ` readahead and oom Wu Fengguang
@ 2011-04-26  6:23       ` Dave Young
  0 siblings, 0 replies; 43+ messages in thread
From: Dave Young @ 2011-04-26  6:23 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: linux-mm, Linux Kernel Mailing List

On Tue, Apr 26, 2011 at 2:13 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Tue, Apr 26, 2011 at 02:05:12PM +0800, Dave Young wrote:
>> On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
>> >> Hi,
>> >>
>> >> When memory pressure is high, readahead could cause oom killing.
>> >> IMHO we should stop readaheading under such circumstances。If it's true
>> >> how to fix it?
>> >
>> > Good question. Before OOM there will be readahead thrashings, which
>> > can be addressed by this patch:
>> >
>> > http://lkml.org/lkml/2010/2/2/229
>>
>> Hi, I'm not clear about the patch, could be regard as below cases?
>> 1) readahead alloc fail due to low memory such as other large allocation
>> 2) readahead thrashing caused by itself
>
> When memory pressure goes up (not as much as allocation failures and OOM),
> the readahead pages may be reclaimed before they are read() accessed
> by the user space. At the time read() asks for the page, it will have
> to be read from disk _again_. This is called readahead thrashing.
>
> What the patch does is to automatically detect readahead thrashing and
> shrink the readahead size adaptively, which will the reduce memory
> consumption by readahead buffers.

Thanks for the explanation.

But still there's the question, if the allocation storm occurs when
system startup, the allocation is so quick that the detection of
thrashing is too late to avoid readahead. Is this possible?

>
> Thanks,
> Fengguang
>
>> >
>> > However there seems no much interest on that feature.. I can separate
>> > that out and resubmit it standalone if necessary.

Would like to test your new patch

>> >
>> > Thanks,
>> > Fengguang
>> >
>>
>>
>>
>> --
>> Regards
>> dave
>



-- 
Regards
dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:07     ` Dave Young
@ 2011-04-26  6:25       ` Wu Fengguang
  2011-04-26  6:29         ` Dave Young
  0 siblings, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-04-26  6:25 UTC (permalink / raw)
  To: Dave Young; +Cc: linux-mm, Linux Kernel Mailing List

On Tue, Apr 26, 2011 at 02:07:17PM +0800, Dave Young wrote:
> On Tue, Apr 26, 2011 at 2:05 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
> > On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
> >>> Hi,
> >>>
> >>> When memory pressure is high, readahead could cause oom killing.
> >>> IMHO we should stop readaheading under such circumstancesa??If it's true
> >>> how to fix it?
> >>
> >> Good question. Before OOM there will be readahead thrashings, which
> >> can be addressed by this patch:
> >>
> >> http://lkml.org/lkml/2010/2/2/229
> >
> > Hi, I'm not clear about the patch, could be regard as below cases?
> > 1) readahead alloc fail due to low memory such as other large allocation
> 
> For example vm balloon allocate lots of memory, then readahead could
> fail immediately and then oom

If true, that would be the problem of vm balloon. It's not good to
consume lots of memory all of a sudden, which will likely impact lots
of kernel subsystems.

btw readahead page allocations are completely optional. They are OK to
fail and in theory shall not trigger OOM on themselves. We may
consider passing __GFP_NORETRY for readahead page allocations.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:25       ` Wu Fengguang
@ 2011-04-26  6:29         ` Dave Young
  2011-04-26  6:34           ` Wu Fengguang
  0 siblings, 1 reply; 43+ messages in thread
From: Dave Young @ 2011-04-26  6:29 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: linux-mm, Linux Kernel Mailing List

On Tue, Apr 26, 2011 at 2:25 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Tue, Apr 26, 2011 at 02:07:17PM +0800, Dave Young wrote:
>> On Tue, Apr 26, 2011 at 2:05 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
>> > On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> >> On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
>> >>> Hi,
>> >>>
>> >>> When memory pressure is high, readahead could cause oom killing.
>> >>> IMHO we should stop readaheading under such circumstances。If it's true
>> >>> how to fix it?
>> >>
>> >> Good question. Before OOM there will be readahead thrashings, which
>> >> can be addressed by this patch:
>> >>
>> >> http://lkml.org/lkml/2010/2/2/229
>> >
>> > Hi, I'm not clear about the patch, could be regard as below cases?
>> > 1) readahead alloc fail due to low memory such as other large allocation
>>
>> For example vm balloon allocate lots of memory, then readahead could
>> fail immediately and then oom
>
> If true, that would be the problem of vm balloon. It's not good to
> consume lots of memory all of a sudden, which will likely impact lots
> of kernel subsystems.
>
> btw readahead page allocations are completely optional. They are OK to
> fail and in theory shall not trigger OOM on themselves. We may
> consider passing __GFP_NORETRY for readahead page allocations.

Good idea, care to submit a patch?

-- 
Regards
dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:29         ` Dave Young
@ 2011-04-26  6:34           ` Wu Fengguang
  2011-04-26  6:50             ` KOSAKI Motohiro
  2011-04-26  7:41             ` Minchan Kim
  0 siblings, 2 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-04-26  6:34 UTC (permalink / raw)
  To: Dave Young; +Cc: linux-mm, Linux Kernel Mailing List, Andrew Morton, Mel Gorman

On Tue, Apr 26, 2011 at 02:29:15PM +0800, Dave Young wrote:
> On Tue, Apr 26, 2011 at 2:25 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Tue, Apr 26, 2011 at 02:07:17PM +0800, Dave Young wrote:
> >> On Tue, Apr 26, 2011 at 2:05 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
> >> > On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> >> On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
> >> >>> Hi,
> >> >>>
> >> >>> When memory pressure is high, readahead could cause oom killing.
> >> >>> IMHO we should stop readaheading under such circumstancesa??If it's true
> >> >>> how to fix it?
> >> >>
> >> >> Good question. Before OOM there will be readahead thrashings, which
> >> >> can be addressed by this patch:
> >> >>
> >> >> http://lkml.org/lkml/2010/2/2/229
> >> >
> >> > Hi, I'm not clear about the patch, could be regard as below cases?
> >> > 1) readahead alloc fail due to low memory such as other large allocation
> >>
> >> For example vm balloon allocate lots of memory, then readahead could
> >> fail immediately and then oom
> >
> > If true, that would be the problem of vm balloon. It's not good to
> > consume lots of memory all of a sudden, which will likely impact lots
> > of kernel subsystems.
> >
> > btw readahead page allocations are completely optional. They are OK to
> > fail and in theory shall not trigger OOM on themselves. We may
> > consider passing __GFP_NORETRY for readahead page allocations.
> 
> Good idea, care to submit a patch?

Here it is :)

Thanks,
Fengguang
---
readahead: readahead page allocations is OK to fail

Pass __GFP_NORETRY for readahead page allocations.

readahead page allocations are completely optional. They are OK to
fail and in particular shall not trigger OOM on themselves.

Reported-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/pagemap.h |    5 +++++
 mm/readahead.c          |    2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

--- linux-next.orig/include/linux/pagemap.h	2011-04-26 14:27:46.000000000 +0800
+++ linux-next/include/linux/pagemap.h	2011-04-26 14:29:31.000000000 +0800
@@ -219,6 +219,11 @@ static inline struct page *page_cache_al
 	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
 }
 
+static inline struct page *page_cache_alloc_cold_noretry(struct address_space *x)
+{
+	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD|__GFP_NORETRY);
+}
+
 typedef int filler_t(void *, struct page *);
 
 extern struct page * find_get_page(struct address_space *mapping,
--- linux-next.orig/mm/readahead.c	2011-04-26 14:27:02.000000000 +0800
+++ linux-next/mm/readahead.c	2011-04-26 14:27:24.000000000 +0800
@@ -180,7 +180,7 @@ __do_page_cache_readahead(struct address
 		if (page)
 			continue;
 
-		page = page_cache_alloc_cold(mapping);
+		page = page_cache_alloc_cold_noretry(mapping);
 		if (!page)
 			break;
 		page->index = page_offset;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:34           ` Wu Fengguang
@ 2011-04-26  6:50             ` KOSAKI Motohiro
  2011-04-26  7:41             ` Minchan Kim
  1 sibling, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2011-04-26  6:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Dave Young, linux-mm, Linux Kernel Mailing List,
	Andrew Morton, Mel Gorman

> > > btw readahead page allocations are completely optional. They are OK to
> > > fail and in theory shall not trigger OOM on themselves. We may
> > > consider passing __GFP_NORETRY for readahead page allocations.
> > 
> > Good idea, care to submit a patch?
> 
> Here it is :)
> 
> Thanks,
> Fengguang
> ---
> readahead: readahead page allocations is OK to fail
> 
> Pass __GFP_NORETRY for readahead page allocations.
> 
> readahead page allocations are completely optional. They are OK to
> fail and in particular shall not trigger OOM on themselves.
> 
> Reported-by: Dave Young <hidave.darkstar@gmail.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  include/linux/pagemap.h |    5 +++++
>  mm/readahead.c          |    2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> --- linux-next.orig/include/linux/pagemap.h	2011-04-26 14:27:46.000000000 +0800
> +++ linux-next/include/linux/pagemap.h	2011-04-26 14:29:31.000000000 +0800
> @@ -219,6 +219,11 @@ static inline struct page *page_cache_al
>  	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
>  }
>  
> +static inline struct page *page_cache_alloc_cold_noretry(struct address_space *x)
> +{
> +	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD|__GFP_NORETRY);
> +}
> +
>  typedef int filler_t(void *, struct page *);
>  
>  extern struct page * find_get_page(struct address_space *mapping,
> --- linux-next.orig/mm/readahead.c	2011-04-26 14:27:02.000000000 +0800
> +++ linux-next/mm/readahead.c	2011-04-26 14:27:24.000000000 +0800
> @@ -180,7 +180,7 @@ __do_page_cache_readahead(struct address
>  		if (page)
>  			continue;
>  
> -		page = page_cache_alloc_cold(mapping);
> +		page = page_cache_alloc_cold_noretry(mapping);
>  		if (!page)
>  			break;
>  		page->index = page_offset;

I like this patch.

Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  6:34           ` Wu Fengguang
  2011-04-26  6:50             ` KOSAKI Motohiro
@ 2011-04-26  7:41             ` Minchan Kim
  2011-04-26  9:20               ` Wu Fengguang
  1 sibling, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-04-26  7:41 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Dave Young, linux-mm, Linux Kernel Mailing List, Andrew Morton,
	Mel Gorman

Hi Wu,

On Tue, Apr 26, 2011 at 3:34 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Tue, Apr 26, 2011 at 02:29:15PM +0800, Dave Young wrote:
>> On Tue, Apr 26, 2011 at 2:25 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > On Tue, Apr 26, 2011 at 02:07:17PM +0800, Dave Young wrote:
>> >> On Tue, Apr 26, 2011 at 2:05 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
>> >> > On Tue, Apr 26, 2011 at 1:55 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> >> >> On Tue, Apr 26, 2011 at 01:49:25PM +0800, Dave Young wrote:
>> >> >>> Hi,
>> >> >>>
>> >> >>> When memory pressure is high, readahead could cause oom killing.
>> >> >>> IMHO we should stop readaheading under such circumstances。If it's true
>> >> >>> how to fix it?
>> >> >>
>> >> >> Good question. Before OOM there will be readahead thrashings, which
>> >> >> can be addressed by this patch:
>> >> >>
>> >> >> http://lkml.org/lkml/2010/2/2/229
>> >> >
>> >> > Hi, I'm not clear about the patch, could be regard as below cases?
>> >> > 1) readahead alloc fail due to low memory such as other large allocation
>> >>
>> >> For example vm balloon allocate lots of memory, then readahead could
>> >> fail immediately and then oom
>> >
>> > If true, that would be the problem of vm balloon. It's not good to
>> > consume lots of memory all of a sudden, which will likely impact lots
>> > of kernel subsystems.
>> >
>> > btw readahead page allocations are completely optional. They are OK to
>> > fail and in theory shall not trigger OOM on themselves. We may
>> > consider passing __GFP_NORETRY for readahead page allocations.
>>
>> Good idea, care to submit a patch?
>
> Here it is :)
>
> Thanks,
> Fengguang
> ---
> readahead: readahead page allocations is OK to fail
>
> Pass __GFP_NORETRY for readahead page allocations.
>
> readahead page allocations are completely optional. They are OK to
> fail and in particular shall not trigger OOM on themselves.
>
> Reported-by: Dave Young <hidave.darkstar@gmail.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  include/linux/pagemap.h |    5 +++++
>  mm/readahead.c          |    2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> --- linux-next.orig/include/linux/pagemap.h     2011-04-26 14:27:46.000000000 +0800
> +++ linux-next/include/linux/pagemap.h  2011-04-26 14:29:31.000000000 +0800
> @@ -219,6 +219,11 @@ static inline struct page *page_cache_al
>        return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
>  }
>
> +static inline struct page *page_cache_alloc_cold_noretry(struct address_space *x)
> +{
> +       return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD|__GFP_NORETRY);

It makes sense to me but it could make a noise about page allocation
failure. I think it's not desirable.
How about adding __GFP_NOWARAN?


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  7:41             ` Minchan Kim
@ 2011-04-26  9:20               ` Wu Fengguang
  2011-04-26  9:28                 ` Minchan Kim
  2011-04-26 19:47                 ` Andrew Morton
  0 siblings, 2 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-04-26  9:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dave Young, linux-mm, Linux Kernel Mailing List, Andrew Morton,
	Mel Gorman

Minchan,

> > +static inline struct page *page_cache_alloc_cold_noretry(struct address_space *x)
> > +{
> > + A  A  A  return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD|__GFP_NORETRY);
> 
> It makes sense to me but it could make a noise about page allocation
> failure. I think it's not desirable.
> How about adding __GFP_NOWARAN?

Yeah it makes sense. Here is the new version.

Thanks,
Fengguang
---
Subject: readahead: readahead page allocations is OK to fail
Date: Tue Apr 26 14:29:40 CST 2011

Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.

readahead page allocations are completely optional. They are OK to
fail and in particular shall not trigger OOM on themselves.

Reported-by: Dave Young <hidave.darkstar@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/pagemap.h |    6 ++++++
 mm/readahead.c          |    2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

--- linux-next.orig/include/linux/pagemap.h	2011-04-26 14:27:46.000000000 +0800
+++ linux-next/include/linux/pagemap.h	2011-04-26 17:17:13.000000000 +0800
@@ -219,6 +219,12 @@ static inline struct page *page_cache_al
 	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
 }
 
+static inline struct page *page_cache_alloc_readahead(struct address_space *x)
+{
+	return __page_cache_alloc(mapping_gfp_mask(x) |
+				  __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN);
+}
+
 typedef int filler_t(void *, struct page *);
 
 extern struct page * find_get_page(struct address_space *mapping,
--- linux-next.orig/mm/readahead.c	2011-04-26 14:27:02.000000000 +0800
+++ linux-next/mm/readahead.c	2011-04-26 17:17:25.000000000 +0800
@@ -180,7 +180,7 @@ __do_page_cache_readahead(struct address
 		if (page)
 			continue;
 
-		page = page_cache_alloc_cold(mapping);
+		page = page_cache_alloc_readahead(mapping);
 		if (!page)
 			break;
 		page->index = page_offset;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  9:20               ` Wu Fengguang
@ 2011-04-26  9:28                 ` Minchan Kim
  2011-04-26 10:18                   ` Pekka Enberg
  2011-04-26 19:47                 ` Andrew Morton
  1 sibling, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-04-26  9:28 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Dave Young, linux-mm, Linux Kernel Mailing List, Andrew Morton,
	Mel Gorman

On Tue, Apr 26, 2011 at 6:20 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Minchan,
>
>> > +static inline struct page *page_cache_alloc_cold_noretry(struct address_space *x)
>> > +{
>> > +       return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD|__GFP_NORETRY);
>>
>> It makes sense to me but it could make a noise about page allocation
>> failure. I think it's not desirable.
>> How about adding __GFP_NOWARAN?
>
> Yeah it makes sense. Here is the new version.
>
> Thanks,
> Fengguang
> ---
> Subject: readahead: readahead page allocations is OK to fail
> Date: Tue Apr 26 14:29:40 CST 2011
>
> Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.
>
> readahead page allocations are completely optional. They are OK to
> fail and in particular shall not trigger OOM on themselves.
>
> Reported-by: Dave Young <hidave.darkstar@gmail.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] mm: readahead page allocations are OK to fail
  2011-04-26  5:49 readahead and oom Dave Young
  2011-04-26  5:55 ` Wu Fengguang
@ 2011-04-26  9:37 ` Wu Fengguang
  1 sibling, 0 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-04-26  9:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Young, linux-mm, Linux Kernel Mailing List, Minchan Kim,
	KOSAKI Motohiro, Mel Gorman

Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.

readahead page allocations are completely optional. They are OK to
fail and in particular shall not trigger OOM on themselves.

Reported-by: Dave Young <hidave.darkstar@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/pagemap.h |    6 ++++++
 mm/readahead.c          |    2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

--- linux-next.orig/include/linux/pagemap.h	2011-04-26 14:27:46.000000000 +0800
+++ linux-next/include/linux/pagemap.h	2011-04-26 17:17:13.000000000 +0800
@@ -219,6 +219,12 @@ static inline struct page *page_cache_al
 	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
 }
 
+static inline struct page *page_cache_alloc_readahead(struct address_space *x)
+{
+	return __page_cache_alloc(mapping_gfp_mask(x) |
+				  __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN);
+}
+
 typedef int filler_t(void *, struct page *);
 
 extern struct page * find_get_page(struct address_space *mapping,
--- linux-next.orig/mm/readahead.c	2011-04-26 14:27:02.000000000 +0800
+++ linux-next/mm/readahead.c	2011-04-26 17:17:25.000000000 +0800
@@ -180,7 +180,7 @@ __do_page_cache_readahead(struct address
 		if (page)
 			continue;
 
-		page = page_cache_alloc_cold(mapping);
+		page = page_cache_alloc_readahead(mapping);
 		if (!page)
 			break;
 		page->index = page_offset;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  9:28                 ` Minchan Kim
@ 2011-04-26 10:18                   ` Pekka Enberg
  0 siblings, 0 replies; 43+ messages in thread
From: Pekka Enberg @ 2011-04-26 10:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Wu Fengguang, Dave Young, linux-mm, Linux Kernel Mailing List,
	Andrew Morton, Mel Gorman

On Tue, Apr 26, 2011 at 12:28 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 6:20 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> Minchan,
>>
>>> > +static inline struct page *page_cache_alloc_cold_noretry(struct address_space *x)
>>> > +{
>>> > +       return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD|__GFP_NORETRY);
>>>
>>> It makes sense to me but it could make a noise about page allocation
>>> failure. I think it's not desirable.
>>> How about adding __GFP_NOWARAN?
>>
>> Yeah it makes sense. Here is the new version.
>>
>> Thanks,
>> Fengguang
>> ---
>> Subject: readahead: readahead page allocations is OK to fail
>> Date: Tue Apr 26 14:29:40 CST 2011
>>
>> Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.
>>
>> readahead page allocations are completely optional. They are OK to
>> fail and in particular shall not trigger OOM on themselves.
>>
>> Reported-by: Dave Young <hidave.darkstar@gmail.com>
>> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: Pekka Enberg <penberg@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26  9:20               ` Wu Fengguang
  2011-04-26  9:28                 ` Minchan Kim
@ 2011-04-26 19:47                 ` Andrew Morton
  2011-04-28  4:19                   ` Wu Fengguang
  2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
  1 sibling, 2 replies; 43+ messages in thread
From: Andrew Morton @ 2011-04-26 19:47 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Dave Young, linux-mm, Linux Kernel Mailing List,
	Mel Gorman

On Tue, 26 Apr 2011 17:20:29 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.
> 
> readahead page allocations are completely optional. They are OK to
> fail and in particular shall not trigger OOM on themselves.

I have distinct recollections of trying this many years ago, finding
that it caused problems then deciding not to do it.  But I can't find
an email trail and I don't remember the reasons :(

If the system is so stressed for memory that the oom-killer might get
involved then the readahead pages may well be getting reclaimed before
the application actually gets to use them.  But that's just an aside.

Ho hum.  The patch *seems* good (as it did 5-10 years ago ;)) but there
may be surprising side-effects which could be exposed under heavy
testing.  Testing which I'm sure hasn't been performed...


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: readahead and oom
  2011-04-26 19:47                 ` Andrew Morton
@ 2011-04-28  4:19                   ` Wu Fengguang
  2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
  1 sibling, 0 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-04-28  4:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Dave Young, linux-mm, Linux Kernel Mailing List,
	Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 3278 bytes --]

On Wed, Apr 27, 2011 at 03:47:43AM +0800, Andrew Morton wrote:
> On Tue, 26 Apr 2011 17:20:29 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.
> > 
> > readahead page allocations are completely optional. They are OK to
> > fail and in particular shall not trigger OOM on themselves.
> 
> I have distinct recollections of trying this many years ago, finding
> that it caused problems then deciding not to do it.  But I can't find
> an email trail and I don't remember the reasons :(

The most possible reason can be page allocation failures even if there
are plenty of _global_ reclaimable pages.

> If the system is so stressed for memory that the oom-killer might get
> involved then the readahead pages may well be getting reclaimed before
> the application actually gets to use them.  But that's just an aside.

Yes, when direct reclaim is working as expected, readahead thrashing
should happen long before NORETRY page allocation failures and OOM.

With that assumption I think it's OK to do this patch.  As for
readahead, sporadic allocation failures are acceptable. But there is a
problem, see below.

> Ho hum.  The patch *seems* good (as it did 5-10 years ago ;)) but there
> may be surprising side-effects which could be exposed under heavy
> testing.  Testing which I'm sure hasn't been performed...

The NORETRY direct reclaim does tend to fail a lot more on concurrent
reclaims, where one task's reclaimed pages can be stoled by others
before it's able to get it.

        __alloc_pages_direct_reclaim()
        {
                did_some_progress = try_to_free_pages();

                // pages stolen by others

                page = get_page_from_freelist();
        }

Here are the tests to demonstrate this problem.

Out of 1000GB reads and page allocations,

        test-ra-thrash.sh: read 1000 1G files interleaved in 1 single task:

        nr_alloc_fail 733

        test-dd-sparse.sh: read 1000 1G files concurrently in 1000 tasks:

        nr_alloc_fail 11799


Thanks,
Fengguang
---

--- linux-next.orig/include/linux/mmzone.h	2011-04-27 21:58:27.000000000 +0800
+++ linux-next/include/linux/mmzone.h	2011-04-27 21:58:39.000000000 +0800
@@ -106,6 +106,7 @@ enum zone_stat_item {
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
+	NR_ALLOC_FAIL,
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
--- linux-next.orig/mm/page_alloc.c	2011-04-27 21:58:27.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-04-27 21:58:39.000000000 +0800
@@ -2176,6 +2176,8 @@ rebalance:
 	}
 
 nopage:
+	inc_zone_state(preferred_zone, NR_ALLOC_FAIL);
+	/* count_zone_vm_events(PGALLOCFAIL, preferred_zone, 1 << order); */
 	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
 		unsigned int filter = SHOW_MEM_FILTER_NODES;
 
--- linux-next.orig/mm/vmstat.c	2011-04-27 21:58:27.000000000 +0800
+++ linux-next/mm/vmstat.c	2011-04-27 21:58:53.000000000 +0800
@@ -879,6 +879,7 @@ static const char * const vmstat_text[] 
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_written",
+	"nr_alloc_fail",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",

[-- Attachment #2: test-dd-sparse.sh --]
[-- Type: application/x-sh, Size: 135 bytes --]

[-- Attachment #3: test-ra-thrash.sh --]
[-- Type: application/x-sh, Size: 124 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-04-26 19:47                 ` Andrew Morton
  2011-04-28  4:19                   ` Wu Fengguang
@ 2011-04-28 13:36                   ` Wu Fengguang
  2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
                                       ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-04-28 13:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Dave Young, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

Concurrent page allocations are suffering from high failure rates.

On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
the page allocation failures are

nr_alloc_fail 733 	# interleaved reads by 1 single task
nr_alloc_fail 11799	# concurrent reads by 1000 tasks

The concurrent read test script is:

	for i in `seq 1000`
	do
		truncate -s 1G /fs/sparse-$i
		dd if=/fs/sparse-$i of=/dev/null &
	done

In order for get_page_from_freelist() to get free page,

(1) try_to_free_pages() should use much higher .nr_to_reclaim than the
    current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
    possible low watermark state as well as fill the pcp with enough free
    pages to overflow its high watermark.

(2) the get_page_from_freelist() _after_ direct reclaim should use lower
    watermark than its normal invocations, so that it can reasonably
    "reserve" some free pages for itself and prevent other concurrent
    page allocators stealing all its reclaimed pages.

Some notes:

- commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
  reclaim allocation fails") has the same target, however is obviously
  costly and less effective. It seems more clean to just remove the
  retry and drain code than to retain it.

- it's a bit hacky to reclaim more than requested pages inside
  do_try_to_free_page(), and it won't help cgroup for now

- it only aims to reduce failures when there are plenty of reclaimable
  pages, so it stops the opportunistic reclaim when scanned 2 times pages

Test results:

- the failure rate is pretty sensible to the page reclaim size,
  from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)

- the IPIs are reduced by over 100 times

base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
-------------------------------------------------------------------------------
nr_alloc_fail 10496
allocstall 1576602

slabs_scanned 21632
kswapd_steal 4393382
kswapd_inodesteal 124
kswapd_low_wmark_hit_quickly 885
kswapd_high_wmark_hit_quickly 2321
kswapd_skip_congestion_wait 0
pageoutrun 29426

CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts

LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
TLB:        189         15         13         17         64        294         97         63   TLB shootdowns

patched (WMARK_MIN)
-------------------
nr_alloc_fail 704
allocstall 105551

slabs_scanned 33280
kswapd_steal 4525537
kswapd_inodesteal 187
kswapd_low_wmark_hit_quickly 4980
kswapd_high_wmark_hit_quickly 2573
kswapd_skip_congestion_wait 0
pageoutrun 35429

CAL:         93        286        396        754        272        297        275        281   Function call interrupts

LOC:     520550     517751     517043     522016     520302     518479     519329     517179   Local timer interrupts
RES:       2131       1371       1376       1269       1390       1181       1409       1280   Rescheduling interrupts
TLB:        280         26         27         30         65        305        134         75   TLB shootdowns

patched (WMARK_HIGH)
--------------------
nr_alloc_fail 282
allocstall 53860

slabs_scanned 23936
kswapd_steal 4561178
kswapd_inodesteal 0
kswapd_low_wmark_hit_quickly 2760
kswapd_high_wmark_hit_quickly 1748
kswapd_skip_congestion_wait 0
pageoutrun 32639

CAL:         93        463        410        540        298        282        272        306   Function call interrupts

LOC:     513956     510749     509890     514897     514300     512392     512825     510574   Local timer interrupts
RES:       1174       2081       1411       1320       1742       2683       1380       1230   Rescheduling interrupts
TLB:        274         21         19         22         57        317        131         61   TLB shootdowns

this patch (WMARK_HIGH, limited scan)
-------------------------------------
nr_alloc_fail 276
allocstall 54034

slabs_scanned 24320
kswapd_steal 4507482
kswapd_inodesteal 262
kswapd_low_wmark_hit_quickly 2638
kswapd_high_wmark_hit_quickly 1710
kswapd_skip_congestion_wait 0
pageoutrun 32182

CAL:         69        443        421        567        273        279        269        334   Function call interrupts

LOC:     514736     511698     510993     514069     514185     512986     513838     511229   Local timer interrupts
RES:       2153       1556       1126       1351       3047       1554       1131       1560   Rescheduling interrupts
TLB:        209         26         20         15         71        315        117         71   TLB shootdowns

CC: Mel Gorman <mel@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page_alloc.c |   17 +++--------------
 mm/vmscan.c     |    6 ++++++
 2 files changed, 9 insertions(+), 14 deletions(-)
--- linux-next.orig/mm/vmscan.c	2011-04-28 21:16:16.000000000 +0800
+++ linux-next/mm/vmscan.c	2011-04-28 21:28:57.000000000 +0800
@@ -1978,6 +1978,8 @@ static void shrink_zones(int priority, s
 				continue;
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;	/* Let kswapd poll it */
+			sc->nr_to_reclaim = max(sc->nr_to_reclaim,
+						zone->watermark[WMARK_HIGH]);
 		}
 
 		shrink_zone(priority, zone, sc);
@@ -2034,6 +2036,7 @@ static unsigned long do_try_to_free_page
 	struct zoneref *z;
 	struct zone *zone;
 	unsigned long writeback_threshold;
+	unsigned long min_reclaim = sc->nr_to_reclaim;
 
 	get_mems_allowed();
 	delayacct_freepages_start();
@@ -2067,6 +2070,9 @@ static unsigned long do_try_to_free_page
 			}
 		}
 		total_scanned += sc->nr_scanned;
+		if (sc->nr_reclaimed >= min_reclaim &&
+		    total_scanned > 2 * sc->nr_to_reclaim)
+			goto out;
 		if (sc->nr_reclaimed >= sc->nr_to_reclaim)
 			goto out;
 
--- linux-next.orig/mm/page_alloc.c	2011-04-28 21:16:16.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-04-28 21:16:18.000000000 +0800
@@ -1888,9 +1888,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
 	int migratetype, unsigned long *did_some_progress)
 {
-	struct page *page = NULL;
+	struct page *page;
 	struct reclaim_state reclaim_state;
-	bool drained = false;
 
 	cond_resched();
 
@@ -1912,22 +1911,12 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	if (unlikely(!(*did_some_progress)))
 		return NULL;
 
-retry:
+	alloc_flags |= ALLOC_HARDER;
+
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
 					alloc_flags, preferred_zone,
 					migratetype);
-
-	/*
-	 * If an allocation failed after direct reclaim, it could be because
-	 * pages are pinned on the per-cpu lists. Drain them and try again
-	 */
-	if (!page && !drained) {
-		drain_all_pages();
-		drained = true;
-		goto retry;
-	}
-
 	return page;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch] vmstat: account page allocation failures
  2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
@ 2011-04-28 13:38                     ` Wu Fengguang
  2011-04-28 13:50                       ` KOSAKI Motohiro
  2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
  2011-05-04  1:56                     ` Dave Young
  2 siblings, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-04-28 13:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Dave Young, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

Just for your reference.

It seems not necessary given that page allocation failure rate is no long high.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/mmzone.h |    1 +
 mm/page_alloc.c        |    2 ++
 mm/vmstat.c            |    1 +
 3 files changed, 4 insertions(+)

--- linux-next.orig/include/linux/mmzone.h	2011-04-28 21:34:30.000000000 +0800
+++ linux-next/include/linux/mmzone.h	2011-04-28 21:34:35.000000000 +0800
@@ -106,6 +106,7 @@ enum zone_stat_item {
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
+	NR_ALLOC_FAIL,
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
--- linux-next.orig/mm/page_alloc.c	2011-04-28 21:34:34.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-04-28 21:34:35.000000000 +0800
@@ -2165,6 +2165,8 @@ rebalance:
 	}
 
 nopage:
+	inc_zone_state(preferred_zone, NR_ALLOC_FAIL);
+	/* count_zone_vm_events(PGALLOCFAIL, preferred_zone, 1 << order); */
 	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
 		unsigned int filter = SHOW_MEM_FILTER_NODES;
 
--- linux-next.orig/mm/vmstat.c	2011-04-28 21:34:30.000000000 +0800
+++ linux-next/mm/vmstat.c	2011-04-28 21:34:35.000000000 +0800
@@ -879,6 +879,7 @@ static const char * const vmstat_text[] 
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_written",
+	"nr_alloc_fail",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] vmstat: account page allocation failures
  2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
@ 2011-04-28 13:50                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2011-04-28 13:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Andrew Morton, Minchan Kim, Dave Young, linux-mm,
	Linux Kernel Mailing List, Mel Gorman, KAMEZAWA Hiroyuki,
	Christoph Lameter, Dave Chinner, David Rientjes

>  nopage:
> +	inc_zone_state(preferred_zone, NR_ALLOC_FAIL);
> +	/* count_zone_vm_events(PGALLOCFAIL, preferred_zone, 1 << order); */
>  	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
>  		unsigned int filter = SHOW_MEM_FILTER_NODES;
>  
> --- linux-next.orig/mm/vmstat.c	2011-04-28 21:34:30.000000000 +0800
> +++ linux-next/mm/vmstat.c	2011-04-28 21:34:35.000000000 +0800
> @@ -879,6 +879,7 @@ static const char * const vmstat_text[] 
>  	"nr_shmem",
>  	"nr_dirtied",
>  	"nr_written",
> +	"nr_alloc_fail",

I'm using very similar patch for debugging. However, this is useless for
admins because typical linux load have plenty GFP_ATOMIC allocation failure.
So, typical user have no way that failure rate is high or not.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
  2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
@ 2011-04-29  2:28                     ` Wu Fengguang
  2011-04-29  2:58                       ` Wu Fengguang
  2011-04-30 14:17                       ` Wu Fengguang
  2011-05-04  1:56                     ` Dave Young
  2 siblings, 2 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-04-29  2:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Dave Young, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

> Test results:
> 
> - the failure rate is pretty sensible to the page reclaim size,
>   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
> 
> - the IPIs are reduced by over 100 times

It's reduced by 500 times indeed.

CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
CAL:         93        463        410        540        298        282        272        306   Function call interrupts

> base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> -------------------------------------------------------------------------------
> nr_alloc_fail 10496
> allocstall 1576602

> patched (WMARK_MIN)
> -------------------
> nr_alloc_fail 704
> allocstall 105551

> patched (WMARK_HIGH)
> --------------------
> nr_alloc_fail 282
> allocstall 53860

> this patch (WMARK_HIGH, limited scan)
> -------------------------------------
> nr_alloc_fail 276
> allocstall 54034

There is a bad side effect though: the much reduced "allocstall" means
each direct reclaim will take much more time to complete. A simple solution
is to terminate direct reclaim after 10ms. I noticed that an 100ms
time threshold can reduce the reclaim latency from 621ms to 358ms.
Further lowering the time threshold to 20ms does not help reducing the
real latencies though.

However the very subjective perception is, in such heavy 1000-dd
workload, the reduced reclaim latency hardly improves the overall
responsiveness.

base kernel
-----------

start time: 243
total time: 529

wfg@fat ~% getdelays -dip 3971
print delayacct stats ON
printing IO accounting
PID     3971


CPU             count     real total  virtual total    delay total
                  961     3176517096     3158468847   313952766099
IO              count    delay total  delay average
                    2      181251847             60ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1205    38120615476             31ms
dd: read=16384, write=0, cancelled_write=0
wfg@fat ~% getdelays -dip 3383
print delayacct stats ON
printing IO accounting
PID     3383


CPU             count     real total  virtual total    delay total
                 1270     4206360536     4181445838   358641985177
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1606    39897314399             24ms
dd: read=0, write=0, cancelled_write=0

no time limit
-------------
wfg@fat ~% getdelays -dip `pidof dd`
print delayacct stats ON
printing IO accounting
PID     9609


CPU             count     real total  virtual total    delay total
                  865     2792575464     2779071029   235345541230
IO              count    delay total  delay average
                    4      300247552             60ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                   32    20504634169            621ms
dd: read=106496, write=0, cancelled_write=0

100ms limit
-----------

start time: 288
total time: 514
nr_alloc_fail 1269
allocstall 128915

wfg@fat ~% getdelays -dip `pidof dd`
print delayacct stats ON
printing IO accounting
PID     5077


CPU             count     real total  virtual total    delay total
                  937     2949551600     2935087806   207877301298
IO              count    delay total  delay average
                    1      151891691            151ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                   71    25475514278            358ms
dd: read=507904, write=0, cancelled_write=0

PID     5101


CPU             count     real total  virtual total    delay total
                 1201     3827418144     3805399187   221075772599
IO              count    delay total  delay average
                    4      300331997             60ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                   94    31996779648            336ms
dd: read=618496, write=0, cancelled_write=0

nr_alloc_fail 937
allocstall 128684

slabs_scanned 63616
kswapd_steal 4616011
kswapd_inodesteal 5
kswapd_low_wmark_hit_quickly 5394
kswapd_high_wmark_hit_quickly 2826
kswapd_skip_congestion_wait 0
pageoutrun 36679

20ms limit
----------

start time: 294
total time: 516
nr_alloc_fail 1662
allocstall 132101

CPU             count     real total  virtual total    delay total
                  839     2750581848     2734464704   198489159459
IO              count    delay total  delay average
                    1       43566814             43ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                   95    35234061367            370ms
dd: read=20480, write=0, cancelled_write=0

test script
-----------
tic=$(date +'%s')

for i in `seq 1000`
do
        truncate -s 1G /fs/sparse-$i
        dd if=/fs/sparse-$i of=/dev/null &>/dev/null &
done

tac=$(date +'%s')
echo start time: $((tac-tic))

wait

tac=$(date +'%s')
echo total time: $((tac-tic))

egrep '(nr_alloc_fail|allocstall)' /proc/vmstat

Thanks,
Fengguang
---
Subject: mm: limit direct reclaim delays
Date: Fri Apr 29 09:04:11 CST 2011

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

--- linux-next.orig/mm/vmscan.c	2011-04-29 09:02:42.000000000 +0800
+++ linux-next/mm/vmscan.c	2011-04-29 09:04:10.000000000 +0800
@@ -2037,6 +2037,7 @@ static unsigned long do_try_to_free_page
 	struct zone *zone;
 	unsigned long writeback_threshold;
 	unsigned long min_reclaim = sc->nr_to_reclaim;
+	unsigned long start_time = jiffies;
 
 	get_mems_allowed();
 	delayacct_freepages_start();
@@ -2070,11 +2071,14 @@ static unsigned long do_try_to_free_page
 			}
 		}
 		total_scanned += sc->nr_scanned;
-		if (sc->nr_reclaimed >= min_reclaim &&
-		    total_scanned > 2 * sc->nr_to_reclaim)
-			goto out;
-		if (sc->nr_reclaimed >= sc->nr_to_reclaim)
-			goto out;
+		if (sc->nr_reclaimed >= min_reclaim) {
+			if (sc->nr_reclaimed >= sc->nr_to_reclaim)
+				goto out;
+			if (total_scanned > 2 * sc->nr_to_reclaim)
+				goto out;
+			if (jiffies - start_time > HZ / 100)
+				goto out;
+		}
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
@ 2011-04-29  2:58                       ` Wu Fengguang
  2011-04-30 14:17                       ` Wu Fengguang
  1 sibling, 0 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-04-29  2:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Dave Young, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

Andrew,

I tested the more realistic 100 dd case. The results are

- nr_alloc_fail: 892 => 146
- reclaim delay: 4ms => 68ms

Thanks,
Fengguang
---

base kernel, 100 dd
-------------------

start time: 3
total time: 52
nr_alloc_fail 892
allocstall 131341

2nd run (no reboot):

start time: 3
total time: 53
nr_alloc_fail 1555
allocstall 265718


CPU             count     real total  virtual total    delay total
                  962     3125524848     3113269116    37972729582
IO              count    delay total  delay average
                    3       25204838              8ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average    
                 1032     5130797747              4ms  

(IPIs accumulated in two runs)
CAL:      34898      35428      35182      35553      35320      35291      35298      35102   Function call interrupts

10ms limit, 100 dd
------------------

start time: 2
total time: 50
nr_alloc_fail 146
allocstall 10598

CPU             count     real total  virtual total    delay total
                 1038     3349490800     3331087137    40156395960
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                   84     5795410854             68ms
dd: read=0, write=0, cancelled_write=0

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
  2011-04-29  2:58                       ` Wu Fengguang
@ 2011-04-30 14:17                       ` Wu Fengguang
  2011-05-01 16:35                         ` Minchan Kim
  1 sibling, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-04-30 14:17 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Minchan Kim, Dave Young, linux-mm, Linux Kernel Mailing List,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

On Fri, Apr 29, 2011 at 10:28:24AM +0800, Wu Fengguang wrote:
> > Test results:
> > 
> > - the failure rate is pretty sensible to the page reclaim size,
> >   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
> > 
> > - the IPIs are reduced by over 100 times
> 
> It's reduced by 500 times indeed.
> 
> CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
> CAL:         93        463        410        540        298        282        272        306   Function call interrupts
> 
> > base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> > -------------------------------------------------------------------------------
> > nr_alloc_fail 10496
> > allocstall 1576602
> 
> > patched (WMARK_MIN)
> > -------------------
> > nr_alloc_fail 704
> > allocstall 105551
> 
> > patched (WMARK_HIGH)
> > --------------------
> > nr_alloc_fail 282
> > allocstall 53860
> 
> > this patch (WMARK_HIGH, limited scan)
> > -------------------------------------
> > nr_alloc_fail 276
> > allocstall 54034
> 
> There is a bad side effect though: the much reduced "allocstall" means
> each direct reclaim will take much more time to complete. A simple solution
> is to terminate direct reclaim after 10ms. I noticed that an 100ms
> time threshold can reduce the reclaim latency from 621ms to 358ms.
> Further lowering the time threshold to 20ms does not help reducing the
> real latencies though.

Experiments going on...

I tried the more reasonable terminate condition: stop direct reclaim
when the preferred zone is above high watermark (see the below chunk).

This helps reduce the average reclaim latency to under 100ms in the
1000-dd case.

However nr_alloc_fail is around 5000 and not ideal. The interesting
thing is, even if zone watermark is high, the task still may fail to
get a free page..

@@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
                        }
                }
                total_scanned += sc->nr_scanned;
-               if (sc->nr_reclaimed >= sc->nr_to_reclaim)
-                       goto out;
+               if (sc->nr_reclaimed >= min_reclaim) {
+                       if (sc->nr_reclaimed >= sc->nr_to_reclaim)
+                               goto out;
+                       if (total_scanned > 2 * sc->nr_to_reclaim)
+                               goto out;
+                       if (preferred_zone &&
+                           zone_watermark_ok_safe(preferred_zone, sc->order,
+                                       high_wmark_pages(preferred_zone),
+                                       zone_idx(preferred_zone), 0))
+                               goto out;
+               }
               
                /*
                 * Try to write back as many pages as we just scanned.  This

Thanks,
Fengguang
---
Subject: mm: cut down __GFP_NORETRY page allocation failures
Date: Thu Apr 28 13:46:39 CST 2011

Concurrent page allocations are suffering from high failure rates.

On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
the page allocation failures are

nr_alloc_fail 733 	# interleaved reads by 1 single task
nr_alloc_fail 11799	# concurrent reads by 1000 tasks

The concurrent read test script is:

	for i in `seq 1000`
	do
		truncate -s 1G /fs/sparse-$i
		dd if=/fs/sparse-$i of=/dev/null &
	done

In order for get_page_from_freelist() to get free page,

(1) try_to_free_pages() should use much higher .nr_to_reclaim than the
    current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
    possible low watermark state as well as fill the pcp with enough free
    pages to overflow its high watermark.

(2) the get_page_from_freelist() _after_ direct reclaim should use lower
    watermark than its normal invocations, so that it can reasonably
    "reserve" some free pages for itself and prevent other concurrent
    page allocators stealing all its reclaimed pages.

Some notes:

- commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
  reclaim allocation fails") has the same target, however is obviously
  costly and less effective. It seems more clean to just remove the
  retry and drain code than to retain it.

- it's a bit hacky to reclaim more than requested pages inside
  do_try_to_free_page(), and it won't help cgroup for now

- it only aims to reduce failures when there are plenty of reclaimable
  pages, so it stops the opportunistic reclaim when scanned 2 times pages

Test results:

- the failure rate is pretty sensible to the page reclaim size,
  from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)

- the IPIs are reduced by over 100 times

base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
-------------------------------------------------------------------------------
nr_alloc_fail 10496
allocstall 1576602

slabs_scanned 21632
kswapd_steal 4393382
kswapd_inodesteal 124
kswapd_low_wmark_hit_quickly 885
kswapd_high_wmark_hit_quickly 2321
kswapd_skip_congestion_wait 0
pageoutrun 29426

CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts

LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
TLB:        189         15         13         17         64        294         97         63   TLB shootdowns

patched (WMARK_MIN)
-------------------
nr_alloc_fail 704
allocstall 105551

slabs_scanned 33280
kswapd_steal 4525537
kswapd_inodesteal 187
kswapd_low_wmark_hit_quickly 4980
kswapd_high_wmark_hit_quickly 2573
kswapd_skip_congestion_wait 0
pageoutrun 35429

CAL:         93        286        396        754        272        297        275        281   Function call interrupts

LOC:     520550     517751     517043     522016     520302     518479     519329     517179   Local timer interrupts
RES:       2131       1371       1376       1269       1390       1181       1409       1280   Rescheduling interrupts
TLB:        280         26         27         30         65        305        134         75   TLB shootdowns

patched (WMARK_HIGH)
--------------------
nr_alloc_fail 282
allocstall 53860

slabs_scanned 23936
kswapd_steal 4561178
kswapd_inodesteal 0
kswapd_low_wmark_hit_quickly 2760
kswapd_high_wmark_hit_quickly 1748
kswapd_skip_congestion_wait 0
pageoutrun 32639

CAL:         93        463        410        540        298        282        272        306   Function call interrupts

LOC:     513956     510749     509890     514897     514300     512392     512825     510574   Local timer interrupts
RES:       1174       2081       1411       1320       1742       2683       1380       1230   Rescheduling interrupts
TLB:        274         21         19         22         57        317        131         61   TLB shootdowns

patched (WMARK_HIGH, limited scan)
----------------------------------
nr_alloc_fail 276
allocstall 54034

slabs_scanned 24320
kswapd_steal 4507482
kswapd_inodesteal 262
kswapd_low_wmark_hit_quickly 2638
kswapd_high_wmark_hit_quickly 1710
kswapd_skip_congestion_wait 0
pageoutrun 32182

CAL:         69        443        421        567        273        279        269        334   Function call interrupts

LOC:     514736     511698     510993     514069     514185     512986     513838     511229   Local timer interrupts
RES:       2153       1556       1126       1351       3047       1554       1131       1560   Rescheduling interrupts
TLB:        209         26         20         15         71        315        117         71   TLB shootdowns

patched (WMARK_HIGH, limited scan, stop on watermark OK), 100 dd
----------------------------------------------------------------

start time: 3
total time: 50
nr_alloc_fail 162
allocstall 45523

CPU             count     real total  virtual total    delay total
                  921     3024540200     3009244668    37123129525
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  357     4891766796             13ms
dd: read=0, write=0, cancelled_write=0

patched (WMARK_HIGH, limited scan, stop on watermark OK), 1000 dd
-----------------------------------------------------------------

start time: 272
total time: 509
nr_alloc_fail 3913
allocstall 541789

CPU             count     real total  virtual total    delay total
                 1044     3445476208     3437200482   229919915202
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  452    34691441605             76ms
dd: read=0, write=0, cancelled_write=0

patched (WMARK_HIGH, limited scan, stop on watermark OK, no time limit), 1000 dd
--------------------------------------------------------------------------------

start time: 278
total time: 513
nr_alloc_fail 4737
allocstall 436392


CPU             count     real total  virtual total    delay total
                 1024     3371487456     3359441487   225088210977
IO              count    delay total  delay average
                    1      160631171            160ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  367    30809994722             83ms
dd: read=20480, write=0, cancelled_write=0


no cond_resched():

start time: 263
total time: 516
nr_alloc_fail 5144
allocstall 436787

CPU             count     real total  virtual total    delay total
                 1018     3305497488     3283831119   241982934044
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  328    31398481378             95ms
dd: read=0, write=0, cancelled_write=0

zone_watermark_ok_safe():

start time: 266
total time: 513
nr_alloc_fail 4526
allocstall 440246

CPU             count     real total  virtual total    delay total
                 1119     3640446568     3619184439   240945024724
IO              count    delay total  delay average
                    3      303620082            101ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  372    27320731898             73ms
dd: read=77824, write=0, cancelled_write=0


start time: 275
total time: 517
nr_alloc_fail 4694
allocstall 431021


CPU             count     real total  virtual total    delay total
                 1073     3534462680     3512544928   234056498221
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  386    34751778363             89ms
dd: read=0, write=0, cancelled_write=0

CC: Mel Gorman <mel@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/buffer.c          |    4 ++--
 include/linux/swap.h |    3 ++-
 mm/page_alloc.c      |   20 +++++---------------
 mm/vmscan.c          |   31 +++++++++++++++++++++++--------
 4 files changed, 32 insertions(+), 26 deletions(-)
--- linux-next.orig/mm/vmscan.c	2011-04-29 10:42:14.000000000 +0800
+++ linux-next/mm/vmscan.c	2011-04-30 21:59:33.000000000 +0800
@@ -2025,8 +2025,9 @@ static bool all_unreclaimable(struct zon
  * returns:	0, if no pages reclaimed
  * 		else, the number of pages reclaimed
  */
-static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
-					struct scan_control *sc)
+static unsigned long do_try_to_free_pages(struct zone *preferred_zone,
+					  struct zonelist *zonelist,
+					  struct scan_control *sc)
 {
 	int priority;
 	unsigned long total_scanned = 0;
@@ -2034,6 +2035,7 @@ static unsigned long do_try_to_free_page
 	struct zoneref *z;
 	struct zone *zone;
 	unsigned long writeback_threshold;
+	unsigned long min_reclaim = sc->nr_to_reclaim;
 
 	get_mems_allowed();
 	delayacct_freepages_start();
@@ -2041,6 +2043,9 @@ static unsigned long do_try_to_free_page
 	if (scanning_global_lru(sc))
 		count_vm_event(ALLOCSTALL);
 
+	if (preferred_zone)
+		sc->nr_to_reclaim += preferred_zone->watermark[WMARK_HIGH];
+
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		sc->nr_scanned = 0;
 		if (!priority)
@@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
 			}
 		}
 		total_scanned += sc->nr_scanned;
-		if (sc->nr_reclaimed >= sc->nr_to_reclaim)
-			goto out;
+		if (sc->nr_reclaimed >= min_reclaim) {
+			if (sc->nr_reclaimed >= sc->nr_to_reclaim)
+				goto out;
+			if (total_scanned > 2 * sc->nr_to_reclaim)
+				goto out;
+			if (preferred_zone &&
+			    zone_watermark_ok_safe(preferred_zone, sc->order,
+					high_wmark_pages(preferred_zone),
+					zone_idx(preferred_zone), 0))
+				goto out;
+		}
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This
@@ -2117,7 +2131,8 @@ out:
 	return 0;
 }
 
-unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
+unsigned long try_to_free_pages(struct zone *preferred_zone,
+				struct zonelist *zonelist, int order,
 				gfp_t gfp_mask, nodemask_t *nodemask)
 {
 	unsigned long nr_reclaimed;
@@ -2137,7 +2152,7 @@ unsigned long try_to_free_pages(struct z
 				sc.may_writepage,
 				gfp_mask);
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(preferred_zone, zonelist, &sc);
 
 	trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
 
@@ -2207,7 +2222,7 @@ unsigned long try_to_free_mem_cgroup_pag
 					    sc.may_writepage,
 					    sc.gfp_mask);
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(NULL, zonelist, &sc);
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 
@@ -2796,7 +2811,7 @@ unsigned long shrink_all_memory(unsigned
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(NULL, zonelist, &sc);
 
 	p->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
--- linux-next.orig/mm/page_alloc.c	2011-04-29 10:42:15.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-04-30 21:29:40.000000000 +0800
@@ -1888,9 +1888,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
 	int migratetype, unsigned long *did_some_progress)
 {
-	struct page *page = NULL;
+	struct page *page;
 	struct reclaim_state reclaim_state;
-	bool drained = false;
 
 	cond_resched();
 
@@ -1901,7 +1900,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
-	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
+	*did_some_progress = try_to_free_pages(preferred_zone, zonelist, order,
+					       gfp_mask, nodemask);
 
 	current->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
@@ -1912,22 +1912,12 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	if (unlikely(!(*did_some_progress)))
 		return NULL;
 
-retry:
+	alloc_flags |= ALLOC_HARDER;
+
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
 					alloc_flags, preferred_zone,
 					migratetype);
-
-	/*
-	 * If an allocation failed after direct reclaim, it could be because
-	 * pages are pinned on the per-cpu lists. Drain them and try again
-	 */
-	if (!page && !drained) {
-		drain_all_pages();
-		drained = true;
-		goto retry;
-	}
-
 	return page;
 }
 
--- linux-next.orig/fs/buffer.c	2011-04-30 13:26:57.000000000 +0800
+++ linux-next/fs/buffer.c	2011-04-30 13:29:08.000000000 +0800
@@ -288,8 +288,8 @@ static void free_more_memory(void)
 						gfp_zone(GFP_NOFS), NULL,
 						&zone);
 		if (zone)
-			try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
-						GFP_NOFS, NULL);
+			try_to_free_pages(zone, node_zonelist(nid, GFP_NOFS),
+					  0, GFP_NOFS, NULL);
 	}
 }
 
--- linux-next.orig/include/linux/swap.h	2011-04-30 13:30:36.000000000 +0800
+++ linux-next/include/linux/swap.h	2011-04-30 13:31:03.000000000 +0800
@@ -249,7 +249,8 @@ static inline void lru_cache_add_file(st
 #define ISOLATE_BOTH 2		/* Isolate both active and inactive pages. */
 
 /* linux/mm/vmscan.c */
-extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
+extern unsigned long try_to_free_pages(struct zone *preferred_zone,
+					struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-04-30 14:17                       ` Wu Fengguang
@ 2011-05-01 16:35                         ` Minchan Kim
  2011-05-01 16:37                           ` Minchan Kim
                                             ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Minchan Kim @ 2011-05-01 16:35 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes

Hi Wu, 

On Sat, Apr 30, 2011 at 10:17:41PM +0800, Wu Fengguang wrote:
> On Fri, Apr 29, 2011 at 10:28:24AM +0800, Wu Fengguang wrote:
> > > Test results:
> > > 
> > > - the failure rate is pretty sensible to the page reclaim size,
> > >   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
> > > 
> > > - the IPIs are reduced by over 100 times
> > 
> > It's reduced by 500 times indeed.
> > 
> > CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
> > CAL:         93        463        410        540        298        282        272        306   Function call interrupts
> > 
> > > base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> > > -------------------------------------------------------------------------------
> > > nr_alloc_fail 10496
> > > allocstall 1576602
> > 
> > > patched (WMARK_MIN)
> > > -------------------
> > > nr_alloc_fail 704
> > > allocstall 105551
> > 
> > > patched (WMARK_HIGH)
> > > --------------------
> > > nr_alloc_fail 282
> > > allocstall 53860
> > 
> > > this patch (WMARK_HIGH, limited scan)
> > > -------------------------------------
> > > nr_alloc_fail 276
> > > allocstall 54034
> > 
> > There is a bad side effect though: the much reduced "allocstall" means
> > each direct reclaim will take much more time to complete. A simple solution
> > is to terminate direct reclaim after 10ms. I noticed that an 100ms
> > time threshold can reduce the reclaim latency from 621ms to 358ms.
> > Further lowering the time threshold to 20ms does not help reducing the
> > real latencies though.
> 
> Experiments going on...
> 
> I tried the more reasonable terminate condition: stop direct reclaim
> when the preferred zone is above high watermark (see the below chunk).
> 
> This helps reduce the average reclaim latency to under 100ms in the
> 1000-dd case.
> 
> However nr_alloc_fail is around 5000 and not ideal. The interesting
> thing is, even if zone watermark is high, the task still may fail to
> get a free page..
> 
> @@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
>                         }
>                 }
>                 total_scanned += sc->nr_scanned;
> -               if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> -                       goto out;
> +               if (sc->nr_reclaimed >= min_reclaim) {
> +                       if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> +                               goto out;
> +                       if (total_scanned > 2 * sc->nr_to_reclaim)
> +                               goto out;
> +                       if (preferred_zone &&
> +                           zone_watermark_ok_safe(preferred_zone, sc->order,
> +                                       high_wmark_pages(preferred_zone),
> +                                       zone_idx(preferred_zone), 0))
> +                               goto out;
> +               }
>                
>                 /*
>                  * Try to write back as many pages as we just scanned.  This
> 
> Thanks,
> Fengguang
> ---
> Subject: mm: cut down __GFP_NORETRY page allocation failures
> Date: Thu Apr 28 13:46:39 CST 2011
> 
> Concurrent page allocations are suffering from high failure rates.
> 
> On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
> the page allocation failures are
> 
> nr_alloc_fail 733 	# interleaved reads by 1 single task
> nr_alloc_fail 11799	# concurrent reads by 1000 tasks
> 
> The concurrent read test script is:
> 
> 	for i in `seq 1000`
> 	do
> 		truncate -s 1G /fs/sparse-$i
> 		dd if=/fs/sparse-$i of=/dev/null &
> 	done
> 
> In order for get_page_from_freelist() to get free page,
> 
> (1) try_to_free_pages() should use much higher .nr_to_reclaim than the
>     current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
>     possible low watermark state as well as fill the pcp with enough free
>     pages to overflow its high watermark.
> 
> (2) the get_page_from_freelist() _after_ direct reclaim should use lower
>     watermark than its normal invocations, so that it can reasonably
>     "reserve" some free pages for itself and prevent other concurrent
>     page allocators stealing all its reclaimed pages.

Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
http://marc.info/?l=linux-mm&m=129187231129887&w=4
The idea is to keep a page at leat for direct reclaimed process.
Could it mitigate your problem or could you enhacne the idea?
I think it's very simple and fair solution.

> 
> Some notes:
> 
> - commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
>   reclaim allocation fails") has the same target, however is obviously
>   costly and less effective. It seems more clean to just remove the
>   retry and drain code than to retain it.

Tend to agree.
My old patch can solve it, I think.

> 
> - it's a bit hacky to reclaim more than requested pages inside
>   do_try_to_free_page(), and it won't help cgroup for now
> 
> - it only aims to reduce failures when there are plenty of reclaimable
>   pages, so it stops the opportunistic reclaim when scanned 2 times pages
> 
> Test results:
> 
> - the failure rate is pretty sensible to the page reclaim size,
>   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
> 
> - the IPIs are reduced by over 100 times
> 
> base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> -------------------------------------------------------------------------------
> nr_alloc_fail 10496
> allocstall 1576602
> 
> slabs_scanned 21632
> kswapd_steal 4393382
> kswapd_inodesteal 124
> kswapd_low_wmark_hit_quickly 885
> kswapd_high_wmark_hit_quickly 2321
> kswapd_skip_congestion_wait 0
> pageoutrun 29426
> 
> CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
> 
> LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
> RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
> TLB:        189         15         13         17         64        294         97         63   TLB shootdowns
> 
> patched (WMARK_MIN)
> -------------------
> nr_alloc_fail 704
> allocstall 105551
> 
> slabs_scanned 33280
> kswapd_steal 4525537
> kswapd_inodesteal 187
> kswapd_low_wmark_hit_quickly 4980
> kswapd_high_wmark_hit_quickly 2573
> kswapd_skip_congestion_wait 0
> pageoutrun 35429
> 
> CAL:         93        286        396        754        272        297        275        281   Function call interrupts
> 
> LOC:     520550     517751     517043     522016     520302     518479     519329     517179   Local timer interrupts
> RES:       2131       1371       1376       1269       1390       1181       1409       1280   Rescheduling interrupts
> TLB:        280         26         27         30         65        305        134         75   TLB shootdowns
> 
> patched (WMARK_HIGH)
> --------------------
> nr_alloc_fail 282
> allocstall 53860
> 
> slabs_scanned 23936
> kswapd_steal 4561178
> kswapd_inodesteal 0
> kswapd_low_wmark_hit_quickly 2760
> kswapd_high_wmark_hit_quickly 1748
> kswapd_skip_congestion_wait 0
> pageoutrun 32639
> 
> CAL:         93        463        410        540        298        282        272        306   Function call interrupts
> 
> LOC:     513956     510749     509890     514897     514300     512392     512825     510574   Local timer interrupts
> RES:       1174       2081       1411       1320       1742       2683       1380       1230   Rescheduling interrupts
> TLB:        274         21         19         22         57        317        131         61   TLB shootdowns
> 
> patched (WMARK_HIGH, limited scan)
> ----------------------------------
> nr_alloc_fail 276
> allocstall 54034
> 
> slabs_scanned 24320
> kswapd_steal 4507482
> kswapd_inodesteal 262
> kswapd_low_wmark_hit_quickly 2638
> kswapd_high_wmark_hit_quickly 1710
> kswapd_skip_congestion_wait 0
> pageoutrun 32182
> 
> CAL:         69        443        421        567        273        279        269        334   Function call interrupts

Looks amazing.

> 
> LOC:     514736     511698     510993     514069     514185     512986     513838     511229   Local timer interrupts
> RES:       2153       1556       1126       1351       3047       1554       1131       1560   Rescheduling interrupts
> TLB:        209         26         20         15         71        315        117         71   TLB shootdowns
> 
> patched (WMARK_HIGH, limited scan, stop on watermark OK), 100 dd
> ----------------------------------------------------------------
> 
> start time: 3
> total time: 50
> nr_alloc_fail 162
> allocstall 45523
> 
> CPU             count     real total  virtual total    delay total
>                   921     3024540200     3009244668    37123129525
> IO              count    delay total  delay average
>                     0              0              0ms
> SWAP            count    delay total  delay average
>                     0              0              0ms
> RECLAIM         count    delay total  delay average
>                   357     4891766796             13ms
> dd: read=0, write=0, cancelled_write=0
> 
> patched (WMARK_HIGH, limited scan, stop on watermark OK), 1000 dd
> -----------------------------------------------------------------
> 
> start time: 272
> total time: 509
> nr_alloc_fail 3913
> allocstall 541789
> 
> CPU             count     real total  virtual total    delay total
>                  1044     3445476208     3437200482   229919915202
> IO              count    delay total  delay average
>                     0              0              0ms
> SWAP            count    delay total  delay average
>                     0              0              0ms
> RECLAIM         count    delay total  delay average
>                   452    34691441605             76ms
> dd: read=0, write=0, cancelled_write=0
> 
> patched (WMARK_HIGH, limited scan, stop on watermark OK, no time limit), 1000 dd
> --------------------------------------------------------------------------------
> 
> start time: 278
> total time: 513
> nr_alloc_fail 4737
> allocstall 436392
> 
> 
> CPU             count     real total  virtual total    delay total
>                  1024     3371487456     3359441487   225088210977
> IO              count    delay total  delay average
>                     1      160631171            160ms
> SWAP            count    delay total  delay average
>                     0              0              0ms
> RECLAIM         count    delay total  delay average
>                   367    30809994722             83ms
> dd: read=20480, write=0, cancelled_write=0
> 
> 
> no cond_resched():

What's this?

> 
> start time: 263
> total time: 516
> nr_alloc_fail 5144
> allocstall 436787
> 
> CPU             count     real total  virtual total    delay total
>                  1018     3305497488     3283831119   241982934044
> IO              count    delay total  delay average
>                     0              0              0ms
> SWAP            count    delay total  delay average
>                     0              0              0ms
> RECLAIM         count    delay total  delay average
>                   328    31398481378             95ms
> dd: read=0, write=0, cancelled_write=0
> 
> zone_watermark_ok_safe():
> 
> start time: 266
> total time: 513
> nr_alloc_fail 4526
> allocstall 440246
> 
> CPU             count     real total  virtual total    delay total
>                  1119     3640446568     3619184439   240945024724
> IO              count    delay total  delay average
>                     3      303620082            101ms
> SWAP            count    delay total  delay average
>                     0              0              0ms
> RECLAIM         count    delay total  delay average
>                   372    27320731898             73ms
> dd: read=77824, write=0, cancelled_write=0
> 
> 
> start time: 275

What's meaing of start time?

> total time: 517

Total time is elapsed time on your experiment?

> nr_alloc_fail 4694
> allocstall 431021
> 
> 
> CPU             count     real total  virtual total    delay total
>                  1073     3534462680     3512544928   234056498221

What's meaning of CPU fields?

> IO              count    delay total  delay average
>                     0              0              0ms
> SWAP            count    delay total  delay average
>                     0              0              0ms
> RECLAIM         count    delay total  delay average
>                   386    34751778363             89ms
> dd: read=0, write=0, cancelled_write=0
> 

Where is vanilla data for comparing latency?
Personally, It's hard to parse your data.


> CC: Mel Gorman <mel@linux.vnet.ibm.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  fs/buffer.c          |    4 ++--
>  include/linux/swap.h |    3 ++-
>  mm/page_alloc.c      |   20 +++++---------------
>  mm/vmscan.c          |   31 +++++++++++++++++++++++--------
>  4 files changed, 32 insertions(+), 26 deletions(-)
> --- linux-next.orig/mm/vmscan.c	2011-04-29 10:42:14.000000000 +0800
> +++ linux-next/mm/vmscan.c	2011-04-30 21:59:33.000000000 +0800
> @@ -2025,8 +2025,9 @@ static bool all_unreclaimable(struct zon
>   * returns:	0, if no pages reclaimed
>   * 		else, the number of pages reclaimed
>   */
> -static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> -					struct scan_control *sc)
> +static unsigned long do_try_to_free_pages(struct zone *preferred_zone,
> +					  struct zonelist *zonelist,
> +					  struct scan_control *sc)
>  {
>  	int priority;
>  	unsigned long total_scanned = 0;
> @@ -2034,6 +2035,7 @@ static unsigned long do_try_to_free_page
>  	struct zoneref *z;
>  	struct zone *zone;
>  	unsigned long writeback_threshold;
> +	unsigned long min_reclaim = sc->nr_to_reclaim;

Hmm,

>  
>  	get_mems_allowed();
>  	delayacct_freepages_start();
> @@ -2041,6 +2043,9 @@ static unsigned long do_try_to_free_page
>  	if (scanning_global_lru(sc))
>  		count_vm_event(ALLOCSTALL);
>  
> +	if (preferred_zone)
> +		sc->nr_to_reclaim += preferred_zone->watermark[WMARK_HIGH];
> +

Hmm, I don't like this idea.
The goal of direct reclaim path is to reclaim pages asap, I beleive.
Many thing should be achieve of background kswapd.
If admin changes min_free_kbytes, it can affect latency of direct reclaim.
It doesn't make sense to me.


>  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
>  		sc->nr_scanned = 0;
>  		if (!priority)
> @@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
>  			}
>  		}
>  		total_scanned += sc->nr_scanned;
> -		if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> -			goto out;
> +		if (sc->nr_reclaimed >= min_reclaim) {
> +			if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> +				goto out;

I can't understand the logic.
if nr_reclaimed is bigger than min_reclaim, it's always greater than
nr_to_reclaim. What's meaning of min_reclaim?


> +			if (total_scanned > 2 * sc->nr_to_reclaim)
> +				goto out;

If there are lots of dirty pages in LRU?
If there are lots of unevictable pages in LRU?
If there are lots of mapped page in LRU but may_unmap = 0 cases?
I means it's rather risky early conclusion.


> +			if (preferred_zone &&
> +			    zone_watermark_ok_safe(preferred_zone, sc->order,
> +					high_wmark_pages(preferred_zone),
> +					zone_idx(preferred_zone), 0))
> +				goto out;
> +		}

As I said, I think direct reclaim path sould be fast if possbile and 
it should not a function of min_free_kbytes.
Of course, there are lots of tackle for keep direct reclaim path's consistent
latency but at least, I don't want to add another source.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-01 16:35                         ` Minchan Kim
@ 2011-05-01 16:37                           ` Minchan Kim
  2011-05-02 10:14                             ` KOSAKI Motohiro
  2011-05-02 10:29                           ` Wu Fengguang
  2011-05-02 13:29                           ` Wu Fengguang
  2 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-05-01 16:37 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes

On Mon, May 02, 2011 at 01:35:42AM +0900, Minchan Kim wrote:
 
> Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
                                     ^^^^^^^^^^^^^^^^
                              typo : wasn't complete
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-01 16:37                           ` Minchan Kim
@ 2011-05-02 10:14                             ` KOSAKI Motohiro
  2011-05-03  0:53                               ` Minchan Kim
  0 siblings, 1 reply; 43+ messages in thread
From: KOSAKI Motohiro @ 2011-05-02 10:14 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Wu Fengguang, Andrew Morton, Mel Gorman,
	Dave Young, linux-mm, Linux Kernel Mailing List,
	KAMEZAWA Hiroyuki, Christoph Lameter, Dave Chinner,
	David Rientjes

> On Mon, May 02, 2011 at 01:35:42AM +0900, Minchan Kim wrote:
>  
> > Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
>                                      ^^^^^^^^^^^^^^^^
>                               typo : wasn't complete

I think your idea is eligible. Wu's approach may increase throughput but
may decrease latency. So, do you have a plan to finish the work?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-01 16:35                         ` Minchan Kim
  2011-05-01 16:37                           ` Minchan Kim
@ 2011-05-02 10:29                           ` Wu Fengguang
  2011-05-02 11:08                             ` Wu Fengguang
  2011-05-03  0:49                             ` Minchan Kim
  2011-05-02 13:29                           ` Wu Fengguang
  2 siblings, 2 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-05-02 10:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes, Li Shaohua,
	Hugh Dickins

[-- Attachment #1: Type: text/plain, Size: 22546 bytes --]

Hi Minchan,

On Mon, May 02, 2011 at 12:35:42AM +0800, Minchan Kim wrote:
> Hi Wu,
> 
> On Sat, Apr 30, 2011 at 10:17:41PM +0800, Wu Fengguang wrote:
> > On Fri, Apr 29, 2011 at 10:28:24AM +0800, Wu Fengguang wrote:
> > > > Test results:
> > > >
> > > > - the failure rate is pretty sensible to the page reclaim size,
> > > >   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
> > > >
> > > > - the IPIs are reduced by over 100 times
> > >
> > > It's reduced by 500 times indeed.
> > >
> > > CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
> > > CAL:         93        463        410        540        298        282        272        306   Function call interrupts
> > >
> > > > base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> > > > -------------------------------------------------------------------------------
> > > > nr_alloc_fail 10496
> > > > allocstall 1576602
> > >
> > > > patched (WMARK_MIN)
> > > > -------------------
> > > > nr_alloc_fail 704
> > > > allocstall 105551
> > >
> > > > patched (WMARK_HIGH)
> > > > --------------------
> > > > nr_alloc_fail 282
> > > > allocstall 53860
> > >
> > > > this patch (WMARK_HIGH, limited scan)
> > > > -------------------------------------
> > > > nr_alloc_fail 276
> > > > allocstall 54034
> > >
> > > There is a bad side effect though: the much reduced "allocstall" means
> > > each direct reclaim will take much more time to complete. A simple solution
> > > is to terminate direct reclaim after 10ms. I noticed that an 100ms
> > > time threshold can reduce the reclaim latency from 621ms to 358ms.
> > > Further lowering the time threshold to 20ms does not help reducing the
> > > real latencies though.
> >
> > Experiments going on...
> >
> > I tried the more reasonable terminate condition: stop direct reclaim
> > when the preferred zone is above high watermark (see the below chunk).
> >
> > This helps reduce the average reclaim latency to under 100ms in the
> > 1000-dd case.
> >
> > However nr_alloc_fail is around 5000 and not ideal. The interesting
> > thing is, even if zone watermark is high, the task still may fail to
> > get a free page..
> >
> > @@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
> >                         }
> >                 }
> >                 total_scanned += sc->nr_scanned;
> > -               if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> > -                       goto out;
> > +               if (sc->nr_reclaimed >= min_reclaim) {
> > +                       if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> > +                               goto out;
> > +                       if (total_scanned > 2 * sc->nr_to_reclaim)
> > +                               goto out;
> > +                       if (preferred_zone &&
> > +                           zone_watermark_ok_safe(preferred_zone, sc->order,
> > +                                       high_wmark_pages(preferred_zone),
> > +                                       zone_idx(preferred_zone), 0))
> > +                               goto out;
> > +               }
> >
> >                 /*
> >                  * Try to write back as many pages as we just scanned.  This
> >
> > Thanks,
> > Fengguang
> > ---
> > Subject: mm: cut down __GFP_NORETRY page allocation failures
> > Date: Thu Apr 28 13:46:39 CST 2011
> >
> > Concurrent page allocations are suffering from high failure rates.
> >
> > On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
> > the page allocation failures are
> >
> > nr_alloc_fail 733     # interleaved reads by 1 single task
> > nr_alloc_fail 11799   # concurrent reads by 1000 tasks
> >
> > The concurrent read test script is:
> >
> >       for i in `seq 1000`
> >       do
> >               truncate -s 1G /fs/sparse-$i
> >               dd if=/fs/sparse-$i of=/dev/null &
> >       done
> >
> > In order for get_page_from_freelist() to get free page,
> >
> > (1) try_to_free_pages() should use much higher .nr_to_reclaim than the
> >     current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
> >     possible low watermark state as well as fill the pcp with enough free
> >     pages to overflow its high watermark.
> >
> > (2) the get_page_from_freelist() _after_ direct reclaim should use lower
> >     watermark than its normal invocations, so that it can reasonably
> >     "reserve" some free pages for itself and prevent other concurrent
> >     page allocators stealing all its reclaimed pages.
> 
> Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
> http://marc.info/?l=linux-mm&m=129187231129887&w=4
> The idea is to keep a page at leat for direct reclaimed process.
> Could it mitigate your problem or could you enhacne the idea?
> I think it's very simple and fair solution.

No it's not helping my problem, nr_alloc_fail and CAL are still high:

root@fat /home/wfg# ./test-dd-sparse.sh
start time: 246
total time: 531
nr_alloc_fail 14097
allocstall 1578332
LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
TLB:        381         27         22         19         96        404        111         67   TLB shootdowns

root@fat /home/wfg# getdelays -dip `pidof dd`
print delayacct stats ON
printing IO accounting
PID     5202


CPU             count     real total  virtual total    delay total
                 1132     3635447328     3627947550   276722091605
IO              count    delay total  delay average
                    2      187809974             62ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1334    35304580824             26ms
dd: read=278528, write=0, cancelled_write=0

I guess your patch is mainly fixing the high order allocations while
my workload is mainly order 0 readahead page allocations. There are
1000 forks, however the "start time: 246" seems to indicate that the
order-1 reclaim latency is not improved.

I'll try modifying your patch and see how it works out. The obvious
change is to apply it to the order-0 case. Hope this won't create much
more isolated pages.

Attached is your patch rebased to 2.6.39-rc3, after resolving some
merge conflicts and fixing a trivial NULL pointer bug.

> >
> > Some notes:
> >
> > - commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
> >   reclaim allocation fails") has the same target, however is obviously
> >   costly and less effective. It seems more clean to just remove the
> >   retry and drain code than to retain it.
> 
> Tend to agree.
> My old patch can solve it, I think.

Sadly nope. See above.

> >
> > - it's a bit hacky to reclaim more than requested pages inside
> >   do_try_to_free_page(), and it won't help cgroup for now
> >
> > - it only aims to reduce failures when there are plenty of reclaimable
> >   pages, so it stops the opportunistic reclaim when scanned 2 times pages
> >
> > Test results:
> >
> > - the failure rate is pretty sensible to the page reclaim size,
> >   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
> >
> > - the IPIs are reduced by over 100 times
> >
> > base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> > -------------------------------------------------------------------------------
> > nr_alloc_fail 10496
> > allocstall 1576602
> >
> > slabs_scanned 21632
> > kswapd_steal 4393382
> > kswapd_inodesteal 124
> > kswapd_low_wmark_hit_quickly 885
> > kswapd_high_wmark_hit_quickly 2321
> > kswapd_skip_congestion_wait 0
> > pageoutrun 29426
> >
> > CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
> >
> > LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
> > RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
> > TLB:        189         15         13         17         64        294         97         63   TLB shootdowns
> >
> > patched (WMARK_MIN)
> > -------------------
> > nr_alloc_fail 704
> > allocstall 105551
> >
> > slabs_scanned 33280
> > kswapd_steal 4525537
> > kswapd_inodesteal 187
> > kswapd_low_wmark_hit_quickly 4980
> > kswapd_high_wmark_hit_quickly 2573
> > kswapd_skip_congestion_wait 0
> > pageoutrun 35429
> >
> > CAL:         93        286        396        754        272        297        275        281   Function call interrupts
> >
> > LOC:     520550     517751     517043     522016     520302     518479     519329     517179   Local timer interrupts
> > RES:       2131       1371       1376       1269       1390       1181       1409       1280   Rescheduling interrupts
> > TLB:        280         26         27         30         65        305        134         75   TLB shootdowns
> >
> > patched (WMARK_HIGH)
> > --------------------
> > nr_alloc_fail 282
> > allocstall 53860
> >
> > slabs_scanned 23936
> > kswapd_steal 4561178
> > kswapd_inodesteal 0
> > kswapd_low_wmark_hit_quickly 2760
> > kswapd_high_wmark_hit_quickly 1748
> > kswapd_skip_congestion_wait 0
> > pageoutrun 32639
> >
> > CAL:         93        463        410        540        298        282        272        306   Function call interrupts
> >
> > LOC:     513956     510749     509890     514897     514300     512392     512825     510574   Local timer interrupts
> > RES:       1174       2081       1411       1320       1742       2683       1380       1230   Rescheduling interrupts
> > TLB:        274         21         19         22         57        317        131         61   TLB shootdowns
> >
> > patched (WMARK_HIGH, limited scan)
> > ----------------------------------
> > nr_alloc_fail 276
> > allocstall 54034
> >
> > slabs_scanned 24320
> > kswapd_steal 4507482
> > kswapd_inodesteal 262
> > kswapd_low_wmark_hit_quickly 2638
> > kswapd_high_wmark_hit_quickly 1710
> > kswapd_skip_congestion_wait 0
> > pageoutrun 32182
> >
> > CAL:         69        443        421        567        273        279        269        334   Function call interrupts
> 
> Looks amazing.

Yeah, I have strong feelings against drain_all_pages() in the direct
reclaim path. The intuition is, once drain_all_pages() is called, the 
later on direct reclaims will have less chance to fill the drained
buffers and therefore forced into drain_all_pages() again and again.

drain_all_pages() is probably an overkill for preventing OOM.
Generally speaking, it's questionable to "squeeze the last page before
OOM".

A typical desktop enters thrashing storms before OOM, as Hugh pointed
out, this may well not the end users wanted. I agree with him and
personally prefer some applications to be OOM killed rather than the
whole system goes unusable thrashing like mad.

> > LOC:     514736     511698     510993     514069     514185     512986     513838     511229   Local timer interrupts
> > RES:       2153       1556       1126       1351       3047       1554       1131       1560   Rescheduling interrupts
> > TLB:        209         26         20         15         71        315        117         71   TLB shootdowns
> >
> > patched (WMARK_HIGH, limited scan, stop on watermark OK), 100 dd
> > ----------------------------------------------------------------
> >
> > start time: 3
> > total time: 50
> > nr_alloc_fail 162
> > allocstall 45523
> >
> > CPU             count     real total  virtual total    delay total
> >                   921     3024540200     3009244668    37123129525
> > IO              count    delay total  delay average
> >                     0              0              0ms
> > SWAP            count    delay total  delay average
> >                     0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                   357     4891766796             13ms
> > dd: read=0, write=0, cancelled_write=0
> >
> > patched (WMARK_HIGH, limited scan, stop on watermark OK), 1000 dd
> > -----------------------------------------------------------------
> >
> > start time: 272
> > total time: 509
> > nr_alloc_fail 3913
> > allocstall 541789
> >
> > CPU             count     real total  virtual total    delay total
> >                  1044     3445476208     3437200482   229919915202
> > IO              count    delay total  delay average
> >                     0              0              0ms
> > SWAP            count    delay total  delay average
> >                     0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                   452    34691441605             76ms
> > dd: read=0, write=0, cancelled_write=0
> >
> > patched (WMARK_HIGH, limited scan, stop on watermark OK, no time limit), 1000 dd
> > --------------------------------------------------------------------------------
> >
> > start time: 278
> > total time: 513
> > nr_alloc_fail 4737
> > allocstall 436392
> >
> >
> > CPU             count     real total  virtual total    delay total
> >                  1024     3371487456     3359441487   225088210977
> > IO              count    delay total  delay average
> >                     1      160631171            160ms
> > SWAP            count    delay total  delay average
> >                     0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                   367    30809994722             83ms
> > dd: read=20480, write=0, cancelled_write=0
> >
> >
> > no cond_resched():
> 
> What's this?

I tried a modified patch that also removes the cond_resched() call in
__alloc_pages_direct_reclaim(), between try_to_free_pages() and
get_page_from_freelist(). It seems not helping noticeably.

It looks safe to remove that cond_resched() as we already have such
calls in shrink_page_list().

> >
> > start time: 263
> > total time: 516
> > nr_alloc_fail 5144
> > allocstall 436787
> >
> > CPU             count     real total  virtual total    delay total
> >                  1018     3305497488     3283831119   241982934044
> > IO              count    delay total  delay average
> >                     0              0              0ms
> > SWAP            count    delay total  delay average
> >                     0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                   328    31398481378             95ms
> > dd: read=0, write=0, cancelled_write=0
> >
> > zone_watermark_ok_safe():
> >
> > start time: 266
> > total time: 513
> > nr_alloc_fail 4526
> > allocstall 440246
> >
> > CPU             count     real total  virtual total    delay total
> >                  1119     3640446568     3619184439   240945024724
> > IO              count    delay total  delay average
> >                     3      303620082            101ms
> > SWAP            count    delay total  delay average
> >                     0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                   372    27320731898             73ms
> > dd: read=77824, write=0, cancelled_write=0
> >

> > start time: 275
> 
> What's meaing of start time?

It's the time taken to start 1000 dd's.
 
> > total time: 517
> 
> Total time is elapsed time on your experiment?

Yeah. They are generated with this script.

$ cat ~/bin/test-dd-sparse.sh
 
#!/bin/sh

mount /dev/sda7 /fs

tic=$(date +'%s')

for i in `seq 1000`
do
	truncate -s 1G /fs/sparse-$i
	dd if=/fs/sparse-$i of=/dev/null &>/dev/null &
done

tac=$(date +'%s')
echo start time: $((tac-tic))

wait

tac=$(date +'%s')
echo total time: $((tac-tic))

egrep '(nr_alloc_fail|allocstall)' /proc/vmstat
egrep '(CAL|RES|LOC|TLB)' /proc/interrupts

> > nr_alloc_fail 4694
> > allocstall 431021
> >
> >
> > CPU             count     real total  virtual total    delay total
> >                  1073     3534462680     3512544928   234056498221
> 
> What's meaning of CPU fields?

It's "waiting for a CPU (while being runnable)" as described in
Documentation/accounting/delay-accounting.txt.

> > IO              count    delay total  delay average
> >                     0              0              0ms
> > SWAP            count    delay total  delay average
> >                     0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                   386    34751778363             89ms
> > dd: read=0, write=0, cancelled_write=0
> >
> 
> Where is vanilla data for comparing latency?
> Personally, It's hard to parse your data.

Sorry it's somehow too much data and kernel revisions.. The base kernel's
average latency is 29ms:

base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
-------------------------------------------------------------------------------

CPU             count     real total  virtual total    delay total
                 1122     3676441096     3656793547   274182127286
IO              count    delay total  delay average
                    3      291765493             97ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1350    39229752193             29ms
dd: read=45056, write=0, cancelled_write=0

start time: 245
total time: 526
nr_alloc_fail 14586
allocstall 1578343
LOC:     533981     529210     528283     532346     533392     531314     531705     528983   Local timer interrupts
RES:       3123       2177       1676       1580       2157       1974       1606       1696   Rescheduling interrupts
CAL:     218392     218631     219167     219217     218840     218985     218429     218440   Function call interrupts
TLB:        175         13         21         18         62        309        119         42   TLB shootdowns

> 
> > CC: Mel Gorman <mel@linux.vnet.ibm.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  fs/buffer.c          |    4 ++--
> >  include/linux/swap.h |    3 ++-
> >  mm/page_alloc.c      |   20 +++++---------------
> >  mm/vmscan.c          |   31 +++++++++++++++++++++++--------
> >  4 files changed, 32 insertions(+), 26 deletions(-)
> > --- linux-next.orig/mm/vmscan.c       2011-04-29 10:42:14.000000000 +0800
> > +++ linux-next/mm/vmscan.c    2011-04-30 21:59:33.000000000 +0800
> > @@ -2025,8 +2025,9 @@ static bool all_unreclaimable(struct zon
> >   * returns:  0, if no pages reclaimed
> >   *           else, the number of pages reclaimed
> >   */
> > -static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> > -                                     struct scan_control *sc)
> > +static unsigned long do_try_to_free_pages(struct zone *preferred_zone,
> > +                                       struct zonelist *zonelist,
> > +                                       struct scan_control *sc)
> >  {
> >       int priority;
> >       unsigned long total_scanned = 0;
> > @@ -2034,6 +2035,7 @@ static unsigned long do_try_to_free_page
> >       struct zoneref *z;
> >       struct zone *zone;
> >       unsigned long writeback_threshold;
> > +     unsigned long min_reclaim = sc->nr_to_reclaim;
> 
> Hmm,
> 
> >
> >       get_mems_allowed();
> >       delayacct_freepages_start();
> > @@ -2041,6 +2043,9 @@ static unsigned long do_try_to_free_page
> >       if (scanning_global_lru(sc))
> >               count_vm_event(ALLOCSTALL);
> >
> > +     if (preferred_zone)
> > +             sc->nr_to_reclaim += preferred_zone->watermark[WMARK_HIGH];
> > +
> 
> Hmm, I don't like this idea.
> The goal of direct reclaim path is to reclaim pages asap, I beleive.
> Many thing should be achieve of background kswapd.
> If admin changes min_free_kbytes, it can affect latency of direct reclaim.
> It doesn't make sense to me.

Yeah, it does increase delays.. in the 1000 dd case, roughly from 30ms
to 90ms. This is a major drawback.

> >       for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> >               sc->nr_scanned = 0;
> >               if (!priority)
> > @@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
> >                       }
> >               }
> >               total_scanned += sc->nr_scanned;
> > -             if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> > -                     goto out;
> > +             if (sc->nr_reclaimed >= min_reclaim) {
> > +                     if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> > +                             goto out;
> 
> I can't understand the logic.
> if nr_reclaimed is bigger than min_reclaim, it's always greater than
> nr_to_reclaim. What's meaning of min_reclaim?

In direct reclaim, min_reclaim will be the legacy SWAP_CLUSTER_MAX and
sc->nr_to_reclaim will be increased to the zone's high watermark and
is kind of "max to reclaim".

> 
> > +                     if (total_scanned > 2 * sc->nr_to_reclaim)
> > +                             goto out;
> 
> If there are lots of dirty pages in LRU?
> If there are lots of unevictable pages in LRU?
> If there are lots of mapped page in LRU but may_unmap = 0 cases?
> I means it's rather risky early conclusion.

That test means to avoid scanning too much on __GFP_NORETRY direct
reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
the LRU pages seem hard to reclaim. And the problem in the 1000 dd
case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
fails from time to time, with lots of IPIs that may hurt large
machines a lot.

> 
> > +                     if (preferred_zone &&
> > +                         zone_watermark_ok_safe(preferred_zone, sc->order,
> > +                                     high_wmark_pages(preferred_zone),
> > +                                     zone_idx(preferred_zone), 0))
> > +                             goto out;
> > +             }
> 
> As I said, I think direct reclaim path sould be fast if possbile and
> it should not a function of min_free_kbytes.

Right.

> Of course, there are lots of tackle for keep direct reclaim path's consistent
> latency but at least, I don't want to add another source.

OK.

Thanks,
Fengguang

[-- Attachment #2: mm-keep-freed-pages-in-direct-reclaim.patch --]
[-- Type: text/x-diff, Size: 6104 bytes --]

Subject: Keep freed pages in direct reclaim
Date: Thu, 9 Dec 2010 14:01:32 +0900

From: Minchan Kim <minchan.kim@gmail.com>

direct reclaimed process often sleep and race with other processes.
Although direct reclaim proceess requires high order pags(order > 0) and
reclaims page successfully, other processes which require order-0 page
could steal the high order page for direct reclaimed process.

After all, direct reclaimed process try it again and it still has a
possibility of above scenario. It can make bad effects following as

1. direct reclaimed process latency is big
2. eviction working set page due to lumpy reclaim
3. continue to wake up kswapd

This patch solves it.

Fengguang:
fix 
[ 1514.892933] BUG: unable to handle kernel
[ 1514.892958] ---[ end trace be7cb17861e1d25b ]---
[ 1514.893589] NULL pointer dereference at           (null)
[ 1514.893968] IP: [<ffffffff81101b2e>] shrink_page_list+0x3dc/0x501

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/buffer.c          |    2 +-
 include/linux/swap.h |    4 +++-
 mm/page_alloc.c      |   27 +++++++++++++++++++++++----
 mm/vmscan.c          |   23 +++++++++++++++++++----
 4 files changed, 46 insertions(+), 10 deletions(-)

--- linux-next.orig/fs/buffer.c	2011-05-02 10:34:06.000000000 +0800
+++ linux-next/fs/buffer.c	2011-05-02 10:45:24.000000000 +0800
@@ -289,7 +289,7 @@ static void free_more_memory(void)
 						&zone);
 		if (zone)
 			try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
-						GFP_NOFS, NULL);
+						GFP_NOFS, NULL, NULL);
 	}
 }
 
--- linux-next.orig/include/linux/swap.h	2011-05-02 10:34:06.000000000 +0800
+++ linux-next/include/linux/swap.h	2011-05-02 10:45:24.000000000 +0800
@@ -249,8 +249,10 @@ static inline void lru_cache_add_file(st
 #define ISOLATE_BOTH 2		/* Isolate both active and inactive pages. */
 
 /* linux/mm/vmscan.c */
+extern noinline_for_stack void free_page_list(struct list_head *free_pages);
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
-					gfp_t gfp_mask, nodemask_t *mask);
+					gfp_t gfp_mask, nodemask_t *mask,
+					struct list_head *freed_pages);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap,
 						  unsigned int swappiness);
--- linux-next.orig/mm/page_alloc.c	2011-05-02 10:34:06.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-05-02 10:45:26.000000000 +0800
@@ -1890,8 +1890,11 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 {
 	struct page *page = NULL;
 	struct reclaim_state reclaim_state;
+	bool high_order;
 	bool drained = false;
+	LIST_HEAD(freed_pages);
 
+	high_order = order ? true : false;
 	cond_resched();
 
 	/* We now go into synchronous reclaim */
@@ -1901,16 +1904,31 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
-	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
-
+	/*
+	 * If request is high order, keep the pages which are reclaimed
+	 * in own list for preventing the lose by other processes.
+	 */
+	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask,
+				nodemask, high_order ? &freed_pages : NULL);
 	current->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
 	current->flags &= ~PF_MEMALLOC;
 
+	if (high_order && !list_empty(&freed_pages)) {
+		free_page_list(&freed_pages);
+		drain_all_pages();
+		drained = true;
+		page = get_page_from_freelist(gfp_mask, nodemask, order,
+					zonelist, high_zoneidx,
+					alloc_flags, preferred_zone,
+					migratetype);
+		if (page)
+			goto out;
+	}
 	cond_resched();
 
 	if (unlikely(!(*did_some_progress)))
-		return NULL;
+		goto out;
 
 retry:
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
@@ -1927,7 +1945,8 @@ retry:
 		drained = true;
 		goto retry;
 	}
-
+out:
+	VM_BUG_ON(!list_empty(&freed_pages));
 	return page;
 }
 
--- linux-next.orig/mm/vmscan.c	2011-05-02 10:34:06.000000000 +0800
+++ linux-next/mm/vmscan.c	2011-05-02 10:46:31.000000000 +0800
@@ -112,6 +112,9 @@ struct scan_control {
 	 * are scanned.
 	 */
 	nodemask_t	*nodemask;
+
+	/* keep freed pages */
+	struct list_head *freed_pages;
 };
 
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -681,7 +684,7 @@ static enum page_references page_check_r
 	return PAGEREF_RECLAIM;
 }
 
-static noinline_for_stack void free_page_list(struct list_head *free_pages)
+noinline_for_stack void free_page_list(struct list_head *free_pages)
 {
 	struct pagevec freed_pvec;
 	struct page *page, *tmp;
@@ -712,6 +715,10 @@ static unsigned long shrink_page_list(st
 	unsigned long nr_dirty = 0;
 	unsigned long nr_congested = 0;
 	unsigned long nr_reclaimed = 0;
+	struct list_head *free_list = &free_pages;
+
+	if (sc->freed_pages)
+		free_list = sc->freed_pages;
 
 	cond_resched();
 
@@ -904,7 +911,7 @@ free_it:
 		 * Is there need to periodically free_page_list? It would
 		 * appear not as the counts should be low
 		 */
-		list_add(&page->lru, &free_pages);
+		list_add(&page->lru, free_list);
 		continue;
 
 cull_mlocked:
@@ -940,7 +947,13 @@ keep_lumpy:
 	if (nr_dirty == nr_congested && nr_dirty != 0)
 		zone_set_flag(zone, ZONE_CONGESTED);
 
-	free_page_list(&free_pages);
+	/*
+	 * If reclaim is direct path and high order, caller should
+	 * free reclaimed pages. It is for preventing reclaimed pages
+	 * lose by other processes.
+	 */
+	if (!sc->freed_pages)
+		free_page_list(&free_pages);
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
@@ -2118,7 +2131,8 @@ out:
 }
 
 unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
-				gfp_t gfp_mask, nodemask_t *nodemask)
+				gfp_t gfp_mask, nodemask_t *nodemask,
+				struct list_head *freed_pages)
 {
 	unsigned long nr_reclaimed;
 	struct scan_control sc = {
@@ -2131,6 +2145,7 @@ unsigned long try_to_free_pages(struct z
 		.order = order,
 		.mem_cgroup = NULL,
 		.nodemask = nodemask,
+		.freed_pages = freed_pages,
 	};
 
 	trace_mm_vmscan_direct_reclaim_begin(order,

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-02 10:29                           ` Wu Fengguang
@ 2011-05-02 11:08                             ` Wu Fengguang
  2011-05-03  0:49                             ` Minchan Kim
  1 sibling, 0 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-05-02 11:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes, Li Shaohua,
	Hugh Dickins

> > Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
> > http://marc.info/?l=linux-mm&m=129187231129887&w=4
> > The idea is to keep a page at leat for direct reclaimed process.
> > Could it mitigate your problem or could you enhacne the idea?
> > I think it's very simple and fair solution.
> 
> No it's not helping my problem, nr_alloc_fail and CAL are still high:
> 
> root@fat /home/wfg# ./test-dd-sparse.sh
> start time: 246
> total time: 531
> nr_alloc_fail 14097
> allocstall 1578332
> LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
> RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
> CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
> TLB:        381         27         22         19         96        404        111         67   TLB shootdowns
> 
> root@fat /home/wfg# getdelays -dip `pidof dd`
> print delayacct stats ON
> printing IO accounting
> PID     5202
> 
> 
> CPU             count     real total  virtual total    delay total
>                  1132     3635447328     3627947550   276722091605
> IO              count    delay total  delay average
>                     2      187809974             62ms
> SWAP            count    delay total  delay average
>                     0              0              0ms
> RECLAIM         count    delay total  delay average
>                  1334    35304580824             26ms
> dd: read=278528, write=0, cancelled_write=0
> 
> I guess your patch is mainly fixing the high order allocations while
> my workload is mainly order 0 readahead page allocations. There are
> 1000 forks, however the "start time: 246" seems to indicate that the
> order-1 reclaim latency is not improved.
> 
> I'll try modifying your patch and see how it works out. The obvious
> change is to apply it to the order-0 case. Hope this won't create much
> more isolated pages.

I tried the below modified patch, removing the high order test and the
drain_all_pages() call. The results are not idea either:

root@fat /home/wfg# ./test-dd-sparse.sh
start time: 246
total time: 526
nr_alloc_fail 15582
allocstall 1583727
LOC:     532518     528880     528184     533426     532765     530526     531177     528757   Local timer interrupts
RES:       2350       1929       1538       1430       3359       1547       1422       1502   Rescheduling interrupts
CAL:     200017     200384     200336     199763     200369     199776     199504     199407   Function call interrupts
TLB:        285         19         24         10        121        306        113         69   TLB shootdowns

CPU             count     real total  virtual total    delay total
                 1154     3767427264     3742671454   273770720370
IO              count    delay total  delay average
                    1      279795961            279ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1385    27228068276             19ms
dd: read=12288, write=0, cancelled_write=0

Thanks,
Fengguang
---
Subject: Keep freed pages in direct reclaim
Date: Thu, 9 Dec 2010 14:01:32 +0900

From: Minchan Kim <minchan.kim@gmail.com>

direct reclaimed process often sleep and race with other processes.
Although direct reclaim proceess requires high order pags(order > 0) and
reclaims page successfully, other processes which require order-0 page
could steal the high order page for direct reclaimed process.

After all, direct reclaimed process try it again and it still has a
possibility of above scenario. It can make bad effects following as

1. direct reclaimed process latency is big
2. eviction working set page due to lumpy reclaim
3. continue to wake up kswapd

This patch solves it.

Fengguang:
fix
[ 1514.892933] BUG: unable to handle kernel
[ 1514.892958] ---[ end trace be7cb17861e1d25b ]---
[ 1514.893589] NULL pointer dereference at           (null)
[ 1514.893968] IP: [<ffffffff81101b2e>] shrink_page_list+0x3dc/0x501

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/buffer.c          |    2 +-
 include/linux/swap.h |    4 +++-
 mm/page_alloc.c      |   25 +++++++++++++++++++++----
 mm/vmscan.c          |   23 +++++++++++++++++++----
 4 files changed, 44 insertions(+), 10 deletions(-)

--- linux-next.orig/fs/buffer.c	2011-05-02 17:18:01.000000000 +0800
+++ linux-next/fs/buffer.c	2011-05-02 18:30:17.000000000 +0800
@@ -289,7 +289,7 @@ static void free_more_memory(void)
 						&zone);
 		if (zone)
 			try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
-						GFP_NOFS, NULL);
+						GFP_NOFS, NULL, NULL);
 	}
 }
 
--- linux-next.orig/include/linux/swap.h	2011-05-02 17:18:01.000000000 +0800
+++ linux-next/include/linux/swap.h	2011-05-02 18:30:17.000000000 +0800
@@ -249,8 +249,10 @@ static inline void lru_cache_add_file(st
 #define ISOLATE_BOTH 2		/* Isolate both active and inactive pages. */
 
 /* linux/mm/vmscan.c */
+extern noinline_for_stack void free_page_list(struct list_head *free_pages);
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
-					gfp_t gfp_mask, nodemask_t *mask);
+					gfp_t gfp_mask, nodemask_t *mask,
+					struct list_head *freed_pages);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap,
 						  unsigned int swappiness);
--- linux-next.orig/mm/page_alloc.c	2011-05-02 17:18:01.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-05-02 18:31:30.000000000 +0800
@@ -1891,6 +1891,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	struct page *page = NULL;
 	struct reclaim_state reclaim_state;
 	bool drained = false;
+	LIST_HEAD(freed_pages);
 
 	cond_resched();
 
@@ -1901,16 +1902,31 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
-	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
-
+	/*
+	 * If request is high order, keep the pages which are reclaimed
+	 * in own list for preventing the lose by other processes.
+	 */
+	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask,
+				nodemask, &freed_pages);
 	current->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
 	current->flags &= ~PF_MEMALLOC;
 
+	if (!list_empty(&freed_pages)) {
+		free_page_list(&freed_pages);
+		/* drain_all_pages(); */
+		/* drained = true; */
+		page = get_page_from_freelist(gfp_mask, nodemask, order,
+					zonelist, high_zoneidx,
+					alloc_flags, preferred_zone,
+					migratetype);
+		if (page)
+			goto out;
+	}
 	cond_resched();
 
 	if (unlikely(!(*did_some_progress)))
-		return NULL;
+		goto out;
 
 retry:
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
@@ -1927,7 +1943,8 @@ retry:
 		drained = true;
 		goto retry;
 	}
-
+out:
+	VM_BUG_ON(!list_empty(&freed_pages));
 	return page;
 }
 
--- linux-next.orig/mm/vmscan.c	2011-05-02 17:18:01.000000000 +0800
+++ linux-next/mm/vmscan.c	2011-05-02 18:30:17.000000000 +0800
@@ -112,6 +112,9 @@ struct scan_control {
 	 * are scanned.
 	 */
 	nodemask_t	*nodemask;
+
+	/* keep freed pages */
+	struct list_head *freed_pages;
 };
 
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -681,7 +684,7 @@ static enum page_references page_check_r
 	return PAGEREF_RECLAIM;
 }
 
-static noinline_for_stack void free_page_list(struct list_head *free_pages)
+noinline_for_stack void free_page_list(struct list_head *free_pages)
 {
 	struct pagevec freed_pvec;
 	struct page *page, *tmp;
@@ -712,6 +715,10 @@ static unsigned long shrink_page_list(st
 	unsigned long nr_dirty = 0;
 	unsigned long nr_congested = 0;
 	unsigned long nr_reclaimed = 0;
+	struct list_head *free_list = &free_pages;
+
+	if (sc->freed_pages)
+		free_list = sc->freed_pages;
 
 	cond_resched();
 
@@ -904,7 +911,7 @@ free_it:
 		 * Is there need to periodically free_page_list? It would
 		 * appear not as the counts should be low
 		 */
-		list_add(&page->lru, &free_pages);
+		list_add(&page->lru, free_list);
 		continue;
 
 cull_mlocked:
@@ -940,7 +947,13 @@ keep_lumpy:
 	if (nr_dirty == nr_congested && nr_dirty != 0)
 		zone_set_flag(zone, ZONE_CONGESTED);
 
-	free_page_list(&free_pages);
+	/*
+	 * If reclaim is direct path and high order, caller should
+	 * free reclaimed pages. It is for preventing reclaimed pages
+	 * lose by other processes.
+	 */
+	if (!sc->freed_pages)
+		free_page_list(&free_pages);
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
@@ -2118,7 +2131,8 @@ out:
 }
 
 unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
-				gfp_t gfp_mask, nodemask_t *nodemask)
+				gfp_t gfp_mask, nodemask_t *nodemask,
+				struct list_head *freed_pages)
 {
 	unsigned long nr_reclaimed;
 	struct scan_control sc = {
@@ -2131,6 +2145,7 @@ unsigned long try_to_free_pages(struct z
 		.order = order,
 		.mem_cgroup = NULL,
 		.nodemask = nodemask,
+		.freed_pages = freed_pages,
 	};
 
 	trace_mm_vmscan_direct_reclaim_begin(order,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-01 16:35                         ` Minchan Kim
  2011-05-01 16:37                           ` Minchan Kim
  2011-05-02 10:29                           ` Wu Fengguang
@ 2011-05-02 13:29                           ` Wu Fengguang
  2011-05-02 13:49                             ` Wu Fengguang
  2 siblings, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-05-02 13:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes

> > +                     if (preferred_zone &&
> > +                         zone_watermark_ok_safe(preferred_zone, sc->order,
> > +                                     high_wmark_pages(preferred_zone),
> > +                                     zone_idx(preferred_zone), 0))
> > +                             goto out;
> > +             }
> 
> As I said, I think direct reclaim path sould be fast if possbile and
> it should not a function of min_free_kbytes.

It can be made not a function of min_free_kbytes by simply changing
high_wmark_pages() to low_wmark_pages() in the above chunk, since
direct reclaim is triggered when ALLOC_WMARK_LOW cannot be satisfied,
ie. it just dropped below low_wmark_pages().

But still, it costs 62ms reclaim latency (base kernel is 29ms).

Thanks,
Fengguang
---
Subject: mm: cut down __GFP_NORETRY page allocation failures
Date: Thu Apr 28 13:46:39 CST 2011

Concurrent page allocations are suffering from high failure rates.

On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
the page allocation failures are

nr_alloc_fail 733 	# interleaved reads by 1 single task
nr_alloc_fail 11799	# concurrent reads by 1000 tasks

The concurrent read test script is:

	for i in `seq 1000`
	do
		truncate -s 1G /fs/sparse-$i
		dd if=/fs/sparse-$i of=/dev/null &
	done

In order for get_page_from_freelist() to get free page,

(1) try_to_free_pages() should use much higher .nr_to_reclaim than the
    current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
    possible low watermark state 

(2) the get_page_from_freelist() _after_ direct reclaim should use lower
    watermark than its normal invocations, so that it can reasonably
    "reserve" some free pages for itself and prevent other concurrent
    page allocators stealing all its reclaimed pages.

Some notes:

- commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
  reclaim allocation fails") has the same target, however is obviously
  costly and less effective. It seems more clean to just remove the
  retry and drain code than to retain it.

- it's a bit hacky to reclaim more than requested pages inside
  do_try_to_free_page(), and it won't help cgroup for now

- it only aims to reduce failures when there are plenty of reclaimable
  pages, so it stops the opportunistic reclaim when scanned 2 times pages

Test results (1000 dd case):

- the failure rate is pretty sensible to the page reclaim size,
  from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 5004 (WMARK_HIGH, stop on low
  watermark ok) to 10496 (SWAP_CLUSTER_MAX)

- the IPIs are reduced by over 500 times

- the reclaim delay is doubled, from 29ms to 62ms

Base kernel is vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocations.

base kernel, 1000 dd
--------------------

start time: 245
total time: 526
nr_alloc_fail 14586
allocstall 1578343
LOC:     533981     529210     528283     532346     533392     531314     531705     528983   Local timer interrupts
RES:       3123       2177       1676       1580       2157       1974       1606       1696   Rescheduling interrupts
CAL:     218392     218631     219167     219217     218840     218985     218429     218440   Function call interrupts
TLB:        175         13         21         18         62        309        119         42   TLB shootdowns


CPU             count     real total  virtual total    delay total
                 1122     3676441096     3656793547   274182127286
IO              count    delay total  delay average
                    3      291765493             97ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1350    39229752193             29ms
dd: read=45056, write=0, cancelled_write=0

patched, 1000 dd
----------------

root@fat /home/wfg# ./test-dd-sparse.sh
start time: 260
total time: 519
nr_alloc_fail 5004
allocstall 551429
LOC:     524861     521832     520945     524632     524666     523334     523797     521562   Local timer interrupts
RES:       1323       1976       2505       1610       1544       1848       3310       1644   Rescheduling interrupts
CAL:         67        335        353        614        289        287        293        325   Function call interrupts
TLB:        288         29         26         34        103        321        123         70   TLB shootdowns

CPU             count     real total  virtual total    delay total
                 1177     3797422704     3775174301   253228435955
IO              count    delay total  delay average
                    1      198528820            198ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  508    31660219699             62ms

base kernel, 100 dd
-------------------
root@fat /home/wfg# ./test-dd-sparse.sh
start time: 3
total time: 53
nr_alloc_fail 849
allocstall 131330
LOC:      59843      56506      55838      65283      61774      57929      58880      56246   Local timer interrupts
RES:        376        308        372        239        374        307        491        239   Rescheduling interrupts
CAL:      17737      18083      17948      18192      17929      17845      17893      17906   Function call interrupts
TLB:        307         26         25         21         80        324        137         79   TLB shootdowns

CPU             count     real total  virtual total    delay total
                  974     3197513904     3180727460    38504429363
IO              count    delay total  delay average
                    1       18156696             18ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1036     3439387298              3ms
dd: read=12288, write=0, cancelled_write=0

patched, 100 dd
---------------

root@fat /home/wfg# ./test-dd-sparse.sh
start time: 3
total time: 52
nr_alloc_fail 307
allocstall 48178
LOC:      56486      53514      52792      55879      56317      55383      55311      53168   Local timer interrupts
RES:        604        345        257        250        775        371        272        252   Rescheduling interrupts
CAL:         75        373        369        543        272        278        295        296   Function call interrupts
TLB:        259         24         19         24         82        306        139         53   TLB shootdowns


CPU             count     real total  virtual total    delay total
                  974     3177516944     3161771347    38508053977
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  393     5389030889             13ms
dd: read=0, write=0, cancelled_write=0

CC: Mel Gorman <mel@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/buffer.c          |    4 ++--
 include/linux/swap.h |    3 ++-
 mm/page_alloc.c      |   22 +++++-----------------
 mm/vmscan.c          |   38 ++++++++++++++++++++++++++++++--------
 4 files changed, 39 insertions(+), 28 deletions(-)
--- linux-next.orig/mm/vmscan.c	2011-05-02 19:15:21.000000000 +0800
+++ linux-next/mm/vmscan.c	2011-05-02 19:47:05.000000000 +0800
@@ -2025,8 +2025,9 @@ static bool all_unreclaimable(struct zon
  * returns:	0, if no pages reclaimed
  * 		else, the number of pages reclaimed
  */
-static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
-					struct scan_control *sc)
+static unsigned long do_try_to_free_pages(struct zone *preferred_zone,
+					  struct zonelist *zonelist,
+					  struct scan_control *sc)
 {
 	int priority;
 	unsigned long total_scanned = 0;
@@ -2034,6 +2035,7 @@ static unsigned long do_try_to_free_page
 	struct zoneref *z;
 	struct zone *zone;
 	unsigned long writeback_threshold;
+	unsigned long min_reclaim = sc->nr_to_reclaim;
 
 	get_mems_allowed();
 	delayacct_freepages_start();
@@ -2041,6 +2043,16 @@ static unsigned long do_try_to_free_page
 	if (scanning_global_lru(sc))
 		count_vm_event(ALLOCSTALL);
 
+	for_each_zone_zonelist_nodemask(zone, z, zonelist,
+					gfp_zone(sc->gfp_mask), sc->nodemask) {
+		if (!populated_zone(zone))
+			continue;
+		preferred_zone = zone;
+		break;
+	}
+	if (preferred_zone)
+		sc->nr_to_reclaim += preferred_zone->watermark[WMARK_HIGH];
+
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		sc->nr_scanned = 0;
 		if (!priority)
@@ -2067,8 +2079,17 @@ static unsigned long do_try_to_free_page
 			}
 		}
 		total_scanned += sc->nr_scanned;
-		if (sc->nr_reclaimed >= sc->nr_to_reclaim)
-			goto out;
+		if (sc->nr_reclaimed >= min_reclaim) {
+			if (sc->nr_reclaimed >= sc->nr_to_reclaim)
+				goto out;
+			if (total_scanned > 2 * sc->nr_to_reclaim)
+				goto out;
+			if (preferred_zone &&
+			    zone_watermark_ok(preferred_zone, sc->order,
+					low_wmark_pages(preferred_zone),
+					zone_idx(preferred_zone), 0))
+				goto out;
+		}
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This
@@ -2117,7 +2138,8 @@ out:
 	return 0;
 }
 
-unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
+unsigned long try_to_free_pages(struct zone *preferred_zone,
+				struct zonelist *zonelist, int order,
 				gfp_t gfp_mask, nodemask_t *nodemask)
 {
 	unsigned long nr_reclaimed;
@@ -2137,7 +2159,7 @@ unsigned long try_to_free_pages(struct z
 				sc.may_writepage,
 				gfp_mask);
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(preferred_zone, zonelist, &sc);
 
 	trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
 
@@ -2207,7 +2229,7 @@ unsigned long try_to_free_mem_cgroup_pag
 					    sc.may_writepage,
 					    sc.gfp_mask);
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(NULL, zonelist, &sc);
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 
@@ -2796,7 +2818,7 @@ unsigned long shrink_all_memory(unsigned
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(NULL, zonelist, &sc);
 
 	p->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
--- linux-next.orig/mm/page_alloc.c	2011-05-02 19:15:21.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-05-02 19:39:51.000000000 +0800
@@ -1888,9 +1888,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
 	int migratetype, unsigned long *did_some_progress)
 {
-	struct page *page = NULL;
+	struct page *page;
 	struct reclaim_state reclaim_state;
-	bool drained = false;
 
 	cond_resched();
 
@@ -1901,33 +1900,22 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
-	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
+	*did_some_progress = try_to_free_pages(preferred_zone, zonelist, order,
+					       gfp_mask, nodemask);
 
 	current->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
 	current->flags &= ~PF_MEMALLOC;
 
-	cond_resched();
-
 	if (unlikely(!(*did_some_progress)))
 		return NULL;
 
-retry:
+	alloc_flags |= ALLOC_HARDER;
+
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
 					alloc_flags, preferred_zone,
 					migratetype);
-
-	/*
-	 * If an allocation failed after direct reclaim, it could be because
-	 * pages are pinned on the per-cpu lists. Drain them and try again
-	 */
-	if (!page && !drained) {
-		drain_all_pages();
-		drained = true;
-		goto retry;
-	}
-
 	return page;
 }
 
--- linux-next.orig/fs/buffer.c	2011-05-02 19:15:21.000000000 +0800
+++ linux-next/fs/buffer.c	2011-05-02 19:15:33.000000000 +0800
@@ -288,8 +288,8 @@ static void free_more_memory(void)
 						gfp_zone(GFP_NOFS), NULL,
 						&zone);
 		if (zone)
-			try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
-						GFP_NOFS, NULL);
+			try_to_free_pages(zone, node_zonelist(nid, GFP_NOFS),
+					  0, GFP_NOFS, NULL);
 	}
 }
 
--- linux-next.orig/include/linux/swap.h	2011-05-02 19:15:21.000000000 +0800
+++ linux-next/include/linux/swap.h	2011-05-02 19:15:33.000000000 +0800
@@ -249,7 +249,8 @@ static inline void lru_cache_add_file(st
 #define ISOLATE_BOTH 2		/* Isolate both active and inactive pages. */
 
 /* linux/mm/vmscan.c */
-extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
+extern unsigned long try_to_free_pages(struct zone *preferred_zone,
+					struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-02 13:29                           ` Wu Fengguang
@ 2011-05-02 13:49                             ` Wu Fengguang
  2011-05-03  0:27                               ` Satoru Moriya
  0 siblings, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-05-02 13:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes

On Mon, May 02, 2011 at 09:29:58PM +0800, Wu Fengguang wrote:
> > > +                     if (preferred_zone &&
> > > +                         zone_watermark_ok_safe(preferred_zone, sc->order,
> > > +                                     high_wmark_pages(preferred_zone),
> > > +                                     zone_idx(preferred_zone), 0))
> > > +                             goto out;
> > > +             }
> > 
> > As I said, I think direct reclaim path sould be fast if possbile and
> > it should not a function of min_free_kbytes.
> 
> It can be made not a function of min_free_kbytes by simply changing
> high_wmark_pages() to low_wmark_pages() in the above chunk, since
> direct reclaim is triggered when ALLOC_WMARK_LOW cannot be satisfied,
> ie. it just dropped below low_wmark_pages().
> 
> But still, it costs 62ms reclaim latency (base kernel is 29ms).

I got new findings: the CPU schedule delays are much larger than
reclaim delays. It does make the "direct reclaim until low watermark
OK" latency less a problem :)

1000 dd test case:
                RECLAIM delay   CPU delay       nr_alloc_fail   CAL (last CPU)
base kernel     29ms            244ms           14586           218440
patched         62ms            215ms           5004            325

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-02 13:49                             ` Wu Fengguang
@ 2011-05-03  0:27                               ` Satoru Moriya
  2011-05-03  2:49                                 ` Wu Fengguang
  0 siblings, 1 reply; 43+ messages in thread
From: Satoru Moriya @ 2011-05-03  0:27 UTC (permalink / raw)
  To: Wu Fengguang, Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes

Hi Wu,
 
> On Mon, May 02, 2011 at 09:29:58PM +0800, Wu Fengguang wrote:
> > > > +                     if (preferred_zone &&
> > > > +                         zone_watermark_ok_safe(preferred_zone, sc->order,
> > > > +                                     high_wmark_pages(preferred_zone),
> > > > +                                     zone_idx(preferred_zone), 0))
> > > > +                             goto out;
> > > > +             }
> > >
> > > As I said, I think direct reclaim path sould be fast if possbile and
> > > it should not a function of min_free_kbytes.
> >
> > It can be made not a function of min_free_kbytes by simply changing
> > high_wmark_pages() to low_wmark_pages() in the above chunk, since
> > direct reclaim is triggered when ALLOC_WMARK_LOW cannot be satisfied,
> > ie. it just dropped below low_wmark_pages().
> >
> > But still, it costs 62ms reclaim latency (base kernel is 29ms).
> 
> I got new findings: the CPU schedule delays are much larger than
> reclaim delays. It does make the "direct reclaim until low watermark
> OK" latency less a problem :)
> 
> 1000 dd test case:
>                 RECLAIM delay   CPU delay       nr_alloc_fail   CAL (last CPU)
> base kernel     29ms            244ms           14586           218440
> patched         62ms            215ms           5004            325

Hmm, in your system, the latency of direct reclaim may be a less problem.

But, generally speaking, in a latency sensitive system in enterprise area
there are two kind of processes. One is latency sensitive -(A) the other
is not-latency sensitive -(B). And usually we set cpu affinity for both processes
to avoid scheduling issue in (A). In this situation, CPU delay tends to be lower
than the above and a less problem but reclaim delay is more critical. 

Regards,
Satoru

> 
> Thanks,
> Fengguang
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-02 10:29                           ` Wu Fengguang
  2011-05-02 11:08                             ` Wu Fengguang
@ 2011-05-03  0:49                             ` Minchan Kim
  2011-05-03  3:51                               ` Wu Fengguang
  1 sibling, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-05-03  0:49 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes, Li Shaohua,
	Hugh Dickins

Hi Wu, Sorry for slow response.
I guess you know why I am slow. :)

On Mon, May 2, 2011 at 7:29 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Hi Minchan,
>
> On Mon, May 02, 2011 at 12:35:42AM +0800, Minchan Kim wrote:
>> Hi Wu,
>>
>> On Sat, Apr 30, 2011 at 10:17:41PM +0800, Wu Fengguang wrote:
>> > On Fri, Apr 29, 2011 at 10:28:24AM +0800, Wu Fengguang wrote:
>> > > > Test results:
>> > > >
>> > > > - the failure rate is pretty sensible to the page reclaim size,
>> > > >   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
>> > > >
>> > > > - the IPIs are reduced by over 100 times
>> > >
>> > > It's reduced by 500 times indeed.
>> > >
>> > > CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
>> > > CAL:         93        463        410        540        298        282        272        306   Function call interrupts
>> > >
>> > > > base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
>> > > > -------------------------------------------------------------------------------
>> > > > nr_alloc_fail 10496
>> > > > allocstall 1576602
>> > >
>> > > > patched (WMARK_MIN)
>> > > > -------------------
>> > > > nr_alloc_fail 704
>> > > > allocstall 105551
>> > >
>> > > > patched (WMARK_HIGH)
>> > > > --------------------
>> > > > nr_alloc_fail 282
>> > > > allocstall 53860
>> > >
>> > > > this patch (WMARK_HIGH, limited scan)
>> > > > -------------------------------------
>> > > > nr_alloc_fail 276
>> > > > allocstall 54034
>> > >
>> > > There is a bad side effect though: the much reduced "allocstall" means
>> > > each direct reclaim will take much more time to complete. A simple solution
>> > > is to terminate direct reclaim after 10ms. I noticed that an 100ms
>> > > time threshold can reduce the reclaim latency from 621ms to 358ms.
>> > > Further lowering the time threshold to 20ms does not help reducing the
>> > > real latencies though.
>> >
>> > Experiments going on...
>> >
>> > I tried the more reasonable terminate condition: stop direct reclaim
>> > when the preferred zone is above high watermark (see the below chunk).
>> >
>> > This helps reduce the average reclaim latency to under 100ms in the
>> > 1000-dd case.
>> >
>> > However nr_alloc_fail is around 5000 and not ideal. The interesting
>> > thing is, even if zone watermark is high, the task still may fail to
>> > get a free page..
>> >
>> > @@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
>> >                         }
>> >                 }
>> >                 total_scanned += sc->nr_scanned;
>> > -               if (sc->nr_reclaimed >= sc->nr_to_reclaim)
>> > -                       goto out;
>> > +               if (sc->nr_reclaimed >= min_reclaim) {
>> > +                       if (sc->nr_reclaimed >= sc->nr_to_reclaim)
>> > +                               goto out;
>> > +                       if (total_scanned > 2 * sc->nr_to_reclaim)
>> > +                               goto out;
>> > +                       if (preferred_zone &&
>> > +                           zone_watermark_ok_safe(preferred_zone, sc->order,
>> > +                                       high_wmark_pages(preferred_zone),
>> > +                                       zone_idx(preferred_zone), 0))
>> > +                               goto out;
>> > +               }
>> >
>> >                 /*
>> >                  * Try to write back as many pages as we just scanned.  This
>> >
>> > Thanks,
>> > Fengguang
>> > ---
>> > Subject: mm: cut down __GFP_NORETRY page allocation failures
>> > Date: Thu Apr 28 13:46:39 CST 2011
>> >
>> > Concurrent page allocations are suffering from high failure rates.
>> >
>> > On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
>> > the page allocation failures are
>> >
>> > nr_alloc_fail 733     # interleaved reads by 1 single task
>> > nr_alloc_fail 11799   # concurrent reads by 1000 tasks
>> >
>> > The concurrent read test script is:
>> >
>> >       for i in `seq 1000`
>> >       do
>> >               truncate -s 1G /fs/sparse-$i
>> >               dd if=/fs/sparse-$i of=/dev/null &
>> >       done
>> >
>> > In order for get_page_from_freelist() to get free page,
>> >
>> > (1) try_to_free_pages() should use much higher .nr_to_reclaim than the
>> >     current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
>> >     possible low watermark state as well as fill the pcp with enough free
>> >     pages to overflow its high watermark.
>> >
>> > (2) the get_page_from_freelist() _after_ direct reclaim should use lower
>> >     watermark than its normal invocations, so that it can reasonably
>> >     "reserve" some free pages for itself and prevent other concurrent
>> >     page allocators stealing all its reclaimed pages.
>>
>> Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
>> http://marc.info/?l=linux-mm&m=129187231129887&w=4
>> The idea is to keep a page at leat for direct reclaimed process.
>> Could it mitigate your problem or could you enhacne the idea?
>> I think it's very simple and fair solution.
>
> No it's not helping my problem, nr_alloc_fail and CAL are still high:

Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
I read your mail which states it doesn't help although it considers
order-0 pages and drain.
Actually, I tried to look into that but in my poor system(core2duo, 2G
ram), nr_alloc_fail never happens. :(
I will try it in other desktop but I am not sure I can reproduce it.

>
> root@fat /home/wfg# ./test-dd-sparse.sh
> start time: 246
> total time: 531
> nr_alloc_fail 14097
> allocstall 1578332
> LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
> RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
> CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
> TLB:        381         27         22         19         96        404        111         67   TLB shootdowns
>
> root@fat /home/wfg# getdelays -dip `pidof dd`
> print delayacct stats ON
> printing IO accounting
> PID     5202
>
>
> CPU             count     real total  virtual total    delay total
>                 1132     3635447328     3627947550   276722091605
> IO              count    delay total  delay average
>                    2      187809974             62ms
> SWAP            count    delay total  delay average
>                    0              0              0ms
> RECLAIM         count    delay total  delay average
>                 1334    35304580824             26ms
> dd: read=278528, write=0, cancelled_write=0
>
> I guess your patch is mainly fixing the high order allocations while
> my workload is mainly order 0 readahead page allocations. There are
> 1000 forks, however the "start time: 246" seems to indicate that the
> order-1 reclaim latency is not improved.

Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.

>
> I'll try modifying your patch and see how it works out. The obvious
> change is to apply it to the order-0 case. Hope this won't create much
> more isolated pages.
>
> Attached is your patch rebased to 2.6.39-rc3, after resolving some
> merge conflicts and fixing a trivial NULL pointer bug.

Thanks!
I would like to see detail with it in my system if I can reproduce it.

>
>> >
>> > Some notes:
>> >
>> > - commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
>> >   reclaim allocation fails") has the same target, however is obviously
>> >   costly and less effective. It seems more clean to just remove the
>> >   retry and drain code than to retain it.
>>
>> Tend to agree.
>> My old patch can solve it, I think.
>
> Sadly nope. See above.
>
>> >
>> > - it's a bit hacky to reclaim more than requested pages inside
>> >   do_try_to_free_page(), and it won't help cgroup for now
>> >
>> > - it only aims to reduce failures when there are plenty of reclaimable
>> >   pages, so it stops the opportunistic reclaim when scanned 2 times pages
>> >
>> > Test results:
>> >
>> > - the failure rate is pretty sensible to the page reclaim size,
>> >   from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
>> >
>> > - the IPIs are reduced by over 100 times
>> >
>> > base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
>> > -------------------------------------------------------------------------------
>> > nr_alloc_fail 10496
>> > allocstall 1576602
>> >
>> > slabs_scanned 21632
>> > kswapd_steal 4393382
>> > kswapd_inodesteal 124
>> > kswapd_low_wmark_hit_quickly 885
>> > kswapd_high_wmark_hit_quickly 2321
>> > kswapd_skip_congestion_wait 0
>> > pageoutrun 29426
>> >
>> > CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
>> >
>> > LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
>> > RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
>> > TLB:        189         15         13         17         64        294         97         63   TLB shootdowns
>> >
>> > patched (WMARK_MIN)
>> > -------------------
>> > nr_alloc_fail 704
>> > allocstall 105551
>> >
>> > slabs_scanned 33280
>> > kswapd_steal 4525537
>> > kswapd_inodesteal 187
>> > kswapd_low_wmark_hit_quickly 4980
>> > kswapd_high_wmark_hit_quickly 2573
>> > kswapd_skip_congestion_wait 0
>> > pageoutrun 35429
>> >
>> > CAL:         93        286        396        754        272        297        275        281   Function call interrupts
>> >
>> > LOC:     520550     517751     517043     522016     520302     518479     519329     517179   Local timer interrupts
>> > RES:       2131       1371       1376       1269       1390       1181       1409       1280   Rescheduling interrupts
>> > TLB:        280         26         27         30         65        305        134         75   TLB shootdowns
>> >
>> > patched (WMARK_HIGH)
>> > --------------------
>> > nr_alloc_fail 282
>> > allocstall 53860
>> >
>> > slabs_scanned 23936
>> > kswapd_steal 4561178
>> > kswapd_inodesteal 0
>> > kswapd_low_wmark_hit_quickly 2760
>> > kswapd_high_wmark_hit_quickly 1748
>> > kswapd_skip_congestion_wait 0
>> > pageoutrun 32639
>> >
>> > CAL:         93        463        410        540        298        282        272        306   Function call interrupts
>> >
>> > LOC:     513956     510749     509890     514897     514300     512392     512825     510574   Local timer interrupts
>> > RES:       1174       2081       1411       1320       1742       2683       1380       1230   Rescheduling interrupts
>> > TLB:        274         21         19         22         57        317        131         61   TLB shootdowns
>> >
>> > patched (WMARK_HIGH, limited scan)
>> > ----------------------------------
>> > nr_alloc_fail 276
>> > allocstall 54034
>> >
>> > slabs_scanned 24320
>> > kswapd_steal 4507482
>> > kswapd_inodesteal 262
>> > kswapd_low_wmark_hit_quickly 2638
>> > kswapd_high_wmark_hit_quickly 1710
>> > kswapd_skip_congestion_wait 0
>> > pageoutrun 32182
>> >
>> > CAL:         69        443        421        567        273        279        269        334   Function call interrupts
>>
>> Looks amazing.
>
> Yeah, I have strong feelings against drain_all_pages() in the direct
> reclaim path. The intuition is, once drain_all_pages() is called, the
> later on direct reclaims will have less chance to fill the drained
> buffers and therefore forced into drain_all_pages() again and again.
>
> drain_all_pages() is probably an overkill for preventing OOM.
> Generally speaking, it's questionable to "squeeze the last page before
> OOM".
>
> A typical desktop enters thrashing storms before OOM, as Hugh pointed
> out, this may well not the end users wanted. I agree with him and
> personally prefer some applications to be OOM killed rather than the
> whole system goes unusable thrashing like mad.

Tend to agree. The rule is applied to embedded system, too.
Couldn't we mitigate draining  just in case it is high order page.

>
>> > LOC:     514736     511698     510993     514069     514185     512986     513838     511229   Local timer interrupts
>> > RES:       2153       1556       1126       1351       3047       1554       1131       1560   Rescheduling interrupts
>> > TLB:        209         26         20         15         71        315        117         71   TLB shootdowns
>> >
>> > patched (WMARK_HIGH, limited scan, stop on watermark OK), 100 dd
>> > ----------------------------------------------------------------
>> >
>> > start time: 3
>> > total time: 50
>> > nr_alloc_fail 162
>> > allocstall 45523
>> >
>> > CPU             count     real total  virtual total    delay total
>> >                   921     3024540200     3009244668    37123129525
>> > IO              count    delay total  delay average
>> >                     0              0              0ms
>> > SWAP            count    delay total  delay average
>> >                     0              0              0ms
>> > RECLAIM         count    delay total  delay average
>> >                   357     4891766796             13ms
>> > dd: read=0, write=0, cancelled_write=0
>> >
>> > patched (WMARK_HIGH, limited scan, stop on watermark OK), 1000 dd
>> > -----------------------------------------------------------------
>> >
>> > start time: 272
>> > total time: 509
>> > nr_alloc_fail 3913
>> > allocstall 541789
>> >
>> > CPU             count     real total  virtual total    delay total
>> >                  1044     3445476208     3437200482   229919915202
>> > IO              count    delay total  delay average
>> >                     0              0              0ms
>> > SWAP            count    delay total  delay average
>> >                     0              0              0ms
>> > RECLAIM         count    delay total  delay average
>> >                   452    34691441605             76ms
>> > dd: read=0, write=0, cancelled_write=0
>> >
>> > patched (WMARK_HIGH, limited scan, stop on watermark OK, no time limit), 1000 dd
>> > --------------------------------------------------------------------------------
>> >
>> > start time: 278
>> > total time: 513
>> > nr_alloc_fail 4737
>> > allocstall 436392
>> >
>> >
>> > CPU             count     real total  virtual total    delay total
>> >                  1024     3371487456     3359441487   225088210977
>> > IO              count    delay total  delay average
>> >                     1      160631171            160ms
>> > SWAP            count    delay total  delay average
>> >                     0              0              0ms
>> > RECLAIM         count    delay total  delay average
>> >                   367    30809994722             83ms
>> > dd: read=20480, write=0, cancelled_write=0
>> >
>> >
>> > no cond_resched():
>>
>> What's this?
>
> I tried a modified patch that also removes the cond_resched() call in
> __alloc_pages_direct_reclaim(), between try_to_free_pages() and
> get_page_from_freelist(). It seems not helping noticeably.
>
> It looks safe to remove that cond_resched() as we already have such
> calls in shrink_page_list().

I tried similar thing but Andrew have a concern about it.
https://lkml.org/lkml/2011/3/24/138

>
>> >
>> > start time: 263
>> > total time: 516
>> > nr_alloc_fail 5144
>> > allocstall 436787
>> >
>> > CPU             count     real total  virtual total    delay total
>> >                  1018     3305497488     3283831119   241982934044
>> > IO              count    delay total  delay average
>> >                     0              0              0ms
>> > SWAP            count    delay total  delay average
>> >                     0              0              0ms
>> > RECLAIM         count    delay total  delay average
>> >                   328    31398481378             95ms
>> > dd: read=0, write=0, cancelled_write=0
>> >
>> > zone_watermark_ok_safe():
>> >
>> > start time: 266
>> > total time: 513
>> > nr_alloc_fail 4526
>> > allocstall 440246
>> >
>> > CPU             count     real total  virtual total    delay total
>> >                  1119     3640446568     3619184439   240945024724
>> > IO              count    delay total  delay average
>> >                     3      303620082            101ms
>> > SWAP            count    delay total  delay average
>> >                     0              0              0ms
>> > RECLAIM         count    delay total  delay average
>> >                   372    27320731898             73ms
>> > dd: read=77824, write=0, cancelled_write=0
>> >
>
>> > start time: 275
>>
>> What's meaing of start time?
>
> It's the time taken to start 1000 dd's.
>
>> > total time: 517
>>
>> Total time is elapsed time on your experiment?
>
> Yeah. They are generated with this script.
>
> $ cat ~/bin/test-dd-sparse.sh
>
> #!/bin/sh
>
> mount /dev/sda7 /fs
>
> tic=$(date +'%s')
>
> for i in `seq 1000`
> do
>        truncate -s 1G /fs/sparse-$i
>        dd if=/fs/sparse-$i of=/dev/null &>/dev/null &
> done
>
> tac=$(date +'%s')
> echo start time: $((tac-tic))
>
> wait
>
> tac=$(date +'%s')
> echo total time: $((tac-tic))
>
> egrep '(nr_alloc_fail|allocstall)' /proc/vmstat
> egrep '(CAL|RES|LOC|TLB)' /proc/interrupts
>
>> > nr_alloc_fail 4694
>> > allocstall 431021
>> >
>> >
>> > CPU             count     real total  virtual total    delay total
>> >                  1073     3534462680     3512544928   234056498221
>>
>> What's meaning of CPU fields?
>
> It's "waiting for a CPU (while being runnable)" as described in
> Documentation/accounting/delay-accounting.txt.

Thanks

>
>> > IO              count    delay total  delay average
>> >                     0              0              0ms
>> > SWAP            count    delay total  delay average
>> >                     0              0              0ms
>> > RECLAIM         count    delay total  delay average
>> >                   386    34751778363             89ms
>> > dd: read=0, write=0, cancelled_write=0
>> >
>>
>> Where is vanilla data for comparing latency?
>> Personally, It's hard to parse your data.
>
> Sorry it's somehow too much data and kernel revisions.. The base kernel's
> average latency is 29ms:
>
> base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> -------------------------------------------------------------------------------
>
> CPU             count     real total  virtual total    delay total
>                 1122     3676441096     3656793547   274182127286
> IO              count    delay total  delay average
>                    3      291765493             97ms
> SWAP            count    delay total  delay average
>                    0              0              0ms
> RECLAIM         count    delay total  delay average
>                 1350    39229752193             29ms
> dd: read=45056, write=0, cancelled_write=0
>
> start time: 245
> total time: 526
> nr_alloc_fail 14586
> allocstall 1578343
> LOC:     533981     529210     528283     532346     533392     531314     531705     528983   Local timer interrupts
> RES:       3123       2177       1676       1580       2157       1974       1606       1696   Rescheduling interrupts
> CAL:     218392     218631     219167     219217     218840     218985     218429     218440   Function call interrupts
> TLB:        175         13         21         18         62        309        119         42   TLB shootdowns
>
>>
>> > CC: Mel Gorman <mel@linux.vnet.ibm.com>
>> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>> > ---
>> >  fs/buffer.c          |    4 ++--
>> >  include/linux/swap.h |    3 ++-
>> >  mm/page_alloc.c      |   20 +++++---------------
>> >  mm/vmscan.c          |   31 +++++++++++++++++++++++--------
>> >  4 files changed, 32 insertions(+), 26 deletions(-)
>> > --- linux-next.orig/mm/vmscan.c       2011-04-29 10:42:14.000000000 +0800
>> > +++ linux-next/mm/vmscan.c    2011-04-30 21:59:33.000000000 +0800
>> > @@ -2025,8 +2025,9 @@ static bool all_unreclaimable(struct zon
>> >   * returns:  0, if no pages reclaimed
>> >   *           else, the number of pages reclaimed
>> >   */
>> > -static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>> > -                                     struct scan_control *sc)
>> > +static unsigned long do_try_to_free_pages(struct zone *preferred_zone,
>> > +                                       struct zonelist *zonelist,
>> > +                                       struct scan_control *sc)
>> >  {
>> >       int priority;
>> >       unsigned long total_scanned = 0;
>> > @@ -2034,6 +2035,7 @@ static unsigned long do_try_to_free_page
>> >       struct zoneref *z;
>> >       struct zone *zone;
>> >       unsigned long writeback_threshold;
>> > +     unsigned long min_reclaim = sc->nr_to_reclaim;
>>
>> Hmm,
>>
>> >
>> >       get_mems_allowed();
>> >       delayacct_freepages_start();
>> > @@ -2041,6 +2043,9 @@ static unsigned long do_try_to_free_page
>> >       if (scanning_global_lru(sc))
>> >               count_vm_event(ALLOCSTALL);
>> >
>> > +     if (preferred_zone)
>> > +             sc->nr_to_reclaim += preferred_zone->watermark[WMARK_HIGH];
>> > +
>>
>> Hmm, I don't like this idea.
>> The goal of direct reclaim path is to reclaim pages asap, I beleive.
>> Many thing should be achieve of background kswapd.
>> If admin changes min_free_kbytes, it can affect latency of direct reclaim.
>> It doesn't make sense to me.
>
> Yeah, it does increase delays.. in the 1000 dd case, roughly from 30ms
> to 90ms. This is a major drawback.

Yes.

>
>> >       for (priority = DEF_PRIORITY; priority >= 0; priority--) {
>> >               sc->nr_scanned = 0;
>> >               if (!priority)
>> > @@ -2067,8 +2072,17 @@ static unsigned long do_try_to_free_page
>> >                       }
>> >               }
>> >               total_scanned += sc->nr_scanned;
>> > -             if (sc->nr_reclaimed >= sc->nr_to_reclaim)
>> > -                     goto out;
>> > +             if (sc->nr_reclaimed >= min_reclaim) {
>> > +                     if (sc->nr_reclaimed >= sc->nr_to_reclaim)
>> > +                             goto out;
>>
>> I can't understand the logic.
>> if nr_reclaimed is bigger than min_reclaim, it's always greater than
>> nr_to_reclaim. What's meaning of min_reclaim?
>
> In direct reclaim, min_reclaim will be the legacy SWAP_CLUSTER_MAX and
> sc->nr_to_reclaim will be increased to the zone's high watermark and
> is kind of "max to reclaim".
>
>>
>> > +                     if (total_scanned > 2 * sc->nr_to_reclaim)
>> > +                             goto out;
>>
>> If there are lots of dirty pages in LRU?
>> If there are lots of unevictable pages in LRU?
>> If there are lots of mapped page in LRU but may_unmap = 0 cases?
>> I means it's rather risky early conclusion.
>
> That test means to avoid scanning too much on __GFP_NORETRY direct
> reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
> the LRU pages seem hard to reclaim. And the problem in the 1000 dd
> case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
> fails from time to time, with lots of IPIs that may hurt large
> machines a lot.

I don't have  enough time and a environment to test it.
So I can't make sure of it but my concern is a latency.
If you solve latency problem considering CPU scaling, I won't oppose it. :)



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-02 10:14                             ` KOSAKI Motohiro
@ 2011-05-03  0:53                               ` Minchan Kim
  2011-05-03  1:25                                 ` KOSAKI Motohiro
  0 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-05-03  0:53 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, Christoph Lameter,
	Dave Chinner, David Rientjes

On Mon, May 2, 2011 at 7:14 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On Mon, May 02, 2011 at 01:35:42AM +0900, Minchan Kim wrote:
>>
>> > Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
>>                                      ^^^^^^^^^^^^^^^^
>>                               typo : wasn't complete
>
> I think your idea is eligible. Wu's approach may increase throughput but

Yes. it doesn't change many subtle things and make much fair but the
Wu's concern is order-0 pages with __GFP_NORETRY. By his experiment,
my patch doesn't help much his concern.
The problem I have is I don't have any infrastructure for reproducing
his experiment. :(

> may decrease latency. So, do you have a plan to finish the work?

I want it but the day would be after finishing inorder-putback series. :)
Maybe you have a environment(8core system). If you want it, go ahead. :)

Thanks, KOSAKI.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-03  0:53                               ` Minchan Kim
@ 2011-05-03  1:25                                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2011-05-03  1:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Wu Fengguang, Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, Christoph Lameter,
	Dave Chinner, David Rientjes

2011/5/3 Minchan Kim <minchan.kim@gmail.com>:
> On Mon, May 2, 2011 at 7:14 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>> On Mon, May 02, 2011 at 01:35:42AM +0900, Minchan Kim wrote:
>>>
>>> > Do you see my old patch? The patch want't incomplet but it's not bad for showing an idea.
>>>                                      ^^^^^^^^^^^^^^^^
>>>                               typo : wasn't complete
>>
>> I think your idea is eligible. Wu's approach may increase throughput but
>
> Yes. it doesn't change many subtle things and make much fair but the
> Wu's concern is order-0 pages with __GFP_NORETRY. By his experiment,
> my patch doesn't help much his concern.
> The problem I have is I don't have any infrastructure for reproducing
> his experiment. :(
>
>> may decrease latency. So, do you have a plan to finish the work?
>
> I want it but the day would be after finishing inorder-putback series. :)
> Maybe you have a environment(8core system). If you want it, go ahead. :)

Hahaha, no. I lost such ia64 box by physical crash. ;-)
I haven't reproduce his issue yet too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-03  0:27                               ` Satoru Moriya
@ 2011-05-03  2:49                                 ` Wu Fengguang
  0 siblings, 0 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-05-03  2:49 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: Minchan Kim, Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes

Hi Satoru,

On Tue, May 03, 2011 at 08:27:43AM +0800, Satoru Moriya wrote:
> Hi Wu,
>  
> > On Mon, May 02, 2011 at 09:29:58PM +0800, Wu Fengguang wrote:
> > > > > +                     if (preferred_zone &&
> > > > > +                         zone_watermark_ok_safe(preferred_zone, sc->order,
> > > > > +                                     high_wmark_pages(preferred_zone),
> > > > > +                                     zone_idx(preferred_zone), 0))
> > > > > +                             goto out;
> > > > > +             }
> > > >
> > > > As I said, I think direct reclaim path sould be fast if possbile and
> > > > it should not a function of min_free_kbytes.
> > >
> > > It can be made not a function of min_free_kbytes by simply changing
> > > high_wmark_pages() to low_wmark_pages() in the above chunk, since
> > > direct reclaim is triggered when ALLOC_WMARK_LOW cannot be satisfied,
> > > ie. it just dropped below low_wmark_pages().
> > >
> > > But still, it costs 62ms reclaim latency (base kernel is 29ms).
> > 
> > I got new findings: the CPU schedule delays are much larger than
> > reclaim delays. It does make the "direct reclaim until low watermark
> > OK" latency less a problem :)
> > 
> > 1000 dd test case:
> >                 RECLAIM delay   CPU delay       nr_alloc_fail   CAL (last CPU)
> > base kernel     29ms            244ms           14586           218440
> > patched         62ms            215ms           5004            325
> 
> Hmm, in your system, the latency of direct reclaim may be a less problem.
> 
> But, generally speaking, in a latency sensitive system in enterprise area
> there are two kind of processes. One is latency sensitive -(A) the other
> is not-latency sensitive -(B). And usually we set cpu affinity for both processes
> to avoid scheduling issue in (A). In this situation, CPU delay tends to be lower
> than the above and a less problem but reclaim delay is more critical. 

Good point, thanks!

I also tried increasing min_free_kbytes as indicated by Minchan and
find 1-second long reclaim delays... Even adding explicit time limits,
it's still over 100ms with very high nr_alloc_fail.

I'm listing the code and results here as a record. But in general I'll
stop experiments in this direction. We need some more oriented way
that can guarantee to satisfy the page allocation request after small
sized direct reclaims.

Thanks,
Fengguang
---

root@fat /home/wfg# ./test-dd-sparse.sh
start time: 250
total time: 518
nr_alloc_fail 18551
allocstall 234468
LOC:     525770     523124     520782     529151     526192     525004     524166     521527   Local timer interrupts
RES:       2174       1674       1301       1420       3329       1563       1314       1563   Rescheduling interrupts
CAL:         67        402        602        267        240        270        291        274   Function call interrupts
TLB:        197         25         23         17         80        321        121         58   TLB shootdowns

CPU             count     real total  virtual total    delay total  delay average
                 1078     3408481832     3400786094   256971188317        238.378ms
IO              count    delay total  delay average
                    5      414363739             82ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  187    28564728545            152ms

Subject: mm: cut down __GFP_NORETRY page allocation failures
Date: Thu Apr 28 13:46:39 CST 2011

Concurrent page allocations are suffering from high failure rates.

On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
the page allocation failures are

nr_alloc_fail 733 	# interleaved reads by 1 single task
nr_alloc_fail 11799	# concurrent reads by 1000 tasks

The concurrent read test script is:

	for i in `seq 1000`
	do
		truncate -s 1G /fs/sparse-$i
		dd if=/fs/sparse-$i of=/dev/null &
	done

In order for get_page_from_freelist() to get free page,

(1) try_to_free_pages() should use much higher .nr_to_reclaim than the
    current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
    possible low watermark state

(2) the get_page_from_freelist() _after_ direct reclaim should use lower
    watermark than its normal invocations, so that it can reasonably
    "reserve" some free pages for itself and prevent other concurrent
    page allocators stealing all its reclaimed pages.

Some notes:

- commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
  reclaim allocation fails") has the same target, however is obviously
  costly and less effective. It seems more clean to just remove the
  retry and drain code than to retain it.

- it's a bit hacky to reclaim more than requested pages inside
  do_try_to_free_page(), and it won't help cgroup for now

- it only aims to reduce failures when there are plenty of reclaimable
  pages, so it stops the opportunistic reclaim when scanned 2 times pages

Test results (1000 dd case):

- the failure rate is pretty sensible to the page reclaim size,
  from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 5004 (WMARK_HIGH, stop on low
  watermark ok) to 10496 (SWAP_CLUSTER_MAX)

- the IPIs are reduced by over 500 times

- the reclaim delay is doubled, from 29ms to 62ms

Base kernel is vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocations.

base kernel, 1000 dd
--------------------

start time: 245
total time: 526
nr_alloc_fail 14586
allocstall 1578343
LOC:     533981     529210     528283     532346     533392     531314     531705     528983   Local timer interrupts
RES:       3123       2177       1676       1580       2157       1974       1606       1696   Rescheduling interrupts
CAL:     218392     218631     219167     219217     218840     218985     218429     218440   Function call interrupts
TLB:        175         13         21         18         62        309        119         42   TLB shootdowns


CPU             count     real total  virtual total    delay total
                 1122     3676441096     3656793547   274182127286
IO              count    delay total  delay average
                    3      291765493             97ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1350    39229752193             29ms
dd: read=45056, write=0, cancelled_write=0

patched, 1000 dd
----------------

root@fat /home/wfg# ./test-dd-sparse.sh
start time: 260
total time: 519
nr_alloc_fail 5004
allocstall 551429
LOC:     524861     521832     520945     524632     524666     523334     523797     521562   Local timer interrupts
RES:       1323       1976       2505       1610       1544       1848       3310       1644   Rescheduling interrupts
CAL:         67        335        353        614        289        287        293        325   Function call interrupts
TLB:        288         29         26         34        103        321        123         70   TLB shootdowns

CPU             count     real total  virtual total    delay total
                 1177     3797422704     3775174301   253228435955
IO              count    delay total  delay average
                    1      198528820            198ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  508    31660219699             62ms

base kernel, 100 dd
-------------------
root@fat /home/wfg# ./test-dd-sparse.sh
start time: 3
total time: 53
nr_alloc_fail 849
allocstall 131330
LOC:      59843      56506      55838      65283      61774      57929      58880      56246   Local timer interrupts
RES:        376        308        372        239        374        307        491        239   Rescheduling interrupts
CAL:      17737      18083      17948      18192      17929      17845      17893      17906   Function call interrupts
TLB:        307         26         25         21         80        324        137         79   TLB shootdowns

CPU             count     real total  virtual total    delay total
                  974     3197513904     3180727460    38504429363
IO              count    delay total  delay average
                    1       18156696             18ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1036     3439387298              3ms
dd: read=12288, write=0, cancelled_write=0

patched, 100 dd
---------------

root@fat /home/wfg# ./test-dd-sparse.sh
start time: 3
total time: 52
nr_alloc_fail 307
allocstall 48178
LOC:      56486      53514      52792      55879      56317      55383      55311      53168   Local timer interrupts
RES:        604        345        257        250        775        371        272        252   Rescheduling interrupts
CAL:         75        373        369        543        272        278        295        296   Function call interrupts
TLB:        259         24         19         24         82        306        139         53   TLB shootdowns


CPU             count     real total  virtual total    delay total
                  974     3177516944     3161771347    38508053977
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                  393     5389030889             13ms
dd: read=0, write=0, cancelled_write=0

CC: Mel Gorman <mel@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/buffer.c          |    4 ++--
 include/linux/swap.h |    3 ++-
 mm/page_alloc.c      |   22 +++++-----------------
 mm/vmscan.c          |   34 ++++++++++++++++++++++++++--------
 4 files changed, 35 insertions(+), 28 deletions(-)
--- linux-next.orig/mm/vmscan.c	2011-05-02 22:14:14.000000000 +0800
+++ linux-next/mm/vmscan.c	2011-05-03 10:07:14.000000000 +0800
@@ -2025,8 +2025,9 @@ static bool all_unreclaimable(struct zon
  * returns:	0, if no pages reclaimed
  * 		else, the number of pages reclaimed
  */
-static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
-					struct scan_control *sc)
+static unsigned long do_try_to_free_pages(struct zone *preferred_zone,
+					  struct zonelist *zonelist,
+					  struct scan_control *sc)
 {
 	int priority;
 	unsigned long total_scanned = 0;
@@ -2034,6 +2035,8 @@ static unsigned long do_try_to_free_page
 	struct zoneref *z;
 	struct zone *zone;
 	unsigned long writeback_threshold;
+	unsigned long min_reclaim = sc->nr_to_reclaim;
+	unsigned long start_time = jiffies;
 
 	get_mems_allowed();
 	delayacct_freepages_start();
@@ -2041,6 +2044,9 @@ static unsigned long do_try_to_free_page
 	if (scanning_global_lru(sc))
 		count_vm_event(ALLOCSTALL);
 
+	if (preferred_zone)
+		sc->nr_to_reclaim += preferred_zone->watermark[WMARK_LOW];
+
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		sc->nr_scanned = 0;
 		if (!priority)
@@ -2067,8 +2073,19 @@ static unsigned long do_try_to_free_page
 			}
 		}
 		total_scanned += sc->nr_scanned;
-		if (sc->nr_reclaimed >= sc->nr_to_reclaim)
-			goto out;
+		if (sc->nr_reclaimed >= min_reclaim) {
+			if (sc->nr_reclaimed >= sc->nr_to_reclaim)
+				goto out;
+			if (total_scanned > 2 * sc->nr_to_reclaim)
+				goto out;
+			if (preferred_zone &&
+			    zone_watermark_ok(preferred_zone, sc->order,
+					low_wmark_pages(preferred_zone),
+					zone_idx(preferred_zone), 0))
+				goto out;
+			if (jiffies - start_time > HZ / 100)
+				goto out;
+		}
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This
@@ -2117,7 +2134,8 @@ out:
 	return 0;
 }
 
-unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
+unsigned long try_to_free_pages(struct zone *preferred_zone,
+				struct zonelist *zonelist, int order,
 				gfp_t gfp_mask, nodemask_t *nodemask)
 {
 	unsigned long nr_reclaimed;
@@ -2137,7 +2155,7 @@ unsigned long try_to_free_pages(struct z
 				sc.may_writepage,
 				gfp_mask);
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(preferred_zone, zonelist, &sc);
 
 	trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
 
@@ -2207,7 +2225,7 @@ unsigned long try_to_free_mem_cgroup_pag
 					    sc.may_writepage,
 					    sc.gfp_mask);
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(NULL, zonelist, &sc);
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 
@@ -2796,7 +2814,7 @@ unsigned long shrink_all_memory(unsigned
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	nr_reclaimed = do_try_to_free_pages(NULL, zonelist, &sc);
 
 	p->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
--- linux-next.orig/mm/page_alloc.c	2011-05-02 22:14:14.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-05-02 22:14:21.000000000 +0800
@@ -1888,9 +1888,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
 	int migratetype, unsigned long *did_some_progress)
 {
-	struct page *page = NULL;
+	struct page *page;
 	struct reclaim_state reclaim_state;
-	bool drained = false;
 
 	cond_resched();
 
@@ -1901,33 +1900,22 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
-	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
+	*did_some_progress = try_to_free_pages(preferred_zone, zonelist, order,
+					       gfp_mask, nodemask);
 
 	current->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
 	current->flags &= ~PF_MEMALLOC;
 
-	cond_resched();
-
 	if (unlikely(!(*did_some_progress)))
 		return NULL;
 
-retry:
+	alloc_flags |= ALLOC_HARDER;
+
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
 					alloc_flags, preferred_zone,
 					migratetype);
-
-	/*
-	 * If an allocation failed after direct reclaim, it could be because
-	 * pages are pinned on the per-cpu lists. Drain them and try again
-	 */
-	if (!page && !drained) {
-		drain_all_pages();
-		drained = true;
-		goto retry;
-	}
-
 	return page;
 }
 
--- linux-next.orig/fs/buffer.c	2011-05-02 22:14:14.000000000 +0800
+++ linux-next/fs/buffer.c	2011-05-02 22:14:21.000000000 +0800
@@ -288,8 +288,8 @@ static void free_more_memory(void)
 						gfp_zone(GFP_NOFS), NULL,
 						&zone);
 		if (zone)
-			try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
-						GFP_NOFS, NULL);
+			try_to_free_pages(zone, node_zonelist(nid, GFP_NOFS),
+					  0, GFP_NOFS, NULL);
 	}
 }
 
--- linux-next.orig/include/linux/swap.h	2011-05-02 22:14:14.000000000 +0800
+++ linux-next/include/linux/swap.h	2011-05-02 22:14:21.000000000 +0800
@@ -249,7 +249,8 @@ static inline void lru_cache_add_file(st
 #define ISOLATE_BOTH 2		/* Isolate both active and inactive pages. */
 
 /* linux/mm/vmscan.c */
-extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
+extern unsigned long try_to_free_pages(struct zone *preferred_zone,
+					struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-03  0:49                             ` Minchan Kim
@ 2011-05-03  3:51                               ` Wu Fengguang
  2011-05-03  4:17                                 ` Minchan Kim
  0 siblings, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-05-03  3:51 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes, Li, Shaohua,
	Hugh Dickins

Hi Minchan,

On Tue, May 03, 2011 at 08:49:20AM +0800, Minchan Kim wrote:
> Hi Wu, Sorry for slow response.
> I guess you know why I am slow. :)

Yeah, never mind :)

> Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
> I read your mail which states it doesn't help although it considers
> order-0 pages and drain.
> Actually, I tried to look into that but in my poor system(core2duo, 2G
> ram), nr_alloc_fail never happens. :(

I'm running a 4-core 8-thread CPU with 3G ram.

Did you run with this patch?

[PATCH] mm: readahead page allocations are OK to fail
https://lkml.org/lkml/2011/4/26/129

It's very good at generating lots of __GFP_NORETRY order-0 page
allocation requests.

> I will try it in other desktop but I am not sure I can reproduce it.
> 
> >
> > root@fat /home/wfg# ./test-dd-sparse.sh
> > start time: 246
> > total time: 531
> > nr_alloc_fail 14097
> > allocstall 1578332
> > LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
> > RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
> > CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
> > TLB:        381         27         22         19         96        404        111         67   TLB shootdowns
> >
> > root@fat /home/wfg# getdelays -dip `pidof dd`
> > print delayacct stats ON
> > printing IO accounting
> > PID     5202
> >
> >
> > CPU             count     real total  virtual total    delay total
> >                 1132     3635447328     3627947550   276722091605
> > IO              count    delay total  delay average
> >                    2      187809974             62ms
> > SWAP            count    delay total  delay average
> >                    0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                 1334    35304580824             26ms
> > dd: read=278528, write=0, cancelled_write=0
> >
> > I guess your patch is mainly fixing the high order allocations while
> > my workload is mainly order 0 readahead page allocations. There are
> > 1000 forks, however the "start time: 246" seems to indicate that the
> > order-1 reclaim latency is not improved.
> 
> Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.

It's mainly a guess. In an earlier experiment of simply increasing
nr_to_reclaim to high_wmark_pages() without any other constraints, it
does manage to reduce start time to about 25 seconds.

> > I'll try modifying your patch and see how it works out. The obvious
> > change is to apply it to the order-0 case. Hope this won't create much
> > more isolated pages.
> >
> > Attached is your patch rebased to 2.6.39-rc3, after resolving some
> > merge conflicts and fixing a trivial NULL pointer bug.
> 
> Thanks!
> I would like to see detail with it in my system if I can reproduce it.

OK.

> >> > no cond_resched():
> >>
> >> What's this?
> >
> > I tried a modified patch that also removes the cond_resched() call in
> > __alloc_pages_direct_reclaim(), between try_to_free_pages() and
> > get_page_from_freelist(). It seems not helping noticeably.
> >
> > It looks safe to remove that cond_resched() as we already have such
> > calls in shrink_page_list().
> 
> I tried similar thing but Andrew have a concern about it.
> https://lkml.org/lkml/2011/3/24/138

Yeah cond_resched() is at least not the root cause of our problems..

> >> > +                     if (total_scanned > 2 * sc->nr_to_reclaim)
> >> > +                             goto out;
> >>
> >> If there are lots of dirty pages in LRU?
> >> If there are lots of unevictable pages in LRU?
> >> If there are lots of mapped page in LRU but may_unmap = 0 cases?
> >> I means it's rather risky early conclusion.
> >
> > That test means to avoid scanning too much on __GFP_NORETRY direct
> > reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
> > the LRU pages seem hard to reclaim. And the problem in the 1000 dd
> > case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
> > fails from time to time, with lots of IPIs that may hurt large
> > machines a lot.
> 
> I don't have  enough time and a environment to test it.
> So I can't make sure of it but my concern is a latency.
> If you solve latency problem considering CPU scaling, I won't oppose it. :)

OK, let's head for that direction :)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-03  3:51                               ` Wu Fengguang
@ 2011-05-03  4:17                                 ` Minchan Kim
  0 siblings, 0 replies; 43+ messages in thread
From: Minchan Kim @ 2011-05-03  4:17 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Mel Gorman, Dave Young, linux-mm,
	Linux Kernel Mailing List, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Christoph Lameter, Dave Chinner, David Rientjes, Li, Shaohua,
	Hugh Dickins

On Tue, May 3, 2011 at 12:51 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Hi Minchan,
>
> On Tue, May 03, 2011 at 08:49:20AM +0800, Minchan Kim wrote:
>> Hi Wu, Sorry for slow response.
>> I guess you know why I am slow. :)
>
> Yeah, never mind :)
>
>> Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
>> I read your mail which states it doesn't help although it considers
>> order-0 pages and drain.
>> Actually, I tried to look into that but in my poor system(core2duo, 2G
>> ram), nr_alloc_fail never happens. :(
>
> I'm running a 4-core 8-thread CPU with 3G ram.
>
> Did you run with this patch?
>
> [PATCH] mm: readahead page allocations are OK to fail
> https://lkml.org/lkml/2011/4/26/129
>

Of course.
I will try it in my better machine  i5 4 core 3G ram.

> It's very good at generating lots of __GFP_NORETRY order-0 page
> allocation requests.
>
>> I will try it in other desktop but I am not sure I can reproduce it.
>>
>> >
>> > root@fat /home/wfg# ./test-dd-sparse.sh
>> > start time: 246
>> > total time: 531
>> > nr_alloc_fail 14097
>> > allocstall 1578332
>> > LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
>> > RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
>> > CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
>> > TLB:        381         27         22         19         96        404        111         67   TLB shootdowns
>> >
>> > root@fat /home/wfg# getdelays -dip `pidof dd`
>> > print delayacct stats ON
>> > printing IO accounting
>> > PID     5202
>> >
>> >
>> > CPU             count     real total  virtual total    delay total
>> >                 1132     3635447328     3627947550   276722091605
>> > IO              count    delay total  delay average
>> >                    2      187809974             62ms
>> > SWAP            count    delay total  delay average
>> >                    0              0              0ms
>> > RECLAIM         count    delay total  delay average
>> >                 1334    35304580824             26ms
>> > dd: read=278528, write=0, cancelled_write=0
>> >
>> > I guess your patch is mainly fixing the high order allocations while
>> > my workload is mainly order 0 readahead page allocations. There are
>> > 1000 forks, however the "start time: 246" seems to indicate that the
>> > order-1 reclaim latency is not improved.
>>
>> Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.
>
> It's mainly a guess. In an earlier experiment of simply increasing
> nr_to_reclaim to high_wmark_pages() without any other constraints, it
> does manage to reduce start time to about 25 seconds.

If so, I guess the workload might depend on order-0 page, not stack allocation.

>
>> > I'll try modifying your patch and see how it works out. The obvious
>> > change is to apply it to the order-0 case. Hope this won't create much
>> > more isolated pages.
>> >
>> > Attached is your patch rebased to 2.6.39-rc3, after resolving some
>> > merge conflicts and fixing a trivial NULL pointer bug.
>>
>> Thanks!
>> I would like to see detail with it in my system if I can reproduce it.
>
> OK.
>
>> >> > no cond_resched():
>> >>
>> >> What's this?
>> >
>> > I tried a modified patch that also removes the cond_resched() call in
>> > __alloc_pages_direct_reclaim(), between try_to_free_pages() and
>> > get_page_from_freelist(). It seems not helping noticeably.
>> >
>> > It looks safe to remove that cond_resched() as we already have such
>> > calls in shrink_page_list().
>>
>> I tried similar thing but Andrew have a concern about it.
>> https://lkml.org/lkml/2011/3/24/138
>
> Yeah cond_resched() is at least not the root cause of our problems..
>
>> >> > +                     if (total_scanned > 2 * sc->nr_to_reclaim)
>> >> > +                             goto out;
>> >>
>> >> If there are lots of dirty pages in LRU?
>> >> If there are lots of unevictable pages in LRU?
>> >> If there are lots of mapped page in LRU but may_unmap = 0 cases?
>> >> I means it's rather risky early conclusion.
>> >
>> > That test means to avoid scanning too much on __GFP_NORETRY direct
>> > reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
>> > the LRU pages seem hard to reclaim. And the problem in the 1000 dd
>> > case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
>> > fails from time to time, with lots of IPIs that may hurt large
>> > machines a lot.
>>
>> I don't have  enough time and a environment to test it.
>> So I can't make sure of it but my concern is a latency.
>> If you solve latency problem considering CPU scaling, I won't oppose it. :)
>
> OK, let's head for that direction :)

Anyway,  the problem about draining overhead with __GFP_NORETRY is
valuable, I think.
We should handle it

>
> Thanks,
> Fengguang
>

Thanks for the good experiments and numbers.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
  2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
  2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
@ 2011-05-04  1:56                     ` Dave Young
  2011-05-04  2:32                       ` Dave Young
  2011-05-04  4:00                       ` Wu Fengguang
  2 siblings, 2 replies; 43+ messages in thread
From: Dave Young @ 2011-05-04  1:56 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Minchan Kim, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

On Thu, Apr 28, 2011 at 9:36 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Concurrent page allocations are suffering from high failure rates.
>
> On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
> the page allocation failures are
>
> nr_alloc_fail 733       # interleaved reads by 1 single task
> nr_alloc_fail 11799     # concurrent reads by 1000 tasks
>
> The concurrent read test script is:
>
>        for i in `seq 1000`
>        do
>                truncate -s 1G /fs/sparse-$i
>                dd if=/fs/sparse-$i of=/dev/null &
>        done
>

With Core2 Duo, 3G ram, No swap partition I can not produce the alloc fail

> In order for get_page_from_freelist() to get free page,
>
> (1) try_to_free_pages() should use much higher .nr_to_reclaim than the
>    current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
>    possible low watermark state as well as fill the pcp with enough free
>    pages to overflow its high watermark.
>
> (2) the get_page_from_freelist() _after_ direct reclaim should use lower
>    watermark than its normal invocations, so that it can reasonably
>    "reserve" some free pages for itself and prevent other concurrent
>    page allocators stealing all its reclaimed pages.
>
> Some notes:
>
> - commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
>  reclaim allocation fails") has the same target, however is obviously
>  costly and less effective. It seems more clean to just remove the
>  retry and drain code than to retain it.
>
> - it's a bit hacky to reclaim more than requested pages inside
>  do_try_to_free_page(), and it won't help cgroup for now
>
> - it only aims to reduce failures when there are plenty of reclaimable
>  pages, so it stops the opportunistic reclaim when scanned 2 times pages
>
> Test results:
>
> - the failure rate is pretty sensible to the page reclaim size,
>  from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
>
> - the IPIs are reduced by over 100 times
>
> base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
> -------------------------------------------------------------------------------
> nr_alloc_fail 10496
> allocstall 1576602
>
> slabs_scanned 21632
> kswapd_steal 4393382
> kswapd_inodesteal 124
> kswapd_low_wmark_hit_quickly 885
> kswapd_high_wmark_hit_quickly 2321
> kswapd_skip_congestion_wait 0
> pageoutrun 29426
>
> CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
>
> LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
> RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
> TLB:        189         15         13         17         64        294         97         63   TLB shootdowns

Could you tell how to get above info?

>
> patched (WMARK_MIN)
> -------------------
> nr_alloc_fail 704
> allocstall 105551
>
> slabs_scanned 33280
> kswapd_steal 4525537
> kswapd_inodesteal 187
> kswapd_low_wmark_hit_quickly 4980
> kswapd_high_wmark_hit_quickly 2573
> kswapd_skip_congestion_wait 0
> pageoutrun 35429
>
> CAL:         93        286        396        754        272        297        275        281   Function call interrupts
>
> LOC:     520550     517751     517043     522016     520302     518479     519329     517179   Local timer interrupts
> RES:       2131       1371       1376       1269       1390       1181       1409       1280   Rescheduling interrupts
> TLB:        280         26         27         30         65        305        134         75   TLB shootdowns
>
> patched (WMARK_HIGH)
> --------------------
> nr_alloc_fail 282
> allocstall 53860
>
> slabs_scanned 23936
> kswapd_steal 4561178
> kswapd_inodesteal 0
> kswapd_low_wmark_hit_quickly 2760
> kswapd_high_wmark_hit_quickly 1748
> kswapd_skip_congestion_wait 0
> pageoutrun 32639
>
> CAL:         93        463        410        540        298        282        272        306   Function call interrupts
>
> LOC:     513956     510749     509890     514897     514300     512392     512825     510574   Local timer interrupts
> RES:       1174       2081       1411       1320       1742       2683       1380       1230   Rescheduling interrupts
> TLB:        274         21         19         22         57        317        131         61   TLB shootdowns
>
> this patch (WMARK_HIGH, limited scan)
> -------------------------------------
> nr_alloc_fail 276
> allocstall 54034
>
> slabs_scanned 24320
> kswapd_steal 4507482
> kswapd_inodesteal 262
> kswapd_low_wmark_hit_quickly 2638
> kswapd_high_wmark_hit_quickly 1710
> kswapd_skip_congestion_wait 0
> pageoutrun 32182
>
> CAL:         69        443        421        567        273        279        269        334   Function call interrupts
>
> LOC:     514736     511698     510993     514069     514185     512986     513838     511229   Local timer interrupts
> RES:       2153       1556       1126       1351       3047       1554       1131       1560   Rescheduling interrupts
> TLB:        209         26         20         15         71        315        117         71   TLB shootdowns
>
> CC: Mel Gorman <mel@linux.vnet.ibm.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/page_alloc.c |   17 +++--------------
>  mm/vmscan.c     |    6 ++++++
>  2 files changed, 9 insertions(+), 14 deletions(-)
> --- linux-next.orig/mm/vmscan.c 2011-04-28 21:16:16.000000000 +0800
> +++ linux-next/mm/vmscan.c      2011-04-28 21:28:57.000000000 +0800
> @@ -1978,6 +1978,8 @@ static void shrink_zones(int priority, s
>                                continue;
>                        if (zone->all_unreclaimable && priority != DEF_PRIORITY)
>                                continue;       /* Let kswapd poll it */
> +                       sc->nr_to_reclaim = max(sc->nr_to_reclaim,
> +                                               zone->watermark[WMARK_HIGH]);
>                }
>
>                shrink_zone(priority, zone, sc);
> @@ -2034,6 +2036,7 @@ static unsigned long do_try_to_free_page
>        struct zoneref *z;
>        struct zone *zone;
>        unsigned long writeback_threshold;
> +       unsigned long min_reclaim = sc->nr_to_reclaim;
>
>        get_mems_allowed();
>        delayacct_freepages_start();
> @@ -2067,6 +2070,9 @@ static unsigned long do_try_to_free_page
>                        }
>                }
>                total_scanned += sc->nr_scanned;
> +               if (sc->nr_reclaimed >= min_reclaim &&
> +                   total_scanned > 2 * sc->nr_to_reclaim)
> +                       goto out;
>                if (sc->nr_reclaimed >= sc->nr_to_reclaim)
>                        goto out;
>
> --- linux-next.orig/mm/page_alloc.c     2011-04-28 21:16:16.000000000 +0800
> +++ linux-next/mm/page_alloc.c  2011-04-28 21:16:18.000000000 +0800
> @@ -1888,9 +1888,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
>        nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
>        int migratetype, unsigned long *did_some_progress)
>  {
> -       struct page *page = NULL;
> +       struct page *page;
>        struct reclaim_state reclaim_state;
> -       bool drained = false;
>
>        cond_resched();
>
> @@ -1912,22 +1911,12 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
>        if (unlikely(!(*did_some_progress)))
>                return NULL;
>
> -retry:
> +       alloc_flags |= ALLOC_HARDER;
> +
>        page = get_page_from_freelist(gfp_mask, nodemask, order,
>                                        zonelist, high_zoneidx,
>                                        alloc_flags, preferred_zone,
>                                        migratetype);
> -
> -       /*
> -        * If an allocation failed after direct reclaim, it could be because
> -        * pages are pinned on the per-cpu lists. Drain them and try again
> -        */
> -       if (!page && !drained) {
> -               drain_all_pages();
> -               drained = true;
> -               goto retry;
> -       }
> -
>        return page;
>  }
>
>



-- 
Regards
dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-04  1:56                     ` Dave Young
@ 2011-05-04  2:32                       ` Dave Young
  2011-05-04  2:56                         ` Wu Fengguang
  2011-05-04  4:00                       ` Wu Fengguang
  1 sibling, 1 reply; 43+ messages in thread
From: Dave Young @ 2011-05-04  2:32 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Minchan Kim, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

On Wed, May 4, 2011 at 9:56 AM, Dave Young <hidave.darkstar@gmail.com> wrote:
> On Thu, Apr 28, 2011 at 9:36 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> Concurrent page allocations are suffering from high failure rates.
>>
>> On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
>> the page allocation failures are
>>
>> nr_alloc_fail 733       # interleaved reads by 1 single task
>> nr_alloc_fail 11799     # concurrent reads by 1000 tasks
>>
>> The concurrent read test script is:
>>
>>        for i in `seq 1000`
>>        do
>>                truncate -s 1G /fs/sparse-$i
>>                dd if=/fs/sparse-$i of=/dev/null &
>>        done
>>
>
> With Core2 Duo, 3G ram, No swap partition I can not produce the alloc fail

unset CONFIG_SCHED_AUTOGROUP and CONFIG_CGROUP_SCHED seems affects the
test results, now I see several nr_alloc_fail (dd is not finished
yet):

dave@darkstar-32:$ grep fail /proc/vmstat:
nr_alloc_fail 4
compact_pagemigrate_failed 0
compact_fail 3
htlb_buddy_alloc_fail 0
thp_collapse_alloc_fail 4

So the result is related to cpu scheduler.

>
>> In order for get_page_from_freelist() to get free page,
>>
>> (1) try_to_free_pages() should use much higher .nr_to_reclaim than the
>>    current SWAP_CLUSTER_MAX=32, in order to draw the zone out of the
>>    possible low watermark state as well as fill the pcp with enough free
>>    pages to overflow its high watermark.
>>
>> (2) the get_page_from_freelist() _after_ direct reclaim should use lower
>>    watermark than its normal invocations, so that it can reasonably
>>    "reserve" some free pages for itself and prevent other concurrent
>>    page allocators stealing all its reclaimed pages.
>>
>> Some notes:
>>
>> - commit 9ee493ce ("mm: page allocator: drain per-cpu lists after direct
>>  reclaim allocation fails") has the same target, however is obviously
>>  costly and less effective. It seems more clean to just remove the
>>  retry and drain code than to retain it.
>>
>> - it's a bit hacky to reclaim more than requested pages inside
>>  do_try_to_free_page(), and it won't help cgroup for now
>>
>> - it only aims to reduce failures when there are plenty of reclaimable
>>  pages, so it stops the opportunistic reclaim when scanned 2 times pages
>>
>> Test results:
>>
>> - the failure rate is pretty sensible to the page reclaim size,
>>  from 282 (WMARK_HIGH) to 704 (WMARK_MIN) to 10496 (SWAP_CLUSTER_MAX)
>>
>> - the IPIs are reduced by over 100 times
>>
>> base kernel: vanilla 2.6.39-rc3 + __GFP_NORETRY readahead page allocation patch
>> -------------------------------------------------------------------------------
>> nr_alloc_fail 10496
>> allocstall 1576602
>>
>> slabs_scanned 21632
>> kswapd_steal 4393382
>> kswapd_inodesteal 124
>> kswapd_low_wmark_hit_quickly 885
>> kswapd_high_wmark_hit_quickly 2321
>> kswapd_skip_congestion_wait 0
>> pageoutrun 29426
>>
>> CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
>>
>> LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
>> RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
>> TLB:        189         15         13         17         64        294         97         63   TLB shootdowns
>
> Could you tell how to get above info?
>
>>
>> patched (WMARK_MIN)
>> -------------------
>> nr_alloc_fail 704
>> allocstall 105551
>>
>> slabs_scanned 33280
>> kswapd_steal 4525537
>> kswapd_inodesteal 187
>> kswapd_low_wmark_hit_quickly 4980
>> kswapd_high_wmark_hit_quickly 2573
>> kswapd_skip_congestion_wait 0
>> pageoutrun 35429
>>
>> CAL:         93        286        396        754        272        297        275        281   Function call interrupts
>>
>> LOC:     520550     517751     517043     522016     520302     518479     519329     517179   Local timer interrupts
>> RES:       2131       1371       1376       1269       1390       1181       1409       1280   Rescheduling interrupts
>> TLB:        280         26         27         30         65        305        134         75   TLB shootdowns
>>
>> patched (WMARK_HIGH)
>> --------------------
>> nr_alloc_fail 282
>> allocstall 53860
>>
>> slabs_scanned 23936
>> kswapd_steal 4561178
>> kswapd_inodesteal 0
>> kswapd_low_wmark_hit_quickly 2760
>> kswapd_high_wmark_hit_quickly 1748
>> kswapd_skip_congestion_wait 0
>> pageoutrun 32639
>>
>> CAL:         93        463        410        540        298        282        272        306   Function call interrupts
>>
>> LOC:     513956     510749     509890     514897     514300     512392     512825     510574   Local timer interrupts
>> RES:       1174       2081       1411       1320       1742       2683       1380       1230   Rescheduling interrupts
>> TLB:        274         21         19         22         57        317        131         61   TLB shootdowns
>>
>> this patch (WMARK_HIGH, limited scan)
>> -------------------------------------
>> nr_alloc_fail 276
>> allocstall 54034
>>
>> slabs_scanned 24320
>> kswapd_steal 4507482
>> kswapd_inodesteal 262
>> kswapd_low_wmark_hit_quickly 2638
>> kswapd_high_wmark_hit_quickly 1710
>> kswapd_skip_congestion_wait 0
>> pageoutrun 32182
>>
>> CAL:         69        443        421        567        273        279        269        334   Function call interrupts
>>
>> LOC:     514736     511698     510993     514069     514185     512986     513838     511229   Local timer interrupts
>> RES:       2153       1556       1126       1351       3047       1554       1131       1560   Rescheduling interrupts
>> TLB:        209         26         20         15         71        315        117         71   TLB shootdowns
>>
>> CC: Mel Gorman <mel@linux.vnet.ibm.com>
>> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>> ---
>>  mm/page_alloc.c |   17 +++--------------
>>  mm/vmscan.c     |    6 ++++++
>>  2 files changed, 9 insertions(+), 14 deletions(-)
>> --- linux-next.orig/mm/vmscan.c 2011-04-28 21:16:16.000000000 +0800
>> +++ linux-next/mm/vmscan.c      2011-04-28 21:28:57.000000000 +0800
>> @@ -1978,6 +1978,8 @@ static void shrink_zones(int priority, s
>>                                continue;
>>                        if (zone->all_unreclaimable && priority != DEF_PRIORITY)
>>                                continue;       /* Let kswapd poll it */
>> +                       sc->nr_to_reclaim = max(sc->nr_to_reclaim,
>> +                                               zone->watermark[WMARK_HIGH]);
>>                }
>>
>>                shrink_zone(priority, zone, sc);
>> @@ -2034,6 +2036,7 @@ static unsigned long do_try_to_free_page
>>        struct zoneref *z;
>>        struct zone *zone;
>>        unsigned long writeback_threshold;
>> +       unsigned long min_reclaim = sc->nr_to_reclaim;
>>
>>        get_mems_allowed();
>>        delayacct_freepages_start();
>> @@ -2067,6 +2070,9 @@ static unsigned long do_try_to_free_page
>>                        }
>>                }
>>                total_scanned += sc->nr_scanned;
>> +               if (sc->nr_reclaimed >= min_reclaim &&
>> +                   total_scanned > 2 * sc->nr_to_reclaim)
>> +                       goto out;
>>                if (sc->nr_reclaimed >= sc->nr_to_reclaim)
>>                        goto out;
>>
>> --- linux-next.orig/mm/page_alloc.c     2011-04-28 21:16:16.000000000 +0800
>> +++ linux-next/mm/page_alloc.c  2011-04-28 21:16:18.000000000 +0800
>> @@ -1888,9 +1888,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
>>        nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
>>        int migratetype, unsigned long *did_some_progress)
>>  {
>> -       struct page *page = NULL;
>> +       struct page *page;
>>        struct reclaim_state reclaim_state;
>> -       bool drained = false;
>>
>>        cond_resched();
>>
>> @@ -1912,22 +1911,12 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
>>        if (unlikely(!(*did_some_progress)))
>>                return NULL;
>>
>> -retry:
>> +       alloc_flags |= ALLOC_HARDER;
>> +
>>        page = get_page_from_freelist(gfp_mask, nodemask, order,
>>                                        zonelist, high_zoneidx,
>>                                        alloc_flags, preferred_zone,
>>                                        migratetype);
>> -
>> -       /*
>> -        * If an allocation failed after direct reclaim, it could be because
>> -        * pages are pinned on the per-cpu lists. Drain them and try again
>> -        */
>> -       if (!page && !drained) {
>> -               drain_all_pages();
>> -               drained = true;
>> -               goto retry;
>> -       }
>> -
>>        return page;
>>  }
>>
>>
>
>
>
> --
> Regards
> dave
>



-- 
Regards
dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-04  2:32                       ` Dave Young
@ 2011-05-04  2:56                         ` Wu Fengguang
  2011-05-04  4:23                           ` Wu Fengguang
  0 siblings, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-05-04  2:56 UTC (permalink / raw)
  To: Dave Young
  Cc: Andrew Morton, Minchan Kim, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]

On Wed, May 04, 2011 at 10:32:01AM +0800, Dave Young wrote:
> On Wed, May 4, 2011 at 9:56 AM, Dave Young <hidave.darkstar@gmail.com> wrote:
> > On Thu, Apr 28, 2011 at 9:36 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> Concurrent page allocations are suffering from high failure rates.
> >>
> >> On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
> >> the page allocation failures are
> >>
> >> nr_alloc_fail 733 A  A  A  # interleaved reads by 1 single task
> >> nr_alloc_fail 11799 A  A  # concurrent reads by 1000 tasks
> >>
> >> The concurrent read test script is:
> >>
> >> A  A  A  A for i in `seq 1000`
> >> A  A  A  A do
> >> A  A  A  A  A  A  A  A truncate -s 1G /fs/sparse-$i
> >> A  A  A  A  A  A  A  A dd if=/fs/sparse-$i of=/dev/null &
> >> A  A  A  A done
> >>
> >
> > With Core2 Duo, 3G ram, No swap partition I can not produce the alloc fail
> 
> unset CONFIG_SCHED_AUTOGROUP and CONFIG_CGROUP_SCHED seems affects the
> test results, now I see several nr_alloc_fail (dd is not finished
> yet):
> 
> dave@darkstar-32:$ grep fail /proc/vmstat:
> nr_alloc_fail 4
> compact_pagemigrate_failed 0
> compact_fail 3
> htlb_buddy_alloc_fail 0
> thp_collapse_alloc_fail 4
> 
> So the result is related to cpu scheduler.

Good catch! My kernel also disabled CONFIG_CGROUP_SCHED and
CONFIG_SCHED_AUTOGROUP.

Thanks,
Fengguang

[-- Attachment #2: .config --]
[-- Type: text/plain, Size: 77510 bytes --]

#
# Automatically generated make config: don't edit
# Linux/x86_64 2.6.39-rc3 Kernel Configuration
# Mon Apr 18 09:06:55 2011
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
# CONFIG_KTIME_SCALAR is not set
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_FHANDLE is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_HAVE_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_RCU_FAST_NO_HZ is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED=y
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_SCHED is not set
CONFIG_BLK_CGROUP=y
CONFIG_DEBUG_BLK_CGROUP=y
# CONFIG_NAMESPACES is not set
# CONFIG_SCHED_AUTOGROUP is not set
CONFIG_MM_OWNER=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_EXPERT=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_PERF_COUNTERS is not set
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_OPROFILE=y
# CONFIG_OPROFILE_EVENT_MULTIPLEX is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_OPTPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_CFQ_GROUP_IOSCHED=y
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_INLINE_SPIN_TRYLOCK is not set
# CONFIG_INLINE_SPIN_TRYLOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK is not set
# CONFIG_INLINE_SPIN_LOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK_IRQ is not set
# CONFIG_INLINE_SPIN_LOCK_IRQSAVE is not set
# CONFIG_INLINE_SPIN_UNLOCK is not set
# CONFIG_INLINE_SPIN_UNLOCK_BH is not set
# CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set
# CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_READ_TRYLOCK is not set
# CONFIG_INLINE_READ_LOCK is not set
# CONFIG_INLINE_READ_LOCK_BH is not set
# CONFIG_INLINE_READ_LOCK_IRQ is not set
# CONFIG_INLINE_READ_LOCK_IRQSAVE is not set
# CONFIG_INLINE_READ_UNLOCK is not set
# CONFIG_INLINE_READ_UNLOCK_BH is not set
# CONFIG_INLINE_READ_UNLOCK_IRQ is not set
# CONFIG_INLINE_READ_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_WRITE_TRYLOCK is not set
# CONFIG_INLINE_WRITE_LOCK is not set
# CONFIG_INLINE_WRITE_LOCK_BH is not set
# CONFIG_INLINE_WRITE_LOCK_IRQ is not set
# CONFIG_INLINE_WRITE_LOCK_IRQSAVE is not set
# CONFIG_INLINE_WRITE_UNLOCK is not set
# CONFIG_INLINE_WRITE_UNLOCK_BH is not set
# CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set
# CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE is not set
# CONFIG_MUTEX_SPIN_ON_OWNER is not set
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PARAVIRT_GUEST=y
# CONFIG_XEN is not set
# CONFIG_XEN_PRIVILEGED_GUEST is not set
CONFIG_KVM_CLOCK=y
CONFIG_KVM_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_SPINLOCKS is not set
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_NO_BOOTMEM=y
CONFIG_MEMTEST=y
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
CONFIG_MCORE2=y
# CONFIG_MATOM is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_INTERNODE_CACHE_SHIFT=7
CONFIG_X86_CMPXCHG=y
CONFIG_CMPXCHG_LOCAL=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_P6_NOP=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_PROCESSOR_SELECT=y
CONFIG_CPU_SUP_INTEL=y
# CONFIG_CPU_SUP_AMD is not set
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_CALGARY_IOMMU is not set
# CONFIG_AMD_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
# CONFIG_IOMMU_API is not set
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=64
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
# CONFIG_X86_MCE_AMD is not set
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=y
CONFIG_X86_THERMAL_VECTOR=y
# CONFIG_I8K is not set
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
# CONFIG_MICROCODE_AMD is not set
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=999999
# CONFIG_COMPACTION is not set
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_EFI=y
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
# CONFIG_KEXEC_JUMP is not set
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_RUNTIME is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_VERBOSE is not set
CONFIG_PM_ADVANCED_DEBUG=y
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_CAN_PM_TRACE=y
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_POWER_METER=y
CONFIG_ACPI_EC_DEBUGFS=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
CONFIG_ACPI_CUSTOM_DSDT_FILE=""
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_DEBUG_FUNC_TRACE=y
CONFIG_ACPI_PCI_SLOT=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
# CONFIG_ACPI_HOTPLUG_MEMORY is not set
CONFIG_ACPI_SBS=y
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_APEI is not set
# CONFIG_SFI is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

#
# CPUFreq processor drivers
#
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=y
# CONFIG_X86_POWERNOW_K8 is not set
CONFIG_X86_SPEEDSTEP_CENTRINO=y
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_INTEL_IDLE is not set

#
# Memory power savings
#
CONFIG_I7300_IDLE_IOAT_CHANNEL=y
CONFIG_I7300_IDLE=y

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCI_CNB20LE_QUIRK is not set
# CONFIG_DMAR is not set
# CONFIG_INTR_REMAP is not set
CONFIG_PCIEPORTBUS=y
# CONFIG_HOTPLUG_PCI_PCIE is not set
CONFIG_PCIEAER=y
# CONFIG_PCIE_ECRC is not set
# CONFIG_PCIEAER_INJECT is not set
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
# CONFIG_PCI_IOV is not set
CONFIG_PCI_IOAPIC=y
CONFIG_ISA_DMA_API=y
# CONFIG_PCCARD is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_FAKE is not set
# CONFIG_HOTPLUG_PCI_ACPI is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
# CONFIG_RAPIDIO is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_HAVE_TEXT_POKE_SMP=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=y
CONFIG_NET_KEY=y
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE_DEMUX is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
CONFIG_INET_AH=y
CONFIG_INET_ESP=y
CONFIG_INET_IPCOMP=y
CONFIG_INET_XFRM_TUNNEL=y
CONFIG_INET_TUNNEL=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=y
CONFIG_TCP_CONG_HTCP=y
CONFIG_TCP_CONG_HSTCP=y
CONFIG_TCP_CONG_HYBLA=y
CONFIG_TCP_CONG_VEGAS=y
CONFIG_TCP_CONG_SCALABLE=y
CONFIG_TCP_CONG_LP=y
CONFIG_TCP_CONG_VENO=y
CONFIG_TCP_CONG_YEAH=y
CONFIG_TCP_CONG_ILLINOIS=y
# CONFIG_DEFAULT_BIC is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_HYBLA is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_VENO is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_QUEUE=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CONNTRACK=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_EVENTS=y
# CONFIG_NF_CONNTRACK_TIMESTAMP is not set
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=y
CONFIG_NF_CONNTRACK_FTP=y
CONFIG_NF_CONNTRACK_H323=y
CONFIG_NF_CONNTRACK_IRC=y
CONFIG_NF_CONNTRACK_BROADCAST=y
CONFIG_NF_CONNTRACK_NETBIOS_NS=y
# CONFIG_NF_CONNTRACK_SNMP is not set
CONFIG_NF_CONNTRACK_PPTP=y
CONFIG_NF_CONNTRACK_SANE=y
CONFIG_NF_CONNTRACK_SIP=y
CONFIG_NF_CONNTRACK_TFTP=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NETFILTER_TPROXY=y
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=y
CONFIG_NETFILTER_XT_CONNMARK=y

#
# Xtables targets
#
# CONFIG_NETFILTER_XT_TARGET_AUDIT is not set
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=y
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
CONFIG_NETFILTER_XT_TARGET_CT=y
CONFIG_NETFILTER_XT_TARGET_DSCP=y
CONFIG_NETFILTER_XT_TARGET_HL=y
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
CONFIG_NETFILTER_XT_TARGET_LED=y
CONFIG_NETFILTER_XT_TARGET_MARK=y
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
CONFIG_NETFILTER_XT_TARGET_NOTRACK=y
CONFIG_NETFILTER_XT_TARGET_RATEEST=y
CONFIG_NETFILTER_XT_TARGET_TEE=y
CONFIG_NETFILTER_XT_TARGET_TPROXY=y
CONFIG_NETFILTER_XT_TARGET_TRACE=y
CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=y

#
# Xtables matches
#
# CONFIG_NETFILTER_XT_MATCH_ADDRTYPE is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=y
CONFIG_NETFILTER_XT_MATCH_COMMENT=y
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=y
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
CONFIG_NETFILTER_XT_MATCH_CPU=y
CONFIG_NETFILTER_XT_MATCH_DCCP=y
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
CONFIG_NETFILTER_XT_MATCH_DSCP=y
CONFIG_NETFILTER_XT_MATCH_ESP=y
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
CONFIG_NETFILTER_XT_MATCH_HELPER=y
CONFIG_NETFILTER_XT_MATCH_HL=y
CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
CONFIG_NETFILTER_XT_MATCH_LENGTH=y
CONFIG_NETFILTER_XT_MATCH_LIMIT=y
CONFIG_NETFILTER_XT_MATCH_MAC=y
CONFIG_NETFILTER_XT_MATCH_MARK=y
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
CONFIG_NETFILTER_XT_MATCH_OSF=y
CONFIG_NETFILTER_XT_MATCH_OWNER=y
CONFIG_NETFILTER_XT_MATCH_POLICY=y
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=y
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
CONFIG_NETFILTER_XT_MATCH_RATEEST=y
CONFIG_NETFILTER_XT_MATCH_REALM=y
CONFIG_NETFILTER_XT_MATCH_RECENT=y
CONFIG_NETFILTER_XT_MATCH_SCTP=y
CONFIG_NETFILTER_XT_MATCH_SOCKET=y
CONFIG_NETFILTER_XT_MATCH_STATE=y
CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
CONFIG_NETFILTER_XT_MATCH_STRING=y
CONFIG_NETFILTER_XT_MATCH_TCPMSS=y
CONFIG_NETFILTER_XT_MATCH_TIME=y
CONFIG_NETFILTER_XT_MATCH_U32=y
# CONFIG_IP_SET is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_CONNTRACK_IPV4=y
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
# CONFIG_IP_NF_QUEUE is not set
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_AH=y
CONFIG_IP_NF_MATCH_ECN=y
CONFIG_IP_NF_MATCH_TTL=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
CONFIG_IP_NF_TARGET_LOG=y
CONFIG_IP_NF_TARGET_ULOG=y
CONFIG_NF_NAT=y
CONFIG_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_IP_NF_TARGET_NETMAP=y
CONFIG_IP_NF_TARGET_REDIRECT=y
CONFIG_NF_NAT_PROTO_DCCP=y
CONFIG_NF_NAT_PROTO_GRE=y
CONFIG_NF_NAT_PROTO_UDPLITE=y
CONFIG_NF_NAT_PROTO_SCTP=y
CONFIG_NF_NAT_FTP=y
CONFIG_NF_NAT_IRC=y
CONFIG_NF_NAT_TFTP=y
CONFIG_NF_NAT_AMANDA=y
CONFIG_NF_NAT_PPTP=y
CONFIG_NF_NAT_H323=y
CONFIG_NF_NAT_SIP=y
CONFIG_IP_NF_MANGLE=y
CONFIG_IP_NF_TARGET_CLUSTERIP=y
CONFIG_IP_NF_TARGET_ECN=y
CONFIG_IP_NF_TARGET_TTL=y
CONFIG_IP_NF_RAW=y
CONFIG_IP_NF_ARPTABLES=y
CONFIG_IP_NF_ARPFILTER=y
CONFIG_IP_NF_ARP_MANGLE=y
CONFIG_BRIDGE_NF_EBTABLES=y
CONFIG_BRIDGE_EBT_BROUTE=y
CONFIG_BRIDGE_EBT_T_FILTER=y
CONFIG_BRIDGE_EBT_T_NAT=y
CONFIG_BRIDGE_EBT_802_3=y
CONFIG_BRIDGE_EBT_AMONG=y
CONFIG_BRIDGE_EBT_ARP=y
CONFIG_BRIDGE_EBT_IP=y
CONFIG_BRIDGE_EBT_LIMIT=y
CONFIG_BRIDGE_EBT_MARK=y
CONFIG_BRIDGE_EBT_PKTTYPE=y
CONFIG_BRIDGE_EBT_STP=y
CONFIG_BRIDGE_EBT_VLAN=y
CONFIG_BRIDGE_EBT_ARPREPLY=y
CONFIG_BRIDGE_EBT_DNAT=y
CONFIG_BRIDGE_EBT_MARK_T=y
CONFIG_BRIDGE_EBT_REDIRECT=y
CONFIG_BRIDGE_EBT_SNAT=y
CONFIG_BRIDGE_EBT_LOG=y
CONFIG_BRIDGE_EBT_ULOG=y
CONFIG_BRIDGE_EBT_NFLOG=y
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
CONFIG_STP=y
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFB is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
CONFIG_NET_SCH_NETEM=y
# CONFIG_NET_SCH_DRR is not set
# CONFIG_NET_SCH_MQPRIO is not set
# CONFIG_NET_SCH_CHOKE is not set

#
# Classification
#
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
# CONFIG_NET_CLS_CGROUP is not set
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
# CONFIG_BATMAN_ADV is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
# CONFIG_DEVTMPFS is not set
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set
CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_ARCH_NO_SYSDEV_OPS=y
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=y
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=65536
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=128
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_VIRTIO_BLK=y
# CONFIG_BLK_DEV_HD is not set
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_MISC_DEVICES is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
# CONFIG_CHR_DEV_SG is not set
# CONFIG_CHR_DEV_SCH is not set
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y
CONFIG_SCSI_SAS_LIBSAS=y
# CONFIG_SCSI_SAS_ATA is not set
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_SCSI_BNX2X_FCOE is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
CONFIG_SCSI_AACRAID=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
# CONFIG_AIC7XXX_BUILD_FIRMWARE is not set
CONFIG_AIC7XXX_DEBUG_ENABLE=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC7XXX_REG_PRETTY_PRINT=y
CONFIG_SCSI_AIC7XXX_OLD=y
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=5000
# CONFIG_AIC79XX_BUILD_FIRMWARE is not set
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_AIC79XX_REG_PRETTY_PRINT=y
CONFIG_SCSI_AIC94XX=y
CONFIG_AIC94XX_DEBUG=y
CONFIG_SCSI_MVSAS=y
CONFIG_SCSI_MVSAS_DEBUG=y
CONFIG_SCSI_DPT_I2O=y
CONFIG_SCSI_ADVANSYS=y
# CONFIG_SCSI_ARCMSR is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=y
CONFIG_MEGARAID_MAILBOX=y
CONFIG_MEGARAID_LEGACY=y
CONFIG_MEGARAID_SAS=y
CONFIG_SCSI_MPT2SAS=y
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
# CONFIG_SCSI_MPT2SAS_LOGGING is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_FCOE_FNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=y
CONFIG_SCSI_QLA_FC=y
CONFIG_SCSI_QLA_ISCSI=y
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=y
# CONFIG_SATA_AHCI_PLATFORM is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=y
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARASAN_CF is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CS5536 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
CONFIG_PATA_PLATFORM=y
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=y
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=y
CONFIG_MULTICORE_RAID456=y
CONFIG_MD_MULTIPATH=y
CONFIG_MD_FAULTY=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_DEBUG=y
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=y
CONFIG_DM_MIRROR=y
# CONFIG_DM_RAID is not set
CONFIG_DM_LOG_USERSPACE=y
CONFIG_DM_ZERO=y
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=y
CONFIG_DM_MULTIPATH_ST=y
CONFIG_DM_DELAY=y
CONFIG_DM_UEVENT=y
# CONFIG_DM_FLAKEY is not set
# CONFIG_TARGET_CORE is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=y
CONFIG_FUSION_FC=y
CONFIG_FUSION_SAS=y
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=y
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# CONFIG_I2O is not set
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=y
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_MII=y
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
CONFIG_MARVELL_PHY=y
CONFIG_DAVICOM_PHY=y
CONFIG_QSEMI_PHY=y
CONFIG_LXT_PHY=y
CONFIG_CICADA_PHY=y
CONFIG_VITESSE_PHY=y
CONFIG_SMSC_PHY=y
CONFIG_BROADCOM_PHY=y
# CONFIG_BCM63XX_PHY is not set
CONFIG_ICPLUS_PHY=y
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_ETHOC is not set
# CONFIG_DNET is not set
# CONFIG_NET_TULIP is not set
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
# CONFIG_AMD8111_ETH is not set
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_KSZ884X_PCI is not set
# CONFIG_B44 is not set
# CONFIG_FORCEDETH is not set
# CONFIG_E100 is not set
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
CONFIG_8139CP=y
CONFIG_8139TOO=y
CONFIG_8139TOO_PIO=y
# CONFIG_8139TOO_TUNE_TWISTER is not set
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_KS8842 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_VIA_RHINE is not set
# CONFIG_SC92031 is not set
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=y
# CONFIG_ACENIC_OMIT_TIGON_I is not set
CONFIG_DL2K=y
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_IP1000=y
CONFIG_IGB=y
CONFIG_IGB_DCA=y
CONFIG_IGBVF=y
CONFIG_NS83820=y
CONFIG_HAMACHI=y
CONFIG_YELLOWFIN=y
CONFIG_R8169=y
CONFIG_SIS190=y
CONFIG_SKGE=y
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=y
# CONFIG_SKY2_DEBUG is not set
CONFIG_VIA_VELOCITY=y
CONFIG_TIGON3=y
CONFIG_BNX2=y
CONFIG_CNIC=y
CONFIG_QLA3XXX=y
CONFIG_ATL1=y
CONFIG_ATL1E=y
CONFIG_ATL1C=y
CONFIG_JME=y
# CONFIG_STMMAC_ETH is not set
# CONFIG_PCH_GBE is not set
CONFIG_NETDEV_10000=y
CONFIG_MDIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
# CONFIG_ENIC is not set
CONFIG_IXGBE=y
CONFIG_IXGBE_DCA=y
# CONFIG_IXGBEVF is not set
CONFIG_IXGB=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
# CONFIG_MYRI10GE is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_NIU is not set
# CONFIG_MLX4_EN is not set
# CONFIG_MLX4_CORE is not set
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
# CONFIG_QLCNIC is not set
# CONFIG_QLGE is not set
# CONFIG_BNA is not set
# CONFIG_SFC is not set
# CONFIG_BE2NET is not set
# CONFIG_TR is not set
# CONFIG_WLAN is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
CONFIG_USB_CATC=y
CONFIG_USB_KAWETH=y
CONFIG_USB_PEGASUS=y
CONFIG_USB_RTL8150=y
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_CDCETHER=y
CONFIG_USB_NET_CDC_EEM=y
CONFIG_USB_NET_CDC_NCM=y
CONFIG_USB_NET_DM9601=y
CONFIG_USB_NET_SMSC75XX=y
CONFIG_USB_NET_SMSC95XX=y
CONFIG_USB_NET_GL620A=y
CONFIG_USB_NET_NET1080=y
CONFIG_USB_NET_PLUSB=y
CONFIG_USB_NET_MCS7830=y
CONFIG_USB_NET_RNDIS_HOST=y
CONFIG_USB_NET_CDC_SUBSET=y
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
CONFIG_USB_KC2190=y
CONFIG_USB_NET_ZAURUS=y
# CONFIG_USB_NET_CX82310_ETH is not set
CONFIG_USB_NET_INT51X1=y
# CONFIG_USB_IPHETH is not set
# CONFIG_USB_SIERRA_NET is not set
# CONFIG_USB_VL600 is not set
# CONFIG_WAN is not set

#
# CAIF transport drivers
#
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=y
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_VIRTIO_NET=y
# CONFIG_VMXNET3 is not set
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
CONFIG_INPUT_POLLDEV=y
CONFIG_INPUT_SPARSEKMAP=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
CONFIG_KEYBOARD_NEWTON=y
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
CONFIG_KEYBOARD_XTKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_SENTELIC is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_AD714X is not set
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
CONFIG_INPUT_UINPUT=y
# CONFIG_INPUT_PCF8574 is not set
# CONFIG_INPUT_ADXL34X is not set
# CONFIG_INPUT_CMA3000 is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_NOZOMI=y
# CONFIG_N_GSM is not set
CONFIG_DEVKMEM=y

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=16
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MFD_HSU is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_TIMBERDALE is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_PCH_UART is not set
# CONFIG_TTY_PRINTK is not set
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=y
# CONFIG_HW_RANDOM_AMD is not set
CONFIG_HW_RANDOM_VIA=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_NVRAM=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
CONFIG_HANGCHECK_TIMER=y
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
# CONFIG_RAMOOPS is not set
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
# CONFIG_I2C_SCMI is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_INTEL_MID is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA_PCI is not set
# CONFIG_I2C_SIMTEC is not set
# CONFIG_I2C_XILINX is not set
# CONFIG_I2C_EG20T is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_SPI is not set

#
# PPS support
#
# CONFIG_PPS is not set

#
# PPS generators support
#
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_BQ20Z75 is not set
# CONFIG_BATTERY_BQ27x00 is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
CONFIG_HWMON=y
# CONFIG_HWMON_VID is not set
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7411 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_ASC7621 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_K10TEMP is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS620 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_PKGTEMP is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_JC42 is not set
# CONFIG_SENSORS_LINEAGE is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM73 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4151 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6639 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_PMBUS is not set
# CONFIG_SENSORS_SHT21 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_EMC1403 is not set
# CONFIG_SENSORS_EMC2103 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_SCH5627 is not set
# CONFIG_SENSORS_ADS1015 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_AMC6821 is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_TMP102 is not set
# CONFIG_SENSORS_TMP401 is not set
# CONFIG_SENSORS_TMP421 is not set
# CONFIG_SENSORS_VIA_CPUTEMP is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_APPLESMC is not set

#
# ACPI drivers
#
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_THERMAL=y
# CONFIG_THERMAL_HWMON is not set
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=y
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_F71808E_WDT is not set
# CONFIG_SP5100_TCO is not set
# CONFIG_SC520_WDT is not set
# CONFIG_SBC_FITPC2_WATCHDOG is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
# CONFIG_IBMASR is not set
# CONFIG_WAFER_WDT is not set
# CONFIG_I6300ESB_WDT is not set
CONFIG_ITCO_WDT=y
CONFIG_ITCO_VENDOR_SUPPORT=y
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_NV_TCO is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_W83627HF_WDT is not set
# CONFIG_W83697HF_WDT is not set
# CONFIG_W83697UG_WDT is not set
# CONFIG_W83877F_WDT is not set
# CONFIG_W83977F_WDT is not set
# CONFIG_MACHZ_WDT is not set
# CONFIG_SBC_EPX_C3_WATCHDOG is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
# CONFIG_SSB_SILENT is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y
CONFIG_MFD_SUPPORT=y
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS6507X is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_MFD_STMPE is not set
# CONFIG_MFD_TC3589X is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_CS5535 is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_INTEL=y
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
CONFIG_DRM_KMS_HELPER=y
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_I810 is not set
CONFIG_DRM_I915=y
CONFIG_DRM_I915_KMS=y
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
# CONFIG_STUB_POULSBO is not set
# CONFIG_VGASTATE is not set
CONFIG_VIDEO_OUTPUT_CONTROL=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_WMT_GE_ROPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_VESA is not set
# CONFIG_FB_EFI is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=y
# CONFIG_LCD_PLATFORM is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_GENERIC=y
# CONFIG_BACKLIGHT_PROGEAR is not set
# CONFIG_BACKLIGHT_APPLE is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set

#
# Display device support
#
CONFIG_DISPLAY_SUPPORT=y

#
# Display hardware drivers
#

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=1024
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
# CONFIG_LOGO is not set
CONFIG_SOUND=y
# CONFIG_SOUND_OSS_CORE is not set
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_JACK=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
# CONFIG_SND_MIXER_OSS is not set
# CONFIG_SND_PCM_OSS is not set
# CONFIG_SND_SEQUENCER_OSS is not set
CONFIG_SND_HRTIMER=y
CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
CONFIG_SND_DYNAMIC_MINORS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_VERBOSE_PRINTK=y
CONFIG_SND_DEBUG=y
CONFIG_SND_DEBUG_VERBOSE=y
CONFIG_SND_PCM_XRUN_DEBUG=y
CONFIG_SND_VMASTER=y
CONFIG_SND_DMA_SGBUF=y
# CONFIG_SND_RAWMIDI_SEQ is not set
# CONFIG_SND_OPL3_LIB_SEQ is not set
# CONFIG_SND_OPL4_LIB_SEQ is not set
# CONFIG_SND_SBAWE_SEQ is not set
# CONFIG_SND_EMU10K1_SEQ is not set
CONFIG_SND_DRIVERS=y
CONFIG_SND_PCSP=m
# CONFIG_SND_DUMMY is not set
# CONFIG_SND_ALOOP is not set
# CONFIG_SND_VIRMIDI is not set
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_MPU401 is not set
CONFIG_SND_PCI=y
# CONFIG_SND_AD1889 is not set
# CONFIG_SND_ALS300 is not set
# CONFIG_SND_ALS4000 is not set
# CONFIG_SND_ALI5451 is not set
# CONFIG_SND_ASIHPI is not set
# CONFIG_SND_ATIIXP is not set
# CONFIG_SND_ATIIXP_MODEM is not set
# CONFIG_SND_AU8810 is not set
# CONFIG_SND_AU8820 is not set
# CONFIG_SND_AU8830 is not set
# CONFIG_SND_AW2 is not set
# CONFIG_SND_AZT3328 is not set
# CONFIG_SND_BT87X is not set
# CONFIG_SND_CA0106 is not set
# CONFIG_SND_CMIPCI is not set
# CONFIG_SND_OXYGEN is not set
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_CS46XX is not set
# CONFIG_SND_CS5530 is not set
# CONFIG_SND_CS5535AUDIO is not set
# CONFIG_SND_CTXFI is not set
# CONFIG_SND_DARLA20 is not set
# CONFIG_SND_GINA20 is not set
# CONFIG_SND_LAYLA20 is not set
# CONFIG_SND_DARLA24 is not set
# CONFIG_SND_GINA24 is not set
# CONFIG_SND_LAYLA24 is not set
# CONFIG_SND_MONA is not set
# CONFIG_SND_MIA is not set
# CONFIG_SND_ECHO3G is not set
# CONFIG_SND_INDIGO is not set
# CONFIG_SND_INDIGOIO is not set
# CONFIG_SND_INDIGODJ is not set
# CONFIG_SND_INDIGOIOX is not set
# CONFIG_SND_INDIGODJX is not set
# CONFIG_SND_EMU10K1 is not set
# CONFIG_SND_EMU10K1X is not set
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
# CONFIG_SND_ES1938 is not set
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_FM801 is not set
CONFIG_SND_HDA_INTEL=y
CONFIG_SND_HDA_HWDEP=y
CONFIG_SND_HDA_RECONFIG=y
CONFIG_SND_HDA_INPUT_BEEP=y
CONFIG_SND_HDA_INPUT_BEEP_MODE=1
CONFIG_SND_HDA_INPUT_JACK=y
# CONFIG_SND_HDA_PATCH_LOADER is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_HDMI=y
CONFIG_SND_HDA_CODEC_CIRRUS=y
CONFIG_SND_HDA_CODEC_CONEXANT=y
CONFIG_SND_HDA_CODEC_CA0110=y
CONFIG_SND_HDA_CODEC_CMEDIA=y
CONFIG_SND_HDA_CODEC_SI3054=y
CONFIG_SND_HDA_GENERIC=y
CONFIG_SND_HDA_POWER_SAVE=y
CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0
# CONFIG_SND_HDSP is not set
# CONFIG_SND_HDSPM is not set
# CONFIG_SND_ICE1712 is not set
# CONFIG_SND_ICE1724 is not set
# CONFIG_SND_INTEL8X0 is not set
# CONFIG_SND_INTEL8X0M is not set
# CONFIG_SND_KORG1212 is not set
# CONFIG_SND_LX6464ES is not set
# CONFIG_SND_MAESTRO3 is not set
# CONFIG_SND_MIXART is not set
# CONFIG_SND_NM256 is not set
# CONFIG_SND_PCXHR is not set
# CONFIG_SND_RIPTIDE is not set
# CONFIG_SND_RME32 is not set
# CONFIG_SND_RME96 is not set
# CONFIG_SND_RME9652 is not set
# CONFIG_SND_SONICVIBES is not set
# CONFIG_SND_TRIDENT is not set
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VIRTUOSO is not set
# CONFIG_SND_VX222 is not set
# CONFIG_SND_YMFPCI is not set
# CONFIG_SND_USB is not set
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
CONFIG_USB_HIDDEV=y

#
# Special HID drivers
#
# CONFIG_HID_3M_PCT is not set
# CONFIG_HID_A4TECH is not set
# CONFIG_HID_ACRUX is not set
# CONFIG_HID_APPLE is not set
# CONFIG_HID_BELKIN is not set
# CONFIG_HID_CANDO is not set
# CONFIG_HID_CHERRY is not set
# CONFIG_HID_CHICONY is not set
# CONFIG_HID_PRODIKEYS is not set
# CONFIG_HID_CYPRESS is not set
# CONFIG_HID_DRAGONRISE is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_EZKEY is not set
# CONFIG_HID_KEYTOUCH is not set
# CONFIG_HID_KYE is not set
# CONFIG_HID_UCLOGIC is not set
# CONFIG_HID_WALTOP is not set
# CONFIG_HID_GYRATION is not set
# CONFIG_HID_TWINHAN is not set
# CONFIG_HID_KENSINGTON is not set
# CONFIG_HID_LCPOWER is not set
# CONFIG_HID_LOGITECH is not set
# CONFIG_HID_MICROSOFT is not set
# CONFIG_HID_MOSART is not set
# CONFIG_HID_MONTEREY is not set
# CONFIG_HID_MULTITOUCH is not set
# CONFIG_HID_NTRIG is not set
# CONFIG_HID_ORTEK is not set
# CONFIG_HID_PANTHERLORD is not set
# CONFIG_HID_PETALYNX is not set
# CONFIG_HID_PICOLCD is not set
# CONFIG_HID_QUANTA is not set
# CONFIG_HID_ROCCAT is not set
# CONFIG_HID_ROCCAT_ARVO is not set
# CONFIG_HID_ROCCAT_KONE is not set
# CONFIG_HID_ROCCAT_KONEPLUS is not set
# CONFIG_HID_ROCCAT_KOVAPLUS is not set
# CONFIG_HID_ROCCAT_PYRA is not set
# CONFIG_HID_SAMSUNG is not set
# CONFIG_HID_SONY is not set
# CONFIG_HID_STANTUM is not set
# CONFIG_HID_SUNPLUS is not set
# CONFIG_HID_GREENASIA is not set
# CONFIG_HID_SMARTJOYPLUS is not set
# CONFIG_HID_TOPSEED is not set
# CONFIG_HID_THRUSTMASTER is not set
# CONFIG_HID_ZEROPLUS is not set
# CONFIG_HID_ZYDACRON is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
CONFIG_USB_DEBUG=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_DEVICE_CLASS is not set
CONFIG_USB_DYNAMIC_MINORS=y
# CONFIG_USB_OTG_WHITELIST is not set
# CONFIG_USB_OTG_BLACKLIST_HUB is not set
# CONFIG_USB_MON is not set
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
# CONFIG_USB_ISP1362_HCD is not set
# CONFIG_USB_OHCI_HCD is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
CONFIG_USB_STORAGE_ONETOUCH=y
CONFIG_USB_STORAGE_KARMA=y
CONFIG_USB_STORAGE_CYPRESS_ATACB=y
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set
CONFIG_USB_LIBUSUAL=y

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
CONFIG_USB_SERIAL=y
CONFIG_USB_SERIAL_CONSOLE=y
# CONFIG_USB_EZUSB is not set
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
CONFIG_USB_SERIAL_BELKIN=y
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_FUNSOFT is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
CONFIG_USB_SERIAL_MCT_U232=y
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MOTOROLA is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QCAUX is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_HP4X is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SAMBA is not set
# CONFIG_USB_SERIAL_SIEMENS_MPI is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_VIVOPAY_SERIAL is not set
# CONFIG_USB_SERIAL_ZIO is not set
# CONFIG_USB_SERIAL_SSU100 is not set
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_LM3530 is not set
# CONFIG_LEDS_ALIX2 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_LP3944 is not set
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_BD2802 is not set
# CONFIG_LEDS_INTEL_SS4200 is not set
# CONFIG_LEDS_DELL_NETBOOKS is not set
CONFIG_LEDS_TRIGGERS=y

#
# LED Triggers
#
# CONFIG_LEDS_TRIGGER_TIMER is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_NFC_DEVICES is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_EDAC is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_DEBUG is not set

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_DS3232 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_ISL12022 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_BQ32K is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set

#
# on-CPU RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
# CONFIG_INTEL_MID_DMAC is not set
CONFIG_INTEL_IOATDMA=y
# CONFIG_TIMB_DMA is not set
# CONFIG_PCH_DMA is not set
CONFIG_DMA_ENGINE=y

#
# DMA Clients
#
# CONFIG_NET_DMA is not set
# CONFIG_ASYNC_TX_DMA is not set
# CONFIG_DMATEST is not set
CONFIG_DCA=y
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=y
# CONFIG_UIO_CIF is not set
# CONFIG_UIO_PDRV is not set
# CONFIG_UIO_PDRV_GENIRQ is not set
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_UIO_PCI_GENERIC is not set
# CONFIG_UIO_NETX is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACER_WMI=y
# CONFIG_ASUS_LAPTOP is not set
CONFIG_DELL_WMI=y
# CONFIG_DELL_WMI_AIO is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_HP_ACCEL is not set
CONFIG_HP_WMI=y
# CONFIG_PANASONIC_LAPTOP is not set
CONFIG_THINKPAD_ACPI=y
CONFIG_THINKPAD_ACPI_ALSA_SUPPORT=y
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_INTEL_MENLOW is not set
CONFIG_EEEPC_LAPTOP=y
# CONFIG_ASUS_WMI is not set
CONFIG_ACPI_WMI=y
# CONFIG_MSI_WMI is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_ACPI_TOSHIBA is not set
# CONFIG_TOSHIBA_BT_RFKILL is not set
# CONFIG_ACPI_CMPC is not set
# CONFIG_INTEL_IPS is not set
# CONFIG_IBM_RTL is not set
# CONFIG_XO15_EBOOK is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_EFI_VARS is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_DMI_SYSFS is not set
# CONFIG_ISCSI_IBFT_FIND is not set
# CONFIG_SIGMA is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
# CONFIG_EXT2_FS_SECURITY is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_XATTR=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
# CONFIG_REISERFS_FS_XATTR is not set
CONFIG_JFS_FS=y
# CONFIG_JFS_POSIX_ACL is not set
# CONFIG_JFS_SECURITY is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_XFS_FS=y
# CONFIG_XFS_QUOTA is not set
CONFIG_XFS_POSIX_ACL=y
# CONFIG_XFS_RT is not set
# CONFIG_XFS_DEBUG is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=y
# CONFIG_BTRFS_FS_POSIX_ACL is not set
CONFIG_NILFS2_FS=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_QUOTACTL is not set
# CONFIG_AUTOFS4_FS is not set
CONFIG_FUSE_FS=y
# CONFIG_CUSE is not set

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
# CONFIG_JOLIET is not set
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
# CONFIG_MSDOS_FS is not set
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_LOGFS is not set
CONFIG_CRAMFS=y
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
# CONFIG_NFS_V4 is not set
CONFIG_ROOT_NFS=y
CONFIG_NFSD=y
CONFIG_NFSD_DEPRECATED=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
# CONFIG_NFSD_V4 is not set
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=y
CONFIG_CIFS_STATS=y
CONFIG_CIFS_STATS2=y
# CONFIG_CIFS_WEAK_PW_HASH is not set
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG2=y
# CONFIG_CIFS_ACL is not set
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=y
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_PRINTK_TIME=y
CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
# CONFIG_LOCKUP_DETECTOR is not set
# CONFIG_HARDLOCKUP_DETECTOR is not set
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_TRACE_IRQFLAGS=y
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VIRTUAL=y
CONFIG_DEBUG_WRITECOUNT=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_DEBUG_CREDENTIALS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
CONFIG_BOOT_PRINTK_DELAY=y
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_RCU_CPU_STALL_DETECTOR=y
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE=y
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# CONFIG_LKDTM is not set
# CONFIG_CPU_NOTIFIER_ERROR_INJECT is not set
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_LATENCYTOP=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FTRACE_NMI_ENTER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_EVENT_TRACING=y
CONFIG_EVENT_POWER_TRACING_DEPRECATED=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_IRQSOFF_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_STACK_TRACER is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENT=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_MMIOTRACE=y
CONFIG_MMIOTRACE_TEST=m
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
# CONFIG_DEBUG_RODATA is not set
# CONFIG_DEBUG_SET_MODULE_RONX is not set
# CONFIG_DEBUG_NX_TEST is not set
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
# CONFIG_X86_DECODER_SELFTEST is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set
# CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=y
CONFIG_ASYNC_MEMCPY=y
CONFIG_ASYNC_XOR=y
CONFIG_ASYNC_PQ=y
CONFIG_ASYNC_RAID6_RECOV=y
CONFIG_ASYNC_TX_DISABLE_PQ_VAL_DMA=y
CONFIG_ASYNC_TX_DISABLE_XOR_VAL_DMA=y
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
# CONFIG_CRYPTO_GF128MUL is not set
CONFIG_CRYPTO_NULL=y
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=y
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_GHASH is not set
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_X86_64 is not set
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_ARC4=y
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_USER_API_HASH is not set
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_PADLOCK is not set
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
CONFIG_KVM_INTEL=y
# CONFIG_KVM_AMD is not set
# CONFIG_KVM_MMU_AUDIT is not set
CONFIG_VHOST_NET=y
CONFIG_VIRTIO=y
CONFIG_VIRTIO_RING=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
# CONFIG_XZ_DEC is not set
# CONFIG_XZ_DEC_BCJ is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=y
CONFIG_TEXTSEARCH_BM=y
CONFIG_TEXTSEARCH_FSM=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_CPU_RMAP=y
CONFIG_NLATTR=y
# CONFIG_AVERAGE is not set

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-04  1:56                     ` Dave Young
  2011-05-04  2:32                       ` Dave Young
@ 2011-05-04  4:00                       ` Wu Fengguang
  2011-05-04  7:33                         ` Dave Young
  1 sibling, 1 reply; 43+ messages in thread
From: Wu Fengguang @ 2011-05-04  4:00 UTC (permalink / raw)
  To: Dave Young
  Cc: Andrew Morton, Minchan Kim, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

[-- Attachment #1: Type: text/plain, Size: 1018 bytes --]

> > CAL: A  A  220449 A  A  220246 A  A  220372 A  A  220558 A  A  220251 A  A  219740 A  A  220043 A  A  219968 A  Function call interrupts
> >
> > LOC: A  A  536274 A  A  532529 A  A  531734 A  A  536801 A  A  536510 A  A  533676 A  A  534853 A  A  532038 A  Local timer interrupts
> > RES: A  A  A  3032 A  A  A  2128 A  A  A  1792 A  A  A  1765 A  A  A  2184 A  A  A  1703 A  A  A  1754 A  A  A  1865 A  Rescheduling interrupts
> > TLB: A  A  A  A 189 A  A  A  A  15 A  A  A  A  13 A  A  A  A  17 A  A  A  A  64 A  A  A  A 294 A  A  A  A  97 A  A  A  A  63 A  TLB shootdowns
> 
> Could you tell how to get above info?

It's /proc/interrupts.

I have two lines at the end of the attached script to collect the
information, and another script to call getdelays on every 10s. The
posted reclaim delays are the last successful getdelays output. 

I've automated the test process, so that with one single command line
a new kernel will be built and the test box will rerun tests on the
new kernel :)

Thanks,
Fengguang

[-- Attachment #2: test-alloc-fails.sh --]
[-- Type: application/x-sh, Size: 295 bytes --]

[-- Attachment #3: test-dd-sparse.sh --]
[-- Type: application/x-sh, Size: 422 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-04  2:56                         ` Wu Fengguang
@ 2011-05-04  4:23                           ` Wu Fengguang
  0 siblings, 0 replies; 43+ messages in thread
From: Wu Fengguang @ 2011-05-04  4:23 UTC (permalink / raw)
  To: Dave Young
  Cc: Andrew Morton, Minchan Kim, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

On Wed, May 04, 2011 at 10:56:09AM +0800, Wu Fengguang wrote:
> On Wed, May 04, 2011 at 10:32:01AM +0800, Dave Young wrote:
> > On Wed, May 4, 2011 at 9:56 AM, Dave Young <hidave.darkstar@gmail.com> wrote:
> > > On Thu, Apr 28, 2011 at 9:36 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > >> Concurrent page allocations are suffering from high failure rates.
> > >>
> > >> On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
> > >> the page allocation failures are
> > >>
> > >> nr_alloc_fail 733 A  A  A  # interleaved reads by 1 single task
> > >> nr_alloc_fail 11799 A  A  # concurrent reads by 1000 tasks
> > >>
> > >> The concurrent read test script is:
> > >>
> > >> A  A  A  A for i in `seq 1000`
> > >> A  A  A  A do
> > >> A  A  A  A  A  A  A  A truncate -s 1G /fs/sparse-$i
> > >> A  A  A  A  A  A  A  A dd if=/fs/sparse-$i of=/dev/null &
> > >> A  A  A  A done
> > >>
> > >
> > > With Core2 Duo, 3G ram, No swap partition I can not produce the alloc fail
> > 
> > unset CONFIG_SCHED_AUTOGROUP and CONFIG_CGROUP_SCHED seems affects the
> > test results, now I see several nr_alloc_fail (dd is not finished
> > yet):
> > 
> > dave@darkstar-32:$ grep fail /proc/vmstat:
> > nr_alloc_fail 4
> > compact_pagemigrate_failed 0
> > compact_fail 3
> > htlb_buddy_alloc_fail 0
> > thp_collapse_alloc_fail 4
> > 
> > So the result is related to cpu scheduler.
> 
> Good catch! My kernel also disabled CONFIG_CGROUP_SCHED and
> CONFIG_SCHED_AUTOGROUP.

I tried enable the two options and find that "ps ax" runs much faster
when the 1000 dd's are running. And test results in base kernel are:

start time: 287
total time: 499
nr_alloc_fail 5075
allocstall 20658
LOC:     502393     501303     500813     503814     501972     501775     501949     501143   Local timer interrupts
RES:       5716       8584       7603       2699       7972      15383       8921       4345   Rescheduling interrupts
CAL:       1543       1731       1733       1809       1692       1715       1765       1753   Function call interrupts
TLB:        132         27         31         21         70        175         68         46   TLB shootdowns

CPU             count     real total  virtual total    delay total  delay average
                  916     2803573792     2785739581   200248952651        218.612ms
IO              count    delay total  delay average
                    0              0              0ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                   15      234623427             15ms
dd: read=0, write=0, cancelled_write=0

Comparing to the cgroup-sched disabled results (cited below), the
allocstall is reduced to 1.3% and CALs are mostly eliminated.
nr_alloc_fail is cut down by almost 2/3. RECLAIM delay is reduced
from 29ms to 15ms. Virtually everything improved considerably!

Thanks,
Fengguang
---

start time: 245
total time: 526
nr_alloc_fail 14586
allocstall 1578343
LOC:     533981     529210     528283     532346     533392     531314     531705     528983   Local timer interrupts
RES:       3123       2177       1676       1580       2157       1974       1606       1696   Rescheduling interrupts
CAL:     218392     218631     219167     219217     218840     218985     218429     218440   Function call interrupts
TLB:        175         13         21         18         62        309        119         42   TLB shootdowns


CPU             count     real total  virtual total    delay total
                 1122     3676441096     3656793547   274182127286
IO              count    delay total  delay average
                    3      291765493             97ms
SWAP            count    delay total  delay average
                    0              0              0ms
RECLAIM         count    delay total  delay average
                 1350    39229752193             29ms
dd: read=45056, write=0, cancelled_write=0

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
  2011-05-04  4:00                       ` Wu Fengguang
@ 2011-05-04  7:33                         ` Dave Young
  0 siblings, 0 replies; 43+ messages in thread
From: Dave Young @ 2011-05-04  7:33 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Minchan Kim, linux-mm, Linux Kernel Mailing List,
	Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Christoph Lameter,
	Dave Chinner, David Rientjes

On Wed, May 4, 2011 at 12:00 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > CAL:     220449     220246     220372     220558     220251     219740     220043     219968   Function call interrupts
>> >
>> > LOC:     536274     532529     531734     536801     536510     533676     534853     532038   Local timer interrupts
>> > RES:       3032       2128       1792       1765       2184       1703       1754       1865   Rescheduling interrupts
>> > TLB:        189         15         13         17         64        294         97         63   TLB shootdowns
>>
>> Could you tell how to get above info?
>
> It's /proc/interrupts.
>
> I have two lines at the end of the attached script to collect the
> information, and another script to call getdelays on every 10s. The
> posted reclaim delays are the last successful getdelays output.
>
> I've automated the test process, so that with one single command line
> a new kernel will be built and the test box will rerun tests on the
> new kernel :)

Thank you for that effort!

>
> Thanks,
> Fengguang
>



-- 
Regards
dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2011-05-04  7:33 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-26  5:49 readahead and oom Dave Young
2011-04-26  5:55 ` Wu Fengguang
2011-04-26  6:05   ` Dave Young
2011-04-26  6:07     ` Dave Young
2011-04-26  6:25       ` Wu Fengguang
2011-04-26  6:29         ` Dave Young
2011-04-26  6:34           ` Wu Fengguang
2011-04-26  6:50             ` KOSAKI Motohiro
2011-04-26  7:41             ` Minchan Kim
2011-04-26  9:20               ` Wu Fengguang
2011-04-26  9:28                 ` Minchan Kim
2011-04-26 10:18                   ` Pekka Enberg
2011-04-26 19:47                 ` Andrew Morton
2011-04-28  4:19                   ` Wu Fengguang
2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
2011-04-28 13:50                       ` KOSAKI Motohiro
2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
2011-04-29  2:58                       ` Wu Fengguang
2011-04-30 14:17                       ` Wu Fengguang
2011-05-01 16:35                         ` Minchan Kim
2011-05-01 16:37                           ` Minchan Kim
2011-05-02 10:14                             ` KOSAKI Motohiro
2011-05-03  0:53                               ` Minchan Kim
2011-05-03  1:25                                 ` KOSAKI Motohiro
2011-05-02 10:29                           ` Wu Fengguang
2011-05-02 11:08                             ` Wu Fengguang
2011-05-03  0:49                             ` Minchan Kim
2011-05-03  3:51                               ` Wu Fengguang
2011-05-03  4:17                                 ` Minchan Kim
2011-05-02 13:29                           ` Wu Fengguang
2011-05-02 13:49                             ` Wu Fengguang
2011-05-03  0:27                               ` Satoru Moriya
2011-05-03  2:49                                 ` Wu Fengguang
2011-05-04  1:56                     ` Dave Young
2011-05-04  2:32                       ` Dave Young
2011-05-04  2:56                         ` Wu Fengguang
2011-05-04  4:23                           ` Wu Fengguang
2011-05-04  4:00                       ` Wu Fengguang
2011-05-04  7:33                         ` Dave Young
2011-04-26  6:13     ` readahead and oom Wu Fengguang
2011-04-26  6:23       ` Dave Young
2011-04-26  9:37 ` [PATCH] mm: readahead page allocations are OK to fail Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).