From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
David Cohen <david.a.cohen@linux.intel.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Damien Ramonda <damien.ramonda@intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH RFC] mm readahead: Fix the readahead fail in case of empty numa node
Date: Thu, 05 Dec 2013 11:27:53 +0530 [thread overview]
Message-ID: <52A015E1.50005@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131204134838.a048880a1db9e9acd14a39e4@linux-foundation.org>
On 12/05/2013 03:18 AM, Andrew Morton wrote:
> On Wed, 04 Dec 2013 14:38:11 +0530 Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:
>
>> On 12/04/2013 02:11 PM, Andrew Morton wrote:
> :
> : This patch takes it all out and applies the same upper limit as is used in
> : sys_readahead() - half the inactive list.
> :
> : +/*
> : + * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
> : + * sensible upper limit.
> : + */
> : +unsigned long max_sane_readahead(unsigned long nr)
> : +{
> : + unsigned long active;
> : + unsigned long inactive;
> : +
> : + get_zone_counts(&active, &inactive);
> : + return min(nr, inactive / 2);
> : +}
>
Hi Andrew, Thanks for digging out. So it seems like earlier we had not
even considered free pages?
> And one would need to go back further still to understand the rationale
> for the sys_readahead() decision and that even predates the BK repo.
>
> iirc the thinking was that we need _some_ limit on readahead size so
> the user can't go and do ridiculously large amounts of readahead via
> sys_readahead(). But that doesn't make a lot of sense because the user
> could do the same thing with plain old read().
>
True.
> So for argument's sake I'm thinking we just kill it altogether and
> permit arbitrarily large readahead:
>
> --- a/mm/readahead.c~a
> +++ a/mm/readahead.c
> @@ -238,13 +238,12 @@ int force_page_cache_readahead(struct ad
> }
>
> /*
> - * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
> - * sensible upper limit.
> + * max_sane_readahead() is disabled. It can later be removed altogether, but
> + * let's keep a skeleton in place for now, in case disabling was the wrong call.
> */
> unsigned long max_sane_readahead(unsigned long nr)
> {
> - return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
> - + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
> + return nr;
> }
>
I had something like below in mind for posting. But it looks
simple now with your patch.
unsigned long max_sane_readahead(unsigned long nr)
{
int nid;
unsigned long free_page = 0;
for_each_node_state(nid, N_MEMORY)
free_page += node_page_state(nid, NR_INACTIVE_FILE)
+ node_page_state(nid, NR_FREE_PAGES);
/*
* Readahead onto remote memory is better than no readahead when local
* numa node does not have memory. We sanitize readahead size depending
* on potential free memory in the whole system.
*/
return min(nr, free_page / (2 * nr_node_ids));
Or if we wanted to avoid iteration on nodes simply returning
something like nr/8 or something like that for remote numa fault cases.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
David Cohen <david.a.cohen@linux.intel.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Damien Ramonda <damien.ramonda@intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH RFC] mm readahead: Fix the readahead fail in case of empty numa node
Date: Thu, 05 Dec 2013 11:27:53 +0530 [thread overview]
Message-ID: <52A015E1.50005@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131204134838.a048880a1db9e9acd14a39e4@linux-foundation.org>
On 12/05/2013 03:18 AM, Andrew Morton wrote:
> On Wed, 04 Dec 2013 14:38:11 +0530 Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:
>
>> On 12/04/2013 02:11 PM, Andrew Morton wrote:
> :
> : This patch takes it all out and applies the same upper limit as is used in
> : sys_readahead() - half the inactive list.
> :
> : +/*
> : + * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
> : + * sensible upper limit.
> : + */
> : +unsigned long max_sane_readahead(unsigned long nr)
> : +{
> : + unsigned long active;
> : + unsigned long inactive;
> : +
> : + get_zone_counts(&active, &inactive);
> : + return min(nr, inactive / 2);
> : +}
>
Hi Andrew, Thanks for digging out. So it seems like earlier we had not
even considered free pages?
> And one would need to go back further still to understand the rationale
> for the sys_readahead() decision and that even predates the BK repo.
>
> iirc the thinking was that we need _some_ limit on readahead size so
> the user can't go and do ridiculously large amounts of readahead via
> sys_readahead(). But that doesn't make a lot of sense because the user
> could do the same thing with plain old read().
>
True.
> So for argument's sake I'm thinking we just kill it altogether and
> permit arbitrarily large readahead:
>
> --- a/mm/readahead.c~a
> +++ a/mm/readahead.c
> @@ -238,13 +238,12 @@ int force_page_cache_readahead(struct ad
> }
>
> /*
> - * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
> - * sensible upper limit.
> + * max_sane_readahead() is disabled. It can later be removed altogether, but
> + * let's keep a skeleton in place for now, in case disabling was the wrong call.
> */
> unsigned long max_sane_readahead(unsigned long nr)
> {
> - return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
> - + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
> + return nr;
> }
>
I had something like below in mind for posting. But it looks
simple now with your patch.
unsigned long max_sane_readahead(unsigned long nr)
{
int nid;
unsigned long free_page = 0;
for_each_node_state(nid, N_MEMORY)
free_page += node_page_state(nid, NR_INACTIVE_FILE)
+ node_page_state(nid, NR_FREE_PAGES);
/*
* Readahead onto remote memory is better than no readahead when local
* numa node does not have memory. We sanitize readahead size depending
* on potential free memory in the whole system.
*/
return min(nr, free_page / (2 * nr_node_ids));
Or if we wanted to avoid iteration on nodes simply returning
something like nr/8 or something like that for remote numa fault cases.
next prev parent reply other threads:[~2013-12-05 5:50 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 10:36 [PATCH RFC] mm readahead: Fix the readahead fail in case of empty numa node Raghavendra K T
2013-12-03 10:36 ` Raghavendra K T
2013-12-03 22:38 ` Andrew Morton
2013-12-03 22:38 ` Andrew Morton
2013-12-04 8:30 ` Raghavendra K T
2013-12-04 8:30 ` Raghavendra K T
2013-12-04 8:41 ` Andrew Morton
2013-12-04 8:41 ` Andrew Morton
2013-12-04 9:08 ` Raghavendra K T
2013-12-04 9:08 ` Raghavendra K T
2013-12-04 21:48 ` Andrew Morton
2013-12-04 21:48 ` Andrew Morton
2013-12-05 5:57 ` Raghavendra K T [this message]
2013-12-05 5:57 ` Raghavendra K T
2013-12-11 22:49 ` Jan Kara
2013-12-11 22:49 ` Jan Kara
2013-12-11 23:05 ` Andrew Morton
2013-12-11 23:05 ` Andrew Morton
2013-12-12 11:14 ` Jan Kara
2013-12-12 11:14 ` Jan Kara
2013-12-14 0:39 ` Linus Torvalds
2013-12-14 0:39 ` Linus Torvalds
2013-12-31 11:07 ` Raghavendra K T
2013-12-31 11:07 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52A015E1.50005@linux.vnet.ibm.com \
--to=raghavendra.kt@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=damien.ramonda@intel.com \
--cc=david.a.cohen@linux.intel.com \
--cc=fengguang.wu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.