linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>, Fengguang Wu <fengguang.wu@intel.com>,
	David Cohen <david.a.cohen@linux.intel.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Damien Ramonda <damien.ramonda@intel.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] mm readahead: Fix the readahead fail in case of empty numa node
Date: Tue, 31 Dec 2013 16:37:16 +0530	[thread overview]
Message-ID: <52C2A564.4040809@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFy-e-uok1K9mSNTYS4bJJfHkxXofY7T1UVWgHOyXuE84A@mail.gmail.com>

On 12/14/2013 06:09 AM, Linus Torvalds wrote:
> On Wed, Dec 11, 2013 at 3:05 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>>
>> But I'm really struggling to think up an implementation!  The current
>> code looks only at the caller's node and doesn't seem to make much
>> sense.  Should we look at all nodes?  Hard to say without prior
>> knowledge of where those pages will be coming from.
>
> I really think we want to put an upper bound on the read-ahead, and
> I'm not convinced we need to try to be excessively clever about it. We
> also probably don't want to make it too expensive to calculate,
> because afaik this ends up being called for each file we open when we
> don't have pages in the page cache yet.
>
> The current function seems reasonable on a single-node system. Let's
> not kill it entirely just because it has some odd corner-case on
> multi-node systems.
>
> In fact, for all I care, I think it would be perfectly ok to just use
> a truly stupid hard limit ("you can't read-ahead more than 16MB" or
> whatever).
>
> What we do *not* want to allow is to have people call "readahead"
> functions and basically kill the machine because you now have a
> unkillable IO that is insanely big. So I'd much rather limit it too
> much than too little. And on absolutely no sane IO susbsystem does it
> make sense to read ahead insane amounts.
>
> So I'd rather limit it to something stupid and small, than to not
> limit things at all.
>
> Looking at the interface, for example, the natural thing to do for the
> "readahead()" system call, for example, is to just give it a size of
> ~0ul, and let the system limit things, becaue limiting things in useer
> space is just not reasonable.
>
> So I really do *not* think it's fine to just remove the limit entirely.
>

Very sorry for late reply (was on very loong vacation).

How about having 16MB limit only for remote readaheads and continuing
the rest as is, something like below:

#define MAX_REMOTE_READAHEAD    4096UL

unsigned long max_sane_readahead(unsigned long nr)
{

	unsigned long local_free_page = (node_page_state(numa_node_id(), 
NR_INACTIVE_FILE)
	+ node_page_state(numa_node_id(), NR_FREE_PAGES));
	unsigned long sane_nr = min(nr, MAX_REMOTE_READAHEAD);

	return (local_free_page ? min(nr, local_free_page / 2) : sane_nr);
}

or we can enforce 16MB limit for all the case too.

I 'll send a patch accordingly.

(readahead max will scale accordingly if we dont have 4k page size
  above).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2013-12-31 10:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03 10:36 [PATCH RFC] mm readahead: Fix the readahead fail in case of empty numa node Raghavendra K T
2013-12-03 22:38 ` Andrew Morton
2013-12-04  8:30   ` Raghavendra K T
2013-12-04  8:41     ` Andrew Morton
2013-12-04  9:08       ` Raghavendra K T
2013-12-04 21:48         ` Andrew Morton
2013-12-05  5:57           ` Raghavendra K T
2013-12-11 22:49           ` Jan Kara
2013-12-11 23:05             ` Andrew Morton
2013-12-12 11:14               ` Jan Kara
2013-12-14  0:39               ` Linus Torvalds
2013-12-31 11:07                 ` Raghavendra K T [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52C2A564.4040809@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=damien.ramonda@intel.com \
    --cc=david.a.cohen@linux.intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).