From: Hugh Dickins <hugh@veritas.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andi Kleen <andi@firstfloor.org>,
David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH mmotm] mm: alloc_large_system_hash check order
Date: Fri, 1 May 2009 15:28:47 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0905011509460.28876@blonde.anvils> (raw)
In-Reply-To: <20090501140015.GA27831@csn.ul.ie>
On Fri, 1 May 2009, Mel Gorman wrote:
> On Fri, May 01, 2009 at 12:30:03PM +0100, Hugh Dickins wrote:
> >
> > Andrew noticed another oddity: that if it goes the hashdist __vmalloc()
> > way, it won't be limited by MAX_ORDER. Makes one wonder whether it
> > ought to fall back to __vmalloc() if the alloc_pages_exact() fails.
>
> I don't believe so. __vmalloc() is only used when hashdist= is used
> or on IA-64 (according to the documentation).
Doc out of date, hashdist's default "on" was extended to include
x86_64 ages ago, and to all 64-bit in 2.6.30-rc.
> It is used in the case that the caller is
> willing to deal with the vmalloc() overhead (e.g. using base page PTEs) in
> exchange for the pages being interleaved on different nodes so that access
> to the hash table has average performance[*]
>
> If we automatically fell back to vmalloc(), I bet 2c we'd eventually get
> a mysterious performance regression report for a workload that depended on
> the hash tables performance but that there was enough memory for the hash
> table to be allocated with vmalloc() instead of alloc_pages_exact().
>
> [*] I speculate that on non-IA64 NUMA machines that we see different
> performance for large filesystem benchmarks depending on whether we are
> running on the boot-CPU node or not depending on whether hashdist=
> is used or not.
Now that will be "32bit NUMA machines". I was going to say that's
a tiny sample, but I'm probably out of touch. I thought NUMA-Q was
on its way out, but see it still there in the tree. And presumably
nowadays there's a great swing to NUMA on Arm or netbooks or something.
>
> > I think that's a change we could make _if_ the large_system_hash
> > users ever ask for it, but _not_ one we should make surreptitiously.
> >
>
> If they want it, they'll have to ask with hashdist=.
That's quite a good argument for taking it out from under CONFIG_NUMA.
The name "hashdist" would then be absurd, but we could delight our
grandchildren with the story of how it came to be so named.
> Somehow I doubt it's specified very often :/ .
Our intuitions match! Which is probably why it got extended.
>
> Here is Take 2
>
> ==== CUT HERE ====
>
> Use alloc_pages_exact() in alloc_large_system_hash() to avoid duplicated logic V2
>
> alloc_large_system_hash() has logic for freeing pages at the end
> of an excessively large power-of-two buffer that is a duplicate of what
> is in alloc_pages_exact(). This patch converts alloc_large_system_hash()
> to use alloc_pages_exact().
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Hugh Dickins <hugh@veritas.com>
> ---
> mm/page_alloc.c | 21 ++++-----------------
> 1 file changed, 4 insertions(+), 17 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1b3da0f..8360d59 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4756,26 +4756,13 @@ void *__init alloc_large_system_hash(const char *tablename,
> else if (hashdist)
> table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
> else {
> - unsigned long order = get_order(size);
> -
> - if (order < MAX_ORDER)
> - table = (void *)__get_free_pages(GFP_ATOMIC,
> - order);
> /*
> * If bucketsize is not a power-of-two, we may free
> - * some pages at the end of hash table.
> + * some pages at the end of hash table which
> + * alloc_pages_exact() automatically does
> */
> - if (table) {
> - unsigned long alloc_end = (unsigned long)table +
> - (PAGE_SIZE << order);
> - unsigned long used = (unsigned long)table +
> - PAGE_ALIGN(size);
> - split_page(virt_to_page(table), order);
> - while (used < alloc_end) {
> - free_page(used);
> - used += PAGE_SIZE;
> - }
> - }
> + if (get_order(size) < MAX_ORDER)
> + table = alloc_pages_exact(size, GFP_ATOMIC);
> }
> } while (!table && size > PAGE_SIZE && --log2qty);
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-01 14:28 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-29 21:09 [PATCH mmotm] mm: alloc_large_system_hash check order Hugh Dickins
2009-04-29 21:28 ` Andrew Morton
2009-05-01 13:40 ` Hugh Dickins
2009-05-01 13:45 ` [PATCH 2.6.30] Doc: hashdist defaults on for 64bit Hugh Dickins
2009-05-01 14:29 ` Mel Gorman
2009-05-01 17:20 ` David Miller
2009-04-30 0:25 ` [PATCH mmotm] mm: alloc_large_system_hash check order David Miller
2009-04-30 13:25 ` Mel Gorman
2009-05-01 11:30 ` Hugh Dickins
2009-05-01 11:46 ` Eric Dumazet
2009-05-01 12:05 ` Hugh Dickins
2009-05-01 14:00 ` Mel Gorman
2009-05-01 13:59 ` Christoph Lameter
2009-05-01 15:09 ` Mel Gorman
2009-05-01 15:14 ` Christoph Lameter
2009-05-01 14:12 ` Mel Gorman
2009-05-01 14:28 ` Hugh Dickins [this message]
2009-05-01 14:43 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0905011509460.28876@blonde.anvils \
--to=hugh@veritas.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).