From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Oren Twaig <oren@scalemp.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org,
"Shai Fultheim (Shai@ScaleMP.com)" <Shai@scalemp.com>
Subject: Re: x86: vmalloc and THP
Date: Tue, 12 Aug 2014 18:01:31 +0300 [thread overview]
Message-ID: <20140812150131.GA12187@node.dhcp.inet.fi> (raw)
In-Reply-To: <1407846532.10122.66.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, Aug 12, 2014 at 05:28:52AM -0700, Eric Dumazet wrote:
> On Tue, 2014-08-12 at 09:07 +0300, Kirill A. Shutemov wrote:
> > On Tue, Aug 12, 2014 at 08:00:54AM +0300, Oren Twaig wrote:
> > >If not, is there any fast way to change this behavior ? Maybe by
> > >changing the granularity/alignment of such allocations to allow such
> > >mapping ?
> >
> > What's the point to use vmalloc() in this case?
>
> Look at various large hashes we have in the system, all using
> vmalloc() :
>
> [ 0.006856] Dentry cache hash table entries: 16777216 (order: 15, 134217728 bytes)
> [ 0.033130] Inode-cache hash table entries: 8388608 (order: 14, 67108864 bytes)
> [ 1.197621] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
I see lower-order allocation in upstream code. Is it some distribution
tweak?
> I would imagine a performance difference if we were using hugepages.
Okay, it's *probably* a valid point.
The hash tables are only allocated with vmalloc() on NUMA system, if
hashdist=1 (default on NUMA). It does it to distribute memory between
nodes. vmalloc() in NUMA_NO_NODE case will allocate all memory with
0-order page allocations: no physical contiguous memory for hugepage
mappings.
I guess we could teach vmalloc() to interleave between nodes on PMD_SIZE
chunks rather then on PAGE_SIZE if caller asks for big memory allocations.
Although, I'm not sure it it would fit all vmalloc() users.
We also would need to allocate PMD_SIZE-aligned virtual address range
to be able to mapped allocated memory with pmds.
It's *potentially* interesting research project. Any volunteers?
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Oren Twaig <oren@scalemp.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org,
"Shai Fultheim (Shai@ScaleMP.com)" <Shai@scalemp.com>
Subject: Re: x86: vmalloc and THP
Date: Tue, 12 Aug 2014 18:01:31 +0300 [thread overview]
Message-ID: <20140812150131.GA12187@node.dhcp.inet.fi> (raw)
In-Reply-To: <1407846532.10122.66.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, Aug 12, 2014 at 05:28:52AM -0700, Eric Dumazet wrote:
> On Tue, 2014-08-12 at 09:07 +0300, Kirill A. Shutemov wrote:
> > On Tue, Aug 12, 2014 at 08:00:54AM +0300, Oren Twaig wrote:
> > >If not, is there any fast way to change this behavior ? Maybe by
> > >changing the granularity/alignment of such allocations to allow such
> > >mapping ?
> >
> > What's the point to use vmalloc() in this case?
>
> Look at various large hashes we have in the system, all using
> vmalloc() :
>
> [ 0.006856] Dentry cache hash table entries: 16777216 (order: 15, 134217728 bytes)
> [ 0.033130] Inode-cache hash table entries: 8388608 (order: 14, 67108864 bytes)
> [ 1.197621] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
I see lower-order allocation in upstream code. Is it some distribution
tweak?
> I would imagine a performance difference if we were using hugepages.
Okay, it's *probably* a valid point.
The hash tables are only allocated with vmalloc() on NUMA system, if
hashdist=1 (default on NUMA). It does it to distribute memory between
nodes. vmalloc() in NUMA_NO_NODE case will allocate all memory with
0-order page allocations: no physical contiguous memory for hugepage
mappings.
I guess we could teach vmalloc() to interleave between nodes on PMD_SIZE
chunks rather then on PAGE_SIZE if caller asks for big memory allocations.
Although, I'm not sure it it would fit all vmalloc() users.
We also would need to allocate PMD_SIZE-aligned virtual address range
to be able to mapped allocated memory with pmds.
It's *potentially* interesting research project. Any volunteers?
--
Kirill A. Shutemov
next prev parent reply other threads:[~2014-08-12 15:01 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-12 5:00 x86: vmalloc and THP Oren Twaig
2014-08-12 6:07 ` Kirill A. Shutemov
2014-08-12 6:07 ` Kirill A. Shutemov
2014-08-12 12:28 ` Eric Dumazet
2014-08-12 12:28 ` Eric Dumazet
2014-08-12 15:01 ` Kirill A. Shutemov [this message]
2014-08-12 15:01 ` Kirill A. Shutemov
2014-08-12 16:20 ` Oren Twaig
2014-08-12 16:20 ` Oren Twaig
2014-08-12 21:40 ` Kirill A. Shutemov
2014-08-12 21:40 ` Kirill A. Shutemov
-- strict thread matches above, loose matches on Subject: below --
2014-08-12 5:01 Oren Twaig
2014-08-12 5:01 ` Oren Twaig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140812150131.GA12187@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=Shai@scalemp.com \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=oren@scalemp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.