From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: "Michaud, Adrian" <Adrian.Michaud@dell.com>
Cc: "lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [LSF/MM TOPIC][LSF/MM ATTEND] Multiple Page Caches, Memory Tiering, Better LRU evictions,
Date: Sat, 14 Jan 2017 02:56:56 +0300 [thread overview]
Message-ID: <20170113235656.GB26245@node.shutemov.name> (raw)
In-Reply-To: <61F9233AFAF8C541AAEC03A42CB0D8C7025D002B@MX203CL01.corp.emc.com>
On Fri, Jan 13, 2017 at 09:49:14PM +0000, Michaud, Adrian wrote:
> I'd like to attend and propose one or all of the following topics at this year's summit.
>
> Multiple Page Caches (Software Enhancements)
> --------------------------
> Support for multiple page caches can provide many benefits to the kernel.
> Different memory types can be put into different page caches. One page
> cache for native DDR system memory, another page cache for slower
> NV-DIMMs, etc.
> General memory can be partitioned into several page caches of different
> sizes and could also be dedicated to high priority processes or used
> with containers to better isolate memory by dedicating a page cache to a
> cgroup process.
> Each VMA, or process, could have a page cache identifier, or page
> alloc/free callbacks that allow individual VMAs or processes to specify
> which page cache they want to use.
> Some VMAs might want anonymous memory backed by vast amounts of slower
> server class memory like NV-DIMMS.
> Some processes or individual VMAs might want their own private page
> cache.
> Each page cache can have its own eviction policy and low-water markers
> Individual page caches could also have their own swap device.
Sounds like you're re-inventing NUMA.
What am I missing?
> Memory Tiering (Software Enhancements)
> --------------------
> Using multiple page caches, evictions from one page cache could be moved
> and remapped to another page cache instead of unmapped and written to
> swap.
> If a system has 16GB of high speed DDR memory, and 64GB of slower
> memory, one could create a page cache with high speed DDR memory,
> another page cache with slower 64GB memory, and evict/copy/remap from
> the DDR page cache to the slow memory page cache. Evictions from the
> slow memory page cache would then get unmapped and written to swap.
I guess it's something that can be done as part of NUMA balancing.
> Better LRU evictions (Software and Hardware Enhancements)
> -------------------------
> Add a page fault counter to the page struct to help colorize page demand.
> We could suggest to Intel/AMD and other architecture leaders that TLB
> entries also have a translation counter (8-10 bits is sufficient)
> instead of just an "accessed" bit. Scanning/clearing access bits is
> obviously inefficient; however, if TLBs had a translation counter
> instead of a single accessed bit then scanning and recording the amount
> of activity each TLB has would be significantly better and allow us to
> bettern calculate LRU pages for evictions.
Except that would make memory accesses slower.
Even access bit handing is noticible performance hit: processor has to
write into page table entry on first access to the page.
What you're proposing is making 2^8-2^10 first accesses slower.
Sounds like no-go for me.
> TLB Shootdown (Hardware Enhancements)
> --------------------------
> We should stomp our feet and demand that TLB shootdowns should be
> hardware assisted in future architectures. Current TLB shootdown on x86
> is horribly inefficient and obviously doesn't scale. The QPI/UPI local
> bus protocol should provide TLB range invalidation broadcast so that a
> single CPU can concurrently notify other CPU/cores (with a selection
> mask) that a shared TLB entry has changed. Sending an IPI to each core
> is horribly inefficient; especially with the core counts increasing and
> the frequency of TLB unmapping/remapping also possibly increasing
> shortly with new server class memory extension technology.
IIUC, the best you can get from hardware is IPI behind the scene.
I doubt it worth the effort.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-01-13 23:56 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-13 21:49 [LSF/MM TOPIC][LSF/MM ATTEND] Multiple Page Caches, Memory Tiering, Better LRU evictions, Michaud, Adrian
2017-01-13 23:56 ` Kirill A. Shutemov [this message]
2017-01-16 16:34 ` Michaud, Adrian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170113235656.GB26245@node.shutemov.name \
--to=kirill@shutemov.name \
--cc=Adrian.Michaud@dell.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox