* Re: [PATCH 0/8] add mTHP support for anonymous shmem [not found] ` <f44dc19a-e117-4418-9114-b723c5dc1178@redhat.com> @ 2024-05-08 19:23 ` Luis Chamberlain 2024-05-09 17:48 ` David Hildenbrand 0 siblings, 1 reply; 3+ messages in thread From: Luis Chamberlain @ 2024-05-08 19:23 UTC (permalink / raw) To: David Hildenbrand, Matthew Wilcox, Christoph Lameter, Christoph Hellwig, Dave Chinner Cc: Daniel Gomez, Baolin Wang, akpm@linux-foundation.org, hughd@google.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Linux FS Devel On Wed, May 08, 2024 at 01:58:19PM +0200, David Hildenbrand wrote: > On 08.05.24 13:39, Daniel Gomez wrote: > > On Mon, May 06, 2024 at 04:46:24PM +0800, Baolin Wang wrote: > > > The primary strategy is similar to supporting anonymous mTHP. Introduce > > > a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled', > > > which can have all the same values as the top-level > > > '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new > > > additional "inherit" option. By default all sizes will be set to "never" > > > except PMD size, which is set to "inherit". This ensures backward compatibility > > > with the shmem enabled of the top level, meanwhile also allows independent > > > control of shmem enabled for each mTHP. > > > > I'm trying to understand the adoption of mTHP and how it fits into the adoption > > of (large) folios that the kernel is moving towards. Can you, or anyone involved > > here, explain this? How much do they overlap, and can we benefit from having > > both? Is there any argument against the adoption of large folios here that I > > might have missed? > > mTHP are implemented using large folios, just like traditional PMD-sized THP > are. > > The biggest challenge with memory that cannot be evicted on memory pressure > to be reclaimed (in contrast to your ordinary files in the pagecache) is > memory waste, well, and placement of large chunks of memory in general, > during page faults. > > In the worst case (no swap), you allocate a large chunk of memory once and > it will stick around until freed: no reclaim of that memory. > > That's the reason why THP for anonymous memory and SHMEM have toggles to > manually enable and configure them, in contrast to the pagecache. The same > was done for mTHP for anonymous memory, and now (anon) shmem follows. > > There are plans to have, at some point, have it all working automatically, > but a lot for that for anonymous memory (and shmem similarly) is still > missing and unclear. Whereas the use for large folios for filesystems is already automatic, so long as the filesystem supports it. We do this in readahead and write path already for iomap, we opportunistically use large folios if we can, otherwise we use smaller folios. So a recommended approach by Matthew was to use the readahead and write path, just as in iomap to determine the size of the folio to use [0]. The use of large folios would also be automatic and not require any knobs at all. The mTHP approach would be growing the "THP" use in filesystems by the only single filesystem to use THP. Meanwhile use of large folios is already automatic with the approach taken by iomap. We're at a crux where it does beg the question if we should continue to chug on with tmpfs being special and doing things differently extending the old THP interface with mTHP, or if it should just use large folios using the same approach as iomap did. From my perspective the more shared code the better, and the more shared paths the better. There is a chance to help test swap with large folios instead of splitting the folios for swap, and that would could be done first with tmpfs. I have not evaluated the difference in testing or how we could get the most of shared code if we take a mTHP approach or the iomap approach for tmpfs, that should be considered. Are there other things to consider? Does this require some dialog at LSFMM? [0] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/ Luis ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 0/8] add mTHP support for anonymous shmem 2024-05-08 19:23 ` [PATCH 0/8] add mTHP support for anonymous shmem Luis Chamberlain @ 2024-05-09 17:48 ` David Hildenbrand 2024-05-10 18:53 ` Luis Chamberlain 0 siblings, 1 reply; 3+ messages in thread From: David Hildenbrand @ 2024-05-09 17:48 UTC (permalink / raw) To: Luis Chamberlain, Matthew Wilcox, Christoph Lameter, Christoph Hellwig, Dave Chinner Cc: Daniel Gomez, Baolin Wang, akpm@linux-foundation.org, hughd@google.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Linux FS Devel On 08.05.24 21:23, Luis Chamberlain wrote: > On Wed, May 08, 2024 at 01:58:19PM +0200, David Hildenbrand wrote: >> On 08.05.24 13:39, Daniel Gomez wrote: >>> On Mon, May 06, 2024 at 04:46:24PM +0800, Baolin Wang wrote: >>>> The primary strategy is similar to supporting anonymous mTHP. Introduce >>>> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled', >>>> which can have all the same values as the top-level >>>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new >>>> additional "inherit" option. By default all sizes will be set to "never" >>>> except PMD size, which is set to "inherit". This ensures backward compatibility >>>> with the shmem enabled of the top level, meanwhile also allows independent >>>> control of shmem enabled for each mTHP. >>> >>> I'm trying to understand the adoption of mTHP and how it fits into the adoption >>> of (large) folios that the kernel is moving towards. Can you, or anyone involved >>> here, explain this? How much do they overlap, and can we benefit from having >>> both? Is there any argument against the adoption of large folios here that I >>> might have missed? >> >> mTHP are implemented using large folios, just like traditional PMD-sized THP >> are. >> >> The biggest challenge with memory that cannot be evicted on memory pressure >> to be reclaimed (in contrast to your ordinary files in the pagecache) is >> memory waste, well, and placement of large chunks of memory in general, >> during page faults. >> >> In the worst case (no swap), you allocate a large chunk of memory once and >> it will stick around until freed: no reclaim of that memory. >> >> That's the reason why THP for anonymous memory and SHMEM have toggles to >> manually enable and configure them, in contrast to the pagecache. The same >> was done for mTHP for anonymous memory, and now (anon) shmem follows. >> >> There are plans to have, at some point, have it all working automatically, >> but a lot for that for anonymous memory (and shmem similarly) is still >> missing and unclear. > > Whereas the use for large folios for filesystems is already automatic, > so long as the filesystem supports it. We do this in readahead and write > path already for iomap, we opportunistically use large folios if we can, > otherwise we use smaller folios. > > So a recommended approach by Matthew was to use the readahead and write > path, just as in iomap to determine the size of the folio to use [0]. > The use of large folios would also be automatic and not require any > knobs at all. Yes, I remember discussing that with Willy at some point, including why shmem is unfortunately a bit more "special", because you might not even have a disk backend ("swap") at all where you could easily reclaim memory. In the extreme form, you can consider SHMEM as memory that might be always mlocked, even without the user requiring special mlock limits ... > > The mTHP approach would be growing the "THP" use in filesystems by the > only single filesystem to use THP. Meanwhile use of large folios is already > automatic with the approach taken by iomap. Yes, it's the extension of existing shmem_enabled (that -- I'm afraid -- was added for good reasons). > > We're at a crux where it does beg the question if we should continue to > chug on with tmpfs being special and doing things differently extending > the old THP interface with mTHP, or if it should just use large folios > using the same approach as iomap did. I'm afraid shmem will remain to some degree special. Fortunately it's not alone, hugetlbfs is even more special ;) > > From my perspective the more shared code the better, and the more shared > paths the better. There is a chance to help test swap with large folios > instead of splitting the folios for swap, and that would could be done > first with tmpfs. I have not evaluated the difference in testing or how > we could get the most of shared code if we take a mTHP approach or the > iomap approach for tmpfs, that should be considered. I don't have a clear picture yet of what might be best for ordinary shmem (IOW, not MAP_SHARED|MAP_PRIVATE), and I'm afraid there is no easy answer. As long as we don't end up wasting memory, it's not obviously bad. But some things might be tricky (see my example about large folios stranding in shmem and never being able to be really reclaimed+reused for better purposes) I'll note that mTHP really is just (supposed to be) a user interface to enable the various folio sizes (well, and to expose better per-size stats), not more. From that point of view, it's just a filter. Enable all, and you get the same behavior as you likely would in the pagecache mode. From a shared-code and testing point of view, there really wouldn't be a lot of differences. Again, essentially just a filter. > > Are there other things to consider? Does this require some dialog at > LSFMM? As raised in my reply to Daniel, I'll be at LSF/MM and happy to discuss. I'm also not a SHMEM expert, so I'm hoping at some point we'd get feedback from Hugh. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 0/8] add mTHP support for anonymous shmem 2024-05-09 17:48 ` David Hildenbrand @ 2024-05-10 18:53 ` Luis Chamberlain 0 siblings, 0 replies; 3+ messages in thread From: Luis Chamberlain @ 2024-05-10 18:53 UTC (permalink / raw) To: David Hildenbrand, Hugh Dickins Cc: Matthew Wilcox, Christoph Lameter, Christoph Hellwig, Dave Chinner, Daniel Gomez, Baolin Wang, akpm@linux-foundation.org, hughd@google.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Linux FS Devel On Thu, May 09, 2024 at 07:48:46PM +0200, David Hildenbrand wrote: > On 08.05.24 21:23, Luis Chamberlain wrote: > > From my perspective the more shared code the better, and the more shared > > paths the better. There is a chance to help test swap with large folios > > instead of splitting the folios for swap, and that would could be done > > first with tmpfs. I have not evaluated the difference in testing or how > > we could get the most of shared code if we take a mTHP approach or the > > iomap approach for tmpfs, that should be considered. > > I don't have a clear picture yet of what might be best for ordinary shmem > (IOW, not MAP_SHARED|MAP_PRIVATE), and I'm afraid there is no easy answer. OK so it sounds like the different options needs to be thought out and reviewed. > As long as we don't end up wasting memory, it's not obviously bad. Sure. > But some > things might be tricky (see my example about large folios stranding in shmem > and never being able to be really reclaimed+reused for better purposes) Where is that stated BTW? Could that be resolved? > I'll note that mTHP really is just (supposed to be) a user interface to > enable the various folio sizes (well, and to expose better per-size stats), > not more. Sure but given filesystems using large folios don't have silly APIs for using which large folios to enable, it just seems odd for tmpfs to take a different approach. > From that point of view, it's just a filter. Enable all, and you get the > same behavior as you likely would in the pagecache mode. Which begs the quesiton, *why* have an API to just constrain to certain large folios, which diverges from what filesystems are doing with large folios? > > Are there other things to consider? Does this require some dialog at > > LSFMM? > > As raised in my reply to Daniel, I'll be at LSF/MM and happy to discuss. I'm > also not a SHMEM expert, so I'm hoping at some point we'd get feedback from > Hugh. Hugh, will you be at LSFMM? Luis ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-05-10 18:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1714978902.git.baolin.wang@linux.alibaba.com>
[not found] ` <CGME20240508113934eucas1p13a3972f3f9955365f40155e084a7c7d5@eucas1p1.samsung.com>
[not found] ` <fqtaxc5pgu3zmvbdad4w6xty5iozye7v5z2b5ckqcjv273nz7b@hhdrjwf6rai3>
[not found] ` <f44dc19a-e117-4418-9114-b723c5dc1178@redhat.com>
2024-05-08 19:23 ` [PATCH 0/8] add mTHP support for anonymous shmem Luis Chamberlain
2024-05-09 17:48 ` David Hildenbrand
2024-05-10 18:53 ` Luis Chamberlain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).