Re: [RFC PATCH 0/5] hugetlb: Change huge pmd sharing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mike Kravetz <mike.kravetz@oracle.com>
To: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Prakash Sangappa <prakash.sangappa@oracle.com>,
	James Houghton <jthoughton@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Ray Fucillo <Ray.Fucillo@intersystems.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 0/5] hugetlb: Change huge pmd sharing
Date: Thu, 7 Apr 2022 09:17:31 -0700	[thread overview]
Message-ID: <4ddf7d53-db45-4201-8ae0-095698ec7e1a@oracle.com> (raw)
In-Reply-To: <045a59a1-0929-a969-b184-1311f81504b8@redhat.com>

On 4/7/22 03:08, David Hildenbrand wrote:
> On 06.04.22 22:48, Mike Kravetz wrote:
>> hugetlb fault scalability regressions have recently been reported [1].
>> This is not the first such report, as regressions were also noted when
>> commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
>> synchronization") was added [2] in v5.7.  At that time, a proposal to
>> address the regression was suggested [3] but went nowhere.
<snip>
>> Please help with comments or suggestions.  I would like to come up with
>> something that is performant and safe.
> 
> May I challenge the existence of huge PMD sharing? TBH I am not
> convinced that the code complexity is worth the benefit.
> 

That is a fair question.
Huge PMD sharing is not a documented or well known feature.  Most people would
not notice it going away.  However, I suspect some people will notice.
> Let me know if I get something wrong:
> 
> Let's assume a 4 TiB device and 2 MiB hugepage size. That's 2097152 huge
> pages. Each such PMD entry consumes 8 bytes. That's 16 MiB.
> 
> Sure, with thousands of processes sharing that memory, the size of page
> tables required would increase with each and every process. But TBH,
> that's in no way different to other file systems where we're even
> dealing with PTE tables.

The numbers for a real use case I am frequently quoted are something like:
1TB shared mapping, 10,000 processes sharing the mapping
4K PMD Page per 1GB of shared mapping
4M saving for each shared process
9,999 * 4M ~= 39GB savings

However, if you look at commit 39dde65c9940c which introduced huge pmd sharing
it states that performance rather than memory savings was the primary
objective.

"For hugetlb, the saving on page table memory is not the primary
 objective (as hugetlb itself already cuts down page table overhead
 significantly), instead, the purpose of using shared page table on hugetlb is
 to allow faster TLB refill and smaller cache pollution upon TLB miss.
    
 With PT sharing, pte entries are shared among hundreds of processes, the
 cache consumption used by all the page table is smaller and in return,
 application gets much higher cache hit ratio.  One other effect is that
 cache hit ratio with hardware page walker hitting on pte in cache will be
 higher and this helps to reduce tlb miss latency.  These two effects
 contribute to higher application performance."

That 'makes sense', but I have never tried to measure any such performance
benefit.  It is easier to calculate the memory savings.

> 
> Which results in me wondering if
> 
> a) We should simply use gigantic pages for such extreme use case. Allows
>    for freeing up more memory via vmemmap either way.

The only problem with this is that many processors in use today have
limited TLB entries for gigantic pages.

> b) We should instead look into reclaiming reconstruct-able page table.
>    It's hard to imagine that each and every process accesses each and
>    every part of the gigantic file all of the time.
> c) We should instead establish a more generic page table sharing
>    mechanism.

Yes.  I think that is the direction taken by mshare() proposal.  If we have
a more generic approach we can certainly start deprecating hugetlb pmd
sharing.

> 
> 
> Consequently, I'd be much more in favor of ripping it out :/ but that's
> just my personal opinion.
> 

-- 
Mike Kravetz

next prev parent reply	other threads:[~2022-04-07 16:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06 20:48 [RFC PATCH 0/5] hugetlb: Change huge pmd sharing Mike Kravetz
2022-04-06 20:48 ` [RFC PATCH 1/5] hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race Mike Kravetz
2022-04-06 20:48 ` [RFC PATCH 2/5] hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization Mike Kravetz
2022-04-06 20:48 ` [RFC PATCH 3/5] hugetlbfs: move routine remove_huge_page to hugetlb.c Mike Kravetz
2022-04-06 20:48 ` [RFC PATCH 4/5] hugetlbfs: catch and handle truncate racing with page faults Mike Kravetz
2022-04-06 20:48 ` [RFC PATCH 5/5] hugetlb: Check for pmd unshare and fault/lookup races Mike Kravetz
2022-04-07 10:08 ` [RFC PATCH 0/5] hugetlb: Change huge pmd sharing David Hildenbrand
2022-04-07 16:17   ` Mike Kravetz [this message]
2022-04-08  9:26     ` David Hildenbrand
2022-04-19 22:50       ` Mike Kravetz
2022-04-20  7:12         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ddf7d53-db45-4201-8ae0-095698ec7e1a@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=Ray.Fucillo@intersystems.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=jthoughton@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=prakash.sangappa@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).