linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Qi Zheng <zhengqi.arch@bytedance.com>,
	akpm@linux-foundation.org, tglx@linutronix.de,
	hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
	kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, songmuchun@bytedance.com
Subject: Re: [PATCH v2 6/9] mm: free user PTE page table pages
Date: Thu, 19 Aug 2021 09:01:40 +0200	[thread overview]
Message-ID: <5aa3020c-fcf2-87bd-31fe-e2b5c2aafcf2@redhat.com> (raw)
In-Reply-To: <20210819031858.98043-7-zhengqi.arch@bytedance.com>

On 19.08.21 05:18, Qi Zheng wrote:
> Some malloc libraries(e.g. jemalloc or tcmalloc) usually
> allocate the amount of VAs by mmap() and do not unmap
> those VAs. They will use madvise(MADV_DONTNEED) to free
> physical memory if they want. But the page tables do not
> be freed by madvise(), so it can produce many page tables
> when the process touches an enormous virtual address space.
> 
> The following figures are a memory usage snapshot of one
> process which actually happened on our server:
> 
>          VIRT:  55t
>          RES:   590g
>          VmPTE: 110g
> 
> As we can see, the PTE page tables size is 110g, while the
> RES is 590g. In theory, the process only need 1.2g PTE page
> tables to map those physical memory. The reason why PTE page
> tables occupy a lot of memory is that madvise(MADV_DONTNEED)
> only empty the PTE and free physical memory but doesn't free
> the PTE page table pages. So we can free those empty PTE page
> tables to save memory. In the above cases, we can save memory
> about 108g(best case). And the larger the difference between
> the size of VIRT and RES, the more memory we save.
> 
> In this patch series, we add a pte_refcount field to the
> struct page of page table to track how many users of PTE page
> table. Similar to the mechanism of page refcount, the user of
> PTE page table should hold a refcount to it before accessing.
> The PTE page table page will be freed when the last refcount
> is dropped.
> 
> While we access ->pte_refcount of a PTE page table, any of the
> following ensures the pmd entry corresponding to the PTE page
> table stability:
> 
> 	- mmap_lock
> 	- anon_lock
> 	- i_mmap_lock
> 	- parallel threads are excluded by other means which
> 	  can make ->pmd stable(e.g. gup case)
> 
> This patch does not support THP temporarily, it will be
> supported in the next patch.

Can you clarify (and document here) who exactly takes a reference on the 
page table? Do I understand correctly that

a) each !pte_none() entry inside a page table take a reference to the 
page it's containted in.
b) each page table walker temporarily grabs a page table reference
c) The PMD tables the PTE is referenced in (->currently only ever a 
single one) does *not* take a reference.

So if there are no PTE entries left and nobody walks the page tables, 
you can remove it? You should really extend the 
description/documentation to make it clearer how exactly it's supposed 
to work.


It feels kind of strange to not introduce the CONFIG_FREE_USER_PTE 
Kconfig option in this patch. At least it took me a while to identify it 
in the previous patch.

Maybe you should introduce the empty stubs and use them in a separate 
patch, and then have this patch just introduce CONFIG_FREE_USER_PTE 
along with the actual refcounting magic inside the !stub implementation.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-08-19  7:01 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19  3:18 [PATCH v2 0/9] Free user PTE page table pages Qi Zheng
2021-08-19  3:18 ` [PATCH v2 1/9] mm: introduce pmd_install() helper Qi Zheng
2021-08-24 16:26   ` David Hildenbrand
2021-08-25 16:20     ` Qi Zheng
2021-08-25 16:32       ` David Hildenbrand
2021-08-26  3:04         ` Qi Zheng
2021-08-19  3:18 ` [PATCH v2 2/9] mm: remove redundant smp_wmb() Qi Zheng
2021-08-19  3:18 ` [PATCH v2 3/9] mm: rework the parameter of lock_page_or_retry() Qi Zheng
2021-08-19  3:18 ` [PATCH v2 4/9] mm: move pte_alloc{,_map,_map_lock}() to a separate file Qi Zheng
2021-08-19  3:18 ` [PATCH v2 5/9] mm: pte_refcount infrastructure Qi Zheng
2021-08-19  3:18 ` [PATCH v2 6/9] mm: free user PTE page table pages Qi Zheng
2021-08-19  7:01   ` David Hildenbrand [this message]
2021-08-19 10:18     ` [External] " Qi Zheng
2021-09-01 13:53   ` Jason Gunthorpe
2021-09-01 13:57     ` David Hildenbrand
2021-09-01 15:32       ` Jason Gunthorpe
2021-09-01 16:13         ` David Hildenbrand
2021-09-01 16:16           ` Jason Gunthorpe
2021-09-01 16:19             ` David Hildenbrand
2021-09-01 17:10               ` Jason Gunthorpe
2021-09-01 17:49                 ` David Hildenbrand
2021-09-01 17:55                   ` Jason Gunthorpe
2021-09-01 17:58                     ` David Hildenbrand
2021-09-01 18:09                       ` Jason Gunthorpe
2021-09-01 18:10                         ` David Hildenbrand
2021-09-02  7:04                     ` Qi Zheng
2021-09-02  6:53         ` Qi Zheng
2021-08-19  3:18 ` [PATCH v2 7/9] mm: add THP support for pte_ref Qi Zheng
2021-08-19  3:18 ` [PATCH v2 8/9] mm: free PTE page table by using rcu mechanism Qi Zheng
2021-08-19  3:18 ` [PATCH v2 9/9] mm: use mmu_gather to free PTE page table Qi Zheng
2021-09-01 12:32 ` [PATCH v2 0/9] Free user PTE page table pages David Hildenbrand
2021-09-01 16:07   ` Jason Gunthorpe
2021-09-01 16:10     ` David Hildenbrand
2021-09-02  3:37   ` Qi Zheng
2021-09-15 14:52   ` Qi Zheng
2021-09-15 14:59     ` Jason Gunthorpe
2021-09-16  5:32       ` Qi Zheng
2021-09-16  8:30         ` David Hildenbrand
2021-09-16  8:41           ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5aa3020c-fcf2-87bd-31fe-e2b5c2aafcf2@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mika.penttila@nextfour.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=vdavydov.dev@gmail.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).