Re: [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Yin Fengwei <fengwei.yin@intel.com>, linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Nick Piggin <npiggin@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, "Huang, Ying" <ying.huang@intel.com>
Subject: Re: [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP
Date: Wed, 31 Jan 2024 11:43:07 +0100	[thread overview]
Message-ID: <3739b705-b7c7-4d1b-b97d-7e55b99f64f6@redhat.com> (raw)
In-Reply-To: <4ef64fd1-f605-4ddf-82e6-74b5e2c43892@intel.com>

On 31.01.24 03:20, Yin Fengwei wrote:
> On 1/29/24 22:32, David Hildenbrand wrote:
>> This series is based on [1] and must be applied on top of it.
>> Similar to what we did with fork(), let's implement PTE batching
>> during unmap/zap when processing PTE-mapped THPs.
>>
>> We collect consecutive PTEs that map consecutive pages of the same large
>> folio, making sure that the other PTE bits are compatible, and (a) adjust
>> the refcount only once per batch, (b) call rmap handling functions only
>> once per batch, (c) perform batch PTE setting/updates and (d) perform TLB
>> entry removal once per batch.
>>
>> Ryan was previously working on this in the context of cont-pte for
>> arm64, int latest iteration [2] with a focus on arm6 with cont-pte only.
>> This series implements the optimization for all architectures, independent
>> of such PTE bits, teaches MMU gather/TLB code to be fully aware of such
>> large-folio-pages batches as well, and amkes use of our new rmap batching
>> function when removing the rmap.
>>
>> To achieve that, we have to enlighten MMU gather / page freeing code
>> (i.e., everything that consumes encoded_page) to process unmapping
>> of consecutive pages that all belong to the same large folio. I'm being
>> very careful to not degrade order-0 performance, and it looks like I
>> managed to achieve that.
> 
> One possible scenario:
> If all the folio is 2M size folio, then one full batch could hold 510M memory.
> Is it too much regarding one full batch before just can hold (2M - 4096 * 2)
> memory?

Good point, we do have CONFIG_INIT_ON_FREE_DEFAULT_ON. I don't remember 
if init_on_free or init_on_alloc was used in production systems. In 
tlb_batch_pages_flush(), there is a cond_resched() to limit the number 
of entries we process.

So if that is actually problematic, we'd run into a soft-lockup and need 
another cond_resched() [I have some faint recollection that people are 
working on removing cond_resched() completely].

One could do some counting in free_pages_and_swap_cache() (where we 
iterate all entries already) and insert cond_resched+release_pages() for 
every (e.g., 512) pages.

-- 
Cheers,

David / dhildenb

     prev parent reply	other threads:[~2024-01-31 10:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-29 14:32 [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP David Hildenbrand
2024-01-29 14:32 ` [PATCH v1 1/9] mm/memory: factor out zapping of present pte into zap_present_pte() David Hildenbrand
2024-01-30  8:13   ` Ryan Roberts
2024-01-30  8:41     ` David Hildenbrand
2024-01-30  8:46       ` Ryan Roberts
2024-01-30  8:49         ` David Hildenbrand
2024-01-29 14:32 ` [PATCH v1 2/9] mm/memory: handle !page case in zap_present_pte() separately David Hildenbrand
2024-01-30  8:20   ` Ryan Roberts
2024-01-29 14:32 ` [PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte() David Hildenbrand
2024-01-30  8:31   ` Ryan Roberts
2024-01-30  8:37     ` David Hildenbrand
2024-01-30  8:45       ` Ryan Roberts
2024-01-30  8:47         ` David Hildenbrand
2024-01-29 14:32 ` [PATCH v1 4/9] mm/memory: factor out zapping folio pte into zap_present_folio_pte() David Hildenbrand
2024-01-30  8:47   ` Ryan Roberts
2024-01-29 14:32 ` [PATCH v1 5/9] mm/mmu_gather: pass "delay_rmap" instead of encoded page to __tlb_remove_page_size() David Hildenbrand
2024-01-30  8:41   ` Ryan Roberts
2024-01-29 14:32 ` [PATCH v1 6/9] mm/mmu_gather: define ENCODED_PAGE_FLAG_DELAY_RMAP David Hildenbrand
2024-01-30  9:03   ` Ryan Roberts
2024-01-29 14:32 ` [PATCH v1 7/9] mm/mmu_gather: add __tlb_remove_folio_pages() David Hildenbrand
2024-01-30  9:21   ` Ryan Roberts
2024-01-30  9:33     ` David Hildenbrand
2024-01-29 14:32 ` [PATCH v1 8/9] mm/mmu_gather: add tlb_remove_tlb_entries() David Hildenbrand
2024-01-30  9:33   ` Ryan Roberts
2024-01-29 14:32 ` [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP David Hildenbrand
2024-01-30  9:08   ` David Hildenbrand
2024-01-30  9:48   ` Ryan Roberts
2024-01-31 10:21     ` David Hildenbrand
2024-01-31 10:31       ` Ryan Roberts
2024-01-31 11:13         ` David Hildenbrand
2024-01-31  2:30   ` Yin Fengwei
2024-01-31 10:30     ` David Hildenbrand
2024-01-31 10:43       ` Yin, Fengwei
2024-01-31  2:20 ` [PATCH v1 0/9] " Yin Fengwei
2024-01-31 10:16   ` David Hildenbrand
2024-01-31 10:26     ` Ryan Roberts
2024-01-31 14:08       ` Michal Hocko
2024-01-31 14:20         ` David Hildenbrand
2024-01-31 14:03     ` Michal Hocko
2024-01-31 10:43   ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3739b705-b7c7-4d1b-b97d-7e55b99f64f6@redhat.com \
    --to=david@redhat.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=fengwei.yin@intel.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=svens@linux.ibm.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).