From: Byungchul Park <byungchul@sk.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel_team@skhynix.com, akpm@linux-foundation.org,
ying.huang@intel.com, vernhao@tencent.com,
mgorman@techsingularity.net, hughd@google.com,
willy@infradead.org, peterz@infradead.org, luto@kernel.org,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, rjgolo@gmail.com
Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90%
Date: Wed, 29 May 2024 13:39:39 +0900 [thread overview]
Message-ID: <20240529043938.GA20307@system.software.com> (raw)
In-Reply-To: <07686f06-f1a8-4282-bb48-fc4a5b554552@redhat.com>
On Tue, May 28, 2024 at 10:41:54AM +0200, David Hildenbrand wrote:
> Am 10.05.24 um 08:51 schrieb Byungchul Park:
> > Hi everyone,
> >
> > While I'm working with a tiered memory system e.g. CXL memory, I have
> > been facing migration overhead esp. tlb shootdown on promotion or
> > demotion between different tiers. Yeah.. most tlb shootdowns on
> > migration through hinting fault can be avoided thanks to Huang Ying's
> > work, commit 4d4b6d66db ("mm,unmap: avoid flushing tlb in batch if PTE
> > is inaccessible"). See the following link for more information:
> >
> > https://lore.kernel.org/lkml/20231115025755.GA29979@system.software.com/
> >
> > However, it's only for migration through hinting fault. I thought it'd
> > be much better if we have a general mechanism to reduce all the tlb
> > numbers that we can apply to any unmap code, that we normally believe
> > tlb flush should be followed.
> >
> > I'm suggesting a new mechanism, LUF(Lazy Unmap Flush), defers tlb flush
> > until folios that have been unmapped and freed, eventually get allocated
> > again. It's safe for folios that had been mapped read-only and were
> > unmapped, since the contents of the folios don't change while staying in
> > pcp or buddy so we can still read the data through the stale tlb entries.
> >
> > tlb flush can be defered when folios get unmapped as long as it
> > guarantees to perform tlb flush needed, before the folios actually
> > become used, of course, only if all the corresponding ptes don't have
> > write permission. Otherwise, the system will get messed up.
> >
> > To achieve that:
> >
> > 1. For the folios that map only to non-writable tlb entries, prevent
> > tlb flush during unmapping but perform it just before the folios
> > actually become used, out of buddy or pcp.
>
> Trying to understand the impact: Effectively, a CPU could still read data
> from a page that has already been freed, until that page gets reallocated
> again.
>
> The important part I can see is
>
> 1) PCP/buddy must not change page content (e.g., poison, init_on_free),
> otherwise an app might read wrong content.
Exactly. I will take them into account. Thank you.
> 2) If we mess up the flush-before-realloc, an app might observe data written
> by whoever allocated the page.
Yes. However, appropiate TLB flush is performed in prep_new_page().
Basically you are right. I need to pay enough attention to it.
> 3) We must reliably detect+handle any read-only PTEs for which we didn't
> flush the TLB yet, otherwise an app could see its memory writes getting
> lost. I recall that at least uffd-wp might defer TLB flushes (see comment in
> do_wp_page()). Not sure about other pte_wrprotect() callers that flush the
> TLB after processing multiple page tables, whereby rmap code might succeed
> in unmapping a page before the TLB flush happened.
>
> Any other possible issues you stumbled over that are worth mentioning?
You mentioned all that I'm concerning but in a clear way.
Byungchul
>
> --
> Thanks,
>
> David / dhildenb
prev parent reply other threads:[~2024-05-29 4:39 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 6:51 [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Byungchul Park
2024-05-10 6:51 ` [PATCH v10 01/12] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2024-05-10 6:51 ` [PATCH v10 02/12] arm64: tlbflush: " Byungchul Park
2024-05-10 6:51 ` [PATCH v10 03/12] riscv, tlb: " Byungchul Park
2024-05-10 6:51 ` [PATCH v10 04/12] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2024-05-10 6:51 ` [PATCH v10 05/12] mm: buddy: make room for a new variable, ugen, in struct page Byungchul Park
2024-05-10 6:52 ` [PATCH v10 06/12] mm: add folio_put_ugen() to deliver unmap generation number to pcp or buddy Byungchul Park
2024-05-10 6:52 ` [PATCH v10 07/12] mm: add a parameter, unmap generation number, to free_unref_folios() Byungchul Park
2024-05-10 6:52 ` [PATCH v10 08/12] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2024-05-10 6:52 ` [PATCH v10 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2024-05-10 6:52 ` [PATCH v10 10/12] mm: separate move/undo parts from migrate_pages_batch() Byungchul Park
2024-05-10 6:52 ` [PATCH v10 11/12] mm, migrate: apply luf mechanism to unmapping during migration Byungchul Park
2024-05-10 6:52 ` [PATCH v10 12/12] mm, vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park
2024-05-11 6:54 ` [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Huang, Ying
2024-05-13 1:41 ` Byungchul Park
2024-05-11 7:15 ` Huang, Ying
2024-05-13 1:44 ` Byungchul Park
2024-05-22 2:16 ` Byungchul Park
2024-05-22 7:38 ` Huang, Ying
2024-05-22 10:27 ` Byungchul Park
2024-05-22 14:15 ` Byungchul Park
2024-05-24 17:16 ` Dave Hansen
2024-05-27 1:57 ` Byungchul Park
2024-05-27 2:43 ` Dave Hansen
2024-05-27 3:46 ` Byungchul Park
2024-05-27 4:19 ` Byungchul Park
2024-05-27 4:25 ` Byungchul Park
2024-05-27 22:58 ` Byungchul Park
2024-05-29 2:16 ` Huang, Ying
2024-05-30 1:02 ` Byungchul Park
2024-05-27 3:10 ` Huang, Ying
2024-05-27 3:56 ` Byungchul Park
2024-05-28 15:14 ` Dave Hansen
2024-05-29 5:00 ` Byungchul Park
2024-05-29 16:41 ` Dave Hansen
2024-05-30 0:50 ` Byungchul Park
2024-05-30 0:59 ` Byungchul Park
2024-05-30 1:11 ` Huang, Ying
2024-05-30 1:33 ` Byungchul Park
2024-05-30 7:18 ` Byungchul Park
2024-05-30 8:24 ` Huang, Ying
2024-05-30 8:41 ` Byungchul Park
2024-05-30 13:50 ` Dave Hansen
2024-05-31 2:06 ` Byungchul Park
2024-05-30 9:33 ` Byungchul Park
2024-05-31 1:45 ` Huang, Ying
2024-05-31 2:20 ` Byungchul Park
2024-05-28 8:41 ` David Hildenbrand
2024-05-29 4:39 ` Byungchul Park [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240529043938.GA20307@system.software.com \
--to=byungchul@sk.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=kernel_team@skhynix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rjgolo@gmail.com \
--cc=tglx@linutronix.de \
--cc=vernhao@tencent.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.