From: Byungchul Park <byungchul@sk.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel_team@skhynix.com, akpm@linux-foundation.org,
vernhao@tencent.com, mgorman@techsingularity.net,
hughd@google.com, willy@infradead.org, david@redhat.com,
peterz@infradead.org, luto@kernel.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
rjgolo@gmail.com
Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90%
Date: Fri, 31 May 2024 11:20:57 +0900 [thread overview]
Message-ID: <20240531022057.GA80728@system.software.com> (raw)
In-Reply-To: <87wmnazrcy.fsf@yhuang6-desk2.ccr.corp.intel.com>
On Fri, May 31, 2024 at 09:45:33AM +0800, Huang, Ying wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > On Thu, May 30, 2024 at 04:24:12PM +0800, Huang, Ying wrote:
> >> Byungchul Park <byungchul@sk.com> writes:
> >>
> >> > On Thu, May 30, 2024 at 09:11:45AM +0800, Huang, Ying wrote:
> >> >> Byungchul Park <byungchul@sk.com> writes:
> >> >>
> >> >> > On Wed, May 29, 2024 at 09:41:22AM -0700, Dave Hansen wrote:
> >> >> >> On 5/28/24 22:00, Byungchul Park wrote:
> >> >> >> > All the code updating ptes already performs TLB flush needed in a safe
> >> >> >> > way if it's inevitable e.g. munmap. LUF which controls when to flush in
> >> >> >> > a higer level than arch code, just leaves stale ro tlb entries that are
> >> >> >> > currently supposed to be in use. Could you give a scenario that you are
> >> >> >> > concering?
> >> >> >>
> >> >> >> Let's go back this scenario:
> >> >> >>
> >> >> >> fd = open("/some/file", O_RDONLY);
> >> >> >> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...);
> >> >> >> foo1 = *ptr1;
> >> >> >>
> >> >> >> There's a read-only PTE at 'ptr1'. Right? The page being pointed to is
> >> >> >> eligible for LUF via the try_to_unmap() paths. In other words, the page
> >> >> >> might be reclaimed at any time. If it is reclaimed, the PTE will be
> >> >> >> cleared.
> >> >> >>
> >> >> >> Then, the user might do:
> >> >> >>
> >> >> >> munmap(ptr1, PAGE_SIZE);
> >> >> >>
> >> >> >> Which will _eventually_ wind up in the zap_pte_range() loop. But that
> >> >> >> loop will only see pte_none(). It doesn't do _anything_ to the 'struct
> >> >> >> mmu_gather'.
> >> >> >>
> >> >> >> The munmap() then lands in tlb_flush_mmu_tlbonly() where it looks at the
> >> >> >> 'struct mmu_gather':
> >> >> >>
> >> >> >> if (!(tlb->freed_tables || tlb->cleared_ptes ||
> >> >> >> tlb->cleared_pmds || tlb->cleared_puds ||
> >> >> >> tlb->cleared_p4ds))
> >> >> >> return;
> >> >> >>
> >> >> >> But since there were no cleared PTEs (or anything else) during the
> >> >> >> unmap, this just returns and doesn't flush the TLB.
> >> >> >>
> >> >> >> We now have an address space with a stale TLB entry at 'ptr1' and not
> >> >> >> even a VMA there. There's nothing to stop a new VMA from going in,
> >> >> >> installing a *new* PTE, but getting data from the stale TLB entry that
> >> >> >> still hasn't been flushed.
> >> >> >
> >> >> > Thank you for the explanation. I got you. I think I could handle the
> >> >> > case through a new flag in vma or something indicating LUF has deferred
> >> >> > necessary TLB flush for it during unmapping so that mmu_gather mechanism
> >> >> > can be aware of it. Of course, the performance change should be checked
> >> >> > again. Thoughts?
> >> >>
> >> >> I suggest you to start with the simple case. That is, only support page
> >> >> reclaiming and migration. A TLB flushing can be enforced during unmap
> >> >> with something similar as flush_tlb_batched_pending().
> >> >
> >> > While reading flush_tlb_batched_pending(mm), I found it already performs
> >> > TLB flush for the target mm, if set_tlb_ubc_flush_pending(mm) has been
> >> > hit at least once since the last flush_tlb_batched_pending(mm).
> >> >
> >> > Since LUF also relies on set_tlb_ubc_flush_pending(mm), it's going to
> >> > perform TLB flush required, in flush_tlb_batched_pending(mm) during
> >> > munmap(). So it looks safe to me with regard to munmap() already.
> >> >
> >> > Is there something that I'm missing?
> >> >
> >> > JFYI, regarding to mmap(), I have reworked on fault handler to give up
> >> > luf when needed in a better way.
> >>
> >> If TLB flush is always enforced during munmap(), then your solution can
> >> only avoid TLB flushing for page reclaiming and migration, not unmap.
> >
> > I'm not sure if I understand what you meant. Could you explain it in
> > more detail?
> >
> > LUF works for only *unmapping* that happens during page reclaiming and
> > migration. Other unmappings than page reclaiming and migration are not
> > what LUF works for. That's why I thought flush_tlb_batched_pending()
> > could handle the pending tlb flushes in the case.
> >
> > It'd be appreciated if you explain what you meant more.
> >
>
> In the following email, you have claimed that LUF can avoid TLB flushing
> for munmap()/mmap().
My bad. Sorry for that confusing expression.
"give up LUF at mmap()" doesn't mean giving up applying LUF to mmap().
"give up LUF at mmap()" means giving up the pending that has been
induced by LUF, in other words, giving up the benefit by LUF because we
are going through mmap() / munmap().
I will be more careful in expressing these things.
> https://lore.kernel.org/linux-mm/20240527015732.GA61604@system.software.com/
>
> Now, you said it can only avoid TLB flushing for page reclaiming and
> migration.
This is true.
Byungchul
> So, to avoid confusion, I suggest you to send out a new series and make
> it explicit that it can only optimize page reclaiming and migration, but
> not munmap(). And it would be good too to add some words about how it
> interact with other TLB flushing mechanisms.
>
> --
> Best Regards,
> Huang, Ying
next prev parent reply other threads:[~2024-05-31 2:21 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 6:51 [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Byungchul Park
2024-05-10 6:51 ` [PATCH v10 01/12] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2024-05-10 6:51 ` [PATCH v10 02/12] arm64: tlbflush: " Byungchul Park
2024-05-10 6:51 ` [PATCH v10 03/12] riscv, tlb: " Byungchul Park
2024-05-10 6:51 ` [PATCH v10 04/12] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2024-05-10 6:51 ` [PATCH v10 05/12] mm: buddy: make room for a new variable, ugen, in struct page Byungchul Park
2024-05-10 6:52 ` [PATCH v10 06/12] mm: add folio_put_ugen() to deliver unmap generation number to pcp or buddy Byungchul Park
2024-05-10 6:52 ` [PATCH v10 07/12] mm: add a parameter, unmap generation number, to free_unref_folios() Byungchul Park
2024-05-10 6:52 ` [PATCH v10 08/12] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2024-05-10 6:52 ` [PATCH v10 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2024-05-10 6:52 ` [PATCH v10 10/12] mm: separate move/undo parts from migrate_pages_batch() Byungchul Park
2024-05-10 6:52 ` [PATCH v10 11/12] mm, migrate: apply luf mechanism to unmapping during migration Byungchul Park
2024-05-10 6:52 ` [PATCH v10 12/12] mm, vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park
2024-05-11 6:54 ` [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Huang, Ying
2024-05-13 1:41 ` Byungchul Park
2024-05-11 7:15 ` Huang, Ying
2024-05-13 1:44 ` Byungchul Park
2024-05-22 2:16 ` Byungchul Park
2024-05-22 7:38 ` Huang, Ying
2024-05-22 10:27 ` Byungchul Park
2024-05-22 14:15 ` Byungchul Park
2024-05-24 17:16 ` Dave Hansen
2024-05-27 1:57 ` Byungchul Park
2024-05-27 2:43 ` Dave Hansen
2024-05-27 3:46 ` Byungchul Park
2024-05-27 4:19 ` Byungchul Park
2024-05-27 4:25 ` Byungchul Park
2024-05-27 22:58 ` Byungchul Park
2024-05-29 2:16 ` Huang, Ying
2024-05-30 1:02 ` Byungchul Park
2024-05-27 3:10 ` Huang, Ying
2024-05-27 3:56 ` Byungchul Park
2024-05-28 15:14 ` Dave Hansen
2024-05-29 5:00 ` Byungchul Park
2024-05-29 16:41 ` Dave Hansen
2024-05-30 0:50 ` Byungchul Park
2024-05-30 0:59 ` Byungchul Park
2024-05-30 1:11 ` Huang, Ying
2024-05-30 1:33 ` Byungchul Park
2024-05-30 7:18 ` Byungchul Park
2024-05-30 8:24 ` Huang, Ying
2024-05-30 8:41 ` Byungchul Park
2024-05-30 13:50 ` Dave Hansen
2024-05-31 2:06 ` Byungchul Park
2024-05-30 9:33 ` Byungchul Park
2024-05-31 1:45 ` Huang, Ying
2024-05-31 2:20 ` Byungchul Park [this message]
2024-05-28 8:41 ` David Hildenbrand
2024-05-29 4:39 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240531022057.GA80728@system.software.com \
--to=byungchul@sk.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=kernel_team@skhynix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rjgolo@gmail.com \
--cc=tglx@linutronix.de \
--cc=vernhao@tencent.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).