From: Byungchul Park <byungchul@sk.com>
To: Nadav Amit <namit@vmware.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
"kernel_team@skhynix.com" <kernel_team@skhynix.com>,
Andrew Morton <akpm@linux-foundation.org>,
"ying.huang@intel.com" <ying.huang@intel.com>,
"xhao@linux.alibaba.com" <xhao@linux.alibaba.com>,
"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
"hughd@google.com" <hughd@google.com>,
"willy@infradead.org" <willy@infradead.org>,
"david@redhat.com" <david@redhat.com>,
"peterz@infradead.org" <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration
Date: Wed, 15 Nov 2023 14:48:46 +0900 [thread overview]
Message-ID: <20231115054846.GA74107@system.software.com> (raw)
In-Reply-To: <0EC1ED50-2370-4EA6-9A02-D36E1913224E@vmware.com>
On Fri, Nov 10, 2023 at 10:18:06PM +0000, Nadav Amit wrote:
> > On Nov 10, 2023, at 5:13 AM, Byungchul Park <byungchul@sk.com> wrote:
> >
> >
> >> Here,
> >>
> >> mprotect()
> >> do_mprotect_pkey()
> >> tlb_finish_mmu()
> >> tlb_flush_mmu()
> >>
> >> I thought TLB flush for mprotect() is performed by tlb_flush_mmu() so
> >> any cached TLB entries on other CPUs can have chance to update. Could
> >> you correct me if I get it wrong? Thanks.
> >
> > I guess you tried to inform me that x86 mmu automatically keeps the
> > consistancy based on cached TLB entries. Right? If yes, I should do
> > something on that path. If not, it's not problematic. Thoughts?
>
> Perhaps I lost something in this whole scheme. Overall, I find it overly
> complicated and somewhat hard to follow.
>
> Ideally, a solution should reduce the number of TLB flushing mechanisms
> and not introduce yet another one that would further increase the already
> high complexity.
>
> Anyhow, I a bit convoluted 2 scenarios, and I’m not sure whether they
> are a potential problem.
>
> (1) Assume there is a RO page migration and you skipped a TLB flush.
> Then you set a RO PTE entry for that page. Afterwards, you have mprotect()
> that updates the PTE for that page to be RW.
>
> Now, tlb_finish_mmu() will do a TLB flush eventually in the mprotect()
> flow, but until it is done, you might have one CPU have RO pointing to
> the source page (no TLB flush, right?) and another having RW access
Is it possible for there to be another having RW access even before
mprotect() is done?
> that were loaded from the updated PTE. Did I miss a TLB flush that
> should take place beforehand?
>
>
> (2) Indeed we encountered many problems when TLB flushing decisions
> are based on PTEs that are read from the page-tables and those do
> not reflect the values that are held in the TLBs.
Do you have any example to help me understand what you mean?
> Overall, I think that a solution is to consolidate the TLB batching
> mechanisms and hold per a group of PTEs (e.g., 512 PTEs held in one
> page-table) the deferred TLB flush generation. Track the “done” TLB
> flush generation, if the deferred is greater than the “done”, perform
> a TLB flush.
>
> This scheme is a bit similar to what you were doing, but more general,
> and easier to follow. I think that its value might be more in simplifying
> the code and reasoning than just performance.
Worth noting that, at the beginning, I tried to reduce TLB flushes by:
1. deferring and performing TLB flushes needed in a batch,
2. skipping some TLB flushes that have already been done.
However, I gave up the 2nd optimization to keep only architecture
independent code in the patch set. So the current migrc version couldn't
skip TLB flushes but does only deferring and batching TLB flushes until
it's inevitable.
To cut a long story short, migrc let any other TLB flushes go performed
as is, *except* a special case e.i. RO mapped folios, during migration.
Byungchul
next prev parent reply other threads:[~2023-11-15 5:48 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 7:25 [v3 0/3] Reduce TLB flushes under some specific conditions Byungchul Park
2023-10-30 7:25 ` [v3 1/3] mm/rmap: Recognize non-writable TLB entries during TLB batch flush Byungchul Park
2023-10-30 7:52 ` Nadav Amit
2023-10-30 10:26 ` Byungchul Park
2023-10-30 7:25 ` [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration Byungchul Park
2023-10-30 8:00 ` David Hildenbrand
2023-10-30 9:58 ` Byungchul Park
2023-11-01 3:06 ` Huang, Ying
2023-10-30 8:50 ` Nadav Amit
2023-10-30 12:51 ` Byungchul Park
2023-10-30 15:58 ` Nadav Amit
2023-10-30 22:40 ` Byungchul Park
2023-11-08 4:12 ` Byungchul Park
2023-11-09 10:16 ` Nadav Amit
2023-11-10 1:02 ` Byungchul Park
2023-11-10 3:13 ` Byungchul Park
2023-11-10 22:18 ` Nadav Amit
2023-11-15 5:48 ` Byungchul Park [this message]
2023-11-09 5:35 ` Byungchul Park
2023-10-30 7:25 ` [v3 3/3] mm, migrc: Add a sysctl knob to enable/disable MIGRC mechanism Byungchul Park
2023-10-30 8:51 ` Nadav Amit
2023-10-30 10:36 ` Byungchul Park
2023-10-30 17:55 ` [v3 0/3] Reduce TLB flushes under some specific conditions Dave Hansen
2023-10-30 18:32 ` Nadav Amit
2023-10-30 22:55 ` Byungchul Park
2023-10-31 8:46 ` David Hildenbrand
2023-10-31 2:37 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231115054846.GA74107@system.software.com \
--to=byungchul@sk.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=kernel_team@skhynix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=namit@vmware.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=xhao@linux.alibaba.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).