Re: [v4 0/3] Reduce TLB flushes under some specific conditions

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Byungchul Park <byungchul@sk.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel_team@skhynix.com, akpm@linux-foundation.org,
	namit@vmware.com, xhao@linux.alibaba.com,
	mgorman@techsingularity.net, hughd@google.com,
	willy@infradead.org, david@redhat.com, peterz@infradead.org,
	luto@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com
Subject: Re: [v4 0/3] Reduce TLB flushes under some specific conditions
Date: Fri, 10 Nov 2023 10:32:24 +0900	[thread overview]
Message-ID: <20231110013224.GD72073@system.software.com> (raw)
In-Reply-To: <87il6bijtu.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Thu, Nov 09, 2023 at 01:20:29PM +0800, Huang, Ying wrote:
> Byungchul Park <byungchul@sk.com> writes:
> 
> > Hi everyone,
> >
> > While I'm working with CXL memory, I have been facing migration overhead
> > esp. TLB shootdown on promotion or demotion between different tiers.
> > Yeah.. most TLB shootdowns on migration through hinting fault can be
> > avoided thanks to Huang Ying's work, commit 4d4b6d66db ("mm,unmap: avoid
> > flushing TLB in batch if PTE is inaccessible").
> >
> > However, it's only for ones using hinting fault. I thought it'd be much
> > better if we have a general mechanism to reduce # of TLB flushes and
> > TLB misses, that we can apply to any type of migration. I tried it only
> > for tiering migration for now tho.
> >
> > I'm suggesting a mechanism to reduce TLB flushes by keeping source and
> > destination of folios participated in the migrations until all TLB
> > flushes required are done, only if those folios are not mapped with
> > write permission PTE entries at all. I worked Based on v6.6-rc5.
> >
> > Can you believe it? I saw the number of TLB full flush reduced about
> > 80% and iTLB miss reduced about 50%, and the time wise performance
> > always shows at least 1% stable improvement with the workload I tested
> > with, XSBench. However, I believe that it would help more with other
> > ones or any real ones. It'd be appreciated to let me know if I'm missing
> > something.
> 
> Can you help to test the effect of commit 7e12beb8ca2a ("migrate_pages:
> batch flushing TLB") for your test case?  To test it, you can revert it
> and compare the performance before and after the reverting.

I will.

> And, how do you trigger migration when testing XSBench?  Use a tiered
> memory system, and migrate pages between DRAM and CXL memory back and
> forth?  If so, how many pages will you migrate for each migration?

Honestly I've been focusing on the migration # and TLB #. I will get
back to you.

	Byungchul

> --
> Best Regards,
> Huang, Ying
> 
> >
> > 	Byungchul
> >
> > ---
> >
> > Changes from v3:
> >
> > 	1. Don't use the kconfig, CONFIG_MIGRC, and remove sysctl knob,
> > 	   migrc_enable. (feedbacked by Nadav)
> > 	2. Remove the optimization skipping CPUs that have already
> > 	   performed TLB flushes needed by any reason when performing
> > 	   TLB flushes by migrc because I can't tell the performance
> > 	   difference between w/ the optimization and w/o that.
> > 	   (feedbacked by Nadav)
> > 	3. Minimize arch-specific code. While at it, move all the migrc
> >            declarations and inline functions from include/linux/mm.h to
> >            mm/internal.h (feedbacked by Dave Hansen, Nadav)
> > 	4. Separate a part making migrc paused when the system is in
> > 	   high memory pressure to another patch. (feedbacked by Nadav)
> > 	5. Rename:
> > 	      a. arch_tlbbatch_clean() to arch_tlbbatch_clear(),
> > 	      b. tlb_ubc_nowr to tlb_ubc_ro,
> > 	      c. migrc_try_flush_free_folios() to migrc_flush_free_folios(),
> > 	      d. migrc_stop to migrc_pause.
> > 	   (feedbacked by Nadav)
> > 	6. Use ->lru list_head instead of introducing a new llist_head.
> > 	   (feedbacked by Nadav)
> > 	7. Use non-atomic operations of page-flag when it's safe.
> > 	   (feedbacked by Nadav)
> > 	8. Use stack instead of keeping a pointer of 'struct migrc_req'
> > 	   in struct task, which is for manipulating it locally.
> > 	   (feedbacked by Nadav)
> > 	9. Replace a lot of simple functions to inline functions placed
> > 	   in a header, mm/internal.h. (feedbacked by Nadav)
> > 	10. Add additional sufficient comments. (feedbacked by Nadav)
> > 	11. Remove a lot of wrapper functions. (feedbacked by Nadav)
> >
> > Changes from RFC v2:
> >
> > 	1. Remove additional occupation in struct page. To do that,
> > 	   unioned with lru field for migrc's list and added a page
> > 	   flag. I know page flag is a thing that we don't like to add
> > 	   but no choice because migrc should distinguish folios under
> > 	   migrc's control from others. Instead, I force migrc to be
> > 	   used only on 64 bit system to mitigate you guys from getting
> > 	   angry.
> > 	2. Remove meaningless internal object allocator that I
> > 	   introduced to minimize impact onto the system. However, a ton
> > 	   of tests showed there was no difference.
> > 	3. Stop migrc from working when the system is in high memory
> > 	   pressure like about to perform direct reclaim. At the
> > 	   condition where the swap mechanism is heavily used, I found
> > 	   the system suffered from regression without this control.
> > 	4. Exclude folios that pte_dirty() == true from migrc's interest
> > 	   so that migrc can work simpler.
> > 	5. Combine several patches that work tightly coupled to one.
> > 	6. Add sufficient comments for better review.
> > 	7. Manage migrc's request in per-node manner (from globally).
> > 	8. Add TLB miss improvement in commit message.
> > 	9. Test with more CPUs(4 -> 16) to see bigger improvement.
> >
> > Changes from RFC:
> >
> > 	1. Fix a bug triggered when a destination folio at the previous
> > 	   migration becomes a source folio at the next migration,
> > 	   before the folio gets handled properly so that the folio can
> > 	   play with another migration. There was inconsistency in the
> > 	   folio's state. Fixed it.
> > 	2. Split the patch set into more pieces so that the folks can
> > 	   review better. (Feedbacked by Nadav Amit)
> > 	3. Fix a wrong usage of barrier e.g. smp_mb__after_atomic().
> > 	   (Feedbacked by Nadav Amit)
> > 	4. Tried to add sufficient comments to explain the patch set
> > 	   better. (Feedbacked by Nadav Amit)
> >
> > Byungchul Park (3):
> >   mm/rmap: Recognize read-only TLB entries during batched TLB flush
> >   mm: Defer TLB flush by keeping both src and dst folios at migration
> >   mm: Pause migrc mechanism at high memory pressure
> >
> >  arch/x86/include/asm/tlbflush.h |   3 +
> >  arch/x86/mm/tlb.c               |  11 ++
> >  include/linux/mm_types.h        |  21 +++
> >  include/linux/mmzone.h          |   9 ++
> >  include/linux/page-flags.h      |   4 +
> >  include/linux/sched.h           |   7 +
> >  include/trace/events/mmflags.h  |   3 +-
> >  mm/internal.h                   |  78 ++++++++++
> >  mm/memory.c                     |  11 ++
> >  mm/migrate.c                    | 266 ++++++++++++++++++++++++++++++++
> >  mm/page_alloc.c                 |  30 +++-
> >  mm/rmap.c                       |  35 ++++-
> >  12 files changed, 475 insertions(+), 3 deletions(-)

next prev parent reply	other threads:[~2023-11-10  1:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09  4:59 [v4 0/3] Reduce TLB flushes under some specific conditions Byungchul Park
2023-11-09  4:59 ` [v4 1/3] mm/rmap: Recognize read-only TLB entries during batched TLB flush Byungchul Park
2023-11-09 20:26   ` kernel test robot
2023-11-09  4:59 ` [v4 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration Byungchul Park
2023-11-09 14:36   ` Matthew Wilcox
2023-11-10  1:29     ` Byungchul Park
2024-01-15  7:55     ` Byungchul Park
2023-11-09 17:09   ` kernel test robot
2023-11-09 19:07   ` kernel test robot
2023-11-09  4:59 ` [v4 3/3] mm: Pause migrc mechanism at high memory pressure Byungchul Park
2023-11-09  5:20 ` [v4 0/3] Reduce TLB flushes under some specific conditions Huang, Ying
2023-11-10  1:32   ` Byungchul Park [this message]
2023-11-15  2:57   ` Byungchul Park
2023-11-09 14:26 ` Dave Hansen
2023-11-10  1:08   ` Byungchul Park
2023-11-15  6:43   ` Byungchul Park
2024-01-15  7:58   ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231110013224.GD72073@system.software.com \
    --to=byungchul@sk.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=xhao@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).