[RFC / PoC v1 0/1] powerpc: Add support for batched unmap TLB flush

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Subject: [RFC / PoC v1 0/1] powerpc: Add support for batched unmap TLB flush
Date: Sun, 22 Sep 2024 18:16:23 +0530	[thread overview]
Message-ID: <cover.1727001426.git.ritesh.list@gmail.com> (raw)

Hello All,

This is a quick PoC to add ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH support to
powerpc for book3s64. The ISA in 6.10 of "Translation Table Update
Synchronization Requirements" says that the architecture allows for optimizing
the translation cache invalidation by doing it in bulk later after the PTE
change has been done.
That means if we can add ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH support, it will be
possible to utilize optimizations in reclaim and migrate pages path which can
defer the tlb invalidations to be done in bulk after all the page unmap
operations has been completed.

This a quick PoC for the same. Note that this may not be a complete patch yet,
TLB on Power is already complex from the hardware side :) and then many
optimizations done in the software (e.g. exit_lazy_flush_tlb to avoid tlbies).
But since the current patch looked somewhat sane to me, I wanted to share to get
an early feedback from people who are well versed with this side of code.

Meanwhile I have many TODOs to look into which I am working in parallel for this
work. Later will also get some benchmarks w.r.t promotion / demotion.

I ran a micro-benchmark which was shared in other commits that adds this
support on other archs. I can see some good initial improvements.

without patch (perf report showing 7% in radix__flush_tlb_page_psize, even with
single thread)
==================
root# time ./a.out
real    0m23.538s
user    0m0.191s
sys     0m5.270s

# Overhead  Command  Shared Object               Symbol
# ........  .......  ..........................  .............................................
#
     7.19%  a.out    [kernel.vmlinux]            [k] radix__flush_tlb_page_psize
     5.63%  a.out    [kernel.vmlinux]            [k] _raw_spin_lock
     3.21%  a.out    a.out                       [.] main
     2.93%  a.out    [kernel.vmlinux]            [k] page_counter_cancel
     2.58%  a.out    [kernel.vmlinux]            [k] page_counter_try_charge
     2.56%  a.out    [kernel.vmlinux]            [k] _raw_spin_lock_irq
     2.30%  a.out    [kernel.vmlinux]            [k] try_to_unmap_one

with patch
============
root# time ./a.out
real    0m8.593s
user    0m0.064s
sys     0m1.610s

# Overhead  Command  Shared Object               Symbol
# ........  .......  ..........................  .............................................
#
     5.10%  a.out    [kernel.vmlinux]            [k] _raw_spin_lock
     3.55%  a.out    [kernel.vmlinux]            [k] __mod_memcg_lruvec_state
     3.13%  a.out    a.out                       [.] main
     3.00%  a.out    [kernel.vmlinux]            [k] page_counter_try_charge
     2.62%  a.out    [kernel.vmlinux]            [k] _raw_spin_lock_irq
     2.58%  a.out    [kernel.vmlinux]            [k] page_counter_cancel
     2.22%  a.out    [kernel.vmlinux]            [k] try_to_unmap_one


<micro-benchmark>
====================
#define PAGESIZE 65536
#define SIZE (1 * 1024 * 1024 * 10)
int main()
{
        volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                         MAP_SHARED | MAP_ANONYMOUS, -1, 0);

        memset(p, 0x88, SIZE);

        for (int k = 0; k < 10000; k++) {
                /* swap in */
                for (int i = 0; i < SIZE; i += PAGESIZE) {
                        (void)p[i];
                }

                /* swap out */
                madvise(p, SIZE, MADV_PAGEOUT);
        }
}



Ritesh Harjani (IBM) (1):
  powerpc: Add support for batched unmap TLB flush

 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/include/asm/book3s/64/tlbflush.h |  5 +++
 arch/powerpc/include/asm/tlbbatch.h           | 14 ++++++++
 arch/powerpc/mm/book3s64/radix_tlb.c          | 32 +++++++++++++++++++
 4 files changed, 52 insertions(+)
 create mode 100644 arch/powerpc/include/asm/tlbbatch.h

--
2.46.0

next             reply	other threads:[~2024-09-22 12:46 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-22 12:46 Ritesh Harjani (IBM) [this message]
2024-09-22 12:46 ` [RFC / PoC v1 1/1] powerpc: Add support for batched unmap TLB flush Ritesh Harjani (IBM)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1727001426.git.ritesh.list@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=aneesh.kumar@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).