public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
From: Xu Lu <luxu.kernel@bytedance.com>
To: paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org,
	atishp@atishpatra.org
Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com,
	lihangjing@bytedance.com, songmuchun@bytedance.com,
	punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	Xu Lu <luxu.kernel@bytedance.com>
Subject: [RFC PATCH V1 00/11] riscv: Introduce 64K base page
Date: Thu, 23 Nov 2023 14:56:57 +0800	[thread overview]
Message-ID: <20231123065708.91345-1-luxu.kernel@bytedance.com> (raw)

Some existing architectures like ARM supports base page larger than 4K
as their MMU supports more page sizes. Thus, besides hugetlb page and
transparent huge page, there is another way for these architectures to
enjoy the benefits of fewer TLB misses without worrying about cost of
splitting and merging huge pages. However, on architectures with only
4K MMU, larger base page is unavailable now.

This patch series attempts to break through the limitation of MMU and
supports larger base page on RISC-V, which only supports 4K page size
now.

The key idea to implement larger base page based on 4K MMU is to
decouple the MMU page from the base page in view of kernel mm, which we
denote as software page. In contrary to software page, we denote the MMU
page as hardware page. Below is the difference between these two kinds
of pages.

1. Kernel memory management module manages, allocates and maps memory at
a granularity of software page, which should not be restricted by
MMU and can be larger than hardware page.

2. Architecture page table operations should be carried out from MMU's
perspective and page table entries are encoded at a granularity of
hardware page, which is 4K on RISC-V MMU now.

The main work to decouple these two kinds of pages lies in architecture
code. For example, we turn the pte_t struct to an array of page table
entries to match it with software page which can be larger than hardware
page, and adapt the page table operations accordingly. For 64K software
base page, the pte_t struct now contains 16 contiguous page table
entries which point to 16 contiguous 4K hardware pages.

To achieve the benefits of large base page, we applies Svnapot for each
base page's mapping. The Svnapot extension on RISC-V is like contiguous
PTE on ARM64. It allows ptes of a naturally aligned power-of 2 size
memory range be encoded in the same format to save the TLB space.

This patch series is the first version and is based on v6.7-rc1. This
version supports both bare metal and virtualization scenarios.

In the next versions, we will continue on the following works:

1. Reduce the memory usage of page table page as it only uses 4K space
while costs a whole base page.

2. When IMSIC interrupt file is smaller than 64K, extra isolation
measures for the interrupt file are needed. (S)PMP and IOPMP may be good
choices.

3. More consideration is needed to make this patch series collaborate
with folios better.

4. Support 64K base page on IOMMU.

5. The performance test is on schedule to verify the actual performance
improvement and the decrease in TLB miss rate.

Thanks in advance for comments.

Xu Lu (11):
  mm: Fix misused APIs on huge pte
  riscv: Introduce concept of hardware base page
  riscv: Adapt pte struct to gap between hw page and sw page
  riscv: Adapt pte operations to gap between hw page and sw page
  riscv: Decouple pmd operations and pte operations
  riscv: Distinguish pmd huge pte and napot huge pte
  riscv: Adapt satp operations to gap between hw page and sw page
  riscv: Apply Svnapot for base page mapping
  riscv: Adjust fix_btmap slots number to match variable page size
  riscv: kvm: Adapt kvm to gap between hw page and sw page
  riscv: Introduce 64K page size

 arch/Kconfig                        |   1 +
 arch/riscv/Kconfig                  |  28 +++
 arch/riscv/include/asm/fixmap.h     |   3 +-
 arch/riscv/include/asm/hugetlb.h    |  71 ++++++-
 arch/riscv/include/asm/page.h       |  16 +-
 arch/riscv/include/asm/pgalloc.h    |  21 ++-
 arch/riscv/include/asm/pgtable-32.h |   2 +-
 arch/riscv/include/asm/pgtable-64.h |  45 +++--
 arch/riscv/include/asm/pgtable.h    | 282 +++++++++++++++++++++++-----
 arch/riscv/kernel/efi.c             |   2 +-
 arch/riscv/kernel/head.S            |   4 +-
 arch/riscv/kernel/hibernate.c       |   3 +-
 arch/riscv/kvm/mmu.c                | 198 +++++++++++++------
 arch/riscv/mm/context.c             |   7 +-
 arch/riscv/mm/fault.c               |   1 +
 arch/riscv/mm/hugetlbpage.c         |  42 +++--
 arch/riscv/mm/init.c                |  25 +--
 arch/riscv/mm/kasan_init.c          |   7 +-
 arch/riscv/mm/pageattr.c            |   2 +-
 fs/proc/task_mmu.c                  |   2 +-
 include/asm-generic/hugetlb.h       |   7 +
 include/asm-generic/pgtable-nopmd.h |   1 +
 include/linux/pgtable.h             |   6 +
 mm/hugetlb.c                        |   2 +-
 mm/migrate.c                        |   5 +-
 mm/mprotect.c                       |   2 +-
 mm/rmap.c                           |  10 +-
 mm/vmalloc.c                        |   3 +-
 28 files changed, 616 insertions(+), 182 deletions(-)

-- 
2.20.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

             reply	other threads:[~2023-11-23  6:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-23  6:56 Xu Lu [this message]
2023-11-23  6:56 ` [RFC PATCH V1 01/11] mm: Fix misused APIs on huge pte Xu Lu
2023-12-31 16:39   ` Alexandre Ghiti
2024-01-04  7:59     ` [External] " Xu Lu
2023-11-23  6:56 ` [RFC PATCH V1 02/11] riscv: Introduce concept of hardware base page Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 03/11] riscv: Adapt pte struct to gap between hw page and sw page Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 04/11] riscv: Adapt pte operations " Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 05/11] riscv: Decouple pmd operations and pte operations Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 06/11] riscv: Distinguish pmd huge pte and napot huge pte Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 07/11] riscv: Adapt satp operations to gap between hw page and sw page Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 08/11] riscv: Apply Svnapot for base page mapping Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 09/11] riscv: Adjust fix_btmap slots number to match variable page size Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 10/11] riscv: kvm: Adapt kvm to gap between hw page and sw page Xu Lu
2023-11-23  6:57 ` [RFC PATCH V1 11/11] riscv: Introduce 64K page size Xu Lu
2023-11-23  9:29 ` [RFC PATCH V1 00/11] riscv: Introduce 64K base page Arnd Bergmann
2023-11-27  8:14   ` [External] " Xu Lu
2023-12-07  6:07 ` Xu Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231123065708.91345-1-luxu.kernel@bytedance.com \
    --to=luxu.kernel@bytedance.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=ardb@kernel.org \
    --cc=atishp@atishpatra.org \
    --cc=dengliang.1214@bytedance.com \
    --cc=lihangjing@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=punit.agrawal@bytedance.com \
    --cc=songmuchun@bytedance.com \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox