linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>,
	will@kernel.org, oliver.upton@linux.dev, maz@kernel.org,
	james.morse@arm.com, suzuki.poulose@arm.com,
	yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org,
	mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org,
	rppt@kernel.org, hughd@google.com, pcc@google.com,
	steven.price@arm.com, anshuman.khandual@arm.com,
	vincenzo.frascino@arm.com, eugenis@google.com, kcc@google.com,
	hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse
Date: Thu, 24 Aug 2023 11:44:13 +0100	[thread overview]
Message-ID: <ZOc0fehF02MohuWr@arm.com> (raw)
In-Reply-To: <33def4fe-fdb8-6388-1151-fabd2adc8220@redhat.com>

On Thu, Aug 24, 2023 at 09:50:32AM +0200, David Hildenbrand wrote:
> after re-reading it 2 times, I still have no clue what your patch set is
> actually trying to achieve. Probably there is a way to describe how user
> space intents to interact with this feature, so to see which value this
> actually has for user space -- and if we are using the right APIs and
> allocators.

I'll try with an alternative summary, hopefully it becomes clearer (I
think Alex is away until the end of the week, may not reply
immediately). If this still doesn't work, maybe we should try a
different implementation ;).

The way MTE is implemented currently is to have a static carve-out of
the DRAM to store the allocation tags (a.k.a. memory colour). This is
what we call the tag storage. Each 16 bytes have 4 bits of tags, so this
means 1/32 of the DRAM, roughly 3% used for the tag storage. This is
done transparently by the hardware/interconnect (with firmware setup)
and normally hidden from the OS. So a checked memory access to location
X generates a tag fetch from location Y in the carve-out and this tag is
compared with the bits 59:56 in the pointer. The correspondence from X
to Y is linear (subject to a minimum block size to deal with some
address interleaving). The software doesn't need to know about this
correspondence as we have specific instructions like STG/LDG to location
X that lead to a tag store/load to Y.

Now, not all memory used by applications is tagged (mmap(PROT_MTE)).
For example, some large allocations may not use PROT_MTE at all or only
for the first and last page since initialising the tags takes time. The
side-effect is that of these 3% DRAM, only part, say 1% is effectively
used. Some people want the unused tag storage to be released for normal
data usage (i.e. give it to the kernel page allocator).

So the first complication is that a PROT_MTE page allocation at address
X will need to reserve the tag storage at location Y (and migrate any
data in that page if it is in use).

To make things worse, pages in the tag storage/carve-out range cannot
use PROT_MTE themselves on current hardware, so this adds the second
complication - a heterogeneous memory layout. The kernel needs to know
where to allocate a PROT_MTE page from or migrate a current page if it
becomes PROT_MTE (mprotect()) and the range it is in does not support
tagging.

Some other complications are arm64-specific like cache coherency between
tags and data accesses. There is a draft architecture spec which will be
released soon, detailing how the hardware behaves.

To your question about user APIs/ABIs, that's entirely transparent. As
with the current kernel (without this dynamic tag storage), a user only
needs to ask for PROT_MTE mappings to get tagged pages.

> So some dummy questions / statements
> 
> 1) Is this about re-propusing the memory used to hold tags for different
> purpose?

Yes. To allow part of this 3% to be used for data. It could even be the
whole 3% if no application is enabling MTE.

> Or what exactly is user space going to do with the PROT_MTE memory?
> The whole mprotect(PROT_MTE) approach might not eb the right thing to do.

As I mentioned above, there's no difference to the user ABI. PROT_MTE
works as before with the kernel moving pages around as needed.

> 2) Why do we even have to involve the page allocator if this is some
> special-purpose memory? Re-porpusing the buddy when later using
> alloc_contig_range() either way feels wrong.

The aim here is to rebrand this special-purpose memory as a nearly
general-purpose one (bar the PROT_MTE restriction).

> The core-mm changes don't look particularly appealing :)

OTOH, it's a fun project to learn about the mm ;).

Our aim for now is to get some feedback from the mm community on whether
this special -> nearly general rebranding is acceptable together with
the introduction of a heterogeneous memory concept for the general
purpose page allocator.

There are some alternatives we looked at with a smaller mm impact but we
haven't prototyped them yet: (a) use the available tag storage as a
frontswap accelerator or (b) use it as a (compressed) ramdisk that can
be mounted as swap. The latter has the advantage of showing up in the
available total memory, keeps customers happy ;). Both options would
need some mm hooks when a PROT_MTE page gets allocated to release the
corresponding page in the tag storage range.

-- 
Catalin

  reply	other threads:[~2023-08-24 10:46 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-23 13:13 [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 01/37] mm: page_alloc: Rename gfp_to_alloc_flags_cma -> gfp_to_alloc_flags_fast Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 02/37] arm64: mte: Rework naming for tag manipulation functions Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 03/37] arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 04/37] mm: Add MIGRATE_METADATA allocation policy Alexandru Elisei
     [not found]   ` <CGME20231012013834epcas2p28ff3162673294077caef3b0794b69e72@epcas2p2.samsung.com>
2023-10-12  1:28     ` Hyesoo Yu
2023-10-16 12:40       ` Alexandru Elisei
2023-10-23  7:52         ` Hyesoo Yu
2023-08-23 13:13 ` [PATCH RFC 05/37] mm: Add memory statistics for the " Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA Alexandru Elisei
     [not found]   ` <CGME20231012013524epcas2p4b50f306e3e4d0b937b31f978022844e5@epcas2p4.samsung.com>
2023-10-12  1:25     ` Hyesoo Yu
2023-10-16 12:41       ` Alexandru Elisei
2023-10-17 10:26         ` Catalin Marinas
2023-10-23  7:16           ` Hyesoo Yu
2023-10-23 10:50             ` Catalin Marinas
2023-10-23 11:55               ` David Hildenbrand
2023-10-23 17:08                 ` Catalin Marinas
2023-10-23 17:22                   ` David Hildenbrand
2023-08-23 13:13 ` [PATCH RFC 07/37] mm: page_alloc: Bypass pcp when freeing MIGRATE_METADATA pages Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 08/37] mm: compaction: Account for free metadata pages in __compact_finished() Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 09/37] mm: compaction: Handle metadata pages as source for direct compaction Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 10/37] mm: compaction: Do not use MIGRATE_METADATA to replace pages with metadata Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 11/37] mm: migrate/mempolicy: Allocate metadata-enabled destination page Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 12/37] mm: gup: Don't allow longterm pinning of MIGRATE_METADATA pages Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 13/37] arm64: mte: Reserve tag storage memory Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 14/37] arm64: mte: Expose tag storage pages to the MIGRATE_METADATA freelist Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 15/37] arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 16/37] arm64: mte: Move tag storage to MIGRATE_MOVABLE when MTE is disabled Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 17/37] arm64: mte: Disable dynamic tag storage management if HW KASAN is enabled Alexandru Elisei
     [not found]   ` <CGME20231012014514epcas2p3ca99a067f3044c5753309a08cd0b05c4@epcas2p3.samsung.com>
2023-10-12  1:35     ` Hyesoo Yu
2023-10-16 12:42       ` Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 18/37] arm64: mte: Check that tag storage blocks are in the same zone Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 19/37] mm: page_alloc: Manage metadata storage on page allocation Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc() Alexandru Elisei
2023-11-21  4:49   ` Peter Collingbourne
2023-11-21 11:54     ` Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 21/37] mm: khugepaged: Handle metadata-enabled VMAs Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 22/37] mm: shmem: Allocate metadata storage for in-memory filesystems Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 23/37] mm: Teach vma_alloc_folio() about metadata-enabled VMAs Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 24/37] mm: page_alloc: Teach alloc_contig_range() about MIGRATE_METADATA Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 25/37] arm64: mte: Manage tag storage on page allocation Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 26/37] arm64: mte: Perform CMOs for tag blocks on tagged page allocation/free Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 27/37] arm64: mte: Reserve tag block for the zero page Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 28/37] mm: sched: Introduce PF_MEMALLOC_ISOLATE Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 29/37] mm: arm64: Define the PAGE_METADATA_NONE page protection Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 30/37] mm: mprotect: arm64: Set PAGE_METADATA_NONE for mprotect(PROT_MTE) Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 31/37] mm: arm64: Set PAGE_METADATA_NONE in set_pte_at() if missing metadata storage Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 32/37] mm: Call arch_swap_prepare_to_restore() before arch_swap_restore() Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 33/37] arm64: mte: swap/copypage: Handle tag restoring when missing tag storage Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 34/37] arm64: mte: Handle fatal signal in reserve_metadata_storage() Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 35/37] mm: hugepage: Handle PAGE_METADATA_NONE faults for huge pages Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 36/37] KVM: arm64: Disable MTE is tag storage is enabled Alexandru Elisei
2023-08-23 13:13 ` [PATCH RFC 37/37] arm64: mte: Enable tag storage management Alexandru Elisei
2023-08-24  7:50 ` [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse David Hildenbrand
2023-08-24 10:44   ` Catalin Marinas [this message]
2023-08-24 11:06     ` David Hildenbrand
2023-08-24 11:25       ` David Hildenbrand
2023-08-24 15:24         ` Catalin Marinas
2023-09-06 11:23           ` Alexandru Elisei
2023-09-11 11:52             ` Catalin Marinas
2023-09-11 12:29               ` David Hildenbrand
2023-09-13 15:29                 ` Catalin Marinas
     [not found]                   ` <CGME20231025031004epcas2p485a0b7a9247bc61d54064d7f7bdd1e89@epcas2p4.samsung.com>
2023-10-25  2:59                     ` Hyesoo Yu
2023-10-25  8:47                       ` Alexandru Elisei
2023-10-25  8:52                         ` Hyesoo Yu
2023-10-27 11:04                           ` Catalin Marinas
2023-09-13  8:11 ` Kuan-Ying Lee (李冠穎)
2023-09-14 17:37   ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZOc0fehF02MohuWr@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexandru.elisei@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=arnd@arndb.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=david@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=eugenis@google.com \
    --cc=hughd@google.com \
    --cc=hyesoo.yu@samsung.com \
    --cc=james.morse@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kcc@google.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=pcc@google.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vincenzo.frascino@arm.com \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).