From: Catalin Marinas <catalin.marinas@arm.com>
To: Evgenii Stepanov <eugenis@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>, Will Deacon <will@kernel.org>,
Steven Price <steven.price@arm.com>,
Peter Collingbourne <pcc@google.com>,
kasan-dev@googlegroups.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] kasan: speed up mte_set_mem_tag_range
Date: Tue, 18 May 2021 18:44:41 +0100 [thread overview]
Message-ID: <20210518174439.GA28491@arm.com> (raw)
In-Reply-To: <20210517235546.3038875-1-eugenis@google.com>
On Mon, May 17, 2021 at 04:55:46PM -0700, Evgenii Stepanov wrote:
> Use DC GVA / DC GZVA to speed up KASan memory tagging in HW tags mode.
>
> The first cacheline is always tagged using STG/STZG even if the address is
> cacheline-aligned, as benchmarks show it is faster than a conditional
> branch.
[...]
> diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h
> index ddd4d17cf9a0..e29a0e2ab35c 100644
> --- a/arch/arm64/include/asm/mte-kasan.h
> +++ b/arch/arm64/include/asm/mte-kasan.h
> @@ -48,45 +48,7 @@ static inline u8 mte_get_random_tag(void)
> return mte_get_ptr_tag(addr);
> }
>
> -/*
> - * Assign allocation tags for a region of memory based on the pointer tag.
> - * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> - * size must be non-zero and MTE_GRANULE_SIZE aligned.
> - */
> -static inline void mte_set_mem_tag_range(void *addr, size_t size,
> - u8 tag, bool init)
With commit 2cb34276427a ("arm64: kasan: simplify and inline MTE
functions") you wanted this inlined for performance. Does this not
matter much that it's now out of line?
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index d31e1169d9b8..c06ada79a437 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -18,3 +18,5 @@ obj-$(CONFIG_CRC32) += crc32.o
> obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>
> obj-$(CONFIG_ARM64_MTE) += mte.o
> +
> +obj-$(CONFIG_KASAN_HW_TAGS) += mte-kasan.o
> diff --git a/arch/arm64/lib/mte-kasan.S b/arch/arm64/lib/mte-kasan.S
> new file mode 100644
> index 000000000000..9f6975e2af60
> --- /dev/null
> +++ b/arch/arm64/lib/mte-kasan.S
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Google Inc.
> + */
> +#include <linux/const.h>
> +#include <linux/linkage.h>
> +
> +#include <asm/mte-def.h>
> +
> + .arch armv8.5-a+memtag
> +
> + .macro __set_mem_tag_range, stg, gva, start, size, linesize, tmp1, tmp2, tmp3
> + add \tmp3, \start, \size
> + cmp \size, \linesize, lsl #1
> + b.lt .Lsmtr3_\@
We could do with some comments here. Why the lsl #1? I think I get it
but it would be good to make this more readable.
It may be easier if you placed it in a file on its own (as it is now but
with a less generic file name) and use a few .req instead of the tmpX.
You can use the macro args only for the stg/gva.
> +
> + sub \tmp1, \linesize, #1
> + bic \tmp2, \tmp3, \tmp1
> + orr \tmp1, \start, \tmp1
> +
> +.Lsmtr1_\@:
> + \stg \start, [\start], #MTE_GRANULE_SIZE
> + cmp \start, \tmp1
> + b.lt .Lsmtr1_\@
> +
> +.Lsmtr2_\@:
> + dc \gva, \start
> + add \start, \start, \linesize
> + cmp \start, \tmp2
> + b.lt .Lsmtr2_\@
> +
> +.Lsmtr3_\@:
> + cmp \start, \tmp3
> + b.ge .Lsmtr4_\@
> + \stg \start, [\start], #MTE_GRANULE_SIZE
> + b .Lsmtr3_\@
> +.Lsmtr4_\@:
> + .endm
If we want to get the best performance out of this, we should look at
the memset implementation and do something similar. In principle it's
not that far from a memzero, though depending on the microarchitecture
it may behave slightly differently.
Anyway, before that I wonder if we wrote all this in C + inline asm
(three while loops or maybe two and some goto), what's the performance
difference? It has the advantage of being easier to maintain even if we
used some C macros to generate gva/gzva variants.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-05-18 17:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-17 23:55 [PATCH v3] kasan: speed up mte_set_mem_tag_range Evgenii Stepanov
2021-05-18 17:44 ` Catalin Marinas [this message]
2021-05-18 18:11 ` Peter Collingbourne
2021-05-19 18:12 ` Catalin Marinas
2021-05-19 19:52 ` Evgenii Stepanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210518174439.GA28491@arm.com \
--to=catalin.marinas@arm.com \
--cc=andreyknvl@gmail.com \
--cc=dvyukov@google.com \
--cc=eugenis@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pcc@google.com \
--cc=ryabinin.a.a@gmail.com \
--cc=steven.price@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).