From: Catalin Marinas <catalin.marinas@arm.com>
To: Evgenii Stepanov <eugenis@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>, Will Deacon <will@kernel.org>,
Steven Price <steven.price@arm.com>,
Peter Collingbourne <pcc@google.com>,
kasan-dev@googlegroups.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] kasan: speed up mte_set_mem_tag_range
Date: Tue, 18 May 2021 18:44:41 +0100 [thread overview]
Message-ID: <20210518174439.GA28491@arm.com> (raw)
In-Reply-To: <20210517235546.3038875-1-eugenis@google.com>
On Mon, May 17, 2021 at 04:55:46PM -0700, Evgenii Stepanov wrote:
> Use DC GVA / DC GZVA to speed up KASan memory tagging in HW tags mode.
>
> The first cacheline is always tagged using STG/STZG even if the address is
> cacheline-aligned, as benchmarks show it is faster than a conditional
> branch.
[...]
> diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h
> index ddd4d17cf9a0..e29a0e2ab35c 100644
> --- a/arch/arm64/include/asm/mte-kasan.h
> +++ b/arch/arm64/include/asm/mte-kasan.h
> @@ -48,45 +48,7 @@ static inline u8 mte_get_random_tag(void)
> return mte_get_ptr_tag(addr);
> }
>
> -/*
> - * Assign allocation tags for a region of memory based on the pointer tag.
> - * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> - * size must be non-zero and MTE_GRANULE_SIZE aligned.
> - */
> -static inline void mte_set_mem_tag_range(void *addr, size_t size,
> - u8 tag, bool init)
With commit 2cb34276427a ("arm64: kasan: simplify and inline MTE
functions") you wanted this inlined for performance. Does this not
matter much that it's now out of line?
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index d31e1169d9b8..c06ada79a437 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -18,3 +18,5 @@ obj-$(CONFIG_CRC32) += crc32.o
> obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>
> obj-$(CONFIG_ARM64_MTE) += mte.o
> +
> +obj-$(CONFIG_KASAN_HW_TAGS) += mte-kasan.o
> diff --git a/arch/arm64/lib/mte-kasan.S b/arch/arm64/lib/mte-kasan.S
> new file mode 100644
> index 000000000000..9f6975e2af60
> --- /dev/null
> +++ b/arch/arm64/lib/mte-kasan.S
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Google Inc.
> + */
> +#include <linux/const.h>
> +#include <linux/linkage.h>
> +
> +#include <asm/mte-def.h>
> +
> + .arch armv8.5-a+memtag
> +
> + .macro __set_mem_tag_range, stg, gva, start, size, linesize, tmp1, tmp2, tmp3
> + add \tmp3, \start, \size
> + cmp \size, \linesize, lsl #1
> + b.lt .Lsmtr3_\@
We could do with some comments here. Why the lsl #1? I think I get it
but it would be good to make this more readable.
It may be easier if you placed it in a file on its own (as it is now but
with a less generic file name) and use a few .req instead of the tmpX.
You can use the macro args only for the stg/gva.
> +
> + sub \tmp1, \linesize, #1
> + bic \tmp2, \tmp3, \tmp1
> + orr \tmp1, \start, \tmp1
> +
> +.Lsmtr1_\@:
> + \stg \start, [\start], #MTE_GRANULE_SIZE
> + cmp \start, \tmp1
> + b.lt .Lsmtr1_\@
> +
> +.Lsmtr2_\@:
> + dc \gva, \start
> + add \start, \start, \linesize
> + cmp \start, \tmp2
> + b.lt .Lsmtr2_\@
> +
> +.Lsmtr3_\@:
> + cmp \start, \tmp3
> + b.ge .Lsmtr4_\@
> + \stg \start, [\start], #MTE_GRANULE_SIZE
> + b .Lsmtr3_\@
> +.Lsmtr4_\@:
> + .endm
If we want to get the best performance out of this, we should look at
the memset implementation and do something similar. In principle it's
not that far from a memzero, though depending on the microarchitecture
it may behave slightly differently.
Anyway, before that I wonder if we wrote all this in C + inline asm
(three while loops or maybe two and some goto), what's the performance
difference? It has the advantage of being easier to maintain even if we
used some C macros to generate gva/gzva variants.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Evgenii Stepanov <eugenis@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>, Will Deacon <will@kernel.org>,
Steven Price <steven.price@arm.com>,
Peter Collingbourne <pcc@google.com>,
kasan-dev@googlegroups.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] kasan: speed up mte_set_mem_tag_range
Date: Tue, 18 May 2021 18:44:41 +0100 [thread overview]
Message-ID: <20210518174439.GA28491@arm.com> (raw)
In-Reply-To: <20210517235546.3038875-1-eugenis@google.com>
On Mon, May 17, 2021 at 04:55:46PM -0700, Evgenii Stepanov wrote:
> Use DC GVA / DC GZVA to speed up KASan memory tagging in HW tags mode.
>
> The first cacheline is always tagged using STG/STZG even if the address is
> cacheline-aligned, as benchmarks show it is faster than a conditional
> branch.
[...]
> diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h
> index ddd4d17cf9a0..e29a0e2ab35c 100644
> --- a/arch/arm64/include/asm/mte-kasan.h
> +++ b/arch/arm64/include/asm/mte-kasan.h
> @@ -48,45 +48,7 @@ static inline u8 mte_get_random_tag(void)
> return mte_get_ptr_tag(addr);
> }
>
> -/*
> - * Assign allocation tags for a region of memory based on the pointer tag.
> - * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> - * size must be non-zero and MTE_GRANULE_SIZE aligned.
> - */
> -static inline void mte_set_mem_tag_range(void *addr, size_t size,
> - u8 tag, bool init)
With commit 2cb34276427a ("arm64: kasan: simplify and inline MTE
functions") you wanted this inlined for performance. Does this not
matter much that it's now out of line?
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index d31e1169d9b8..c06ada79a437 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -18,3 +18,5 @@ obj-$(CONFIG_CRC32) += crc32.o
> obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>
> obj-$(CONFIG_ARM64_MTE) += mte.o
> +
> +obj-$(CONFIG_KASAN_HW_TAGS) += mte-kasan.o
> diff --git a/arch/arm64/lib/mte-kasan.S b/arch/arm64/lib/mte-kasan.S
> new file mode 100644
> index 000000000000..9f6975e2af60
> --- /dev/null
> +++ b/arch/arm64/lib/mte-kasan.S
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Google Inc.
> + */
> +#include <linux/const.h>
> +#include <linux/linkage.h>
> +
> +#include <asm/mte-def.h>
> +
> + .arch armv8.5-a+memtag
> +
> + .macro __set_mem_tag_range, stg, gva, start, size, linesize, tmp1, tmp2, tmp3
> + add \tmp3, \start, \size
> + cmp \size, \linesize, lsl #1
> + b.lt .Lsmtr3_\@
We could do with some comments here. Why the lsl #1? I think I get it
but it would be good to make this more readable.
It may be easier if you placed it in a file on its own (as it is now but
with a less generic file name) and use a few .req instead of the tmpX.
You can use the macro args only for the stg/gva.
> +
> + sub \tmp1, \linesize, #1
> + bic \tmp2, \tmp3, \tmp1
> + orr \tmp1, \start, \tmp1
> +
> +.Lsmtr1_\@:
> + \stg \start, [\start], #MTE_GRANULE_SIZE
> + cmp \start, \tmp1
> + b.lt .Lsmtr1_\@
> +
> +.Lsmtr2_\@:
> + dc \gva, \start
> + add \start, \start, \linesize
> + cmp \start, \tmp2
> + b.lt .Lsmtr2_\@
> +
> +.Lsmtr3_\@:
> + cmp \start, \tmp3
> + b.ge .Lsmtr4_\@
> + \stg \start, [\start], #MTE_GRANULE_SIZE
> + b .Lsmtr3_\@
> +.Lsmtr4_\@:
> + .endm
If we want to get the best performance out of this, we should look at
the memset implementation and do something similar. In principle it's
not that far from a memzero, though depending on the microarchitecture
it may behave slightly differently.
Anyway, before that I wonder if we wrote all this in C + inline asm
(three while loops or maybe two and some goto), what's the performance
difference? It has the advantage of being easier to maintain even if we
used some C macros to generate gva/gzva variants.
--
Catalin
next prev parent reply other threads:[~2021-05-18 17:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-17 23:55 [PATCH v3] kasan: speed up mte_set_mem_tag_range Evgenii Stepanov
2021-05-17 23:55 ` Evgenii Stepanov
2021-05-18 17:44 ` Catalin Marinas [this message]
2021-05-18 17:44 ` Catalin Marinas
2021-05-18 18:11 ` Peter Collingbourne
2021-05-18 18:11 ` Peter Collingbourne
2021-05-19 18:12 ` Catalin Marinas
2021-05-19 18:12 ` Catalin Marinas
2021-05-19 19:52 ` Evgenii Stepanov
2021-05-19 19:52 ` Evgenii Stepanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210518174439.GA28491@arm.com \
--to=catalin.marinas@arm.com \
--cc=andreyknvl@gmail.com \
--cc=dvyukov@google.com \
--cc=eugenis@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pcc@google.com \
--cc=ryabinin.a.a@gmail.com \
--cc=steven.price@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.