From: Catalin Marinas <catalin.marinas@arm.com>
To: Evgenii Stepanov <eugenis@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>, Will Deacon <will@kernel.org>,
Steven Price <steven.price@arm.com>,
Peter Collingbourne <pcc@google.com>,
kasan-dev@googlegroups.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] kasan: speed up mte_set_mem_tag_range
Date: Thu, 20 May 2021 10:21:06 +0100 [thread overview]
Message-ID: <20210520092105.GA12251@arm.com> (raw)
In-Reply-To: <20210520020305.2826694-1-eugenis@google.com>
On Wed, May 19, 2021 at 07:03:05PM -0700, Evgenii Stepanov wrote:
> Use DC GVA / DC GZVA to speed up KASan memory tagging in HW tags mode.
>
> The first cacheline is always tagged using STG/STZG even if the address is
> cacheline-aligned, as benchmarks show it is faster than a conditional
> branch.
>
> Signed-off-by: Evgenii Stepanov <eugenis@google.com>
> Co-developed-by: Peter Collingbourne <pcc@google.com>
> Signed-off-by: Peter Collingbourne <pcc@google.com>
Some nitpicks below but it looks fine otherwise.
> diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h
> index ddd4d17cf9a0..34e23886f346 100644
> --- a/arch/arm64/include/asm/mte-kasan.h
> +++ b/arch/arm64/include/asm/mte-kasan.h
> @@ -48,43 +48,85 @@ static inline u8 mte_get_random_tag(void)
> return mte_get_ptr_tag(addr);
> }
>
> +static inline u64 __stg_post(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "stg %0, [%0], #16"
> + : "+r"(p)
> + :
> + : "memory");
> + return p;
> +}
> +
> +static inline u64 __stzg_post(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "stzg %0, [%0], #16"
> + : "+r"(p)
> + :
> + : "memory");
> + return p;
> +}
> +
> +static inline void __dc_gva(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "dc gva, %0" : : "r"(p) : "memory");
> +}
> +
> +static inline void __dc_gzva(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "dc gzva, %0" : : "r"(p) : "memory");
> +}
> +
> /*
> * Assign allocation tags for a region of memory based on the pointer tag.
> * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> - * size must be non-zero and MTE_GRANULE_SIZE aligned.
> + * size must be MTE_GRANULE_SIZE aligned.
> */
> -static inline void mte_set_mem_tag_range(void *addr, size_t size,
> - u8 tag, bool init)
> +static inline void mte_set_mem_tag_range(void *addr, size_t size, u8 tag,
> + bool init)
> {
> - u64 curr, end;
> + u64 curr, DCZID, mask, line_size, end1, end2, end3;
Nitpick 1: please use lowercase for variables even if they match some
register.
>
> - if (!size)
> - return;
> + /* Read DC G(Z)VA store size from the register. */
> + __asm__ __volatile__(__MTE_PREAMBLE "mrs %0, dczid_el0"
> + : "=r"(DCZID)::);
> + line_size = 4ul << (DCZID & 0xf);
No need for __MTE_PREAMBLE here, this register has been available since
8.0. Even better, just use read_cpuid(DCZID_EL0) directly rather than
asm.
I'd also call this variable block_size (or dczid_bs etc.), it's not
necessarily a cache line size (we have CTR_EL0 for that), though most
implementations probably do just that. There are a few instances below
where the comments refer to cache lines.
> curr = (u64)__tag_set(addr, tag);
> - end = curr + size;
> -
> - /*
> - * 'asm volatile' is required to prevent the compiler to move
> - * the statement outside of the loop.
> + mask = line_size - 1;
> + /* STG/STZG up to the end of the first cache line. */
> + end1 = curr | mask;
> + end3 = curr + size;
> + /* DC GVA / GZVA in [end1, end2) */
> + end2 = end3 & ~mask;
> +
> + /* The following code uses STG on the first cache line even if the start
> + * address is cache line aligned - it appears to be faster than an
> + * alignment check + conditional branch. Also, if the size is at least 2
> + * cache lines, the first two loops can use post-condition to save one
> + * branch each.
> */
Nitpick 2: the multiline comments start with an empty /* (as per the
coding style doc).
> - if (init) {
> - do {
> - asm volatile(__MTE_PREAMBLE "stzg %0, [%0]"
> - :
> - : "r" (curr)
> - : "memory");
> - curr += MTE_GRANULE_SIZE;
> - } while (curr != end);
> - } else {
> - do {
> - asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
> - :
> - : "r" (curr)
> - : "memory");
> - curr += MTE_GRANULE_SIZE;
> - } while (curr != end);
> - }
> +#define SET_MEMTAG_RANGE(stg_post, dc_gva) \
> + do { \
> + if (size >= 2 * line_size) { \
> + do { \
> + curr = stg_post(curr); \
> + } while (curr < end1); \
> + \
> + do { \
> + dc_gva(curr); \
> + curr += line_size; \
> + } while (curr < end2); \
> + } \
> + \
> + while (curr < end3) \
> + curr = stg_post(curr); \
> + } while (0)
> +
> + if (init)
> + SET_MEMTAG_RANGE(__stzg_post, __dc_gzva);
> + else
> + SET_MEMTAG_RANGE(__stg_post, __dc_gva);
> +#undef SET_MEMTAG_RANGE
> }
>
> void mte_enable_kernel_sync(void);
> --
> 2.31.1.751.gd2f1c929bd-goog
With the above fixed, feel free to add:
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
(aiming at 5.14)
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Evgenii Stepanov <eugenis@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>, Will Deacon <will@kernel.org>,
Steven Price <steven.price@arm.com>,
Peter Collingbourne <pcc@google.com>,
kasan-dev@googlegroups.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] kasan: speed up mte_set_mem_tag_range
Date: Thu, 20 May 2021 10:21:06 +0100 [thread overview]
Message-ID: <20210520092105.GA12251@arm.com> (raw)
In-Reply-To: <20210520020305.2826694-1-eugenis@google.com>
On Wed, May 19, 2021 at 07:03:05PM -0700, Evgenii Stepanov wrote:
> Use DC GVA / DC GZVA to speed up KASan memory tagging in HW tags mode.
>
> The first cacheline is always tagged using STG/STZG even if the address is
> cacheline-aligned, as benchmarks show it is faster than a conditional
> branch.
>
> Signed-off-by: Evgenii Stepanov <eugenis@google.com>
> Co-developed-by: Peter Collingbourne <pcc@google.com>
> Signed-off-by: Peter Collingbourne <pcc@google.com>
Some nitpicks below but it looks fine otherwise.
> diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h
> index ddd4d17cf9a0..34e23886f346 100644
> --- a/arch/arm64/include/asm/mte-kasan.h
> +++ b/arch/arm64/include/asm/mte-kasan.h
> @@ -48,43 +48,85 @@ static inline u8 mte_get_random_tag(void)
> return mte_get_ptr_tag(addr);
> }
>
> +static inline u64 __stg_post(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "stg %0, [%0], #16"
> + : "+r"(p)
> + :
> + : "memory");
> + return p;
> +}
> +
> +static inline u64 __stzg_post(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "stzg %0, [%0], #16"
> + : "+r"(p)
> + :
> + : "memory");
> + return p;
> +}
> +
> +static inline void __dc_gva(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "dc gva, %0" : : "r"(p) : "memory");
> +}
> +
> +static inline void __dc_gzva(u64 p)
> +{
> + asm volatile(__MTE_PREAMBLE "dc gzva, %0" : : "r"(p) : "memory");
> +}
> +
> /*
> * Assign allocation tags for a region of memory based on the pointer tag.
> * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> - * size must be non-zero and MTE_GRANULE_SIZE aligned.
> + * size must be MTE_GRANULE_SIZE aligned.
> */
> -static inline void mte_set_mem_tag_range(void *addr, size_t size,
> - u8 tag, bool init)
> +static inline void mte_set_mem_tag_range(void *addr, size_t size, u8 tag,
> + bool init)
> {
> - u64 curr, end;
> + u64 curr, DCZID, mask, line_size, end1, end2, end3;
Nitpick 1: please use lowercase for variables even if they match some
register.
>
> - if (!size)
> - return;
> + /* Read DC G(Z)VA store size from the register. */
> + __asm__ __volatile__(__MTE_PREAMBLE "mrs %0, dczid_el0"
> + : "=r"(DCZID)::);
> + line_size = 4ul << (DCZID & 0xf);
No need for __MTE_PREAMBLE here, this register has been available since
8.0. Even better, just use read_cpuid(DCZID_EL0) directly rather than
asm.
I'd also call this variable block_size (or dczid_bs etc.), it's not
necessarily a cache line size (we have CTR_EL0 for that), though most
implementations probably do just that. There are a few instances below
where the comments refer to cache lines.
> curr = (u64)__tag_set(addr, tag);
> - end = curr + size;
> -
> - /*
> - * 'asm volatile' is required to prevent the compiler to move
> - * the statement outside of the loop.
> + mask = line_size - 1;
> + /* STG/STZG up to the end of the first cache line. */
> + end1 = curr | mask;
> + end3 = curr + size;
> + /* DC GVA / GZVA in [end1, end2) */
> + end2 = end3 & ~mask;
> +
> + /* The following code uses STG on the first cache line even if the start
> + * address is cache line aligned - it appears to be faster than an
> + * alignment check + conditional branch. Also, if the size is at least 2
> + * cache lines, the first two loops can use post-condition to save one
> + * branch each.
> */
Nitpick 2: the multiline comments start with an empty /* (as per the
coding style doc).
> - if (init) {
> - do {
> - asm volatile(__MTE_PREAMBLE "stzg %0, [%0]"
> - :
> - : "r" (curr)
> - : "memory");
> - curr += MTE_GRANULE_SIZE;
> - } while (curr != end);
> - } else {
> - do {
> - asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
> - :
> - : "r" (curr)
> - : "memory");
> - curr += MTE_GRANULE_SIZE;
> - } while (curr != end);
> - }
> +#define SET_MEMTAG_RANGE(stg_post, dc_gva) \
> + do { \
> + if (size >= 2 * line_size) { \
> + do { \
> + curr = stg_post(curr); \
> + } while (curr < end1); \
> + \
> + do { \
> + dc_gva(curr); \
> + curr += line_size; \
> + } while (curr < end2); \
> + } \
> + \
> + while (curr < end3) \
> + curr = stg_post(curr); \
> + } while (0)
> +
> + if (init)
> + SET_MEMTAG_RANGE(__stzg_post, __dc_gzva);
> + else
> + SET_MEMTAG_RANGE(__stg_post, __dc_gva);
> +#undef SET_MEMTAG_RANGE
> }
>
> void mte_enable_kernel_sync(void);
> --
> 2.31.1.751.gd2f1c929bd-goog
With the above fixed, feel free to add:
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
(aiming at 5.14)
--
Catalin
next prev parent reply other threads:[~2021-05-20 9:23 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-20 2:03 [PATCH v4] kasan: speed up mte_set_mem_tag_range Evgenii Stepanov
2021-05-20 2:03 ` Evgenii Stepanov
2021-05-20 9:21 ` Catalin Marinas [this message]
2021-05-20 9:21 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210520092105.GA12251@arm.com \
--to=catalin.marinas@arm.com \
--cc=andreyknvl@gmail.com \
--cc=dvyukov@google.com \
--cc=eugenis@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pcc@google.com \
--cc=ryabinin.a.a@gmail.com \
--cc=steven.price@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.