From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
Mark Rutland <mark.rutland@arm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Tue, 16 Jan 2024 14:51:21 -0400 [thread overview]
Message-ID: <20240116185121.GB980613@nvidia.com> (raw)
In-Reply-To: <20231206125919.GP2692119@nvidia.com>
Hey Catalin,
I'm just revising this and I'm wondering if you know why ARM64 has this:
#define __raw_writeq __raw_writeq
static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
{
asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
}
Instead of
#define __raw_writeq __raw_writeq
static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
{
asm volatile("str %x0, %1" : : "rZ" (val), "m" (*(volatile u64 *)addr));
}
?? Like x86 has.
The codegen for a 64 byte unrolled copy loop is way better with "m" on gcc:
"r" constraint (gcc 13.2.0):
.L3:
ldr x3, [x1]
str x3, [x0]
ldr x3, [x1, 8]
add x4, x0, 8
str x3, [x4]
ldr x3, [x1, 16]
add x4, x0, 16
str x3, [x4]
ldr x3, [x1, 24]
add x4, x0, 24
str x3, [x4]
ldr x3, [x1, 32]
add x4, x0, 32
str x3, [x4]
ldr x3, [x1, 40]
add x4, x0, 40
str x3, [x4]
ldr x3, [x1, 48]
add x4, x0, 48
str x3, [x4]
ldr x3, [x1, 56]
add x4, x0, 56
str x3, [x4]
add x1, x1, 64
add x0, x0, 64
cmp x2, x1
bhi .L3
"m" constraint:
.L3:
ldp x10, x9, [x1]
ldp x8, x7, [x1, 16]
ldp x6, x5, [x1, 32]
ldp x4, x3, [x1, 48]
str x10, [x0]
str x9, [x0, 8]
str x8, [x0, 16]
str x7, [x0, 24]
str x6, [x0, 32]
str x5, [x0, 40]
str x4, [x0, 48]
str x3, [x0, 56]
add x1, x1, 64
add x0, x0, 64
cmp x2, x1
bhi .L3
clang 17 doesn't do any better either way, it doesn't seem to do
anything with 'm', but I guess it could..
clang 17 (either):
.LBB0_2: // =>This Inner Loop Header: Depth=1
ldp x9, x10, [x1]
add x14, x0, #8
add x18, x0, #40
ldp x11, x12, [x1, #16]
add x2, x0, #48
add x3, x0, #56
ldp x13, x15, [x1, #32]
ldp x16, x17, [x1, #48]
str x9, [x0]
str x10, [x14]
add x9, x0, #16
add x10, x0, #24
add x14, x0, #32
str x11, [x9]
str x12, [x10]
str x13, [x14]
str x15, [x18]
str x16, [x2]
str x17, [x3]
add x1, x1, #64
add x0, x0, #64
cmp x1, x8
b.lo .LBB0_2
It doesn't matter for this series, but it seems like something ARM64
might want to look at to improve..
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
Mark Rutland <mark.rutland@arm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Tue, 16 Jan 2024 14:51:21 -0400 [thread overview]
Message-ID: <20240116185121.GB980613@nvidia.com> (raw)
In-Reply-To: <20231206125919.GP2692119@nvidia.com>
Hey Catalin,
I'm just revising this and I'm wondering if you know why ARM64 has this:
#define __raw_writeq __raw_writeq
static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
{
asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
}
Instead of
#define __raw_writeq __raw_writeq
static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
{
asm volatile("str %x0, %1" : : "rZ" (val), "m" (*(volatile u64 *)addr));
}
?? Like x86 has.
The codegen for a 64 byte unrolled copy loop is way better with "m" on gcc:
"r" constraint (gcc 13.2.0):
.L3:
ldr x3, [x1]
str x3, [x0]
ldr x3, [x1, 8]
add x4, x0, 8
str x3, [x4]
ldr x3, [x1, 16]
add x4, x0, 16
str x3, [x4]
ldr x3, [x1, 24]
add x4, x0, 24
str x3, [x4]
ldr x3, [x1, 32]
add x4, x0, 32
str x3, [x4]
ldr x3, [x1, 40]
add x4, x0, 40
str x3, [x4]
ldr x3, [x1, 48]
add x4, x0, 48
str x3, [x4]
ldr x3, [x1, 56]
add x4, x0, 56
str x3, [x4]
add x1, x1, 64
add x0, x0, 64
cmp x2, x1
bhi .L3
"m" constraint:
.L3:
ldp x10, x9, [x1]
ldp x8, x7, [x1, 16]
ldp x6, x5, [x1, 32]
ldp x4, x3, [x1, 48]
str x10, [x0]
str x9, [x0, 8]
str x8, [x0, 16]
str x7, [x0, 24]
str x6, [x0, 32]
str x5, [x0, 40]
str x4, [x0, 48]
str x3, [x0, 56]
add x1, x1, 64
add x0, x0, 64
cmp x2, x1
bhi .L3
clang 17 doesn't do any better either way, it doesn't seem to do
anything with 'm', but I guess it could..
clang 17 (either):
.LBB0_2: // =>This Inner Loop Header: Depth=1
ldp x9, x10, [x1]
add x14, x0, #8
add x18, x0, #40
ldp x11, x12, [x1, #16]
add x2, x0, #48
add x3, x0, #56
ldp x13, x15, [x1, #32]
ldp x16, x17, [x1, #48]
str x9, [x0]
str x10, [x14]
add x9, x0, #16
add x10, x0, #24
add x14, x0, #32
str x11, [x9]
str x12, [x10]
str x13, [x14]
str x15, [x18]
str x16, [x2]
str x17, [x3]
add x1, x1, #64
add x0, x0, #64
cmp x1, x8
b.lo .LBB0_2
It doesn't matter for this series, but it seems like something ARM64
might want to look at to improve..
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-01-16 18:51 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-23 19:04 [PATCH rdma-next 0/2] Add and use memcpy_toio_64() Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-23 19:04 ` [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-24 10:16 ` Mark Rutland
2023-11-24 10:16 ` Mark Rutland
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-27 12:42 ` Catalin Marinas
2023-11-27 12:42 ` Catalin Marinas
2023-11-27 13:45 ` Jason Gunthorpe
2023-11-27 13:45 ` Jason Gunthorpe
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 19:34 ` Catalin Marinas
2023-12-05 19:34 ` Catalin Marinas
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 12:59 ` Jason Gunthorpe
2023-12-06 12:59 ` Jason Gunthorpe
2024-01-16 18:51 ` Jason Gunthorpe [this message]
2024-01-16 18:51 ` Jason Gunthorpe
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 13:29 ` Mark Rutland
2024-01-17 13:29 ` Mark Rutland
2024-01-23 20:38 ` Catalin Marinas
2024-01-23 20:38 ` Catalin Marinas
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 17:54 ` Catalin Marinas
2024-01-24 17:54 ` Catalin Marinas
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-24 11:38 ` Mark Rutland
2024-01-24 11:38 ` Mark Rutland
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 17:22 ` Catalin Marinas
2024-01-24 17:22 ` Catalin Marinas
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-17 14:07 ` Mark Rutland
2024-01-17 14:07 ` Mark Rutland
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 16:05 ` Will Deacon
2024-01-17 16:05 ` Will Deacon
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-24 11:31 ` Mark Rutland
2024-01-24 11:31 ` Mark Rutland
2023-11-24 12:58 ` Robin Murphy
2023-11-24 12:58 ` Robin Murphy
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 15:32 ` Robin Murphy
2023-11-24 15:32 ` Robin Murphy
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-28 16:28 ` Niklas Schnelle
2023-11-28 16:28 ` Niklas Schnelle
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:25 ` Niklas Schnelle
2024-01-18 16:25 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:39 ` Jason Gunthorpe
2024-02-16 12:39 ` Jason Gunthorpe
2023-11-23 19:04 ` [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240116185121.GB980613@nvidia.com \
--to=jgg@nvidia.com \
--cc=arnd@arndb.de \
--cc=catalin.marinas@arm.com \
--cc=leon@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=michaelgur@mellanox.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=schnelle@linux.ibm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.