All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Justin Stitt <justinstitt@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org,
	llvm@lists.linux.dev, Ingo Molnar <mingo@redhat.com>,
	Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Jijie Shao <shaojijie@huawei.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Leon Romanovsky <leonro@mellanox.com>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Guralnik <michaelgur@mellanox.com>,
	patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy()
Date: Fri, 1 Mar 2024 18:52:32 +0000	[thread overview]
Message-ID: <ZeIj8HtdbKS3eqG6@arm.com> (raw)
In-Reply-To: <4-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com>

On Tue, Feb 20, 2024 at 09:17:08PM -0400, Jason Gunthorpe wrote:
> The kernel provides driver support for using write combining IO memory
> through the __iowriteXX_copy() API which is commonly used as an optional
> optimization to generate 16/32/64 byte MemWr TLPs in a PCIe environment.
> 
> iomap_copy.c provides a generic implementation as a simple 4/8 byte at a
> time copy loop that has worked well with past ARM64 CPUs, giving a high
> frequency of large TLPs being successfully formed.
> 
> However modern ARM64 CPUs are quite sensitive to how the write combining
> CPU HW is operated and a compiler generated loop with intermixed
> load/store is not sufficient to frequently generate a large TLP. The CPUs
> would like to see the entire TLP generated by consecutive store
> instructions from registers. Compilers like gcc tend to intermix loads and
> stores and have poor code generation, in part, due to the ARM64 situation
> that writeq() does not codegen anything other than "[xN]". However even
> with that resolved compilers like clang still do not have good code
> generation.
> 
> This means on modern ARM64 CPUs the rate at which __iowriteXX_copy()
> successfully generates large TLPs is very small (less than 1 in 10,000)
> tries), to the point that the use of WC is pointless.
> 
> Implement __iowrite32/64_copy() specifically for ARM64 and use inline
> assembly to build consecutive blocks of STR instructions. Provide direct
> support for 64/32/16 large TLP generation in this manner. Optimize for
> common constant lengths so that the compiler can directly inline the store
> blocks.
> 
> This brings the frequency of large TLP generation up to a high level that
> is comparable with older CPU generations.
> 
> As the __iowriteXX_copy() family of APIs is intended for use with WC
> incorporate the DGH hint directly into the function.
> 
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Apart from the slightly more complicated code, I don't expect it to make
things worse on any of the existing hardware.

So, with the typo fix that Will mentioned:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Justin Stitt <justinstitt@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org,
	llvm@lists.linux.dev, Ingo Molnar <mingo@redhat.com>,
	Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Jijie Shao <shaojijie@huawei.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Leon Romanovsky <leonro@mellanox.com>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Guralnik <michaelgur@mellanox.com>,
	patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy()
Date: Fri, 1 Mar 2024 18:52:32 +0000	[thread overview]
Message-ID: <ZeIj8HtdbKS3eqG6@arm.com> (raw)
In-Reply-To: <4-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com>

On Tue, Feb 20, 2024 at 09:17:08PM -0400, Jason Gunthorpe wrote:
> The kernel provides driver support for using write combining IO memory
> through the __iowriteXX_copy() API which is commonly used as an optional
> optimization to generate 16/32/64 byte MemWr TLPs in a PCIe environment.
> 
> iomap_copy.c provides a generic implementation as a simple 4/8 byte at a
> time copy loop that has worked well with past ARM64 CPUs, giving a high
> frequency of large TLPs being successfully formed.
> 
> However modern ARM64 CPUs are quite sensitive to how the write combining
> CPU HW is operated and a compiler generated loop with intermixed
> load/store is not sufficient to frequently generate a large TLP. The CPUs
> would like to see the entire TLP generated by consecutive store
> instructions from registers. Compilers like gcc tend to intermix loads and
> stores and have poor code generation, in part, due to the ARM64 situation
> that writeq() does not codegen anything other than "[xN]". However even
> with that resolved compilers like clang still do not have good code
> generation.
> 
> This means on modern ARM64 CPUs the rate at which __iowriteXX_copy()
> successfully generates large TLPs is very small (less than 1 in 10,000)
> tries), to the point that the use of WC is pointless.
> 
> Implement __iowrite32/64_copy() specifically for ARM64 and use inline
> assembly to build consecutive blocks of STR instructions. Provide direct
> support for 64/32/16 large TLP generation in this manner. Optimize for
> common constant lengths so that the compiler can directly inline the store
> blocks.
> 
> This brings the frequency of large TLP generation up to a high level that
> is comparable with older CPU generations.
> 
> As the __iowriteXX_copy() family of APIs is intended for use with WC
> incorporate the DGH hint directly into the function.
> 
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Apart from the slightly more complicated code, I don't expect it to make
things worse on any of the existing hardware.

So, with the typo fix that Will mentioned:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2024-03-01 18:52 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-21  1:17 [PATCH 0/6] Fix mlx5 write combining support on new ARM64 cores Jason Gunthorpe
2024-02-21  1:17 ` Jason Gunthorpe
2024-02-21  1:17 ` [PATCH 1/6] x86: Stop using weak symbols for __iowrite32_copy() Jason Gunthorpe
2024-02-21  1:17   ` Jason Gunthorpe
2024-02-21  1:17 ` [PATCH 2/6] s390: Implement __iowrite32_copy() Jason Gunthorpe
2024-02-21  1:17   ` Jason Gunthorpe
2024-02-21  1:17 ` [PATCH 3/6] s390: Stop using weak symbols for __iowrite64_copy() Jason Gunthorpe
2024-02-21  1:17   ` Jason Gunthorpe
2024-02-21  1:17 ` [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy() Jason Gunthorpe
2024-02-21  1:17   ` Jason Gunthorpe
2024-02-21 19:22   ` Will Deacon
2024-02-21 19:22     ` Will Deacon
2024-02-21 23:28     ` Jason Gunthorpe
2024-02-21 23:28       ` Jason Gunthorpe
2024-02-22 22:05   ` David Laight
2024-02-22 22:05     ` David Laight
2024-02-22 22:36     ` Jason Gunthorpe
2024-02-22 22:36       ` Jason Gunthorpe
2024-02-23  9:07       ` David Laight
2024-02-23  9:07         ` David Laight
2024-02-23 11:01         ` Niklas Schnelle
2024-02-23 11:01           ` Niklas Schnelle
2024-02-23 11:05           ` David Laight
2024-02-23 11:05             ` David Laight
2024-02-23 12:53             ` Jason Gunthorpe
2024-02-23 12:53               ` Jason Gunthorpe
2024-02-23 11:38         ` Niklas Schnelle
2024-02-23 11:38           ` Niklas Schnelle
2024-02-23 12:19           ` David Laight
2024-02-23 12:19             ` David Laight
2024-02-23 13:03             ` Jason Gunthorpe
2024-02-23 13:03               ` Jason Gunthorpe
2024-02-23 13:52               ` David Laight
2024-02-23 13:52                 ` David Laight
2024-02-23 14:44                 ` Jason Gunthorpe
2024-02-23 14:44                   ` Jason Gunthorpe
2024-02-23 12:58           ` Jason Gunthorpe
2024-02-23 12:58             ` Jason Gunthorpe
2024-02-23 16:35             ` Niklas Schnelle
2024-02-23 16:35               ` Niklas Schnelle
2024-02-23 17:05               ` Jason Gunthorpe
2024-02-23 17:05                 ` Jason Gunthorpe
2024-02-27 10:37   ` Catalin Marinas
2024-02-27 10:37     ` Catalin Marinas
2024-02-28 23:06     ` Jason Gunthorpe
2024-02-28 23:06       ` Jason Gunthorpe
2024-02-29 10:24       ` Catalin Marinas
2024-02-29 10:24         ` Catalin Marinas
2024-02-29 13:28         ` Jason Gunthorpe
2024-02-29 13:28           ` Jason Gunthorpe
2024-02-29 10:33   ` Catalin Marinas
2024-02-29 10:33     ` Catalin Marinas
2024-02-29 13:29     ` Jason Gunthorpe
2024-02-29 13:29       ` Jason Gunthorpe
2024-03-01 18:52   ` Catalin Marinas [this message]
2024-03-01 18:52     ` Catalin Marinas
2024-02-21  1:17 ` [PATCH 5/6] net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() Jason Gunthorpe
2024-02-21  1:17   ` Jason Gunthorpe
2024-02-22  0:57   ` Jijie Shao
2024-02-22  0:57     ` Jijie Shao
2024-02-21  1:17 ` [PATCH 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores Jason Gunthorpe
2024-02-21  1:17   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeIj8HtdbKS3eqG6@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jgg@nvidia.com \
    --cc=justinstitt@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@mellanox.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=michaelgur@mellanox.com \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=salil.mehta@huawei.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shaojijie@huawei.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.