linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
	Michael Guralnik <michaelgur@mellanox.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Wed, 24 Jan 2024 09:27:19 -0400	[thread overview]
Message-ID: <20240124132719.GF1455070@nvidia.com> (raw)
In-Reply-To: <ZbEFPbT7vl6HN4lk@arm.com>

On Wed, Jan 24, 2024 at 12:40:29PM +0000, Catalin Marinas wrote:

> > Just to be clear, that means we should drop this patch ("arm64/io: add
> > memcpy_toio_64") for now, right?
> 
> In its current form yes, but that doesn't mean that memcpy_toio_64()
> cannot be reworked differently.

I gave up on touching memcpy_toio_64(), it doesn't work very well
because of the weak alignment

Instead I followed your suggestion to fix __iowrite64_copy()

There are only a couple of places that use this API:

drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:     __iowrite32_copy(mbox->reg.bar_reg, &init, sizeof(init) / 4);
drivers/mtd/nand/raw/mxc_nand.c:        __iowrite32_copy(trg, src, size / 4);
drivers/net/ethernet/amazon/ena/ena_eth_com.c:  __iowrite64_copy(io_sq->desc_addr.pbuf_dev_addr + dst_offset,
drivers/net/ethernet/broadcom/bnxt/bnxt.c:                      __iowrite64_copy(db, tx_push_buf, 16);
drivers/net/ethernet/broadcom/bnxt/bnxt.c:                      __iowrite32_copy(db + 4, tx_push_buf + 1,
drivers/net/ethernet/broadcom/bnxt/bnxt.c:                      __iowrite64_copy(db, tx_push_buf, push_len);
drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c: __iowrite32_copy(bp->bar0 + bar_offset, data, msg_len / 4);
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:        __iowrite64_copy(ring->tqp->mem_base, desc,
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:        __iowrite64_copy(ring->tqp->mem_base + HNS3_MEM_DOORBELL_OFFSET,
drivers/net/ethernet/mellanox/mlx4/en_tx.c:     __iowrite64_copy(dst, src, bytecnt / 8);
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)
drivers/net/ethernet/sfc/tx.c:  __iowrite64_copy(*piobuf, data, block_len >> 3);
drivers/net/ethernet/sfc/tx.c:          __iowrite64_copy(*piobuf, copy_buf->buf,
drivers/net/ethernet/sfc/tx.c:          __iowrite64_copy(piobuf, copy_buf->buf,
drivers/net/ethernet/sfc/tx.c:          __iowrite64_copy(tx_queue->piobuf, skb->data,
drivers/net/wireless/mediatek/mt76/mmio.c:      __iowrite32_copy(dev->mmio.regs + offset, data, DIV_ROUND_UP(len, 4));
drivers/net/wireless/ralink/rt2x00/rt2x00mmio.h:        __iowrite32_copy(rt2x00dev->csr.base + offset, value, length >> 2);
drivers/remoteproc/mtk_scp_ipi.c:       __iowrite32_copy(dst + i, src + i, (len - i) / 4);
drivers/rpmsg/qcom_glink_rpm.c:         __iowrite32_copy(pipe->fifo + head, data,
drivers/rpmsg/qcom_glink_rpm.c:         __iowrite32_copy(pipe->fifo, data + len,
drivers/rpmsg/qcom_smd.c:               __iowrite32_copy(dst, src, count / sizeof(u32));
drivers/scsi/lpfc/lpfc_compat.h:        __iowrite32_copy(dest, src, bytes / sizeof(uint32_t));
drivers/slimbus/qcom-ctrl.c:    __iowrite32_copy(ctrl->base + tx_reg, buf, count);
drivers/soc/qcom/qcom_aoss.c:   __iowrite32_copy(qmp->msgram + qmp->offset + sizeof(u32),
drivers/soc/qcom/spm.c: __iowrite32_copy(addr, drv->reg_data->seq,
drivers/spi/spi-hisi-sfc-v3xx.c:                __iowrite32_copy(to, from, words);
sound/soc/intel/atom/sst/sst_loader.c:  __iowrite32_copy(dst, src, count / 4);
sound/soc/sof/iomem-utils.c:    __iowrite32_copy(dest, src, m);

At least the networking ones I recognize as performance paths, we
don't want to degrade them.

__iowrite64_copy() has a sufficient API that the compiler can inline
the STP block as this patch did.

I experimented with having memcpy_toio_64() invoke __iowrite64_copy(),
but it did not look very nice. Maybe there is a possible performance
win there, I don't know.

> > > If eight STRs without other operations interleaved give us the
> > > write-combining on most CPUs (with Normal NC), we should go with this
> > > instead of STP.
> > 
> > Agreed; I've sent out a patch to allow the offset addressing at:
> > 
> >   https://lore.kernel.org/linux-arm-kernel/20240124111259.874975-1-mark.rutland@arm.com/
> > 
> > ... and it should be possible to build atop that to use eight STRs.
> 
> That's great, thanks.

It is a nice patch but it does not really help this problem. The
compiler cannot be trusted to use the new writeq() properly, eg clang
doesn't optimize the new constraint at all.

Regardless this has to be a fixed inline assembly block of either STR
or STP.

Jason

  reply	other threads:[~2024-01-24 13:27 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-23 19:04 [PATCH rdma-next 0/2] Add and use memcpy_toio_64() Leon Romanovsky
2023-11-23 19:04 ` [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Leon Romanovsky
2023-11-24 10:16   ` Mark Rutland
2023-11-24 12:23     ` Jason Gunthorpe
2023-11-27 12:42       ` Catalin Marinas
2023-11-27 13:45         ` Jason Gunthorpe
2023-12-04 17:31           ` Catalin Marinas
2023-12-04 18:23             ` Jason Gunthorpe
2023-12-05 17:21               ` Catalin Marinas
2023-12-05 17:51                 ` Jason Gunthorpe
2023-12-05 19:34                   ` Catalin Marinas
2023-12-05 19:51                     ` Jason Gunthorpe
2023-12-06 11:09                       ` Catalin Marinas
2023-12-06 12:59                         ` Jason Gunthorpe
2024-01-16 18:51                           ` Jason Gunthorpe
2024-01-17 12:30                             ` Mark Rutland
2024-01-17 12:36                               ` Jason Gunthorpe
2024-01-17 12:41                                 ` Jason Gunthorpe
2024-01-17 13:29                                 ` Mark Rutland
2024-01-23 20:38                                   ` Catalin Marinas
2024-01-24  1:27                                     ` Jason Gunthorpe
2024-01-24  8:26                                       ` Marc Zyngier
2024-01-24 13:06                                         ` Jason Gunthorpe
2024-01-24 13:32                                           ` Marc Zyngier
2024-01-24 15:52                                             ` Jason Gunthorpe
2024-01-24 17:54                                               ` Catalin Marinas
2024-01-25  1:29                                                 ` Jason Gunthorpe
2024-01-26 16:15                                                   ` Catalin Marinas
2024-01-26 17:09                                                     ` Jason Gunthorpe
2024-01-24 11:38                                     ` Mark Rutland
2024-01-24 12:40                                       ` Catalin Marinas
2024-01-24 13:27                                         ` Jason Gunthorpe [this message]
2024-01-24 17:22                                           ` Catalin Marinas
2024-01-24 19:26                                             ` Jason Gunthorpe
2024-01-25 17:43                                               ` Jason Gunthorpe
2024-01-26 14:56                                                 ` Catalin Marinas
2024-01-26 15:24                                                   ` Jason Gunthorpe
2024-01-17 14:07                               ` Mark Rutland
2024-01-17 15:28                                 ` Jason Gunthorpe
2024-01-17 16:05                                   ` Will Deacon
2024-01-18 16:18                                     ` Jason Gunthorpe
2024-01-24 11:31                                       ` Mark Rutland
2023-11-24 12:58   ` Robin Murphy
2023-11-24 13:45     ` Jason Gunthorpe
2023-11-24 15:32       ` Robin Murphy
2023-11-24 14:10   ` Niklas Schnelle
2023-11-24 14:20     ` Jason Gunthorpe
2023-11-24 14:48       ` Niklas Schnelle
2023-11-24 14:53         ` Niklas Schnelle
2023-11-24 14:55         ` Jason Gunthorpe
2023-11-24 15:59           ` Niklas Schnelle
2023-11-24 16:06             ` Jason Gunthorpe
2023-11-27 17:43               ` Niklas Schnelle
2023-11-27 17:51                 ` Jason Gunthorpe
2023-11-28 16:28                   ` Niklas Schnelle
2024-01-16 17:33                     ` Jason Gunthorpe
2024-01-17 13:20                       ` Niklas Schnelle
2024-01-17 13:26                         ` Jason Gunthorpe
2024-01-17 17:55                           ` Jason Gunthorpe
2024-01-18 13:46                             ` Niklas Schnelle
2024-01-18 14:00                               ` Jason Gunthorpe
2024-01-18 15:59                                 ` Niklas Schnelle
2024-01-18 16:21                                   ` Jason Gunthorpe
2024-01-18 16:25                                     ` Niklas Schnelle
2024-01-19 11:52                                       ` Niklas Schnelle
2024-02-16 12:09                                   ` Niklas Schnelle
2024-02-16 12:39                                     ` Jason Gunthorpe
2023-11-23 19:04 ` [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240124132719.GF1455070@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=leon@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=michaelgur@mellanox.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=schnelle@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).