netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dust Li <dust.li@linux.alibaba.com>
To: Alexandra Winter <wintera@linux.ibm.com>,
	Wenjia Zhang <wenjia@linux.ibm.com>,
	Jan Karcher <jaka@linux.ibm.com>,
	Gerd Bayer <gbayer@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	"D. Wythe" <alibuda@linux.alibaba.com>,
	Tony Lu <tonylu@linux.alibaba.com>,
	Wen Gu <guwen@linux.alibaba.com>,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Julian Ruess <julianr@linux.ibm.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Thorsten Winkler <twinkler@linux.ibm.com>,
	netdev@vger.kernel.org, linux-s390@vger.kernel.org,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Simon Horman <horms@kernel.org>
Subject: Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
Date: Mon, 20 Jan 2025 11:55:25 +0800	[thread overview]
Message-ID: <20250120035525.GK89233@linux.alibaba.com> (raw)
In-Reply-To: <20250115195527.2094320-6-wintera@linux.ibm.com>

On 2025-01-15 20:55:25, Alexandra Winter wrote:
>The first stage of ism_loopback was implemented as part of the
>SMC module [1]. Now that we have the ism layer, provide
>access to the ism_loopback device to all ism clients.
>
>Move ism_loopback.* from net/smc to net/ism.
>The following changes are required to ism_loopback.c:
>- Change ism_lo_move_data() to no longer schedule an smcd receive tasklet,
>but instead call ism_client->handle_irq().
>Note: In this RFC patch ism_loppback is not fully generic.
>  The smc-d client uses attached buffers, for moves without signalling.
>  and not-attached buffers for moves with signalling.
>  ism_lo_move_data() must not rely on that assumption.
>  ism_lo_move_data() must be able to handle more than one ism client.
>
>In addition the following changes are required to unify ism_loopback and
>ism_vp:
>
>In ism layer and ism_vpci:
>ism_loopback is not backed by a pci device, so use dev instead of pdev in
>ism_dev.
>
>In smc-d:
>in smcd_alloc_dev():
>- use kernel memory instead of device memory for smcd_dev and smcd->conn.
>        An alternative would be to ask device to alloc the memory.
>- use different smcd_ops and max_dmbs for ism_vp and ism_loopback.
>    A future patch can change smc-d to directly use ism_ops instead of
>    smcd_ops.
>- use ism dev_name instead of pci dev name for ism_evt_wq name
>- allocate an event workqueue for ism_loopback, although it currently does
>  not generate events.
>
>Link: https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ [1]
>
>Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
>---
> drivers/s390/net/ism.h     |   6 +-
> drivers/s390/net/ism_drv.c |  31 ++-
> include/linux/ism.h        |  59 +++++
> include/net/smc.h          |   4 +-
> net/ism/Kconfig            |  13 ++
> net/ism/Makefile           |   1 +
> net/ism/ism_loopback.c     | 366 +++++++++++++++++++++++++++++++
> net/ism/ism_loopback.h     |  59 +++++
> net/ism/ism_main.c         |  11 +-
> net/smc/Kconfig            |  13 --
> net/smc/Makefile           |   1 -
> net/smc/af_smc.c           |  12 +-
> net/smc/smc_ism.c          | 108 +++++++---
> net/smc/smc_loopback.c     | 427 -------------------------------------
> net/smc/smc_loopback.h     |  60 ------
> 15 files changed, 606 insertions(+), 565 deletions(-)
> create mode 100644 net/ism/ism_loopback.c
> create mode 100644 net/ism/ism_loopback.h
> delete mode 100644 net/smc/smc_loopback.c
> delete mode 100644 net/smc/smc_loopback.h
>
>diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
>index 61cf10334170..0deca6d0e328 100644
>--- a/drivers/s390/net/ism.h
>+++ b/drivers/s390/net/ism.h
>@@ -202,7 +202,7 @@ struct ism_sba {
> static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> 				  unsigned long offset, unsigned long len)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, 8);
> 
> 	while (len > 0) {
>@@ -216,7 +216,7 @@ static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> 				   unsigned long offset, unsigned long len)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, len);
> 
> 	if (len)
>@@ -226,7 +226,7 @@ static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> static inline int __ism_move(struct ism_dev *ism, u64 dmb_req, void *data,
> 			     unsigned int size)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, size);
> 
> 	return __zpci_store_block(data, req, dmb_req);
>diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
>index ab70debbdd9d..c0954d6dd9f5 100644
>--- a/drivers/s390/net/ism_drv.c
>+++ b/drivers/s390/net/ism_drv.c
>@@ -88,7 +88,7 @@ static int register_sba(struct ism_dev *ism)
> 	dma_addr_t dma_handle;
> 	struct ism_sba *sba;
> 
>-	sba = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+	sba = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> 				 GFP_KERNEL);
> 	if (!sba)
> 		return -ENOMEM;
>@@ -99,7 +99,7 @@ static int register_sba(struct ism_dev *ism)
> 	cmd.request.sba = dma_handle;
> 
> 	if (ism_cmd(ism, &cmd)) {
>-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, sba, dma_handle);
>+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, sba, dma_handle);
> 		return -EIO;
> 	}
> 
>@@ -115,7 +115,7 @@ static int register_ieq(struct ism_dev *ism)
> 	dma_addr_t dma_handle;
> 	struct ism_eq *ieq;
> 
>-	ieq = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+	ieq = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> 				 GFP_KERNEL);
> 	if (!ieq)
> 		return -ENOMEM;
>@@ -127,7 +127,7 @@ static int register_ieq(struct ism_dev *ism)
> 	cmd.request.len = sizeof(*ieq);
> 
> 	if (ism_cmd(ism, &cmd)) {
>-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, ieq, dma_handle);
>+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, ieq, dma_handle);
> 		return -EIO;
> 	}
> 
>@@ -149,7 +149,7 @@ static int unregister_sba(struct ism_dev *ism)
> 	if (ret && ret != ISM_ERROR)
> 		return -EIO;
> 
>-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> 			  ism->sba, ism->sba_dma_addr);
> 
> 	ism->sba = NULL;
>@@ -169,7 +169,7 @@ static int unregister_ieq(struct ism_dev *ism)
> 	if (ret && ret != ISM_ERROR)
> 		return -EIO;
> 
>-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> 			  ism->ieq, ism->ieq_dma_addr);
> 
> 	ism->ieq = NULL;
>@@ -216,7 +216,7 @@ static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
> static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> {
> 	clear_bit(dmb->sba_idx, ism->sba_bitmap);
>-	dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len,
>+	dma_unmap_page(ism->dev.parent, dmb->dma_addr, dmb->dmb_len,
> 		       DMA_FROM_DEVICE);
> 	folio_put(virt_to_folio(dmb->cpu_addr));
> }
>@@ -227,7 +227,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> 	unsigned long bit;
> 	int rc;
> 
>-	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev))
>+	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(ism->dev.parent))
> 		return -EINVAL;
> 
> 	if (!dmb->sba_idx) {
>@@ -251,10 +251,10 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> 	}
> 
> 	dmb->cpu_addr = folio_address(folio);
>-	dmb->dma_addr = dma_map_page(&ism->pdev->dev,
>+	dmb->dma_addr = dma_map_page(ism->dev.parent,
> 				     virt_to_page(dmb->cpu_addr), 0,
> 				     dmb->dmb_len, DMA_FROM_DEVICE);
>-	if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) {
>+	if (dma_mapping_error(ism->dev.parent, dmb->dma_addr)) {
> 		rc = -ENOMEM;
> 		goto out_free;
> 	}
>@@ -419,10 +419,7 @@ static int ism_supports_v2(void)
> 
> static u16 ism_get_chid(struct ism_dev *ism)
> {
>-	if (!ism || !ism->pdev)
>-		return 0;
>-
>-	return to_zpci(ism->pdev)->pchid;
>+	return to_zpci(to_pci_dev(ism->dev.parent))->pchid;
> }
> 
> static void ism_handle_event(struct ism_dev *ism)
>@@ -499,7 +496,7 @@ static const struct ism_ops ism_vp_ops = {
> 
> static int ism_dev_init(struct ism_dev *ism)
> {
>-	struct pci_dev *pdev = ism->pdev;
>+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> 	int ret;
> 
> 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
>@@ -565,7 +562,6 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 
> 	spin_lock_init(&ism->lock);
> 	dev_set_drvdata(&pdev->dev, ism);
>-	ism->pdev = pdev;
> 	ism->dev.parent = &pdev->dev;
> 	device_initialize(&ism->dev);
> 	dev_set_name(&ism->dev, dev_name(&pdev->dev));
>@@ -603,14 +599,13 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 	device_del(&ism->dev);
> err_dev:
> 	dev_set_drvdata(&pdev->dev, NULL);
>-	kfree(ism);
> 
> 	return ret;
> }
> 
> static void ism_dev_exit(struct ism_dev *ism)
> {
>-	struct pci_dev *pdev = ism->pdev;
>+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> 	unsigned long flags;
> 	int i;
> 
>diff --git a/include/linux/ism.h b/include/linux/ism.h
>index bc165d077071..929a1f275419 100644
>--- a/include/linux/ism.h
>+++ b/include/linux/ism.h
>@@ -144,6 +144,9 @@ int  ism_unregister_client(struct ism_client *client);
>  *	identified by dmb_tok and idx. If signal flag (sf) then signal
>  *	the remote peer that data has arrived in this dmb.
>  *
>+ * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+ *	Unregister an ism_dmb buffer
>+ *
>  * int (*supports_v2)(void);
>  *
>  * u16 (*get_chid)(struct ism_dev *dev);
>@@ -218,12 +221,63 @@ struct ism_ops {
> 	int (*reset_vlan_required)(struct ism_dev *dev);
> 	int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
> 			    u32 trigger_irq, u32 event_code, u64 info);
>+/* no copy option
>+ * --------------
>+ */
>+	/**
>+	 * support_dmb_nocopy() - does this device provide no-copy option?
>+	 * @dev: ism device
>+	 *
>+	 * In addition to using move_data(), a sender device can provide a
>+	 * kernel address + length, that represents a target dmb
>+	 * (like MMIO). If a sender writes into such a ghost-send-buffer
>+	 * (= at this kernel address) the data will automatically
>+	 * immediately appear in the target dmb, even without calling
>+	 * move_data().
>+	 * Note that this is NOT related to the MSG_ZEROCOPY socket flag.
>+	 *
>+	 * Either all 3 function pointers for support_dmb_nocopy(),
>+	 * attach_dmb() and detach_dmb() are defined, or all of them must
>+	 * be NULL.
>+	 *
>+	 * Return: non-zero, if no-copy is supported.
>+	 */
>+	int (*support_dmb_nocopy)(struct ism_dev *dev);
>+	/**
>+	 * attach_dmb() - attach local memory to a remote dmb
>+	 * @dev: Local sending ism device
>+	 * @dmb: all other parameters are passed in the form of a
>+	 *	 dmb struct
>+	 *	 TODO: (THIS IS CONFUSING, should be changed)

Agree.

>+	 *  dmb_tok: (in) Token of the remote dmb, we want to attach to
>+	 *  cpu_addr: (out) MMIO address
>+	 *  dma_addr: (out) MMIO address (if applicable, invalid otherwise)
>+	 *  dmb_len: (out) length of local MMIO region,
>+	 *           equal to length of remote DMB.
>+	 *  sba_idx: (out) index of remote dmb (NOT HELPFUL, should be removed)
>+	 *
>+	 * Provides a memory address to the sender that can be used to
>+	 * directly write into the remote dmb.
>+	 *
>+	 * Return: Zero upon success, Error code otherwise
>+	 */
>+	int (*attach_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+	/**
>+	 * detach_dmb() - Detach the ghost buffer from a remote dmb
>+	 * @dev: ism device
>+	 * @token: dmb token of the remote dmb
>+	 * Return: Zero upon success, Error code otherwise
>+	 */
>+	int (*detach_dmb)(struct ism_dev *dev, u64 token);
> };
> 

...

>+
>+static int ism_lo_move_data(struct ism_dev *ism, u64 dmb_tok,
>+			    unsigned int idx, bool sf, unsigned int offset,
>+			    void *data, unsigned int size)
>+{
>+	struct ism_lo_dmb_node *rmb_node = NULL, *tmp_node;
>+	struct ism_lo_dev *ldev;
>+	u16 s_mask;
>+	u8 client_id;
>+	u32 sba_idx;
>+
>+	ldev = container_of(ism, struct ism_lo_dev, ism);
>+
>+	if (!sf)
>+		/* since sndbuf is merged with peer DMB, there is
>+		 * no need to copy data from sndbuf to peer DMB.
>+		 */
>+		return 0;
>+
>+	read_lock_bh(&ldev->dmb_ht_lock);
>+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
>+		if (tmp_node->token == dmb_tok) {
>+			rmb_node = tmp_node;
>+			break;
>+		}
>+	}
>+	if (!rmb_node) {
>+		read_unlock_bh(&ldev->dmb_ht_lock);
>+		return -EINVAL;
>+	}
>+	// So why copy the data now?? SMC usecase? Data buffer is attached,
>+	// rw-pointer are not attached?

I understand the confusion here. I have the same confusion the first time
I saw this.

This is actually the tricky part: it assumes the CDC will signal, while
the data will not. We need to copy the CDC, so the copy here only to the
CDC.

I think we should refine the move_data() API to make this clearer.

Best regards,
Dust


  reply	other threads:[~2025-01-20  3:55 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
2025-01-16 20:08   ` Andrew Lunn
2025-01-17 12:06     ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 2/7] net/ism: Remove dependencies between ISM_VPCI and SMC Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID Alexandra Winter
2025-01-20 17:18   ` Simon Horman
2025-01-22 14:46     ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
2025-01-15 22:06   ` Halil Pasic
2025-01-20  6:32   ` Dust Li
2025-01-20  9:56     ` Alexandra Winter
2025-01-20 10:07       ` Julian Ruess
2025-01-20 11:35         ` Alexandra Winter
2025-01-20 10:34     ` Niklas Schnelle
2025-01-22 15:02       ` Dust Li
2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
2025-01-20  3:55   ` Dust Li [this message]
2025-01-20  9:31     ` Alexandra Winter
2025-02-06 17:36   ` Julian Ruess
2025-02-10 10:39     ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 6/7] s390/ism: Define ismvp_dev Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 7/7] net/smc: Use only ism_ops Alexandra Winter
2025-01-16  9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
2025-01-16 11:55   ` Julian Ruess
2025-01-16 16:17     ` Alexandra Winter
2025-01-16 17:08       ` Julian Ruess
2025-01-17  2:13       ` Dust Li
2025-01-17 10:38         ` Niklas Schnelle
2025-01-17 15:02           ` Andrew Lunn
2025-01-17 16:00             ` Niklas Schnelle
2025-01-17 16:33               ` Andrew Lunn
2025-01-17 16:57                 ` Niklas Schnelle
2025-01-17 20:29                   ` Andrew Lunn
2025-01-20  6:21                     ` Dust Li
2025-01-20 12:03                       ` Alexandra Winter
2025-01-20 16:01                         ` Andrew Lunn
2025-01-20 17:25                           ` Alexandra Winter
2025-01-18 15:31           ` Dust Li
2025-01-28 16:04             ` Alexandra Winter
2025-02-10  5:08               ` Dust Li
2025-02-10  9:38                 ` Alexandra Winter
2025-02-11  1:57                   ` Dust Li
2025-02-16 15:40                   ` Wen Gu
2025-02-19 11:25                     ` [RFC net-next 0/7] Provide an ism layer - naming Alexandra Winter
2025-02-25  1:36                       ` Dust Li
2025-02-25  8:40                         ` Alexandra Winter
2025-01-17 13:00         ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-17 15:10           ` Andrew Lunn
2025-01-17 16:20             ` Alexandra Winter
2025-01-20 10:28           ` Alexandra Winter
2025-01-22  3:04             ` Dust Li
2025-01-22 12:02               ` Alexandra Winter
2025-01-22 12:05                 ` Alexandra Winter
2025-01-22 14:10                   ` Dust Li
2025-01-17 15:06       ` Andrew Lunn
2025-01-17 15:38         ` Alexandra Winter
2025-02-16 15:38       ` Wen Gu
2025-01-17 11:04   ` Alexandra Winter
2025-01-18 15:24     ` Dust Li
2025-01-20 11:45       ` Alexandra Winter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250120035525.GK89233@linux.alibaba.com \
    --to=dust.li@linux.alibaba.com \
    --cc=agordeev@linux.ibm.com \
    --cc=alibuda@linux.alibaba.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=borntraeger@linux.ibm.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gbayer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=guwen@linux.alibaba.com \
    --cc=hca@linux.ibm.com \
    --cc=horms@kernel.org \
    --cc=jaka@linux.ibm.com \
    --cc=julianr@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=oberpar@linux.ibm.com \
    --cc=pabeni@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=twinkler@linux.ibm.com \
    --cc=wenjia@linux.ibm.com \
    --cc=wintera@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).