From: Dust Li <dust.li@linux.alibaba.com>
To: Alexandra Winter <wintera@linux.ibm.com>,
Wenjia Zhang <wenjia@linux.ibm.com>,
Jan Karcher <jaka@linux.ibm.com>,
Gerd Bayer <gbayer@linux.ibm.com>,
Halil Pasic <pasic@linux.ibm.com>,
"D. Wythe" <alibuda@linux.alibaba.com>,
Tony Lu <tonylu@linux.alibaba.com>,
Wen Gu <guwen@linux.alibaba.com>,
Peter Oberparleiter <oberpar@linux.ibm.com>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Julian Ruess <julianr@linux.ibm.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Thorsten Winkler <twinkler@linux.ibm.com>,
netdev@vger.kernel.org, linux-s390@vger.kernel.org,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Simon Horman <horms@kernel.org>
Subject: Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
Date: Mon, 20 Jan 2025 11:55:25 +0800 [thread overview]
Message-ID: <20250120035525.GK89233@linux.alibaba.com> (raw)
In-Reply-To: <20250115195527.2094320-6-wintera@linux.ibm.com>
On 2025-01-15 20:55:25, Alexandra Winter wrote:
>The first stage of ism_loopback was implemented as part of the
>SMC module [1]. Now that we have the ism layer, provide
>access to the ism_loopback device to all ism clients.
>
>Move ism_loopback.* from net/smc to net/ism.
>The following changes are required to ism_loopback.c:
>- Change ism_lo_move_data() to no longer schedule an smcd receive tasklet,
>but instead call ism_client->handle_irq().
>Note: In this RFC patch ism_loppback is not fully generic.
> The smc-d client uses attached buffers, for moves without signalling.
> and not-attached buffers for moves with signalling.
> ism_lo_move_data() must not rely on that assumption.
> ism_lo_move_data() must be able to handle more than one ism client.
>
>In addition the following changes are required to unify ism_loopback and
>ism_vp:
>
>In ism layer and ism_vpci:
>ism_loopback is not backed by a pci device, so use dev instead of pdev in
>ism_dev.
>
>In smc-d:
>in smcd_alloc_dev():
>- use kernel memory instead of device memory for smcd_dev and smcd->conn.
> An alternative would be to ask device to alloc the memory.
>- use different smcd_ops and max_dmbs for ism_vp and ism_loopback.
> A future patch can change smc-d to directly use ism_ops instead of
> smcd_ops.
>- use ism dev_name instead of pci dev name for ism_evt_wq name
>- allocate an event workqueue for ism_loopback, although it currently does
> not generate events.
>
>Link: https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ [1]
>
>Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
>---
> drivers/s390/net/ism.h | 6 +-
> drivers/s390/net/ism_drv.c | 31 ++-
> include/linux/ism.h | 59 +++++
> include/net/smc.h | 4 +-
> net/ism/Kconfig | 13 ++
> net/ism/Makefile | 1 +
> net/ism/ism_loopback.c | 366 +++++++++++++++++++++++++++++++
> net/ism/ism_loopback.h | 59 +++++
> net/ism/ism_main.c | 11 +-
> net/smc/Kconfig | 13 --
> net/smc/Makefile | 1 -
> net/smc/af_smc.c | 12 +-
> net/smc/smc_ism.c | 108 +++++++---
> net/smc/smc_loopback.c | 427 -------------------------------------
> net/smc/smc_loopback.h | 60 ------
> 15 files changed, 606 insertions(+), 565 deletions(-)
> create mode 100644 net/ism/ism_loopback.c
> create mode 100644 net/ism/ism_loopback.h
> delete mode 100644 net/smc/smc_loopback.c
> delete mode 100644 net/smc/smc_loopback.h
>
>diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
>index 61cf10334170..0deca6d0e328 100644
>--- a/drivers/s390/net/ism.h
>+++ b/drivers/s390/net/ism.h
>@@ -202,7 +202,7 @@ struct ism_sba {
> static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> unsigned long offset, unsigned long len)
> {
>- struct zpci_dev *zdev = to_zpci(ism->pdev);
>+ struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, 8);
>
> while (len > 0) {
>@@ -216,7 +216,7 @@ static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> unsigned long offset, unsigned long len)
> {
>- struct zpci_dev *zdev = to_zpci(ism->pdev);
>+ struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, len);
>
> if (len)
>@@ -226,7 +226,7 @@ static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> static inline int __ism_move(struct ism_dev *ism, u64 dmb_req, void *data,
> unsigned int size)
> {
>- struct zpci_dev *zdev = to_zpci(ism->pdev);
>+ struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, size);
>
> return __zpci_store_block(data, req, dmb_req);
>diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
>index ab70debbdd9d..c0954d6dd9f5 100644
>--- a/drivers/s390/net/ism_drv.c
>+++ b/drivers/s390/net/ism_drv.c
>@@ -88,7 +88,7 @@ static int register_sba(struct ism_dev *ism)
> dma_addr_t dma_handle;
> struct ism_sba *sba;
>
>- sba = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+ sba = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> GFP_KERNEL);
> if (!sba)
> return -ENOMEM;
>@@ -99,7 +99,7 @@ static int register_sba(struct ism_dev *ism)
> cmd.request.sba = dma_handle;
>
> if (ism_cmd(ism, &cmd)) {
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, sba, dma_handle);
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE, sba, dma_handle);
> return -EIO;
> }
>
>@@ -115,7 +115,7 @@ static int register_ieq(struct ism_dev *ism)
> dma_addr_t dma_handle;
> struct ism_eq *ieq;
>
>- ieq = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+ ieq = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> GFP_KERNEL);
> if (!ieq)
> return -ENOMEM;
>@@ -127,7 +127,7 @@ static int register_ieq(struct ism_dev *ism)
> cmd.request.len = sizeof(*ieq);
>
> if (ism_cmd(ism, &cmd)) {
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, ieq, dma_handle);
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE, ieq, dma_handle);
> return -EIO;
> }
>
>@@ -149,7 +149,7 @@ static int unregister_sba(struct ism_dev *ism)
> if (ret && ret != ISM_ERROR)
> return -EIO;
>
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> ism->sba, ism->sba_dma_addr);
>
> ism->sba = NULL;
>@@ -169,7 +169,7 @@ static int unregister_ieq(struct ism_dev *ism)
> if (ret && ret != ISM_ERROR)
> return -EIO;
>
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> ism->ieq, ism->ieq_dma_addr);
>
> ism->ieq = NULL;
>@@ -216,7 +216,7 @@ static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
> static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> {
> clear_bit(dmb->sba_idx, ism->sba_bitmap);
>- dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len,
>+ dma_unmap_page(ism->dev.parent, dmb->dma_addr, dmb->dmb_len,
> DMA_FROM_DEVICE);
> folio_put(virt_to_folio(dmb->cpu_addr));
> }
>@@ -227,7 +227,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> unsigned long bit;
> int rc;
>
>- if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev))
>+ if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(ism->dev.parent))
> return -EINVAL;
>
> if (!dmb->sba_idx) {
>@@ -251,10 +251,10 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> }
>
> dmb->cpu_addr = folio_address(folio);
>- dmb->dma_addr = dma_map_page(&ism->pdev->dev,
>+ dmb->dma_addr = dma_map_page(ism->dev.parent,
> virt_to_page(dmb->cpu_addr), 0,
> dmb->dmb_len, DMA_FROM_DEVICE);
>- if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) {
>+ if (dma_mapping_error(ism->dev.parent, dmb->dma_addr)) {
> rc = -ENOMEM;
> goto out_free;
> }
>@@ -419,10 +419,7 @@ static int ism_supports_v2(void)
>
> static u16 ism_get_chid(struct ism_dev *ism)
> {
>- if (!ism || !ism->pdev)
>- return 0;
>-
>- return to_zpci(ism->pdev)->pchid;
>+ return to_zpci(to_pci_dev(ism->dev.parent))->pchid;
> }
>
> static void ism_handle_event(struct ism_dev *ism)
>@@ -499,7 +496,7 @@ static const struct ism_ops ism_vp_ops = {
>
> static int ism_dev_init(struct ism_dev *ism)
> {
>- struct pci_dev *pdev = ism->pdev;
>+ struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> int ret;
>
> ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
>@@ -565,7 +562,6 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>
> spin_lock_init(&ism->lock);
> dev_set_drvdata(&pdev->dev, ism);
>- ism->pdev = pdev;
> ism->dev.parent = &pdev->dev;
> device_initialize(&ism->dev);
> dev_set_name(&ism->dev, dev_name(&pdev->dev));
>@@ -603,14 +599,13 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> device_del(&ism->dev);
> err_dev:
> dev_set_drvdata(&pdev->dev, NULL);
>- kfree(ism);
>
> return ret;
> }
>
> static void ism_dev_exit(struct ism_dev *ism)
> {
>- struct pci_dev *pdev = ism->pdev;
>+ struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> unsigned long flags;
> int i;
>
>diff --git a/include/linux/ism.h b/include/linux/ism.h
>index bc165d077071..929a1f275419 100644
>--- a/include/linux/ism.h
>+++ b/include/linux/ism.h
>@@ -144,6 +144,9 @@ int ism_unregister_client(struct ism_client *client);
> * identified by dmb_tok and idx. If signal flag (sf) then signal
> * the remote peer that data has arrived in this dmb.
> *
>+ * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+ * Unregister an ism_dmb buffer
>+ *
> * int (*supports_v2)(void);
> *
> * u16 (*get_chid)(struct ism_dev *dev);
>@@ -218,12 +221,63 @@ struct ism_ops {
> int (*reset_vlan_required)(struct ism_dev *dev);
> int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
> u32 trigger_irq, u32 event_code, u64 info);
>+/* no copy option
>+ * --------------
>+ */
>+ /**
>+ * support_dmb_nocopy() - does this device provide no-copy option?
>+ * @dev: ism device
>+ *
>+ * In addition to using move_data(), a sender device can provide a
>+ * kernel address + length, that represents a target dmb
>+ * (like MMIO). If a sender writes into such a ghost-send-buffer
>+ * (= at this kernel address) the data will automatically
>+ * immediately appear in the target dmb, even without calling
>+ * move_data().
>+ * Note that this is NOT related to the MSG_ZEROCOPY socket flag.
>+ *
>+ * Either all 3 function pointers for support_dmb_nocopy(),
>+ * attach_dmb() and detach_dmb() are defined, or all of them must
>+ * be NULL.
>+ *
>+ * Return: non-zero, if no-copy is supported.
>+ */
>+ int (*support_dmb_nocopy)(struct ism_dev *dev);
>+ /**
>+ * attach_dmb() - attach local memory to a remote dmb
>+ * @dev: Local sending ism device
>+ * @dmb: all other parameters are passed in the form of a
>+ * dmb struct
>+ * TODO: (THIS IS CONFUSING, should be changed)
Agree.
>+ * dmb_tok: (in) Token of the remote dmb, we want to attach to
>+ * cpu_addr: (out) MMIO address
>+ * dma_addr: (out) MMIO address (if applicable, invalid otherwise)
>+ * dmb_len: (out) length of local MMIO region,
>+ * equal to length of remote DMB.
>+ * sba_idx: (out) index of remote dmb (NOT HELPFUL, should be removed)
>+ *
>+ * Provides a memory address to the sender that can be used to
>+ * directly write into the remote dmb.
>+ *
>+ * Return: Zero upon success, Error code otherwise
>+ */
>+ int (*attach_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+ /**
>+ * detach_dmb() - Detach the ghost buffer from a remote dmb
>+ * @dev: ism device
>+ * @token: dmb token of the remote dmb
>+ * Return: Zero upon success, Error code otherwise
>+ */
>+ int (*detach_dmb)(struct ism_dev *dev, u64 token);
> };
>
...
>+
>+static int ism_lo_move_data(struct ism_dev *ism, u64 dmb_tok,
>+ unsigned int idx, bool sf, unsigned int offset,
>+ void *data, unsigned int size)
>+{
>+ struct ism_lo_dmb_node *rmb_node = NULL, *tmp_node;
>+ struct ism_lo_dev *ldev;
>+ u16 s_mask;
>+ u8 client_id;
>+ u32 sba_idx;
>+
>+ ldev = container_of(ism, struct ism_lo_dev, ism);
>+
>+ if (!sf)
>+ /* since sndbuf is merged with peer DMB, there is
>+ * no need to copy data from sndbuf to peer DMB.
>+ */
>+ return 0;
>+
>+ read_lock_bh(&ldev->dmb_ht_lock);
>+ hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
>+ if (tmp_node->token == dmb_tok) {
>+ rmb_node = tmp_node;
>+ break;
>+ }
>+ }
>+ if (!rmb_node) {
>+ read_unlock_bh(&ldev->dmb_ht_lock);
>+ return -EINVAL;
>+ }
>+ // So why copy the data now?? SMC usecase? Data buffer is attached,
>+ // rw-pointer are not attached?
I understand the confusion here. I have the same confusion the first time
I saw this.
This is actually the tricky part: it assumes the CDC will signal, while
the data will not. We need to copy the CDC, so the copy here only to the
CDC.
I think we should refine the move_data() API to make this clearer.
Best regards,
Dust
next prev parent reply other threads:[~2025-01-20 3:55 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
2025-01-16 20:08 ` Andrew Lunn
2025-01-17 12:06 ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 2/7] net/ism: Remove dependencies between ISM_VPCI and SMC Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID Alexandra Winter
2025-01-20 17:18 ` Simon Horman
2025-01-22 14:46 ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
2025-01-15 22:06 ` Halil Pasic
2025-01-20 6:32 ` Dust Li
2025-01-20 9:56 ` Alexandra Winter
2025-01-20 10:07 ` Julian Ruess
2025-01-20 11:35 ` Alexandra Winter
2025-01-20 10:34 ` Niklas Schnelle
2025-01-22 15:02 ` Dust Li
2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
2025-01-20 3:55 ` Dust Li [this message]
2025-01-20 9:31 ` Alexandra Winter
2025-02-06 17:36 ` Julian Ruess
2025-02-10 10:39 ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 6/7] s390/ism: Define ismvp_dev Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 7/7] net/smc: Use only ism_ops Alexandra Winter
2025-01-16 9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
2025-01-16 11:55 ` Julian Ruess
2025-01-16 16:17 ` Alexandra Winter
2025-01-16 17:08 ` Julian Ruess
2025-01-17 2:13 ` Dust Li
2025-01-17 10:38 ` Niklas Schnelle
2025-01-17 15:02 ` Andrew Lunn
2025-01-17 16:00 ` Niklas Schnelle
2025-01-17 16:33 ` Andrew Lunn
2025-01-17 16:57 ` Niklas Schnelle
2025-01-17 20:29 ` Andrew Lunn
2025-01-20 6:21 ` Dust Li
2025-01-20 12:03 ` Alexandra Winter
2025-01-20 16:01 ` Andrew Lunn
2025-01-20 17:25 ` Alexandra Winter
2025-01-18 15:31 ` Dust Li
2025-01-28 16:04 ` Alexandra Winter
2025-02-10 5:08 ` Dust Li
2025-02-10 9:38 ` Alexandra Winter
2025-02-11 1:57 ` Dust Li
2025-02-16 15:40 ` Wen Gu
2025-02-19 11:25 ` [RFC net-next 0/7] Provide an ism layer - naming Alexandra Winter
2025-02-25 1:36 ` Dust Li
2025-02-25 8:40 ` Alexandra Winter
2025-01-17 13:00 ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-17 15:10 ` Andrew Lunn
2025-01-17 16:20 ` Alexandra Winter
2025-01-20 10:28 ` Alexandra Winter
2025-01-22 3:04 ` Dust Li
2025-01-22 12:02 ` Alexandra Winter
2025-01-22 12:05 ` Alexandra Winter
2025-01-22 14:10 ` Dust Li
2025-01-17 15:06 ` Andrew Lunn
2025-01-17 15:38 ` Alexandra Winter
2025-02-16 15:38 ` Wen Gu
2025-01-17 11:04 ` Alexandra Winter
2025-01-18 15:24 ` Dust Li
2025-01-20 11:45 ` Alexandra Winter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250120035525.GK89233@linux.alibaba.com \
--to=dust.li@linux.alibaba.com \
--cc=agordeev@linux.ibm.com \
--cc=alibuda@linux.alibaba.com \
--cc=andrew+netdev@lunn.ch \
--cc=borntraeger@linux.ibm.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gbayer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=guwen@linux.alibaba.com \
--cc=hca@linux.ibm.com \
--cc=horms@kernel.org \
--cc=jaka@linux.ibm.com \
--cc=julianr@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=oberpar@linux.ibm.com \
--cc=pabeni@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=twinkler@linux.ibm.com \
--cc=wenjia@linux.ibm.com \
--cc=wintera@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).