From: Dust Li <dust.li@linux.alibaba.com>
To: Alexandra Winter <wintera@linux.ibm.com>,
Wenjia Zhang <wenjia@linux.ibm.com>,
Jan Karcher <jaka@linux.ibm.com>,
Gerd Bayer <gbayer@linux.ibm.com>,
Halil Pasic <pasic@linux.ibm.com>,
"D. Wythe" <alibuda@linux.alibaba.com>,
Tony Lu <tonylu@linux.alibaba.com>,
Wen Gu <guwen@linux.alibaba.com>,
Peter Oberparleiter <oberpar@linux.ibm.com>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Julian Ruess <julianr@linux.ibm.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Thorsten Winkler <twinkler@linux.ibm.com>,
netdev@vger.kernel.org, linux-s390@vger.kernel.org,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Simon Horman <horms@kernel.org>
Subject: Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
Date: Mon, 20 Jan 2025 11:55:25 +0800 [thread overview]
Message-ID: <20250120035525.GK89233@linux.alibaba.com> (raw)
In-Reply-To: <20250115195527.2094320-6-wintera@linux.ibm.com>
On 2025-01-15 20:55:25, Alexandra Winter wrote:
>The first stage of ism_loopback was implemented as part of the
>SMC module [1]. Now that we have the ism layer, provide
>access to the ism_loopback device to all ism clients.
>
>Move ism_loopback.* from net/smc to net/ism.
>The following changes are required to ism_loopback.c:
>- Change ism_lo_move_data() to no longer schedule an smcd receive tasklet,
>but instead call ism_client->handle_irq().
>Note: In this RFC patch ism_loppback is not fully generic.
> The smc-d client uses attached buffers, for moves without signalling.
> and not-attached buffers for moves with signalling.
> ism_lo_move_data() must not rely on that assumption.
> ism_lo_move_data() must be able to handle more than one ism client.
>
>In addition the following changes are required to unify ism_loopback and
>ism_vp:
>
>In ism layer and ism_vpci:
>ism_loopback is not backed by a pci device, so use dev instead of pdev in
>ism_dev.
>
>In smc-d:
>in smcd_alloc_dev():
>- use kernel memory instead of device memory for smcd_dev and smcd->conn.
> An alternative would be to ask device to alloc the memory.
>- use different smcd_ops and max_dmbs for ism_vp and ism_loopback.
> A future patch can change smc-d to directly use ism_ops instead of
> smcd_ops.
>- use ism dev_name instead of pci dev name for ism_evt_wq name
>- allocate an event workqueue for ism_loopback, although it currently does
> not generate events.
>
>Link: https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ [1]
>
>Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
>---
> drivers/s390/net/ism.h | 6 +-
> drivers/s390/net/ism_drv.c | 31 ++-
> include/linux/ism.h | 59 +++++
> include/net/smc.h | 4 +-
> net/ism/Kconfig | 13 ++
> net/ism/Makefile | 1 +
> net/ism/ism_loopback.c | 366 +++++++++++++++++++++++++++++++
> net/ism/ism_loopback.h | 59 +++++
> net/ism/ism_main.c | 11 +-
> net/smc/Kconfig | 13 --
> net/smc/Makefile | 1 -
> net/smc/af_smc.c | 12 +-
> net/smc/smc_ism.c | 108 +++++++---
> net/smc/smc_loopback.c | 427 -------------------------------------
> net/smc/smc_loopback.h | 60 ------
> 15 files changed, 606 insertions(+), 565 deletions(-)
> create mode 100644 net/ism/ism_loopback.c
> create mode 100644 net/ism/ism_loopback.h
> delete mode 100644 net/smc/smc_loopback.c
> delete mode 100644 net/smc/smc_loopback.h
>
>diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
>index 61cf10334170..0deca6d0e328 100644
>--- a/drivers/s390/net/ism.h
>+++ b/drivers/s390/net/ism.h
>@@ -202,7 +202,7 @@ struct ism_sba {
> static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> unsigned long offset, unsigned long len)
> {
>- struct zpci_dev *zdev = to_zpci(ism->pdev);
>+ struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, 8);
>
> while (len > 0) {
>@@ -216,7 +216,7 @@ static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> unsigned long offset, unsigned long len)
> {
>- struct zpci_dev *zdev = to_zpci(ism->pdev);
>+ struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, len);
>
> if (len)
>@@ -226,7 +226,7 @@ static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> static inline int __ism_move(struct ism_dev *ism, u64 dmb_req, void *data,
> unsigned int size)
> {
>- struct zpci_dev *zdev = to_zpci(ism->pdev);
>+ struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, size);
>
> return __zpci_store_block(data, req, dmb_req);
>diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
>index ab70debbdd9d..c0954d6dd9f5 100644
>--- a/drivers/s390/net/ism_drv.c
>+++ b/drivers/s390/net/ism_drv.c
>@@ -88,7 +88,7 @@ static int register_sba(struct ism_dev *ism)
> dma_addr_t dma_handle;
> struct ism_sba *sba;
>
>- sba = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+ sba = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> GFP_KERNEL);
> if (!sba)
> return -ENOMEM;
>@@ -99,7 +99,7 @@ static int register_sba(struct ism_dev *ism)
> cmd.request.sba = dma_handle;
>
> if (ism_cmd(ism, &cmd)) {
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, sba, dma_handle);
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE, sba, dma_handle);
> return -EIO;
> }
>
>@@ -115,7 +115,7 @@ static int register_ieq(struct ism_dev *ism)
> dma_addr_t dma_handle;
> struct ism_eq *ieq;
>
>- ieq = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+ ieq = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> GFP_KERNEL);
> if (!ieq)
> return -ENOMEM;
>@@ -127,7 +127,7 @@ static int register_ieq(struct ism_dev *ism)
> cmd.request.len = sizeof(*ieq);
>
> if (ism_cmd(ism, &cmd)) {
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, ieq, dma_handle);
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE, ieq, dma_handle);
> return -EIO;
> }
>
>@@ -149,7 +149,7 @@ static int unregister_sba(struct ism_dev *ism)
> if (ret && ret != ISM_ERROR)
> return -EIO;
>
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> ism->sba, ism->sba_dma_addr);
>
> ism->sba = NULL;
>@@ -169,7 +169,7 @@ static int unregister_ieq(struct ism_dev *ism)
> if (ret && ret != ISM_ERROR)
> return -EIO;
>
>- dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+ dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> ism->ieq, ism->ieq_dma_addr);
>
> ism->ieq = NULL;
>@@ -216,7 +216,7 @@ static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
> static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> {
> clear_bit(dmb->sba_idx, ism->sba_bitmap);
>- dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len,
>+ dma_unmap_page(ism->dev.parent, dmb->dma_addr, dmb->dmb_len,
> DMA_FROM_DEVICE);
> folio_put(virt_to_folio(dmb->cpu_addr));
> }
>@@ -227,7 +227,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> unsigned long bit;
> int rc;
>
>- if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev))
>+ if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(ism->dev.parent))
> return -EINVAL;
>
> if (!dmb->sba_idx) {
>@@ -251,10 +251,10 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> }
>
> dmb->cpu_addr = folio_address(folio);
>- dmb->dma_addr = dma_map_page(&ism->pdev->dev,
>+ dmb->dma_addr = dma_map_page(ism->dev.parent,
> virt_to_page(dmb->cpu_addr), 0,
> dmb->dmb_len, DMA_FROM_DEVICE);
>- if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) {
>+ if (dma_mapping_error(ism->dev.parent, dmb->dma_addr)) {
> rc = -ENOMEM;
> goto out_free;
> }
>@@ -419,10 +419,7 @@ static int ism_supports_v2(void)
>
> static u16 ism_get_chid(struct ism_dev *ism)
> {
>- if (!ism || !ism->pdev)
>- return 0;
>-
>- return to_zpci(ism->pdev)->pchid;
>+ return to_zpci(to_pci_dev(ism->dev.parent))->pchid;
> }
>
> static void ism_handle_event(struct ism_dev *ism)
>@@ -499,7 +496,7 @@ static const struct ism_ops ism_vp_ops = {
>
> static int ism_dev_init(struct ism_dev *ism)
> {
>- struct pci_dev *pdev = ism->pdev;
>+ struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> int ret;
>
> ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
>@@ -565,7 +562,6 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>
> spin_lock_init(&ism->lock);
> dev_set_drvdata(&pdev->dev, ism);
>- ism->pdev = pdev;
> ism->dev.parent = &pdev->dev;
> device_initialize(&ism->dev);
> dev_set_name(&ism->dev, dev_name(&pdev->dev));
>@@ -603,14 +599,13 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> device_del(&ism->dev);
> err_dev:
> dev_set_drvdata(&pdev->dev, NULL);
>- kfree(ism);
>
> return ret;
> }
>
> static void ism_dev_exit(struct ism_dev *ism)
> {
>- struct pci_dev *pdev = ism->pdev;
>+ struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> unsigned long flags;
> int i;
>
>diff --git a/include/linux/ism.h b/include/linux/ism.h
>index bc165d077071..929a1f275419 100644
>--- a/include/linux/ism.h
>+++ b/include/linux/ism.h
>@@ -144,6 +144,9 @@ int ism_unregister_client(struct ism_client *client);
> * identified by dmb_tok and idx. If signal flag (sf) then signal
> * the remote peer that data has arrived in this dmb.
> *
>+ * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+ * Unregister an ism_dmb buffer
>+ *
> * int (*supports_v2)(void);
> *
> * u16 (*get_chid)(struct ism_dev *dev);
>@@ -218,12 +221,63 @@ struct ism_ops {
> int (*reset_vlan_required)(struct ism_dev *dev);
> int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
> u32 trigger_irq, u32 event_code, u64 info);
>+/* no copy option
>+ * --------------
>+ */
>+ /**
>+ * support_dmb_nocopy() - does this device provide no-copy option?
>+ * @dev: ism device
>+ *
>+ * In addition to using move_data(), a sender device can provide a
>+ * kernel address + length, that represents a target dmb
>+ * (like MMIO). If a sender writes into such a ghost-send-buffer
>+ * (= at this kernel address) the data will automatically
>+ * immediately appear in the target dmb, even without calling
>+ * move_data().
>+ * Note that this is NOT related to the MSG_ZEROCOPY socket flag.
>+ *
>+ * Either all 3 function pointers for support_dmb_nocopy(),
>+ * attach_dmb() and detach_dmb() are defined, or all of them must
>+ * be NULL.
>+ *
>+ * Return: non-zero, if no-copy is supported.
>+ */
>+ int (*support_dmb_nocopy)(struct ism_dev *dev);
>+ /**
>+ * attach_dmb() - attach local memory to a remote dmb
>+ * @dev: Local sending ism device
>+ * @dmb: all other parameters are passed in the form of a
>+ * dmb struct
>+ * TODO: (THIS IS CONFUSING, should be changed)
Agree.
>+ * dmb_tok: (in) Token of the remote dmb, we want to attach to
>+ * cpu_addr: (out) MMIO address
>+ * dma_addr: (out) MMIO address (if applicable, invalid otherwise)
>+ * dmb_len: (out) length of local MMIO region,
>+ * equal to length of remote DMB.
>+ * sba_idx: (out) index of remote dmb (NOT HELPFUL, should be removed)
>+ *
>+ * Provides a memory address to the sender that can be used to
>+ * directly write into the remote dmb.
>+ *
>+ * Return: Zero upon success, Error code otherwise
>+ */
>+ int (*attach_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+ /**
>+ * detach_dmb() - Detach the ghost buffer from a remote dmb
>+ * @dev: ism device
>+ * @token: dmb token of the remote dmb
>+ * Return: Zero upon success, Error code otherwise
>+ */
>+ int (*detach_dmb)(struct ism_dev *dev, u64 token);
> };
>
...
>+
>+static int ism_lo_move_data(struct ism_dev *ism, u64 dmb_tok,
>+ unsigned int idx, bool sf, unsigned int offset,
>+ void *data, unsigned int size)
>+{
>+ struct ism_lo_dmb_node *rmb_node = NULL, *tmp_node;
>+ struct ism_lo_dev *ldev;
>+ u16 s_mask;
>+ u8 client_id;
>+ u32 sba_idx;
>+
>+ ldev = container_of(ism, struct ism_lo_dev, ism);
>+
>+ if (!sf)
>+ /* since sndbuf is merged with peer DMB, there is
>+ * no need to copy data from sndbuf to peer DMB.
>+ */
>+ return 0;
>+
>+ read_lock_bh(&ldev->dmb_ht_lock);
>+ hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
>+ if (tmp_node->token == dmb_tok) {
>+ rmb_node = tmp_node;
>+ break;
>+ }
>+ }
>+ if (!rmb_node) {
>+ read_unlock_bh(&ldev->dmb_ht_lock);
>+ return -EINVAL;
>+ }
>+ // So why copy the data now?? SMC usecase? Data buffer is attached,
>+ // rw-pointer are not attached?
I understand the confusion here. I have the same confusion the first time
I saw this.
This is actually the tricky part: it assumes the CDC will signal, while
the data will not. We need to copy the CDC, so the copy here only to the
CDC.
I think we should refine the move_data() API to make this clearer.
Best regards,
Dust
next prev parent reply other threads:[~2025-01-20 3:55 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
2025-01-16 20:08 ` Andrew Lunn
2025-01-17 12:06 ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 2/7] net/ism: Remove dependencies between ISM_VPCI and SMC Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID Alexandra Winter
2025-01-20 17:18 ` Simon Horman
2025-01-22 14:46 ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
2025-01-15 22:06 ` Halil Pasic
2025-01-20 6:32 ` Dust Li
2025-01-20 9:56 ` Alexandra Winter
2025-01-20 10:07 ` Julian Ruess
2025-01-20 11:35 ` Alexandra Winter
2025-01-20 10:34 ` Niklas Schnelle
2025-01-22 15:02 ` Dust Li
2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
2025-01-20 3:55 ` Dust Li [this message]
2025-01-20 9:31 ` Alexandra Winter
2025-02-06 17:36 ` Julian Ruess
2025-02-10 10:39 ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 6/7] s390/ism: Define ismvp_dev Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 7/7] net/smc: Use only ism_ops Alexandra Winter
2025-01-16 9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
2025-01-16 11:55 ` Julian Ruess
2025-01-16 16:17 ` Alexandra Winter
2025-01-16 17:08 ` Julian Ruess
2025-01-17 2:13 ` Dust Li
2025-01-17 10:38 ` Niklas Schnelle
2025-01-17 15:02 ` Andrew Lunn
2025-01-17 16:00 ` Niklas Schnelle
2025-01-17 16:33 ` Andrew Lunn
2025-01-17 16:57 ` Niklas Schnelle
2025-01-17 20:29 ` Andrew Lunn
2025-01-20 6:21 ` Dust Li
2025-01-20 12:03 ` Alexandra Winter
2025-01-20 16:01 ` Andrew Lunn
2025-01-20 17:25 ` Alexandra Winter
2025-01-18 15:31 ` Dust Li
2025-01-28 16:04 ` Alexandra Winter
2025-02-10 5:08 ` Dust Li
2025-02-10 9:38 ` Alexandra Winter
2025-02-11 1:57 ` Dust Li
2025-02-16 15:40 ` Wen Gu
2025-02-19 11:25 ` [RFC net-next 0/7] Provide an ism layer - naming Alexandra Winter
2025-02-25 1:36 ` Dust Li
2025-02-25 8:40 ` Alexandra Winter
2025-01-17 13:00 ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-17 15:10 ` Andrew Lunn
2025-01-17 16:20 ` Alexandra Winter
2025-01-20 10:28 ` Alexandra Winter
2025-01-22 3:04 ` Dust Li
2025-01-22 12:02 ` Alexandra Winter
2025-01-22 12:05 ` Alexandra Winter
2025-01-22 14:10 ` Dust Li
2025-01-17 15:06 ` Andrew Lunn
2025-01-17 15:38 ` Alexandra Winter
2025-02-16 15:38 ` Wen Gu
2025-01-17 11:04 ` Alexandra Winter
2025-01-18 15:24 ` Dust Li
2025-01-20 11:45 ` Alexandra Winter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250120035525.GK89233@linux.alibaba.com \
--to=dust.li@linux.alibaba.com \
--cc=agordeev@linux.ibm.com \
--cc=alibuda@linux.alibaba.com \
--cc=andrew+netdev@lunn.ch \
--cc=borntraeger@linux.ibm.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gbayer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=guwen@linux.alibaba.com \
--cc=hca@linux.ibm.com \
--cc=horms@kernel.org \
--cc=jaka@linux.ibm.com \
--cc=julianr@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=oberpar@linux.ibm.com \
--cc=pabeni@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=twinkler@linux.ibm.com \
--cc=wenjia@linux.ibm.com \
--cc=wintera@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.