netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/7] Provide an ism layer
@ 2025-01-15 19:55 Alexandra Winter
  2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
                   ` (7 more replies)
  0 siblings, 8 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

This RFC is about providing a generic shim layer between all kinds of
ism devices and all kinds of ism users.

Benefits:
- Cleaner separation of ISM and SMC-D functionality
- simpler and less module dependencies
- Clear interface definition.
- Extendable for future devices and clients.

Request for comments:
---------------------
Any comments are welcome, but I am aware that this series needs more work.
It may not be worth your time to do an in-depth review of the details, I am
looking for feedback on the general idea.
I am mostly interested in your thoughts and recommendations about the general
concept, the location of net/ism, the structure of include/linux/ism.h, the
KConfig and makefiles.

Status of this RFC:
-------------------
This is a very early RFC to ask you for comments on this general idea.
The RFC does not fullfill all criteria required for a patchset.
The whole set compiles and runs, but I did not try all combinations of
module and built-in yet. I did not check for checkpatch or any other checkers.
Also I have only done very rudimentary quick tests of SMC-D. More testing is
required.

Background / Status quo:
------------------------
Currently s390 hardware provides virtual PCI ISM devices (ism_vpci). Their
driver is in drivers/s390/net/ism_drv.c. The main user is SMC-D (net/smc).
ism_vpci driver offers a client interface so other users/protocols
can also use them, but it is still heavily intermingled with the smc code.
Namely, the ISM vPCI module cannot be used without the SMC module, which
feels artificial.

The ISM concept is being extended:
[1] proposed an ISM loopback interface (ism_lo), that can be used on non-s390
architectures (e.g. between containers or to test SMC-D). A minimal implementation
went upstream with [2]: ism_lo currently is a part of the smc protocol and rather
hidden.

[3] proposed a virtio definition of ISM (ism_virtio) that can be used between
kvm guests.

We will shortly send an RFC for an ISM client that uses ISM as transport for TTY.

Concept:
--------
Create a shim layer in net/ism that contains common definitions and code for
all ism devices and all ism clients.
Any device or client module only needs to depend on this ism layer module and
any device or client code only needs to include the definitions in
include/linux/ism.h

Ideas for next steps:
---------------------
- sysfs representation? e.g. as /sys/class/ism ?
- provide a full-fledged ism loopback interface
    (runtime enable/disable, sysfs device, ..)
- additional clients (tty over ism)
- additional devices (virtio-ism, ...)

Link: [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
Link: [2] https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/
Link: [3] https://groups.oasis-open.org/communities/community-home/digestviewer/viewthread?GroupId=3973&MessageKey=c060ecf9-ea1a-49a2-9827-c92f0e6447b2&CommunityKey=2f26be99-3aa1-48f6-93a5-018dce262226&hlmlt=VT

Alexandra Winter (7):
  net/ism: Create net/ism
  net/ism: Remove dependencies between ISM_VPCI and SMC
  net/ism: Use uuid_t for ISM GID
  net/ism: Add kernel-doc comments for ism functions
  net/ism: Move ism_loopback to net/ism
  s390/ism: Define ismvp_dev
  net/smc: Use only ism_ops

 MAINTAINERS                |   7 +
 drivers/s390/net/Kconfig   |  10 +-
 drivers/s390/net/Makefile  |   4 +-
 drivers/s390/net/ism.h     |  27 ++-
 drivers/s390/net/ism_drv.c | 467 ++++++++++++-------------------------
 include/linux/ism.h        | 299 +++++++++++++++++++++---
 include/net/smc.h          |  52 +----
 net/Kconfig                |   1 +
 net/Makefile               |   1 +
 net/ism/Kconfig            |  27 +++
 net/ism/Makefile           |   8 +
 net/ism/ism_loopback.c     | 366 +++++++++++++++++++++++++++++
 net/ism/ism_loopback.h     |  59 +++++
 net/ism/ism_main.c         | 171 ++++++++++++++
 net/smc/Kconfig            |  13 --
 net/smc/Makefile           |   1 -
 net/smc/af_smc.c           |  12 +-
 net/smc/smc_clc.c          |   6 +-
 net/smc/smc_core.c         |   6 +-
 net/smc/smc_diag.c         |   2 +-
 net/smc/smc_ism.c          | 112 +++++----
 net/smc/smc_ism.h          |  29 ++-
 net/smc/smc_loopback.c     | 427 ---------------------------------
 net/smc/smc_loopback.h     |  60 -----
 net/smc/smc_pnet.c         |   8 +-
 25 files changed, 1183 insertions(+), 992 deletions(-)
 create mode 100644 net/ism/Kconfig
 create mode 100644 net/ism/Makefile
 create mode 100644 net/ism/ism_loopback.c
 create mode 100644 net/ism/ism_loopback.h
 create mode 100644 net/ism/ism_main.c
 delete mode 100644 net/smc/smc_loopback.c
 delete mode 100644 net/smc/smc_loopback.h

-- 
2.45.2


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC net-next 1/7] net/ism: Create net/ism
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
@ 2025-01-15 19:55 ` Alexandra Winter
  2025-01-16 20:08   ` Andrew Lunn
  2025-01-15 19:55 ` [RFC net-next 2/7] net/ism: Remove dependencies between ISM_VPCI and SMC Alexandra Winter
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

Create an 'ISM' shim layer that will provide generic functionality and
declarations for ism device drivers and ism clients.

Move the respective pieces from drivers/s390/net/ism_drv.* to net/ism/

When we need to distinguish between generic ism interfaces and
specifically the s390 virtual pci ism device, it will be called 'ISM_vPCI'.

No optimizations are done in this patch, it only moves pieces around.
Following patch will further detangle ism_vpci and smc-d.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 MAINTAINERS                |   7 ++
 drivers/s390/net/Kconfig   |   9 +--
 drivers/s390/net/Makefile  |   4 +-
 drivers/s390/net/ism_drv.c | 129 ++---------------------------
 include/linux/ism.h        |   8 ++
 include/net/smc.h          |   3 -
 net/Kconfig                |   1 +
 net/Makefile               |   1 +
 net/ism/Kconfig            |  14 ++++
 net/ism/Makefile           |   7 ++
 net/ism/ism_main.c         | 162 +++++++++++++++++++++++++++++++++++++
 11 files changed, 213 insertions(+), 132 deletions(-)
 create mode 100644 net/ism/Kconfig
 create mode 100644 net/ism/Makefile
 create mode 100644 net/ism/ism_main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4dcb849e6748..780db61f3f16 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12239,6 +12239,13 @@ F:	Documentation/devicetree/bindings/hwmon/renesas,isl28022.yaml
 F:	Documentation/hwmon/isl28022.rst
 F:	drivers/hwmon/isl28022.c
 
+ISM (INTERNAL SHARED MEMORY)
+M:	Alexandra Winter <wintera@linux.ibm.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	include/linux/ism.h
+F:	net/ism/
+
 ISOFS FILESYSTEM
 M:	Jan Kara <jack@suse.cz>
 L:	linux-fsdevel@vger.kernel.org
diff --git a/drivers/s390/net/Kconfig b/drivers/s390/net/Kconfig
index c61e6427384c..2e900d3087d4 100644
--- a/drivers/s390/net/Kconfig
+++ b/drivers/s390/net/Kconfig
@@ -100,15 +100,14 @@ config CCWGROUP
 	tristate
 	default (LCS || CTCM || QETH || SMC)
 
-config ISM
+config ISM_VPCI
 	tristate "Support for ISM vPCI Adapter"
-	depends on PCI
+	depends on PCI && ISM
 	imply SMC
-	default n
+	default y
 	help
 	  Select this option if you want to use the Internal Shared Memory
 	  vPCI Adapter. The adapter can be used with the SMC network protocol.
 
-	  To compile as a module choose M. The module name is ism.
-	  If unsure, choose N.
+	  To compile as a module choose M. The module name is ism_vpci.
 endmenu
diff --git a/drivers/s390/net/Makefile b/drivers/s390/net/Makefile
index bc55ec316adb..87461019184e 100644
--- a/drivers/s390/net/Makefile
+++ b/drivers/s390/net/Makefile
@@ -16,5 +16,5 @@ obj-$(CONFIG_QETH_L2) += qeth_l2.o
 qeth_l3-y += qeth_l3_main.o qeth_l3_sys.o
 obj-$(CONFIG_QETH_L3) += qeth_l3.o
 
-ism-y := ism_drv.o
-obj-$(CONFIG_ISM) += ism.o
+ism_vpci-y += ism_drv.o
+obj-$(CONFIG_ISM_VPCI) += ism_vpci.o
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index e36e3ea165d3..2eeccf5ef48d 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -4,7 +4,7 @@
  *
  * Copyright IBM Corp. 2018
  */
-#define KMSG_COMPONENT "ism"
+#define KMSG_COMPONENT "ism-vpci"
 #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
 
 #include <linux/module.h>
@@ -31,100 +31,7 @@ MODULE_DEVICE_TABLE(pci, ism_device_table);
 
 static debug_info_t *ism_debug_info;
 
-#define NO_CLIENT		0xff		/* must be >= MAX_CLIENTS */
-static struct ism_client *clients[MAX_CLIENTS];	/* use an array rather than */
-						/* a list for fast mapping  */
-static u8 max_client;
-static DEFINE_MUTEX(clients_lock);
 static bool ism_v2_capable;
-struct ism_dev_list {
-	struct list_head list;
-	struct mutex mutex; /* protects ism device list */
-};
-
-static struct ism_dev_list ism_dev_list = {
-	.list = LIST_HEAD_INIT(ism_dev_list.list),
-	.mutex = __MUTEX_INITIALIZER(ism_dev_list.mutex),
-};
-
-static void ism_setup_forwarding(struct ism_client *client, struct ism_dev *ism)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&ism->lock, flags);
-	ism->subs[client->id] = client;
-	spin_unlock_irqrestore(&ism->lock, flags);
-}
-
-int ism_register_client(struct ism_client *client)
-{
-	struct ism_dev *ism;
-	int i, rc = -ENOSPC;
-
-	mutex_lock(&ism_dev_list.mutex);
-	mutex_lock(&clients_lock);
-	for (i = 0; i < MAX_CLIENTS; ++i) {
-		if (!clients[i]) {
-			clients[i] = client;
-			client->id = i;
-			if (i == max_client)
-				max_client++;
-			rc = 0;
-			break;
-		}
-	}
-	mutex_unlock(&clients_lock);
-
-	if (i < MAX_CLIENTS) {
-		/* initialize with all devices that we got so far */
-		list_for_each_entry(ism, &ism_dev_list.list, list) {
-			ism->priv[i] = NULL;
-			client->add(ism);
-			ism_setup_forwarding(client, ism);
-		}
-	}
-	mutex_unlock(&ism_dev_list.mutex);
-
-	return rc;
-}
-EXPORT_SYMBOL_GPL(ism_register_client);
-
-int ism_unregister_client(struct ism_client *client)
-{
-	struct ism_dev *ism;
-	unsigned long flags;
-	int rc = 0;
-
-	mutex_lock(&ism_dev_list.mutex);
-	list_for_each_entry(ism, &ism_dev_list.list, list) {
-		spin_lock_irqsave(&ism->lock, flags);
-		/* Stop forwarding IRQs and events */
-		ism->subs[client->id] = NULL;
-		for (int i = 0; i < ISM_NR_DMBS; ++i) {
-			if (ism->sba_client_arr[i] == client->id) {
-				WARN(1, "%s: attempt to unregister '%s' with registered dmb(s)\n",
-				     __func__, client->name);
-				rc = -EBUSY;
-				goto err_reg_dmb;
-			}
-		}
-		spin_unlock_irqrestore(&ism->lock, flags);
-	}
-	mutex_unlock(&ism_dev_list.mutex);
-
-	mutex_lock(&clients_lock);
-	clients[client->id] = NULL;
-	if (client->id + 1 == max_client)
-		max_client--;
-	mutex_unlock(&clients_lock);
-	return rc;
-
-err_reg_dmb:
-	spin_unlock_irqrestore(&ism->lock, flags);
-	mutex_unlock(&ism_dev_list.mutex);
-	return rc;
-}
-EXPORT_SYMBOL_GPL(ism_unregister_client);
 
 static int ism_cmd(struct ism_dev *ism, void *cmd)
 {
@@ -475,7 +382,7 @@ static void ism_handle_event(struct ism_dev *ism)
 
 		entry = &ism->ieq->entry[ism->ieq_idx];
 		debug_event(ism_debug_info, 2, entry, sizeof(*entry));
-		for (i = 0; i < max_client; ++i) {
+		for (i = 0; i < MAX_CLIENTS; ++i) {
 			clt = ism->subs[i];
 			if (clt)
 				clt->handle_event(ism, entry);
@@ -524,7 +431,7 @@ static irqreturn_t ism_handle_irq(int irq, void *data)
 static int ism_dev_init(struct ism_dev *ism)
 {
 	struct pci_dev *pdev = ism->pdev;
-	int i, ret;
+	int ret;
 
 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
 	if (ret <= 0)
@@ -558,19 +465,7 @@ static int ism_dev_init(struct ism_dev *ism)
 	else
 		ism_v2_capable = false;
 
-	mutex_lock(&ism_dev_list.mutex);
-	mutex_lock(&clients_lock);
-	for (i = 0; i < max_client; ++i) {
-		if (clients[i]) {
-			clients[i]->add(ism);
-			ism_setup_forwarding(clients[i], ism);
-		}
-	}
-	mutex_unlock(&clients_lock);
-
-	list_add(&ism->list, &ism_dev_list.list);
-	mutex_unlock(&ism_dev_list.mutex);
-
+	ism_dev_register(ism);
 	query_info(ism);
 	return 0;
 
@@ -649,17 +544,11 @@ static void ism_dev_exit(struct ism_dev *ism)
 	int i;
 
 	spin_lock_irqsave(&ism->lock, flags);
-	for (i = 0; i < max_client; ++i)
+	for (i = 0; i < MAX_CLIENTS; ++i)
 		ism->subs[i] = NULL;
 	spin_unlock_irqrestore(&ism->lock, flags);
 
-	mutex_lock(&ism_dev_list.mutex);
-	mutex_lock(&clients_lock);
-	for (i = 0; i < max_client; ++i) {
-		if (clients[i])
-			clients[i]->remove(ism);
-	}
-	mutex_unlock(&clients_lock);
+	ism_dev_unregister(ism);
 
 	if (ism_v2_capable)
 		ism_del_vlan_id(ism, ISM_RESERVED_VLANID);
@@ -668,8 +557,6 @@ static void ism_dev_exit(struct ism_dev *ism)
 	free_irq(pci_irq_vector(pdev, 0), ism);
 	kfree(ism->sba_client_arr);
 	pci_free_irq_vectors(pdev);
-	list_del_init(&ism->list);
-	mutex_unlock(&ism_dev_list.mutex);
 }
 
 static void ism_remove(struct pci_dev *pdev)
@@ -700,8 +587,6 @@ static int __init ism_init(void)
 	if (!ism_debug_info)
 		return -ENODEV;
 
-	memset(clients, 0, sizeof(clients));
-	max_client = 0;
 	debug_register_view(ism_debug_info, &debug_hex_ascii_view);
 	ret = pci_register_driver(&ism_driver);
 	if (ret)
@@ -721,7 +606,7 @@ module_exit(ism_exit);
 
 /*************************** SMC-D Implementation *****************************/
 
-#if IS_ENABLED(CONFIG_SMC)
+#if IS_ENABLED(CONFIG_SMC) // needed to avoid unused functions
 static int ism_query_rgid(struct ism_dev *ism, u64 rgid, u32 vid_valid,
 			  u32 vid)
 {
diff --git a/include/linux/ism.h b/include/linux/ism.h
index 5428edd90982..1462296e8ba7 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -9,6 +9,7 @@
 #ifndef _ISM_H
 #define _ISM_H
 
+#include <linux/device.h>
 #include <linux/workqueue.h>
 
 struct ism_dmb {
@@ -24,6 +25,7 @@ struct ism_dmb {
 
 /* Unless we gain unexpected popularity, this limit should hold for a while */
 #define MAX_CLIENTS		8
+#define NO_CLIENT		0xff		/* must be >= MAX_CLIENTS */
 #define ISM_NR_DMBS		1920
 
 struct ism_dev {
@@ -76,6 +78,9 @@ static inline void *ism_get_priv(struct ism_dev *dev,
 	return dev->priv[client->id];
 }
 
+int ism_dev_register(struct ism_dev *ism);
+void ism_dev_unregister(struct ism_dev *ism);
+
 static inline void ism_set_priv(struct ism_dev *dev, struct ism_client *client,
 				void *priv) {
 	dev->priv[client->id] = priv;
@@ -87,6 +92,9 @@ int  ism_unregister_dmb(struct ism_dev *dev, struct ism_dmb *dmb);
 int  ism_move(struct ism_dev *dev, u64 dmb_tok, unsigned int idx, bool sf,
 	      unsigned int offset, void *data, unsigned int size);
 
+#define ISM_RESERVED_VLANID	0x1FFF
+#define ISM_ERROR	0xFFFF
+
 const struct smcd_ops *ism_get_smcd_ops(void);
 
 #endif	/* _ISM_H */
diff --git a/include/net/smc.h b/include/net/smc.h
index db84e4e35080..ab732b286f91 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -42,9 +42,6 @@ struct smcd_dmb {
 #define ISM_EVENT_GID	1
 #define ISM_EVENT_SWR	2
 
-#define ISM_RESERVED_VLANID	0x1FFF
-
-#define ISM_ERROR	0xFFFF
 
 struct smcd_dev;
 
diff --git a/net/Kconfig b/net/Kconfig
index c3fca69a7c83..2dbe9655f7de 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -83,6 +83,7 @@ source "net/tls/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
+source "net/ism/Kconfig"
 source "net/xdp/Kconfig"
 
 config NET_HANDSHAKE
diff --git a/net/Makefile b/net/Makefile
index 60ed5190eda8..6f06cf00bfbb 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_TIPC)		+= tipc/
 obj-$(CONFIG_NETLABEL)		+= netlabel/
 obj-$(CONFIG_IUCV)		+= iucv/
 obj-$(CONFIG_SMC)		+= smc/
+obj-$(CONFIG_ISM)		+= ism/
 obj-$(CONFIG_RFKILL)		+= rfkill/
 obj-$(CONFIG_NET_9P)		+= 9p/
 obj-$(CONFIG_CAIF)		+= caif/
diff --git a/net/ism/Kconfig b/net/ism/Kconfig
new file mode 100644
index 000000000000..4329489cc1e9
--- /dev/null
+++ b/net/ism/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0
+config ISM
+	tristate "ISM support"
+	default n
+	help
+	  Internal Shared Memory (ISM)
+	  A communication method that uses common physical memory for
+	  synchronous direct access into a remote buffer.
+
+	  Select this option to provide the abstraction layer between
+	  ISM devices and ISM users like the SMC protocol.
+
+	  To compile as a module choose M. The module name is ism.
+	  If unsure, choose N.
diff --git a/net/ism/Makefile b/net/ism/Makefile
new file mode 100644
index 000000000000..b752baf72003
--- /dev/null
+++ b/net/ism/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# ISM class module
+#
+
+ism-y += ism_main.o
+obj-$(CONFIG_ISM) += ism.o
diff --git a/net/ism/ism_main.c b/net/ism/ism_main.c
new file mode 100644
index 000000000000..268408dbd691
--- /dev/null
+++ b/net/ism/ism_main.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Internal Shared Memory
+ *
+ *  Implementation of the ISM class module
+ *
+ *  Copyright IBM Corp. 2024
+ */
+#define KMSG_COMPONENT "ism"
+#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/err.h>
+#include <linux/ism.h>
+
+MODULE_DESCRIPTION("Internal Shared Memory class");
+MODULE_LICENSE("GPL");
+
+static struct ism_client *clients[MAX_CLIENTS];	/* use an array rather than */
+						/* a list for fast mapping  */
+static u8 max_client;
+static DEFINE_MUTEX(clients_lock);
+struct ism_dev_list {
+	struct list_head list;
+	struct mutex mutex; /* protects ism device list */
+};
+
+static struct ism_dev_list ism_dev_list = {
+	.list = LIST_HEAD_INIT(ism_dev_list.list),
+	.mutex = __MUTEX_INITIALIZER(ism_dev_list.mutex),
+};
+
+static void ism_setup_forwarding(struct ism_client *client, struct ism_dev *ism)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ism->lock, flags);
+	ism->subs[client->id] = client;
+	spin_unlock_irqrestore(&ism->lock, flags);
+}
+
+int ism_register_client(struct ism_client *client)
+{
+	struct ism_dev *ism;
+	int i, rc = -ENOSPC;
+
+	mutex_lock(&ism_dev_list.mutex);
+	mutex_lock(&clients_lock);
+	for (i = 0; i < MAX_CLIENTS; ++i) {
+		if (!clients[i]) {
+			clients[i] = client;
+			client->id = i;
+			if (i == max_client)
+				max_client++;
+			rc = 0;
+			break;
+		}
+	}
+	mutex_unlock(&clients_lock);
+
+	if (i < MAX_CLIENTS) {
+		/* initialize with all devices that we got so far */
+		list_for_each_entry(ism, &ism_dev_list.list, list) {
+			ism->priv[i] = NULL;
+			client->add(ism);
+			ism_setup_forwarding(client, ism);
+		}
+	}
+	mutex_unlock(&ism_dev_list.mutex);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ism_register_client);
+
+int ism_unregister_client(struct ism_client *client)
+{
+	struct ism_dev *ism;
+	unsigned long flags;
+	int rc = 0;
+
+	mutex_lock(&ism_dev_list.mutex);
+	list_for_each_entry(ism, &ism_dev_list.list, list) {
+		spin_lock_irqsave(&ism->lock, flags);
+		/* Stop forwarding IRQs and events */
+		ism->subs[client->id] = NULL;
+		for (int i = 0; i < ISM_NR_DMBS; ++i) {
+			if (ism->sba_client_arr[i] == client->id) {
+				WARN(1, "%s: attempt to unregister '%s' with registered dmb(s)\n",
+				     __func__, client->name);
+				rc = -EBUSY;
+				goto err_reg_dmb;
+			}
+		}
+		spin_unlock_irqrestore(&ism->lock, flags);
+	}
+	mutex_unlock(&ism_dev_list.mutex);
+
+	mutex_lock(&clients_lock);
+	clients[client->id] = NULL;
+	if (client->id + 1 == max_client)
+		max_client--;
+	mutex_unlock(&clients_lock);
+	return rc;
+
+err_reg_dmb:
+	spin_unlock_irqrestore(&ism->lock, flags);
+	mutex_unlock(&ism_dev_list.mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ism_unregister_client);
+
+int ism_dev_register(struct ism_dev *ism)
+{
+	int i;
+
+	mutex_lock(&ism_dev_list.mutex);
+	mutex_lock(&clients_lock);
+	for (i = 0; i < max_client; ++i) {
+		if (clients[i]) {
+			clients[i]->add(ism);
+			ism_setup_forwarding(clients[i], ism);
+		}
+	}
+	mutex_unlock(&clients_lock);
+	list_add(&ism->list, &ism_dev_list.list);
+	mutex_unlock(&ism_dev_list.mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ism_dev_register);
+
+void ism_dev_unregister(struct ism_dev *ism)
+{
+	int i;
+
+	mutex_lock(&ism_dev_list.mutex);
+	mutex_lock(&clients_lock);
+	for (i = 0; i < max_client; ++i) {
+		if (clients[i])
+			clients[i]->remove(ism);
+	}
+	mutex_unlock(&clients_lock);
+	list_del_init(&ism->list);
+	mutex_unlock(&ism_dev_list.mutex);
+}
+EXPORT_SYMBOL_GPL(ism_dev_unregister);
+
+static int __init ism_init(void)
+{
+	memset(clients, 0, sizeof(clients));
+	max_client = 0;
+
+	return 0;
+}
+
+static void __exit ism_exit(void)
+{
+}
+
+module_init(ism_init);
+module_exit(ism_exit);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC net-next 2/7] net/ism: Remove dependencies between ISM_VPCI and SMC
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
  2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
@ 2025-01-15 19:55 ` Alexandra Winter
  2025-01-15 19:55 ` [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID Alexandra Winter
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

The modules ISM_VPCI and SMC should not depend on each other,
instead they should both depend on the ISM layer module.

use only ism_dmb:
Now that SMC depends on ISM, we can safely remove the
duplicate declaration of smcd_dmb and use only ism_dmb.

Move smcd_ops away from ism_drv:
move smcd_ops from drivers/s390/net/ism_drv.c to net/smc/smc_ism.c
Less exported functions, no more dependencies between ISM_VPCI and SMC.
Once ism_loopback is also moved to ism layer, a follow on patch can use
ism_ops directly and remove smcd_ops.

Now the ISM_VPCI module no longer needs to imply SMC.

Note:
- This patch temporarily moves smcd_gid to ism.h,
a follow on patch (uuid_t gid) will restore this.
- Added a comment that vlan handling in ism_drv.c and smc
is incomplete. Should be fixed by a follow-on patch.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 drivers/s390/net/Kconfig   |   1 -
 drivers/s390/net/ism.h     |   1 -
 drivers/s390/net/ism_drv.c | 236 +++++++++++++------------------------
 include/linux/ism.h        | 146 +++++++++++++++++------
 include/net/smc.h          |  31 ++---
 net/smc/smc_ism.c          | 123 ++++++++++++++++++-
 net/smc/smc_loopback.c     |   6 +-
 7 files changed, 319 insertions(+), 225 deletions(-)

diff --git a/drivers/s390/net/Kconfig b/drivers/s390/net/Kconfig
index 2e900d3087d4..9bb3cc186510 100644
--- a/drivers/s390/net/Kconfig
+++ b/drivers/s390/net/Kconfig
@@ -103,7 +103,6 @@ config CCWGROUP
 config ISM_VPCI
 	tristate "Support for ISM vPCI Adapter"
 	depends on PCI && ISM
-	imply SMC
 	default y
 	help
 	  Select this option if you want to use the Internal Shared Memory
diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
index 047fa6101555..8b56e1d82e6b 100644
--- a/drivers/s390/net/ism.h
+++ b/drivers/s390/net/ism.h
@@ -6,7 +6,6 @@
 #include <linux/types.h>
 #include <linux/pci.h>
 #include <linux/ism.h>
-#include <net/smc.h>
 #include <asm/pci_insn.h>
 
 #define UTIL_STR_LEN	16
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index 2eeccf5ef48d..112e0d67cdd6 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -191,11 +191,28 @@ static int ism_read_local_gid(struct ism_dev *ism)
 	if (ret)
 		goto out;
 
-	ism->local_gid = cmd.response.gid;
+	ism->gid.gid = cmd.response.gid;
+	ism->gid.gid_ext = 0;
 out:
 	return ret;
 }
 
+static int ism_query_rgid(struct ism_dev *ism, struct smcd_gid *rgid,
+			  u32 vid_valid, u32 vid)
+{
+	union ism_query_rgid cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.request.hdr.cmd = ISM_QUERY_RGID;
+	cmd.request.hdr.len = sizeof(cmd.request);
+
+	cmd.request.rgid = rgid->gid;
+	cmd.request.vlan_valid = vid_valid;
+	cmd.request.vlan_id = vid;
+
+	return ism_cmd(ism, &cmd);
+}
+
 static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 {
 	clear_bit(dmb->sba_idx, ism->sba_bitmap);
@@ -251,8 +268,8 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 	return rc;
 }
 
-int ism_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb,
-		     struct ism_client *client)
+static int ism_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb,
+			    struct ism_client *client)
 {
 	union ism_reg_dmb cmd;
 	unsigned long flags;
@@ -285,9 +302,8 @@ int ism_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb,
 out:
 	return ret;
 }
-EXPORT_SYMBOL_GPL(ism_register_dmb);
 
-int ism_unregister_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
+static int ism_unregister_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 {
 	union ism_unreg_dmb cmd;
 	unsigned long flags;
@@ -311,7 +327,6 @@ int ism_unregister_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 out:
 	return ret;
 }
-EXPORT_SYMBOL_GPL(ism_unregister_dmb);
 
 static int ism_add_vlan_id(struct ism_dev *ism, u64 vlan_id)
 {
@@ -339,14 +354,42 @@ static int ism_del_vlan_id(struct ism_dev *ism, u64 vlan_id)
 	return ism_cmd(ism, &cmd);
 }
 
+static int ism_set_vlan_required(struct ism_dev *ism)
+{
+	return ism_cmd_simple(ism, ISM_SET_VLAN);
+}
+
+static int ism_reset_vlan_required(struct ism_dev *ism)
+{
+	return ism_cmd_simple(ism, ISM_RESET_VLAN);
+}
+
+static int ism_signal_ieq(struct ism_dev *ism, struct smcd_gid *rgid,
+			  u32 trigger_irq, u32 event_code, u64 info)
+{
+	union ism_sig_ieq cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.request.hdr.cmd = ISM_SIGNAL_IEQ;
+	cmd.request.hdr.len = sizeof(cmd.request);
+
+	cmd.request.rgid = rgid->gid;
+	cmd.request.trigger_irq = trigger_irq;
+	cmd.request.event_code = event_code;
+	cmd.request.info = info;
+
+	return ism_cmd(ism, &cmd);
+}
+
 static unsigned int max_bytes(unsigned int start, unsigned int len,
 			      unsigned int boundary)
 {
 	return min(boundary - (start & (boundary - 1)), len);
 }
 
-int ism_move(struct ism_dev *ism, u64 dmb_tok, unsigned int idx, bool sf,
-	     unsigned int offset, void *data, unsigned int size)
+static int ism_move(struct ism_dev *ism, u64 dmb_tok, unsigned int idx,
+		    bool sf, unsigned int offset, void *data,
+		    unsigned int size)
 {
 	unsigned int bytes;
 	u64 dmb_req;
@@ -368,7 +411,19 @@ int ism_move(struct ism_dev *ism, u64 dmb_tok, unsigned int idx, bool sf,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(ism_move);
+
+static int ism_supports_v2(void)
+{
+	return ism_v2_capable;
+}
+
+static u16 ism_get_chid(struct ism_dev *ism)
+{
+	if (!ism || !ism->pdev)
+		return 0;
+
+	return to_zpci(ism->pdev)->pchid;
+}
 
 static void ism_handle_event(struct ism_dev *ism)
 {
@@ -428,6 +483,20 @@ static irqreturn_t ism_handle_irq(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static const struct ism_ops ism_vp_ops = {
+	.query_remote_gid = ism_query_rgid,
+	.register_dmb = ism_register_dmb,
+	.unregister_dmb = ism_unregister_dmb,
+	.add_vlan_id = ism_add_vlan_id,
+	.del_vlan_id = ism_del_vlan_id,
+	.set_vlan_required = ism_set_vlan_required,
+	.reset_vlan_required = ism_reset_vlan_required,
+	.signal_event = ism_signal_ieq,
+	.move_data = ism_move,
+	.supports_v2 = ism_supports_v2,
+	.get_chid = ism_get_chid,
+};
+
 static int ism_dev_init(struct ism_dev *ism)
 {
 	struct pci_dev *pdev = ism->pdev;
@@ -465,6 +534,8 @@ static int ism_dev_init(struct ism_dev *ism)
 	else
 		ism_v2_capable = false;
 
+	ism->ops = &ism_vp_ops;
+
 	ism_dev_register(ism);
 	query_info(ism);
 	return 0;
@@ -603,150 +674,3 @@ static void __exit ism_exit(void)
 
 module_init(ism_init);
 module_exit(ism_exit);
-
-/*************************** SMC-D Implementation *****************************/
-
-#if IS_ENABLED(CONFIG_SMC) // needed to avoid unused functions
-static int ism_query_rgid(struct ism_dev *ism, u64 rgid, u32 vid_valid,
-			  u32 vid)
-{
-	union ism_query_rgid cmd;
-
-	memset(&cmd, 0, sizeof(cmd));
-	cmd.request.hdr.cmd = ISM_QUERY_RGID;
-	cmd.request.hdr.len = sizeof(cmd.request);
-
-	cmd.request.rgid = rgid;
-	cmd.request.vlan_valid = vid_valid;
-	cmd.request.vlan_id = vid;
-
-	return ism_cmd(ism, &cmd);
-}
-
-static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
-			   u32 vid_valid, u32 vid)
-{
-	return ism_query_rgid(smcd->priv, rgid->gid, vid_valid, vid);
-}
-
-static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
-			     void *client)
-{
-	return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, client);
-}
-
-static int smcd_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
-{
-	return ism_unregister_dmb(smcd->priv, (struct ism_dmb *)dmb);
-}
-
-static int smcd_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
-{
-	return ism_add_vlan_id(smcd->priv, vlan_id);
-}
-
-static int smcd_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
-{
-	return ism_del_vlan_id(smcd->priv, vlan_id);
-}
-
-static int smcd_set_vlan_required(struct smcd_dev *smcd)
-{
-	return ism_cmd_simple(smcd->priv, ISM_SET_VLAN);
-}
-
-static int smcd_reset_vlan_required(struct smcd_dev *smcd)
-{
-	return ism_cmd_simple(smcd->priv, ISM_RESET_VLAN);
-}
-
-static int ism_signal_ieq(struct ism_dev *ism, u64 rgid, u32 trigger_irq,
-			  u32 event_code, u64 info)
-{
-	union ism_sig_ieq cmd;
-
-	memset(&cmd, 0, sizeof(cmd));
-	cmd.request.hdr.cmd = ISM_SIGNAL_IEQ;
-	cmd.request.hdr.len = sizeof(cmd.request);
-
-	cmd.request.rgid = rgid;
-	cmd.request.trigger_irq = trigger_irq;
-	cmd.request.event_code = event_code;
-	cmd.request.info = info;
-
-	return ism_cmd(ism, &cmd);
-}
-
-static int smcd_signal_ieq(struct smcd_dev *smcd, struct smcd_gid *rgid,
-			   u32 trigger_irq, u32 event_code, u64 info)
-{
-	return ism_signal_ieq(smcd->priv, rgid->gid,
-			      trigger_irq, event_code, info);
-}
-
-static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx,
-		     bool sf, unsigned int offset, void *data,
-		     unsigned int size)
-{
-	return ism_move(smcd->priv, dmb_tok, idx, sf, offset, data, size);
-}
-
-static int smcd_supports_v2(void)
-{
-	return ism_v2_capable;
-}
-
-static u64 ism_get_local_gid(struct ism_dev *ism)
-{
-	return ism->local_gid;
-}
-
-static void smcd_get_local_gid(struct smcd_dev *smcd,
-			       struct smcd_gid *smcd_gid)
-{
-	smcd_gid->gid = ism_get_local_gid(smcd->priv);
-	smcd_gid->gid_ext = 0;
-}
-
-static u16 ism_get_chid(struct ism_dev *ism)
-{
-	if (!ism || !ism->pdev)
-		return 0;
-
-	return to_zpci(ism->pdev)->pchid;
-}
-
-static u16 smcd_get_chid(struct smcd_dev *smcd)
-{
-	return ism_get_chid(smcd->priv);
-}
-
-static inline struct device *smcd_get_dev(struct smcd_dev *dev)
-{
-	struct ism_dev *ism = dev->priv;
-
-	return &ism->dev;
-}
-
-static const struct smcd_ops ism_ops = {
-	.query_remote_gid = smcd_query_rgid,
-	.register_dmb = smcd_register_dmb,
-	.unregister_dmb = smcd_unregister_dmb,
-	.add_vlan_id = smcd_add_vlan_id,
-	.del_vlan_id = smcd_del_vlan_id,
-	.set_vlan_required = smcd_set_vlan_required,
-	.reset_vlan_required = smcd_reset_vlan_required,
-	.signal_event = smcd_signal_ieq,
-	.move_data = smcd_move,
-	.supports_v2 = smcd_supports_v2,
-	.get_local_gid = smcd_get_local_gid,
-	.get_chid = smcd_get_chid,
-	.get_dev = smcd_get_dev,
-};
-
-const struct smcd_ops *ism_get_smcd_ops(void)
-{
-	return &ism_ops;
-}
-EXPORT_SYMBOL_GPL(ism_get_smcd_ops);
-#endif
diff --git a/include/linux/ism.h b/include/linux/ism.h
index 1462296e8ba7..ede1a40b408e 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -12,6 +12,7 @@
 #include <linux/device.h>
 #include <linux/workqueue.h>
 
+/* The remote peer rgid can use dmb_tok to write into this buffer. */
 struct ism_dmb {
 	u64 dmb_tok;
 	u64 rgid;
@@ -23,30 +24,9 @@ struct ism_dmb {
 	dma_addr_t dma_addr;
 };
 
-/* Unless we gain unexpected popularity, this limit should hold for a while */
-#define MAX_CLIENTS		8
-#define NO_CLIENT		0xff		/* must be >= MAX_CLIENTS */
-#define ISM_NR_DMBS		1920
-
-struct ism_dev {
-	spinlock_t lock; /* protects the ism device */
-	struct list_head list;
-	struct pci_dev *pdev;
-
-	struct ism_sba *sba;
-	dma_addr_t sba_dma_addr;
-	DECLARE_BITMAP(sba_bitmap, ISM_NR_DMBS);
-	u8 *sba_client_arr;	/* entries are indices into 'clients' array */
-	void *priv[MAX_CLIENTS];
-
-	struct ism_eq *ieq;
-	dma_addr_t ieq_dma_addr;
-
-	struct device dev;
-	u64 local_gid;
-	int ieq_idx;
-
-	struct ism_client *subs[MAX_CLIENTS];
+struct smcd_gid {
+	u64	gid;
+	u64	gid_ext;
 };
 
 struct ism_event {
@@ -57,6 +37,12 @@ struct ism_event {
 	u64 info;
 };
 
+#define ISM_EVENT_DMB	0
+#define ISM_EVENT_GID	1
+#define ISM_EVENT_SWR	2
+
+struct ism_dev;
+
 struct ism_client {
 	const char *name;
 	void (*add)(struct ism_dev *dev);
@@ -73,28 +59,116 @@ struct ism_client {
 
 int ism_register_client(struct ism_client *client);
 int  ism_unregister_client(struct ism_client *client);
-static inline void *ism_get_priv(struct ism_dev *dev,
-				 struct ism_client *client) {
-	return dev->priv[client->id];
-}
+
+/* Mandatory operations for all ism devices:
+ * int (*query_remote_gid)(struct ism_dev *dev, struct smcd_gid *rgid,
+ *	                   u32 vid_valid, u32 vid);
+ *	Query whether remote GID rgid is reachable via this device and this
+ *	vlan id. Vlan id is only checked if vid_valid != 0.
+ *
+ * int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
+ *			    void *client);
+ *	Register an ism_dmb buffer for this device and this client.
+ *
+ * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
+ *	Unregister an ism_dmb buffer
+ *
+ * int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
+ *			 bool sf, unsigned int offset, void *data,
+ *			 unsigned int size);
+ *	Use dev to write data of size at offset into a remote dmb
+ *	identified by dmb_tok and idx. If signal flag (sf) then signal
+ *	the remote peer that data has arrived in this dmb.
+ *
+ * int (*supports_v2)(void);
+ *
+ * u16 (*get_chid)(struct ism_dev *dev);
+ *	Returns ism fabric identifier (channel id) of this device.
+ *	Only devices on the same ism fabric can communicate.
+ *	chid is unique per HW system, except for 0xFFFF, which denotes
+ *	an ism_loopback device that can only communicate with itself.
+ *	Use chid for fast negative checks, but only query_remote_gid()
+ *	can give a reliable positive answer.
+ *
+ * struct device* (*get_dev)(struct ism_dev *dev);
+ *
+ * Optional operations:
+ * int (*add_vlan_id)(struct ism_dev *dev, u64 vlan_id);
+ * int (*del_vlan_id)(struct ism_dev *dev, u64 vlan_id);
+ * int (*set_vlan_required)(struct ism_dev *dev);
+ * int (*reset_vlan_required)(struct ism_dev *dev);
+ *	VLAN handling is broken - don't use it
+ *	Ability to assign dmbs to VLANs is missing
+ *	- do we really want / need this?
+ *
+ * int (*signal_event)(struct ism_dev *dev, struct smcd_gid *rgid,
+ *			    u32 trigger_irq, u32 event_code, u64 info);
+ *	Send a control event into the event queue of a remote gid (rgid)
+ *	with (1) or without (0) triggering an interrupt at the remote gid.
+ */
+
+struct ism_ops {
+	int (*query_remote_gid)(struct ism_dev *dev, struct smcd_gid *rgid,
+				u32 vid_valid, u32 vid);
+	int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
+			    struct ism_client *client);
+	int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
+	int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
+			 bool sf, unsigned int offset, void *data,
+			 unsigned int size);
+	int (*supports_v2)(void);
+	u16 (*get_chid)(struct ism_dev *dev);
+	struct device* (*get_dev)(struct ism_dev *dev);
+
+	/* optional operations */
+	int (*add_vlan_id)(struct ism_dev *dev, u64 vlan_id);
+	int (*del_vlan_id)(struct ism_dev *dev, u64 vlan_id);
+	int (*set_vlan_required)(struct ism_dev *dev);
+	int (*reset_vlan_required)(struct ism_dev *dev);
+	int (*signal_event)(struct ism_dev *dev, struct smcd_gid *rgid,
+			    u32 trigger_irq, u32 event_code, u64 info);
+};
+
+/* Unless we gain unexpected popularity, this limit should hold for a while */
+#define MAX_CLIENTS		8
+#define NO_CLIENT		0xff		/* must be >= MAX_CLIENTS */
+#define ISM_NR_DMBS		1920
+
+struct ism_dev {
+	const struct ism_ops *ops;
+	spinlock_t lock; /* protects the ism device */
+	struct list_head list;
+	struct pci_dev *pdev;
+
+	struct ism_sba *sba;
+	dma_addr_t sba_dma_addr;
+	DECLARE_BITMAP(sba_bitmap, ISM_NR_DMBS);
+	u8 *sba_client_arr;	/* entries are indices into 'clients' array */
+	void *priv[MAX_CLIENTS];
+
+	struct ism_eq *ieq;
+	dma_addr_t ieq_dma_addr;
+
+	struct device dev;
+	struct smcd_gid gid;
+	int ieq_idx;
+
+	struct ism_client *subs[MAX_CLIENTS];
+};
 
 int ism_dev_register(struct ism_dev *ism);
 void ism_dev_unregister(struct ism_dev *ism);
 
+static inline void *ism_get_priv(struct ism_dev *dev,
+				 struct ism_client *client) {
+	return dev->priv[client->id];
+}
 static inline void ism_set_priv(struct ism_dev *dev, struct ism_client *client,
 				void *priv) {
 	dev->priv[client->id] = priv;
 }
 
-int  ism_register_dmb(struct ism_dev *dev, struct ism_dmb *dmb,
-		      struct ism_client *client);
-int  ism_unregister_dmb(struct ism_dev *dev, struct ism_dmb *dmb);
-int  ism_move(struct ism_dev *dev, u64 dmb_tok, unsigned int idx, bool sf,
-	      unsigned int offset, void *data, unsigned int size);
-
 #define ISM_RESERVED_VLANID	0x1FFF
 #define ISM_ERROR	0xFFFF
 
-const struct smcd_ops *ism_get_smcd_ops(void);
-
 #endif	/* _ISM_H */
diff --git a/include/net/smc.h b/include/net/smc.h
index ab732b286f91..3d20c6c05056 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -27,35 +27,20 @@ struct smc_hashinfo {
 };
 
 /* SMCD/ISM device driver interface */
-struct smcd_dmb {
-	u64 dmb_tok;
-	u64 rgid;
-	u32 dmb_len;
-	u32 sba_idx;
-	u32 vlan_valid;
-	u32 vlan_id;
-	void *cpu_addr;
-	dma_addr_t dma_addr;
-};
-
-#define ISM_EVENT_DMB	0
-#define ISM_EVENT_GID	1
-#define ISM_EVENT_SWR	2
-
 
 struct smcd_dev;
 
-struct smcd_gid {
-	u64	gid;
-	u64	gid_ext;
-};
-
+//struct smcd_gid {
+//	u64	gid;
+//	u64	gid_ext;
+//};
+//
 struct smcd_ops {
 	int (*query_remote_gid)(struct smcd_dev *dev, struct smcd_gid *rgid,
 				u32 vid_valid, u32 vid);
-	int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb,
+	int (*register_dmb)(struct smcd_dev *dev, struct ism_dmb *dmb,
 			    void *client);
-	int (*unregister_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
+	int (*unregister_dmb)(struct smcd_dev *dev, struct ism_dmb *dmb);
 	int (*move_data)(struct smcd_dev *dev, u64 dmb_tok, unsigned int idx,
 			 bool sf, unsigned int offset, void *data,
 			 unsigned int size);
@@ -72,7 +57,7 @@ struct smcd_ops {
 	int (*signal_event)(struct smcd_dev *dev, struct smcd_gid *rgid,
 			    u32 trigger_irq, u32 event_code, u64 info);
 	int (*support_dmb_nocopy)(struct smcd_dev *dev);
-	int (*attach_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
+	int (*attach_dmb)(struct smcd_dev *dev, struct ism_dmb *dmb);
 	int (*detach_dmb)(struct smcd_dev *dev, u64 token);
 };
 
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index 84f98e18c7db..6fbacad02f23 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -207,7 +207,7 @@ int smc_ism_put_vlan(struct smcd_dev *smcd, unsigned short vlanid)
 
 int smc_ism_unregister_dmb(struct smcd_dev *smcd, struct smc_buf_desc *dmb_desc)
 {
-	struct smcd_dmb dmb;
+	struct ism_dmb dmb;
 	int rc = 0;
 
 	if (!dmb_desc->dma_addr)
@@ -231,7 +231,7 @@ int smc_ism_unregister_dmb(struct smcd_dev *smcd, struct smc_buf_desc *dmb_desc)
 int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len,
 			 struct smc_buf_desc *dmb_desc)
 {
-	struct smcd_dmb dmb;
+	struct ism_dmb dmb;
 	int rc;
 
 	memset(&dmb, 0, sizeof(dmb));
@@ -263,7 +263,7 @@ bool smc_ism_support_dmb_nocopy(struct smcd_dev *smcd)
 int smc_ism_attach_dmb(struct smcd_dev *dev, u64 token,
 		       struct smc_buf_desc *dmb_desc)
 {
-	struct smcd_dmb dmb;
+	struct ism_dmb dmb;
 	int rc = 0;
 
 	if (!dev->ops->attach_dmb)
@@ -481,9 +481,122 @@ static struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name,
 	return smcd;
 }
 
+static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
+			   u32 vid_valid, u32 vid)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->query_remote_gid(ism, rgid, vid_valid, vid);
+}
+
+static int smcd_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb,
+			     void *client)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->register_dmb(ism, dmb, (struct ism_client *)client);
+}
+
+static int smcd_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->unregister_dmb(ism, dmb);
+}
+
+static int smcd_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->add_vlan_id(ism, vlan_id);
+}
+
+static int smcd_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->del_vlan_id(ism, vlan_id);
+}
+
+static int smcd_set_vlan_required(struct smcd_dev *smcd)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->set_vlan_required(ism);
+}
+
+static int smcd_reset_vlan_required(struct smcd_dev *smcd)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->reset_vlan_required(ism);
+}
+
+static int smcd_signal_ieq(struct smcd_dev *smcd, struct smcd_gid *rgid,
+			   u32 trigger_irq, u32 event_code, u64 info)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->signal_event(ism, rgid,
+			      trigger_irq, event_code, info);
+}
+
+static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx,
+		     bool sf, unsigned int offset, void *data,
+		     unsigned int size)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->move_data(ism, dmb_tok, idx, sf, offset, data, size);
+}
+
+static int smcd_supports_v2(void)
+{
+	return smc_ism_v2_capable;
+}
+
+static void smcd_get_local_gid(struct smcd_dev *smcd,
+			       struct smcd_gid *smcd_gid)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	smcd_gid->gid = ism->gid.gid;
+	smcd_gid->gid_ext = ism->gid.gid_ext;
+}
+
+static u16 smcd_get_chid(struct smcd_dev *smcd)
+{
+	struct ism_dev *ism = smcd->priv;
+
+	return ism->ops->get_chid(ism);
+}
+
+static inline struct device *smcd_get_dev(struct smcd_dev *dev)
+{
+	struct ism_dev *ism = dev->priv;
+
+	return &ism->dev;
+}
+
+static const struct smcd_ops ism_smcd_ops = {
+	.query_remote_gid = smcd_query_rgid,
+	.register_dmb = smcd_register_dmb,
+	.unregister_dmb = smcd_unregister_dmb,
+	.add_vlan_id = smcd_add_vlan_id,
+	.del_vlan_id = smcd_del_vlan_id,
+	.set_vlan_required = smcd_set_vlan_required,
+	.reset_vlan_required = smcd_reset_vlan_required,
+	.signal_event = smcd_signal_ieq,
+	.move_data = smcd_move,
+	.supports_v2 = smcd_supports_v2,
+	.get_local_gid = smcd_get_local_gid,
+	.get_chid = smcd_get_chid,
+	.get_dev = smcd_get_dev,
+};
+
 static void smcd_register_dev(struct ism_dev *ism)
 {
-	const struct smcd_ops *ops = ism_get_smcd_ops();
+	const struct smcd_ops *ops = &ism_smcd_ops;
 	struct smcd_dev *smcd, *fentry;
 
 	if (!ops)
@@ -499,7 +612,7 @@ static void smcd_register_dev(struct ism_dev *ism)
 	if (smc_pnetid_by_dev_port(&ism->pdev->dev, 0, smcd->pnetid))
 		smc_pnetid_by_table_smcd(smcd);
 
-	if (smcd->ops->supports_v2())
+	if (ism->ops->supports_v2())
 		smc_ism_set_v2_capable();
 	mutex_lock(&smcd_dev_list.mutex);
 	/* sort list:
diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
index 3c5f64ca4115..c4020653ae20 100644
--- a/net/smc/smc_loopback.c
+++ b/net/smc/smc_loopback.c
@@ -51,7 +51,7 @@ static int smc_lo_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
 	return 0;
 }
 
-static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
+static int smc_lo_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb,
 			       void *client_priv)
 {
 	struct smc_lo_dmb_node *dmb_node, *tmp_node;
@@ -129,7 +129,7 @@ static void __smc_lo_unregister_dmb(struct smc_lo_dev *ldev,
 		wake_up(&ldev->ldev_release);
 }
 
-static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
+static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb)
 {
 	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
 	struct smc_lo_dev *ldev = smcd->priv;
@@ -158,7 +158,7 @@ static int smc_lo_support_dmb_nocopy(struct smcd_dev *smcd)
 	return SMC_LO_SUPPORT_NOCOPY;
 }
 
-static int smc_lo_attach_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
+static int smc_lo_attach_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb)
 {
 	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
 	struct smc_lo_dev *ldev = smcd->priv;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
  2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
  2025-01-15 19:55 ` [RFC net-next 2/7] net/ism: Remove dependencies between ISM_VPCI and SMC Alexandra Winter
@ 2025-01-15 19:55 ` Alexandra Winter
  2025-01-20 17:18   ` Simon Horman
  2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

SMC uses 64 Bit and 128 Bit Global Identifiers (GIDs)
that need to be sent via the SMC protocol.
When integers are used network endianness and host endianness
need to be considered.

Avoid this in the ISM layer by using uuid_t byte arrays.
Follow on patches could do the same change for SMC, for now
conversion helper functions are introduced.

ISM-vPCI devices provide 64 Bit GIDs. Map them to ISM uuid_t GIDs
like this:
 _________________________________________
| 64 Bit ISM-vPCI GID | 00000000_00000000 |
 -----------------------------------------
If interpreted as UUID, this would be interpreted as th UIID variant,
that is reserved for NCS backward compatibility. So it will not collide
with UUIDs that were generated according to the standard.

Future ISM devices, shall use real UUIDs as 128 Bit GIDs.

Note:
- In this RFC patch smcd_gid is now moved back to smc.h,
  future patchset should avoid that.
- ism_dmb and ism_event structs still contain 64 Bit rgid and info
  fields. A future patch could change them to uuid_t gids. This
  does not break anything, because ism_loopback does not use them.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 drivers/s390/net/ism.h     |  9 +++++++++
 drivers/s390/net/ism_drv.c | 16 ++++++++--------
 include/linux/ism.h        | 16 ++++++----------
 include/net/smc.h          | 12 ++++++------
 net/smc/smc_ism.c          | 13 ++++++++-----
 net/smc/smc_ism.h          | 21 +++++++++++++++++++++
 6 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
index 8b56e1d82e6b..61cf10334170 100644
--- a/drivers/s390/net/ism.h
+++ b/drivers/s390/net/ism.h
@@ -64,6 +64,15 @@ union ism_reg_ieq {
 	} response;
 } __aligned(16);
 
+/* ISM-vPCI devices provide 64 Bit GIDs
+ * Map them to ISM UUID GIDs like this:
+ *  _________________________________________
+ * | 64 Bit ISM-vPCI GID | 00000000_00000000 |
+ *  -----------------------------------------
+ * This will be interpreted as a UIID variant, that is reserved
+ * for NCS backward compatibility. So it will not collide with
+ * proper UUIDs.
+ */
 union ism_read_gid {
 	struct {
 		struct ism_req_hdr hdr;
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index 112e0d67cdd6..ab70debbdd9d 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -191,14 +191,14 @@ static int ism_read_local_gid(struct ism_dev *ism)
 	if (ret)
 		goto out;
 
-	ism->gid.gid = cmd.response.gid;
-	ism->gid.gid_ext = 0;
+	memset(&ism->gid, 0, sizeof(ism->gid));
+	memcpy(&ism->gid, &cmd.response.gid, sizeof(cmd.response.gid));
 out:
 	return ret;
 }
 
-static int ism_query_rgid(struct ism_dev *ism, struct smcd_gid *rgid,
-			  u32 vid_valid, u32 vid)
+static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
+			  u32 vid)
 {
 	union ism_query_rgid cmd;
 
@@ -206,7 +206,7 @@ static int ism_query_rgid(struct ism_dev *ism, struct smcd_gid *rgid,
 	cmd.request.hdr.cmd = ISM_QUERY_RGID;
 	cmd.request.hdr.len = sizeof(cmd.request);
 
-	cmd.request.rgid = rgid->gid;
+	memcpy(&cmd.request.rgid, rgid, sizeof(cmd.request.rgid));
 	cmd.request.vlan_valid = vid_valid;
 	cmd.request.vlan_id = vid;
 
@@ -364,8 +364,8 @@ static int ism_reset_vlan_required(struct ism_dev *ism)
 	return ism_cmd_simple(ism, ISM_RESET_VLAN);
 }
 
-static int ism_signal_ieq(struct ism_dev *ism, struct smcd_gid *rgid,
-			  u32 trigger_irq, u32 event_code, u64 info)
+static int ism_signal_ieq(struct ism_dev *ism, uuid_t *rgid, u32 trigger_irq,
+			  u32 event_code, u64 info)
 {
 	union ism_sig_ieq cmd;
 
@@ -373,7 +373,7 @@ static int ism_signal_ieq(struct ism_dev *ism, struct smcd_gid *rgid,
 	cmd.request.hdr.cmd = ISM_SIGNAL_IEQ;
 	cmd.request.hdr.len = sizeof(cmd.request);
 
-	cmd.request.rgid = rgid->gid;
+	memcpy(&cmd.request.rgid, rgid, sizeof(cmd.request.rgid));
 	cmd.request.trigger_irq = trigger_irq;
 	cmd.request.event_code = event_code;
 	cmd.request.info = info;
diff --git a/include/linux/ism.h b/include/linux/ism.h
index ede1a40b408e..50975847248f 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -11,6 +11,7 @@
 
 #include <linux/device.h>
 #include <linux/workqueue.h>
+#include <linux/uuid.h>
 
 /* The remote peer rgid can use dmb_tok to write into this buffer. */
 struct ism_dmb {
@@ -24,11 +25,6 @@ struct ism_dmb {
 	dma_addr_t dma_addr;
 };
 
-struct smcd_gid {
-	u64	gid;
-	u64	gid_ext;
-};
-
 struct ism_event {
 	u32 type;
 	u32 code;
@@ -61,7 +57,7 @@ int ism_register_client(struct ism_client *client);
 int  ism_unregister_client(struct ism_client *client);
 
 /* Mandatory operations for all ism devices:
- * int (*query_remote_gid)(struct ism_dev *dev, struct smcd_gid *rgid,
+ * int (*query_remote_gid)(struct ism_dev *dev, uuid_t *rgid,
  *	                   u32 vid_valid, u32 vid);
  *	Query whether remote GID rgid is reachable via this device and this
  *	vlan id. Vlan id is only checked if vid_valid != 0.
@@ -101,14 +97,14 @@ int  ism_unregister_client(struct ism_client *client);
  *	Ability to assign dmbs to VLANs is missing
  *	- do we really want / need this?
  *
- * int (*signal_event)(struct ism_dev *dev, struct smcd_gid *rgid,
+ * int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
  *			    u32 trigger_irq, u32 event_code, u64 info);
  *	Send a control event into the event queue of a remote gid (rgid)
  *	with (1) or without (0) triggering an interrupt at the remote gid.
  */
 
 struct ism_ops {
-	int (*query_remote_gid)(struct ism_dev *dev, struct smcd_gid *rgid,
+	int (*query_remote_gid)(struct ism_dev *dev, uuid_t *rgid,
 				u32 vid_valid, u32 vid);
 	int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
 			    struct ism_client *client);
@@ -125,7 +121,7 @@ struct ism_ops {
 	int (*del_vlan_id)(struct ism_dev *dev, u64 vlan_id);
 	int (*set_vlan_required)(struct ism_dev *dev);
 	int (*reset_vlan_required)(struct ism_dev *dev);
-	int (*signal_event)(struct ism_dev *dev, struct smcd_gid *rgid,
+	int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
 			    u32 trigger_irq, u32 event_code, u64 info);
 };
 
@@ -150,7 +146,7 @@ struct ism_dev {
 	dma_addr_t ieq_dma_addr;
 
 	struct device dev;
-	struct smcd_gid gid;
+	uuid_t gid;
 	int ieq_idx;
 
 	struct ism_client *subs[MAX_CLIENTS];
diff --git a/include/net/smc.h b/include/net/smc.h
index 3d20c6c05056..91aab1d44166 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -15,7 +15,7 @@
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/wait.h>
-#include "linux/ism.h"
+#include <linux/ism.h>
 
 struct sock;
 
@@ -30,11 +30,11 @@ struct smc_hashinfo {
 
 struct smcd_dev;
 
-//struct smcd_gid {
-//	u64	gid;
-//	u64	gid_ext;
-//};
-//
+struct smcd_gid {
+	u64	gid;
+	u64	gid_ext;
+};
+
 struct smcd_ops {
 	int (*query_remote_gid)(struct smcd_dev *dev, struct smcd_gid *rgid,
 				u32 vid_valid, u32 vid);
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index 6fbacad02f23..a49da16bafd5 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -485,8 +485,10 @@ static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
 			   u32 vid_valid, u32 vid)
 {
 	struct ism_dev *ism = smcd->priv;
+	uuid_t ism_rgid;
 
-	return ism->ops->query_remote_gid(ism, rgid, vid_valid, vid);
+	copy_to_ismgid(&ism_rgid, rgid);
+	return ism->ops->query_remote_gid(ism, &ism_rgid, vid_valid, vid);
 }
 
 static int smcd_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb,
@@ -536,9 +538,11 @@ static int smcd_signal_ieq(struct smcd_dev *smcd, struct smcd_gid *rgid,
 			   u32 trigger_irq, u32 event_code, u64 info)
 {
 	struct ism_dev *ism = smcd->priv;
+	uuid_t ism_rgid;
 
-	return ism->ops->signal_event(ism, rgid,
-			      trigger_irq, event_code, info);
+	copy_to_ismgid(&ism_rgid, rgid);
+	return ism->ops->signal_event(ism, &ism_rgid, trigger_irq,
+				      event_code, info);
 }
 
 static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx,
@@ -560,8 +564,7 @@ static void smcd_get_local_gid(struct smcd_dev *smcd,
 {
 	struct ism_dev *ism = smcd->priv;
 
-	smcd_gid->gid = ism->gid.gid;
-	smcd_gid->gid_ext = ism->gid.gid_ext;
+	copy_to_smcdgid(smcd_gid, &ism->gid);
 }
 
 static u16 smcd_get_chid(struct smcd_dev *smcd)
diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h
index 6763133dd8d0..d041e5a7c459 100644
--- a/net/smc/smc_ism.h
+++ b/net/smc/smc_ism.h
@@ -12,6 +12,7 @@
 #include <linux/uio.h>
 #include <linux/types.h>
 #include <linux/mutex.h>
+#include <linux/ism.h>
 
 #include "smc.h"
 
@@ -94,4 +95,24 @@ static inline bool smc_ism_is_loopback(struct smcd_dev *smcd)
 	return (smcd->ops->get_chid(smcd) == 0xFFFF);
 }
 
+static inline void copy_to_smcdgid(struct smcd_gid *sgid, uuid_t *igid)
+{
+	__be64 temp;
+
+	memcpy(&temp, igid, sizeof(sgid->gid));
+	sgid->gid = ntohll(temp);
+	memcpy(&temp, igid + sizeof(sgid->gid), sizeof(sgid->gid_ext));
+	sgid->gid_ext = ntohll(temp);
+}
+
+static inline void copy_to_ismgid(uuid_t *igid, struct smcd_gid *sgid)
+{
+	__be64 temp;
+
+	temp = htonll(sgid->gid);
+	memcpy(igid, &temp, sizeof(sgid->gid));
+	temp = htonll(sgid->gid_ext);
+	memcpy(igid + sizeof(sgid->gid), &temp, sizeof(sgid->gid_ext));
+}
+
 #endif
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
                   ` (2 preceding siblings ...)
  2025-01-15 19:55 ` [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID Alexandra Winter
@ 2025-01-15 19:55 ` Alexandra Winter
  2025-01-15 22:06   ` Halil Pasic
  2025-01-20  6:32   ` Dust Li
  2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

Note that in this RFC this patch is not complete, future versions
of this patch need to contain comments for all ism_ops.
Especially signal_event() and handle_event() need a good generic
description.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 include/linux/ism.h | 115 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 105 insertions(+), 10 deletions(-)

diff --git a/include/linux/ism.h b/include/linux/ism.h
index 50975847248f..bc165d077071 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -13,11 +13,26 @@
 #include <linux/workqueue.h>
 #include <linux/uuid.h>
 
-/* The remote peer rgid can use dmb_tok to write into this buffer. */
+/*
+ * DMB - Direct Memory Buffer
+ * ==========================
+ * An ism client provides an DMB as input buffer for a local receiving
+ * ism device for exactly one (remote) sending ism device. Only this
+ * sending device can send data into this DMB using move_data(). Sender
+ * and receiver can be the same device.
+ * TODO: Alignment and length rules (CPU and DMA). Device specific?
+ */
 struct ism_dmb {
+	/* dmb_tok - Token for this dmb
+	 * Used by remote sender to address this dmb.
+	 * Provided by ism fabric in register_dmb().
+	 * Unique per ism fabric.
+	 */
 	u64 dmb_tok;
+	/* rgid - GID of designated remote sending device */
 	u64 rgid;
 	u32 dmb_len;
+	/* sba_idx - Index of this DMB on this receiving device */
 	u32 sba_idx;
 	u32 vlan_valid;
 	u32 vlan_id;
@@ -25,6 +40,8 @@ struct ism_dmb {
 	dma_addr_t dma_addr;
 };
 
+/* ISM event structure (currently device type specific) */
+// TODO: Define and describe generic event properties
 struct ism_event {
 	u32 type;
 	u32 code;
@@ -33,38 +50,89 @@ struct ism_event {
 	u64 info;
 };
 
+//TODO: use enum typedef
 #define ISM_EVENT_DMB	0
 #define ISM_EVENT_GID	1
 #define ISM_EVENT_SWR	2
 
 struct ism_dev;
 
+/*
+ * ISM clients
+ * ===========
+ * All ism clients have access to all ism devices
+ * and must provide the following functions to be called by
+ * ism device drivers:
+ */
 struct ism_client {
+	/* client name for logging and debugging purposes */
 	const char *name;
+	/**
+	 *  add() - add an ism device
+	 *  @dev: device that was added
+	 *
+	 * Will be called during ism_register_client() for all existing
+	 * ism devices and whenever a new ism device is registered.
+	 * *dev is valid until ism_client->remove() is called.
+	 */
 	void (*add)(struct ism_dev *dev);
+	/**
+	 * remove() - remove an ism device
+	 * @dev: device to be removed
+	 *
+	 * Will be called whenever an ism device is unregistered.
+	 * Before this call the device is already inactive: It will
+	 * no longer call client handlers.
+	 * The client must not access *dev after this call.
+	 */
 	void (*remove)(struct ism_dev *dev);
+	/**
+	 * handle_event() - Handle control information sent by device
+	 * @dev: device reporting the event
+	 * @event: ism event structure
+	 */
 	void (*handle_event)(struct ism_dev *dev, struct ism_event *event);
-	/* Parameter dmbemask contains a bit vector with updated DMBEs, if sent
-	 * via ism_move_data(). Callback function must handle all active bits
-	 * indicated by dmbemask.
+	/**
+	 * handle_irq() - Handle signalling of a DMB
+	 * @dev: device owns the dmb
+	 * @bit: sba_idx=idx of the ism_dmb that got signalled
+	 *	TODO: Pass a priv pointer to ism_dmb instead of 'bit'(?)
+	 * @dmbemask: ism signalling mask of the dmb
+	 *
+	 * Handle signalling of a dmb that was registered by this client
+	 * for this device.
+	 * The ism device can coalesce multiple signalling triggers into a
+	 * single call of handle_irq(). dmbemask can be used to indicate
+	 * different kinds of triggers.
 	 */
 	void (*handle_irq)(struct ism_dev *dev, unsigned int bit, u16 dmbemask);
-	/* Private area - don't touch! */
+	/* client index - provided by ism layer */
 	u8 id;
 };
 
 int ism_register_client(struct ism_client *client);
 int  ism_unregister_client(struct ism_client *client);
 
+//TODO: Pair descriptions with functions
+/*
+ * ISM devices
+ * ===========
+ */
 /* Mandatory operations for all ism devices:
  * int (*query_remote_gid)(struct ism_dev *dev, uuid_t *rgid,
  *	                   u32 vid_valid, u32 vid);
  *	Query whether remote GID rgid is reachable via this device and this
  *	vlan id. Vlan id is only checked if vid_valid != 0.
+ *	Returns 0 if remote gid is reachable.
  *
  * int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
  *			    void *client);
- *	Register an ism_dmb buffer for this device and this client.
+ *	Allocate and register an ism_dmb buffer for this device and this client.
+ *	The following fields of ism_dmb must be valid:
+ *	rgid, dmb_len, vlan_*; Optionally:requested sba_idx (non-zero)
+ *	Upon return the following fields will be valid: dmb_tok, sba_idx
+ *		cpu_addr, dma_addr (if applicable)
+ *	Returns zero on success
  *
  * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
  *	Unregister an ism_dmb buffer
@@ -81,10 +149,15 @@ int  ism_unregister_client(struct ism_client *client);
  * u16 (*get_chid)(struct ism_dev *dev);
  *	Returns ism fabric identifier (channel id) of this device.
  *	Only devices on the same ism fabric can communicate.
- *	chid is unique per HW system, except for 0xFFFF, which denotes
- *	an ism_loopback device that can only communicate with itself.
- *	Use chid for fast negative checks, but only query_remote_gid()
- *	can give a reliable positive answer.
+ *	chid is unique per HW system. Use chid for fast negative checks,
+ *	but only query_remote_gid() can give a reliable positive answer:
+ *	Different chid: ism is not possible
+ *	Same chid: ism traffic may be possible or not
+ *		   (e.g. different HW systems)
+ *	EXCEPTION: A value of 0xFFFF denotes an ism_loopback device
+ *		that can only communicate with itself. Use GID or
+ *		query_remote_gid()to determine whether sender and
+ *		receiver use the same ism_loopback device.
  *
  * struct device* (*get_dev)(struct ism_dev *dev);
  *
@@ -109,6 +182,28 @@ struct ism_ops {
 	int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
 			    struct ism_client *client);
 	int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
+	/**
+	 * move_data() - write into a remote dmb
+	 * @dev: Local sending ism device
+	 * @dmb_tok: Token of the remote dmb
+	 * @idx: signalling index
+	 * @sf: signalling flag;
+	 *      if true, idx will be turned on at target ism interrupt mask
+	 *      and target device will be signalled, if required.
+	 * @offset: offset within target dmb
+	 * @data: pointer to data to be sent
+	 * @size: length of data to be sent
+	 *
+	 * Use dev to write data of size at offset into a remote dmb
+	 * identified by dmb_tok. Data is moved synchronously, *data can
+	 * be freed when this function returns.
+	 *
+	 * If signalling flag (sf) is true, bit number idx bit will be
+	 * turned on in the ism signalling mask, that belongs to the
+	 * target dmb, and handle_irq() of the ism client that owns this
+	 * dmb will be called, if required. The target device may chose to
+	 * coalesce multiple signalling triggers.
+	 */
 	int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
 			 bool sf, unsigned int offset, void *data,
 			 unsigned int size);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
                   ` (3 preceding siblings ...)
  2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
@ 2025-01-15 19:55 ` Alexandra Winter
  2025-01-20  3:55   ` Dust Li
  2025-02-06 17:36   ` Julian Ruess
  2025-01-15 19:55 ` [RFC net-next 6/7] s390/ism: Define ismvp_dev Alexandra Winter
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

The first stage of ism_loopback was implemented as part of the
SMC module [1]. Now that we have the ism layer, provide
access to the ism_loopback device to all ism clients.

Move ism_loopback.* from net/smc to net/ism.
The following changes are required to ism_loopback.c:
- Change ism_lo_move_data() to no longer schedule an smcd receive tasklet,
but instead call ism_client->handle_irq().
Note: In this RFC patch ism_loppback is not fully generic.
  The smc-d client uses attached buffers, for moves without signalling.
  and not-attached buffers for moves with signalling.
  ism_lo_move_data() must not rely on that assumption.
  ism_lo_move_data() must be able to handle more than one ism client.

In addition the following changes are required to unify ism_loopback and
ism_vp:

In ism layer and ism_vpci:
ism_loopback is not backed by a pci device, so use dev instead of pdev in
ism_dev.

In smc-d:
in smcd_alloc_dev():
- use kernel memory instead of device memory for smcd_dev and smcd->conn.
        An alternative would be to ask device to alloc the memory.
- use different smcd_ops and max_dmbs for ism_vp and ism_loopback.
    A future patch can change smc-d to directly use ism_ops instead of
    smcd_ops.
- use ism dev_name instead of pci dev name for ism_evt_wq name
- allocate an event workqueue for ism_loopback, although it currently does
  not generate events.

Link: https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ [1]

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 drivers/s390/net/ism.h     |   6 +-
 drivers/s390/net/ism_drv.c |  31 ++-
 include/linux/ism.h        |  59 +++++
 include/net/smc.h          |   4 +-
 net/ism/Kconfig            |  13 ++
 net/ism/Makefile           |   1 +
 net/ism/ism_loopback.c     | 366 +++++++++++++++++++++++++++++++
 net/ism/ism_loopback.h     |  59 +++++
 net/ism/ism_main.c         |  11 +-
 net/smc/Kconfig            |  13 --
 net/smc/Makefile           |   1 -
 net/smc/af_smc.c           |  12 +-
 net/smc/smc_ism.c          | 108 +++++++---
 net/smc/smc_loopback.c     | 427 -------------------------------------
 net/smc/smc_loopback.h     |  60 ------
 15 files changed, 606 insertions(+), 565 deletions(-)
 create mode 100644 net/ism/ism_loopback.c
 create mode 100644 net/ism/ism_loopback.h
 delete mode 100644 net/smc/smc_loopback.c
 delete mode 100644 net/smc/smc_loopback.h

diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
index 61cf10334170..0deca6d0e328 100644
--- a/drivers/s390/net/ism.h
+++ b/drivers/s390/net/ism.h
@@ -202,7 +202,7 @@ struct ism_sba {
 static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
 				  unsigned long offset, unsigned long len)
 {
-	struct zpci_dev *zdev = to_zpci(ism->pdev);
+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, 8);
 
 	while (len > 0) {
@@ -216,7 +216,7 @@ static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
 static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
 				   unsigned long offset, unsigned long len)
 {
-	struct zpci_dev *zdev = to_zpci(ism->pdev);
+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, len);
 
 	if (len)
@@ -226,7 +226,7 @@ static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
 static inline int __ism_move(struct ism_dev *ism, u64 dmb_req, void *data,
 			     unsigned int size)
 {
-	struct zpci_dev *zdev = to_zpci(ism->pdev);
+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, size);
 
 	return __zpci_store_block(data, req, dmb_req);
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index ab70debbdd9d..c0954d6dd9f5 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -88,7 +88,7 @@ static int register_sba(struct ism_dev *ism)
 	dma_addr_t dma_handle;
 	struct ism_sba *sba;
 
-	sba = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
+	sba = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
 				 GFP_KERNEL);
 	if (!sba)
 		return -ENOMEM;
@@ -99,7 +99,7 @@ static int register_sba(struct ism_dev *ism)
 	cmd.request.sba = dma_handle;
 
 	if (ism_cmd(ism, &cmd)) {
-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, sba, dma_handle);
+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, sba, dma_handle);
 		return -EIO;
 	}
 
@@ -115,7 +115,7 @@ static int register_ieq(struct ism_dev *ism)
 	dma_addr_t dma_handle;
 	struct ism_eq *ieq;
 
-	ieq = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
+	ieq = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
 				 GFP_KERNEL);
 	if (!ieq)
 		return -ENOMEM;
@@ -127,7 +127,7 @@ static int register_ieq(struct ism_dev *ism)
 	cmd.request.len = sizeof(*ieq);
 
 	if (ism_cmd(ism, &cmd)) {
-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, ieq, dma_handle);
+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, ieq, dma_handle);
 		return -EIO;
 	}
 
@@ -149,7 +149,7 @@ static int unregister_sba(struct ism_dev *ism)
 	if (ret && ret != ISM_ERROR)
 		return -EIO;
 
-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
 			  ism->sba, ism->sba_dma_addr);
 
 	ism->sba = NULL;
@@ -169,7 +169,7 @@ static int unregister_ieq(struct ism_dev *ism)
 	if (ret && ret != ISM_ERROR)
 		return -EIO;
 
-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
 			  ism->ieq, ism->ieq_dma_addr);
 
 	ism->ieq = NULL;
@@ -216,7 +216,7 @@ static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
 static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 {
 	clear_bit(dmb->sba_idx, ism->sba_bitmap);
-	dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len,
+	dma_unmap_page(ism->dev.parent, dmb->dma_addr, dmb->dmb_len,
 		       DMA_FROM_DEVICE);
 	folio_put(virt_to_folio(dmb->cpu_addr));
 }
@@ -227,7 +227,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 	unsigned long bit;
 	int rc;
 
-	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev))
+	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(ism->dev.parent))
 		return -EINVAL;
 
 	if (!dmb->sba_idx) {
@@ -251,10 +251,10 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 	}
 
 	dmb->cpu_addr = folio_address(folio);
-	dmb->dma_addr = dma_map_page(&ism->pdev->dev,
+	dmb->dma_addr = dma_map_page(ism->dev.parent,
 				     virt_to_page(dmb->cpu_addr), 0,
 				     dmb->dmb_len, DMA_FROM_DEVICE);
-	if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) {
+	if (dma_mapping_error(ism->dev.parent, dmb->dma_addr)) {
 		rc = -ENOMEM;
 		goto out_free;
 	}
@@ -419,10 +419,7 @@ static int ism_supports_v2(void)
 
 static u16 ism_get_chid(struct ism_dev *ism)
 {
-	if (!ism || !ism->pdev)
-		return 0;
-
-	return to_zpci(ism->pdev)->pchid;
+	return to_zpci(to_pci_dev(ism->dev.parent))->pchid;
 }
 
 static void ism_handle_event(struct ism_dev *ism)
@@ -499,7 +496,7 @@ static const struct ism_ops ism_vp_ops = {
 
 static int ism_dev_init(struct ism_dev *ism)
 {
-	struct pci_dev *pdev = ism->pdev;
+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
 	int ret;
 
 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
@@ -565,7 +562,6 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	spin_lock_init(&ism->lock);
 	dev_set_drvdata(&pdev->dev, ism);
-	ism->pdev = pdev;
 	ism->dev.parent = &pdev->dev;
 	device_initialize(&ism->dev);
 	dev_set_name(&ism->dev, dev_name(&pdev->dev));
@@ -603,14 +599,13 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	device_del(&ism->dev);
 err_dev:
 	dev_set_drvdata(&pdev->dev, NULL);
-	kfree(ism);
 
 	return ret;
 }
 
 static void ism_dev_exit(struct ism_dev *ism)
 {
-	struct pci_dev *pdev = ism->pdev;
+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
 	unsigned long flags;
 	int i;
 
diff --git a/include/linux/ism.h b/include/linux/ism.h
index bc165d077071..929a1f275419 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -144,6 +144,9 @@ int  ism_unregister_client(struct ism_client *client);
  *	identified by dmb_tok and idx. If signal flag (sf) then signal
  *	the remote peer that data has arrived in this dmb.
  *
+ * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
+ *	Unregister an ism_dmb buffer
+ *
  * int (*supports_v2)(void);
  *
  * u16 (*get_chid)(struct ism_dev *dev);
@@ -218,12 +221,63 @@ struct ism_ops {
 	int (*reset_vlan_required)(struct ism_dev *dev);
 	int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
 			    u32 trigger_irq, u32 event_code, u64 info);
+/* no copy option
+ * --------------
+ */
+	/**
+	 * support_dmb_nocopy() - does this device provide no-copy option?
+	 * @dev: ism device
+	 *
+	 * In addition to using move_data(), a sender device can provide a
+	 * kernel address + length, that represents a target dmb
+	 * (like MMIO). If a sender writes into such a ghost-send-buffer
+	 * (= at this kernel address) the data will automatically
+	 * immediately appear in the target dmb, even without calling
+	 * move_data().
+	 * Note that this is NOT related to the MSG_ZEROCOPY socket flag.
+	 *
+	 * Either all 3 function pointers for support_dmb_nocopy(),
+	 * attach_dmb() and detach_dmb() are defined, or all of them must
+	 * be NULL.
+	 *
+	 * Return: non-zero, if no-copy is supported.
+	 */
+	int (*support_dmb_nocopy)(struct ism_dev *dev);
+	/**
+	 * attach_dmb() - attach local memory to a remote dmb
+	 * @dev: Local sending ism device
+	 * @dmb: all other parameters are passed in the form of a
+	 *	 dmb struct
+	 *	 TODO: (THIS IS CONFUSING, should be changed)
+	 *  dmb_tok: (in) Token of the remote dmb, we want to attach to
+	 *  cpu_addr: (out) MMIO address
+	 *  dma_addr: (out) MMIO address (if applicable, invalid otherwise)
+	 *  dmb_len: (out) length of local MMIO region,
+	 *           equal to length of remote DMB.
+	 *  sba_idx: (out) index of remote dmb (NOT HELPFUL, should be removed)
+	 *
+	 * Provides a memory address to the sender that can be used to
+	 * directly write into the remote dmb.
+	 *
+	 * Return: Zero upon success, Error code otherwise
+	 */
+	int (*attach_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
+	/**
+	 * detach_dmb() - Detach the ghost buffer from a remote dmb
+	 * @dev: ism device
+	 * @token: dmb token of the remote dmb
+	 * Return: Zero upon success, Error code otherwise
+	 */
+	int (*detach_dmb)(struct ism_dev *dev, u64 token);
 };
 
 /* Unless we gain unexpected popularity, this limit should hold for a while */
 #define MAX_CLIENTS		8
 #define NO_CLIENT		0xff		/* must be >= MAX_CLIENTS */
 #define ISM_NR_DMBS		1920
+/* Defined fabric id / CHID for all loopback devices: */
+#define ISM_LO_RESERVED_CHID	0xFFFF
+#define ISM_LO_MAX_DMBS		5000
 
 struct ism_dev {
 	const struct ism_ops *ops;
@@ -259,6 +313,11 @@ static inline void ism_set_priv(struct ism_dev *dev, struct ism_client *client,
 	dev->priv[client->id] = priv;
 }
 
+static inline struct device *ism_get_dev(struct ism_dev *ism)
+{
+	return &ism->dev;
+}
+
 #define ISM_RESERVED_VLANID	0x1FFF
 #define ISM_ERROR	0xFFFF
 
diff --git a/include/net/smc.h b/include/net/smc.h
index 91aab1d44166..7a96ed2ae20c 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -63,8 +63,8 @@ struct smcd_ops {
 
 struct smcd_dev {
 	const struct smcd_ops *ops;
-	void *priv;
-	void *client;
+	struct ism_dev *ism;
+	struct ism_client *client;
 	struct list_head list;
 	spinlock_t lock;
 	struct smc_connection **conn;
diff --git a/net/ism/Kconfig b/net/ism/Kconfig
index 4329489cc1e9..ac7a9ba7c792 100644
--- a/net/ism/Kconfig
+++ b/net/ism/Kconfig
@@ -12,3 +12,16 @@ config ISM
 
 	  To compile as a module choose M. The module name is ism.
 	  If unsure, choose N.
+
+config ISM_LO
+	bool "intra-OS shortcut with loopback-ism"
+	depends on ISM
+	default n
+	help
+	  ISM_LO enables the creation of an Emulated-ISM device named
+	  loopback-ism which can be used for transferring data
+	  when communication occurs within the same OS. This helps in
+	  convenient testing of ISM clients, since loopback-ism is
+	  independent of architecture or hardware.
+
+	  if unsure, say N.
diff --git a/net/ism/Makefile b/net/ism/Makefile
index b752baf72003..5e7c51845862 100644
--- a/net/ism/Makefile
+++ b/net/ism/Makefile
@@ -5,3 +5,4 @@
 
 ism-y += ism_main.o
 obj-$(CONFIG_ISM) += ism.o
+ism-$(CONFIG_ISM_LO) += ism_loopback.o
\ No newline at end of file
diff --git a/net/ism/ism_loopback.c b/net/ism/ism_loopback.c
new file mode 100644
index 000000000000..47e5ef355dd7
--- /dev/null
+++ b/net/ism/ism_loopback.c
@@ -0,0 +1,366 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Functions for loopback-ism device.
+ *
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: Wen Gu <guwen@linux.alibaba.com>
+ *          Tony Lu <tonylu@linux.alibaba.com>
+ *
+ */
+
+#include <linux/bitops.h>
+#include <linux/device.h>
+#include <linux/ism.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+#include "ism_loopback.h"
+
+#define ISM_LO_V2_CAPABLE	0x1 /* loopback-ism acts as ISMv2 */
+#define ISM_LO_SUPPORT_NOCOPY	0x1
+#define ISM_DMA_ADDR_INVALID	(~(dma_addr_t)0)
+
+static const char ism_lo_dev_name[] = "loopback-ism";
+/* global loopback device */
+static struct ism_lo_dev *lo_dev;
+
+static int ism_lo_query_rgid(struct ism_dev *ism, uuid_t *rgid,
+			     u32 vid_valid, u32 vid)
+{
+	/* rgid should be the same as lgid; vlan is not supported */
+	if (!vid_valid && uuid_equal(rgid, &ism->gid))
+		return 0;
+	return -ENETUNREACH;
+}
+
+static int ism_lo_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb,
+			       struct ism_client *client)
+{
+	struct ism_lo_dmb_node *dmb_node, *tmp_node;
+	struct ism_lo_dev *ldev;
+	unsigned long flags;
+	int sba_idx, rc;
+
+	ldev = container_of(ism, struct ism_lo_dev, ism);
+	sba_idx = dmb->sba_idx;
+	/* check space for new dmb */
+	for_each_clear_bit(sba_idx, ldev->sba_idx_mask, ISM_LO_MAX_DMBS) {
+		if (!test_and_set_bit(sba_idx, ldev->sba_idx_mask))
+			break;
+	}
+	if (sba_idx == ISM_LO_MAX_DMBS)
+		return -ENOSPC;
+
+	dmb_node = kzalloc(sizeof(*dmb_node), GFP_KERNEL);
+	if (!dmb_node) {
+		rc = -ENOMEM;
+		goto err_bit;
+	}
+
+	dmb_node->sba_idx = sba_idx;
+	dmb_node->len = dmb->dmb_len;
+	dmb_node->cpu_addr = kzalloc(dmb_node->len, GFP_KERNEL |
+				     __GFP_NOWARN | __GFP_NORETRY |
+				     __GFP_NOMEMALLOC);
+	if (!dmb_node->cpu_addr) {
+		rc = -ENOMEM;
+		goto err_node;
+	}
+	dmb_node->dma_addr = ISM_DMA_ADDR_INVALID;
+	refcount_set(&dmb_node->refcnt, 1);
+
+again:
+	/* add new dmb into hash table */
+	get_random_bytes(&dmb_node->token, sizeof(dmb_node->token));
+	write_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_node->token) {
+		if (tmp_node->token == dmb_node->token) {
+			write_unlock_bh(&ldev->dmb_ht_lock);
+			goto again;
+		}
+	}
+	hash_add(ldev->dmb_ht, &dmb_node->list, dmb_node->token);
+	write_unlock_bh(&ldev->dmb_ht_lock);
+	atomic_inc(&ldev->dmb_cnt);
+
+	dmb->sba_idx = dmb_node->sba_idx;
+	dmb->dmb_tok = dmb_node->token;
+	dmb->cpu_addr = dmb_node->cpu_addr;
+	dmb->dma_addr = dmb_node->dma_addr;
+	dmb->dmb_len = dmb_node->len;
+
+	spin_lock_irqsave(&ism->lock, flags);
+	ism->sba_client_arr[sba_idx] = client->id;
+	spin_unlock_irqrestore(&ism->lock, flags);
+
+	return 0;
+
+err_node:
+	kfree(dmb_node);
+err_bit:
+	clear_bit(sba_idx, ldev->sba_idx_mask);
+	return rc;
+}
+
+static void __ism_lo_unregister_dmb(struct ism_lo_dev *ldev,
+				    struct ism_lo_dmb_node *dmb_node)
+{
+	/* remove dmb from hash table */
+	write_lock_bh(&ldev->dmb_ht_lock);
+	hash_del(&dmb_node->list);
+	write_unlock_bh(&ldev->dmb_ht_lock);
+
+	clear_bit(dmb_node->sba_idx, ldev->sba_idx_mask);
+	kvfree(dmb_node->cpu_addr);
+	kfree(dmb_node);
+
+	if (atomic_dec_and_test(&ldev->dmb_cnt))
+		wake_up(&ldev->ldev_release);
+}
+
+static int ism_lo_unregister_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
+{
+	struct ism_lo_dmb_node *dmb_node = NULL, *tmp_node;
+	struct ism_lo_dev *ldev;
+	unsigned long flags;
+
+	ldev = container_of(ism, struct ism_lo_dev, ism);
+
+	/* find dmb from hash table */
+	read_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) {
+		if (tmp_node->token == dmb->dmb_tok) {
+			dmb_node = tmp_node;
+			break;
+		}
+	}
+	read_unlock_bh(&ldev->dmb_ht_lock);
+	if (!dmb_node)
+		return -EINVAL;
+
+	if (refcount_dec_and_test(&dmb_node->refcnt)) {
+		spin_lock_irqsave(&ism->lock, flags);
+		ism->sba_client_arr[dmb_node->sba_idx] = NO_CLIENT;
+		spin_unlock_irqrestore(&ism->lock, flags);
+
+		__ism_lo_unregister_dmb(ldev, dmb_node);
+	}
+	return 0;
+}
+
+static int ism_lo_support_dmb_nocopy(struct ism_dev *ism)
+{
+	return ISM_LO_SUPPORT_NOCOPY;
+}
+
+static int ism_lo_attach_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
+{
+	struct ism_lo_dmb_node *dmb_node = NULL, *tmp_node;
+	struct ism_lo_dev *ldev;
+
+	ldev = container_of(ism, struct ism_lo_dev, ism);
+
+	/* find dmb_node according to dmb->dmb_tok */
+	read_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) {
+		if (tmp_node->token == dmb->dmb_tok) {
+			dmb_node = tmp_node;
+			break;
+		}
+	}
+	if (!dmb_node) {
+		read_unlock_bh(&ldev->dmb_ht_lock);
+		return -EINVAL;
+	}
+	read_unlock_bh(&ldev->dmb_ht_lock);
+
+	if (!refcount_inc_not_zero(&dmb_node->refcnt))
+		/* the dmb is being unregistered, but has
+		 * not been removed from the hash table.
+		 */
+		return -EINVAL;
+
+	/* provide dmb information */
+	dmb->sba_idx = dmb_node->sba_idx;
+	dmb->dmb_tok = dmb_node->token;
+	dmb->cpu_addr = dmb_node->cpu_addr;
+	dmb->dma_addr = dmb_node->dma_addr;
+	dmb->dmb_len = dmb_node->len;
+	return 0;
+}
+
+static int ism_lo_detach_dmb(struct ism_dev *ism, u64 token)
+{
+	struct ism_lo_dmb_node *dmb_node = NULL, *tmp_node;
+	struct ism_lo_dev *ldev;
+
+	ldev = container_of(ism, struct ism_lo_dev, ism);
+
+	/* find dmb_node according to dmb->dmb_tok */
+	read_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, token) {
+		if (tmp_node->token == token) {
+			dmb_node = tmp_node;
+			break;
+		}
+	}
+	if (!dmb_node) {
+		read_unlock_bh(&ldev->dmb_ht_lock);
+		return -EINVAL;
+	}
+	read_unlock_bh(&ldev->dmb_ht_lock);
+
+	if (refcount_dec_and_test(&dmb_node->refcnt))
+		__ism_lo_unregister_dmb(ldev, dmb_node);
+	return 0;
+}
+
+static int ism_lo_move_data(struct ism_dev *ism, u64 dmb_tok,
+			    unsigned int idx, bool sf, unsigned int offset,
+			    void *data, unsigned int size)
+{
+	struct ism_lo_dmb_node *rmb_node = NULL, *tmp_node;
+	struct ism_lo_dev *ldev;
+	u16 s_mask;
+	u8 client_id;
+	u32 sba_idx;
+
+	ldev = container_of(ism, struct ism_lo_dev, ism);
+
+	if (!sf)
+		/* since sndbuf is merged with peer DMB, there is
+		 * no need to copy data from sndbuf to peer DMB.
+		 */
+		return 0;
+
+	read_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
+		if (tmp_node->token == dmb_tok) {
+			rmb_node = tmp_node;
+			break;
+		}
+	}
+	if (!rmb_node) {
+		read_unlock_bh(&ldev->dmb_ht_lock);
+		return -EINVAL;
+	}
+	// So why copy the data now?? SMC usecase? Data buffer is attached,
+	// rw-pointer are not attached?
+	memcpy((char *)rmb_node->cpu_addr + offset, data, size);
+	sba_idx = rmb_node->sba_idx;
+	read_unlock_bh(&ldev->dmb_ht_lock);
+
+	spin_lock(&ism->lock);
+	client_id = ism->sba_client_arr[sba_idx];
+	s_mask = ror16(0x1000, idx);
+	if (likely(client_id != NO_CLIENT && ism->subs[client_id]))
+		ism->subs[client_id]->handle_irq(ism, sba_idx, s_mask);
+	spin_unlock(&ism->lock);
+
+	return 0;
+}
+
+static int ism_lo_supports_v2(void)
+{
+	return ISM_LO_V2_CAPABLE;
+}
+
+static u16 ism_lo_get_chid(struct ism_dev *ism)
+{
+	return ISM_LO_RESERVED_CHID;
+}
+
+static const struct ism_ops ism_lo_ops = {
+	.query_remote_gid = ism_lo_query_rgid,
+	.register_dmb = ism_lo_register_dmb,
+	.unregister_dmb = ism_lo_unregister_dmb,
+	.support_dmb_nocopy = ism_lo_support_dmb_nocopy,
+	.attach_dmb = ism_lo_attach_dmb,
+	.detach_dmb = ism_lo_detach_dmb,
+	.add_vlan_id = NULL,
+	.del_vlan_id = NULL,
+	.set_vlan_required = NULL,
+	.reset_vlan_required = NULL,
+	.signal_event = NULL,
+	.move_data = ism_lo_move_data,
+	.supports_v2 = ism_lo_supports_v2,
+	.get_chid = ism_lo_get_chid,
+};
+
+static void ism_lo_dev_init(struct ism_lo_dev *ldev)
+{
+	rwlock_init(&ldev->dmb_ht_lock);
+	hash_init(ldev->dmb_ht);
+	atomic_set(&ldev->dmb_cnt, 0);
+	init_waitqueue_head(&ldev->ldev_release);
+}
+
+static void ism_lo_dev_exit(struct ism_lo_dev *ldev)
+{
+	ism_dev_unregister(&ldev->ism);
+	if (atomic_read(&ldev->dmb_cnt))
+		wait_event(ldev->ldev_release, !atomic_read(&ldev->dmb_cnt));
+}
+
+static void ism_lo_dev_release(struct device *dev)
+{
+	struct ism_dev *ism;
+	struct ism_lo_dev *ldev;
+
+	ism = container_of(dev, struct ism_dev, dev);
+	ldev = container_of(ism, struct ism_lo_dev, ism);
+
+	kfree(ldev);
+}
+
+static int ism_lo_dev_probe(void)
+{
+	struct ism_lo_dev *ldev;
+	struct ism_dev *ism;
+
+	ldev = kzalloc(sizeof(*ldev), GFP_KERNEL);
+	if (!ldev)
+		return -ENOMEM;
+
+	ism_lo_dev_init(ldev);
+	ism = &ldev->ism;
+	uuid_gen(&ism->gid);
+	ism->ops = &ism_lo_ops;
+
+	ism->sba_client_arr = kzalloc(ISM_LO_MAX_DMBS, GFP_KERNEL);
+	if (!ism->sba_client_arr)
+		return -ENOMEM;
+	memset(ism->sba_client_arr, NO_CLIENT, ISM_LO_MAX_DMBS);
+
+	ism->dev.parent = NULL;
+	ism->dev.release = ism_lo_dev_release;
+	device_initialize(&ism->dev);
+	dev_set_name(&ism->dev, ism_lo_dev_name);
+	// No device_add() for loopback?
+
+	ism_dev_register(ism);
+	lo_dev = ldev;
+	return 0;
+}
+
+static void ism_lo_dev_remove(void)
+{
+	if (!lo_dev)
+		return;
+
+	ism_lo_dev_exit(lo_dev);
+	put_device(&lo_dev->dev); /* device_initialize in ism_lo_dev_probe */
+	//Missing anyhow?:
+	lo_dev = NULL;
+}
+
+int ism_loopback_init(void)
+{
+	return ism_lo_dev_probe();
+}
+
+void ism_loopback_exit(void)
+{
+	ism_lo_dev_remove();
+}
diff --git a/net/ism/ism_loopback.h b/net/ism/ism_loopback.h
new file mode 100644
index 000000000000..b1484b032d11
--- /dev/null
+++ b/net/ism/ism_loopback.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ *  loopback-ism device structure definitions.
+ *
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: Wen Gu <guwen@linux.alibaba.com>
+ *          Tony Lu <tonylu@linux.alibaba.com>
+ *
+ */
+
+#ifndef _ISM_LOOPBACK_H
+#define _ISM_LOOPBACK_H
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/hashtable.h>
+#include <linux/ism.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#if IS_ENABLED(CONFIG_ISM_LO)
+#define ISM_LO_DMBS_HASH_BITS	12
+
+struct ism_lo_dmb_node {
+	struct hlist_node list;
+	u64 token;
+	u32 len;
+	u32 sba_idx;
+	void *cpu_addr;
+	dma_addr_t dma_addr;
+	refcount_t refcnt;
+};
+
+struct ism_lo_dev {
+	struct ism_dev ism;
+	struct device dev;
+	atomic_t dmb_cnt;
+	rwlock_t dmb_ht_lock;
+	DECLARE_BITMAP(sba_idx_mask, ISM_LO_MAX_DMBS);
+	DECLARE_HASHTABLE(dmb_ht, ISM_LO_DMBS_HASH_BITS);
+	wait_queue_head_t ldev_release;
+};
+
+int ism_loopback_init(void);
+void ism_loopback_exit(void);
+#else
+static inline int ism_loopback_init(void)
+{
+	return 0;
+}
+
+static inline void ism_loopback_exit(void)
+{
+}
+#endif
+
+#endif /* _ISM_LOOPBACK_H */
diff --git a/net/ism/ism_main.c b/net/ism/ism_main.c
index 268408dbd691..13edccff45ea 100644
--- a/net/ism/ism_main.c
+++ b/net/ism/ism_main.c
@@ -14,6 +14,8 @@
 #include <linux/err.h>
 #include <linux/ism.h>
 
+#include "ism_loopback.h"
+
 MODULE_DESCRIPTION("Internal Shared Memory class");
 MODULE_LICENSE("GPL");
 
@@ -148,14 +150,21 @@ EXPORT_SYMBOL_GPL(ism_dev_unregister);
 
 static int __init ism_init(void)
 {
+	int rc;
+
 	memset(clients, 0, sizeof(clients));
 	max_client = 0;
 
-	return 0;
+	rc = ism_loopback_init();
+	if (rc)
+		pr_err("%s: ism_loopback_init fails with %d\n", __func__, rc);
+
+	return rc;
 }
 
 static void __exit ism_exit(void)
 {
+	ism_loopback_exit();
 }
 
 module_init(ism_init);
diff --git a/net/smc/Kconfig b/net/smc/Kconfig
index ba5e6a2dd2fd..746be3996768 100644
--- a/net/smc/Kconfig
+++ b/net/smc/Kconfig
@@ -20,16 +20,3 @@ config SMC_DIAG
 	  smcss.
 
 	  if unsure, say Y.
-
-config SMC_LO
-	bool "SMC intra-OS shortcut with loopback-ism"
-	depends on SMC
-	default n
-	help
-	  SMC_LO enables the creation of an Emulated-ISM device named
-	  loopback-ism in SMC and makes use of it for transferring data
-	  when communication occurs within the same OS. This helps in
-	  convenient testing of SMC-D since loopback-ism is independent
-	  of architecture or hardware.
-
-	  if unsure, say N.
diff --git a/net/smc/Makefile b/net/smc/Makefile
index 60f1c87d5212..0e754cbc38f9 100644
--- a/net/smc/Makefile
+++ b/net/smc/Makefile
@@ -6,4 +6,3 @@ smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o
 smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_ism.o smc_netlink.o smc_stats.o
 smc-y += smc_tracepoint.o smc_inet.o
 smc-$(CONFIG_SYSCTL) += smc_sysctl.o
-smc-$(CONFIG_SMC_LO) += smc_loopback.o
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 9e6c69d18581..b80cae1940e1 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -53,7 +53,6 @@
 #include "smc_stats.h"
 #include "smc_tracepoint.h"
 #include "smc_sysctl.h"
-#include "smc_loopback.h"
 #include "smc_inet.h"
 
 static DEFINE_MUTEX(smc_server_lgr_pending);	/* serialize link group
@@ -3560,16 +3559,10 @@ static int __init smc_init(void)
 		goto out_sock;
 	}
 
-	rc = smc_loopback_init();
-	if (rc) {
-		pr_err("%s: smc_loopback_init fails with %d\n", __func__, rc);
-		goto out_ib;
-	}
-
 	rc = tcp_register_ulp(&smc_ulp_ops);
 	if (rc) {
 		pr_err("%s: tcp_ulp_register fails with %d\n", __func__, rc);
-		goto out_lo;
+		goto out_ib;
 	}
 	rc = smc_inet_init();
 	if (rc) {
@@ -3580,8 +3573,6 @@ static int __init smc_init(void)
 	return 0;
 out_ulp:
 	tcp_unregister_ulp(&smc_ulp_ops);
-out_lo:
-	smc_loopback_exit();
 out_ib:
 	smc_ib_unregister_client();
 out_sock:
@@ -3620,7 +3611,6 @@ static void __exit smc_exit(void)
 	tcp_unregister_ulp(&smc_ulp_ops);
 	sock_unregister(PF_SMC);
 	smc_core_exit();
-	smc_loopback_exit();
 	smc_ib_unregister_client();
 	smc_ism_exit();
 	destroy_workqueue(smc_close_wq);
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index a49da16bafd5..22c1cfb2ad09 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -302,7 +302,7 @@ static int smc_nl_handle_smcd_dev(struct smcd_dev *smcd,
 	int use_cnt = 0;
 	void *nlh;
 
-	ism = smcd->priv;
+	ism = smcd->ism;
 	nlh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			  &smc_gen_nl_family, NLM_F_MULTI,
 			  SMC_NETLINK_GET_DEV_SMCD);
@@ -453,23 +453,24 @@ static void smc_ism_event_work(struct work_struct *work)
 	kfree(wrk);
 }
 
-static struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name,
-				       const struct smcd_ops *ops, int max_dmbs)
+static struct smcd_dev *smcd_alloc_dev(const char *name,
+				       const struct smcd_ops *ops,
+				       int max_dmbs)
 {
 	struct smcd_dev *smcd;
 
-	smcd = devm_kzalloc(parent, sizeof(*smcd), GFP_KERNEL);
+	smcd = kzalloc(sizeof(*smcd), GFP_KERNEL);
 	if (!smcd)
 		return NULL;
-	smcd->conn = devm_kcalloc(parent, max_dmbs,
-				  sizeof(struct smc_connection *), GFP_KERNEL);
+	smcd->conn = kcalloc(max_dmbs, sizeof(struct smc_connection *),
+			     GFP_KERNEL);
 	if (!smcd->conn)
-		return NULL;
+		goto free_smcd;
 
 	smcd->event_wq = alloc_ordered_workqueue("ism_evt_wq-%s)",
 						 WQ_MEM_RECLAIM, name);
 	if (!smcd->event_wq)
-		return NULL;
+		goto free_conn;
 
 	smcd->ops = ops;
 
@@ -479,12 +480,18 @@ static struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name,
 	INIT_LIST_HEAD(&smcd->lgr_list);
 	init_waitqueue_head(&smcd->lgrs_deleted);
 	return smcd;
+
+free_conn:
+	kfree(smcd->conn);
+free_smcd:
+	kfree(smcd);
+	return NULL;
 }
 
 static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
 			   u32 vid_valid, u32 vid)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 	uuid_t ism_rgid;
 
 	copy_to_ismgid(&ism_rgid, rgid);
@@ -494,42 +501,42 @@ static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
 static int smcd_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb,
 			     void *client)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->register_dmb(ism, dmb, (struct ism_client *)client);
 }
 
 static int smcd_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->unregister_dmb(ism, dmb);
 }
 
 static int smcd_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->add_vlan_id(ism, vlan_id);
 }
 
 static int smcd_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->del_vlan_id(ism, vlan_id);
 }
 
 static int smcd_set_vlan_required(struct smcd_dev *smcd)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->set_vlan_required(ism);
 }
 
 static int smcd_reset_vlan_required(struct smcd_dev *smcd)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->reset_vlan_required(ism);
 }
@@ -537,7 +544,7 @@ static int smcd_reset_vlan_required(struct smcd_dev *smcd)
 static int smcd_signal_ieq(struct smcd_dev *smcd, struct smcd_gid *rgid,
 			   u32 trigger_irq, u32 event_code, u64 info)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 	uuid_t ism_rgid;
 
 	copy_to_ismgid(&ism_rgid, rgid);
@@ -549,7 +556,7 @@ static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx,
 		     bool sf, unsigned int offset, void *data,
 		     unsigned int size)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->move_data(ism, dmb_tok, idx, sf, offset, data, size);
 }
@@ -562,23 +569,21 @@ static int smcd_supports_v2(void)
 static void smcd_get_local_gid(struct smcd_dev *smcd,
 			       struct smcd_gid *smcd_gid)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	copy_to_smcdgid(smcd_gid, &ism->gid);
 }
 
 static u16 smcd_get_chid(struct smcd_dev *smcd)
 {
-	struct ism_dev *ism = smcd->priv;
+	struct ism_dev *ism = smcd->ism;
 
 	return ism->ops->get_chid(ism);
 }
 
 static inline struct device *smcd_get_dev(struct smcd_dev *dev)
 {
-	struct ism_dev *ism = dev->priv;
-
-	return &ism->dev;
+	return ism_get_dev(dev->ism);
 }
 
 static const struct smcd_ops ism_smcd_ops = {
@@ -597,22 +602,65 @@ static const struct smcd_ops ism_smcd_ops = {
 	.get_dev = smcd_get_dev,
 };
 
+static inline int smcd_support_dmb_nocopy(struct smcd_dev *smcd)
+{
+	struct ism_dev *ism = smcd->ism;
+
+	return ism->ops->support_dmb_nocopy(ism);
+}
+
+static inline int smcd_attach_dmb(struct smcd_dev *smcd,
+				  struct ism_dmb *dmb)
+{
+	struct ism_dev *ism = smcd->ism;
+
+	return ism->ops->attach_dmb(ism, dmb);
+}
+
+static inline int smcd_detach_dmb(struct smcd_dev *smcd, u64 token)
+{
+	struct ism_dev *ism = smcd->ism;
+
+	return ism->ops->detach_dmb(ism, token);
+}
+
+static const struct smcd_ops lo_ops = {
+	.query_remote_gid = smcd_query_rgid,
+	.register_dmb = smcd_register_dmb,
+	.unregister_dmb = smcd_unregister_dmb,
+	.support_dmb_nocopy = smcd_support_dmb_nocopy,
+	.attach_dmb = smcd_attach_dmb,
+	.detach_dmb = smcd_detach_dmb,
+	.move_data = smcd_move,
+	.supports_v2 = smcd_supports_v2,
+	.get_local_gid = smcd_get_local_gid,
+	.get_chid = smcd_get_chid,
+	.get_dev = smcd_get_dev,
+};
+
 static void smcd_register_dev(struct ism_dev *ism)
 {
-	const struct smcd_ops *ops = &ism_smcd_ops;
+	const struct smcd_ops *ops;
 	struct smcd_dev *smcd, *fentry;
+	int max_dmbs;
 
-	if (!ops)
-		return;
+	if (ism->ops->get_chid(ism) == ISM_LO_RESERVED_CHID) {
+		max_dmbs = ISM_LO_MAX_DMBS;
+		ops = &lo_ops;
+	} else {
+		max_dmbs = ISM_NR_DMBS;
+		ops = &ism_smcd_ops;
+	}
 
-	smcd = smcd_alloc_dev(&ism->pdev->dev, dev_name(&ism->pdev->dev), ops,
-			      ISM_NR_DMBS);
+	smcd = smcd_alloc_dev(dev_name(&ism->dev), ops, max_dmbs);
 	if (!smcd)
 		return;
-	smcd->priv = ism;
+
+	smcd->ism = ism;
 	smcd->client = &smc_ism_client;
 	ism_set_priv(ism, &smc_ism_client, smcd);
-	if (smc_pnetid_by_dev_port(&ism->pdev->dev, 0, smcd->pnetid))
+
+	if (smc_pnetid_by_dev_port(ism->dev.parent, 0, smcd->pnetid))
 		smc_pnetid_by_table_smcd(smcd);
 
 	if (ism->ops->supports_v2())
@@ -653,6 +701,8 @@ static void smcd_unregister_dev(struct ism_dev *ism)
 	list_del_init(&smcd->list);
 	mutex_unlock(&smcd_dev_list.mutex);
 	destroy_workqueue(smcd->event_wq);
+	kfree(smcd->conn);
+	kfree(smcd);
 }
 
 /* SMCD Device event handler. Called from ISM device interrupt handler.
diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
deleted file mode 100644
index c4020653ae20..000000000000
--- a/net/smc/smc_loopback.c
+++ /dev/null
@@ -1,427 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- *  Shared Memory Communications Direct over loopback-ism device.
- *
- *  Functions for loopback-ism device.
- *
- *  Copyright (c) 2024, Alibaba Inc.
- *
- *  Author: Wen Gu <guwen@linux.alibaba.com>
- *          Tony Lu <tonylu@linux.alibaba.com>
- *
- */
-
-#include <linux/device.h>
-#include <linux/types.h>
-#include <net/smc.h>
-
-#include "smc_cdc.h"
-#include "smc_ism.h"
-#include "smc_loopback.h"
-
-#define SMC_LO_V2_CAPABLE	0x1 /* loopback-ism acts as ISMv2 */
-#define SMC_LO_SUPPORT_NOCOPY	0x1
-#define SMC_DMA_ADDR_INVALID	(~(dma_addr_t)0)
-
-static const char smc_lo_dev_name[] = "loopback-ism";
-static struct smc_lo_dev *lo_dev;
-
-static void smc_lo_generate_ids(struct smc_lo_dev *ldev)
-{
-	struct smcd_gid *lgid = &ldev->local_gid;
-	uuid_t uuid;
-
-	uuid_gen(&uuid);
-	memcpy(&lgid->gid, &uuid, sizeof(lgid->gid));
-	memcpy(&lgid->gid_ext, (u8 *)&uuid + sizeof(lgid->gid),
-	       sizeof(lgid->gid_ext));
-
-	ldev->chid = SMC_LO_RESERVED_CHID;
-}
-
-static int smc_lo_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
-			     u32 vid_valid, u32 vid)
-{
-	struct smc_lo_dev *ldev = smcd->priv;
-
-	/* rgid should be the same as lgid */
-	if (!ldev || rgid->gid != ldev->local_gid.gid ||
-	    rgid->gid_ext != ldev->local_gid.gid_ext)
-		return -ENETUNREACH;
-	return 0;
-}
-
-static int smc_lo_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb,
-			       void *client_priv)
-{
-	struct smc_lo_dmb_node *dmb_node, *tmp_node;
-	struct smc_lo_dev *ldev = smcd->priv;
-	int sba_idx, rc;
-
-	/* check space for new dmb */
-	for_each_clear_bit(sba_idx, ldev->sba_idx_mask, SMC_LO_MAX_DMBS) {
-		if (!test_and_set_bit(sba_idx, ldev->sba_idx_mask))
-			break;
-	}
-	if (sba_idx == SMC_LO_MAX_DMBS)
-		return -ENOSPC;
-
-	dmb_node = kzalloc(sizeof(*dmb_node), GFP_KERNEL);
-	if (!dmb_node) {
-		rc = -ENOMEM;
-		goto err_bit;
-	}
-
-	dmb_node->sba_idx = sba_idx;
-	dmb_node->len = dmb->dmb_len;
-	dmb_node->cpu_addr = kzalloc(dmb_node->len, GFP_KERNEL |
-				     __GFP_NOWARN | __GFP_NORETRY |
-				     __GFP_NOMEMALLOC);
-	if (!dmb_node->cpu_addr) {
-		rc = -ENOMEM;
-		goto err_node;
-	}
-	dmb_node->dma_addr = SMC_DMA_ADDR_INVALID;
-	refcount_set(&dmb_node->refcnt, 1);
-
-again:
-	/* add new dmb into hash table */
-	get_random_bytes(&dmb_node->token, sizeof(dmb_node->token));
-	write_lock_bh(&ldev->dmb_ht_lock);
-	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_node->token) {
-		if (tmp_node->token == dmb_node->token) {
-			write_unlock_bh(&ldev->dmb_ht_lock);
-			goto again;
-		}
-	}
-	hash_add(ldev->dmb_ht, &dmb_node->list, dmb_node->token);
-	write_unlock_bh(&ldev->dmb_ht_lock);
-	atomic_inc(&ldev->dmb_cnt);
-
-	dmb->sba_idx = dmb_node->sba_idx;
-	dmb->dmb_tok = dmb_node->token;
-	dmb->cpu_addr = dmb_node->cpu_addr;
-	dmb->dma_addr = dmb_node->dma_addr;
-	dmb->dmb_len = dmb_node->len;
-
-	return 0;
-
-err_node:
-	kfree(dmb_node);
-err_bit:
-	clear_bit(sba_idx, ldev->sba_idx_mask);
-	return rc;
-}
-
-static void __smc_lo_unregister_dmb(struct smc_lo_dev *ldev,
-				    struct smc_lo_dmb_node *dmb_node)
-{
-	/* remove dmb from hash table */
-	write_lock_bh(&ldev->dmb_ht_lock);
-	hash_del(&dmb_node->list);
-	write_unlock_bh(&ldev->dmb_ht_lock);
-
-	clear_bit(dmb_node->sba_idx, ldev->sba_idx_mask);
-	kvfree(dmb_node->cpu_addr);
-	kfree(dmb_node);
-
-	if (atomic_dec_and_test(&ldev->dmb_cnt))
-		wake_up(&ldev->ldev_release);
-}
-
-static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb)
-{
-	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
-	struct smc_lo_dev *ldev = smcd->priv;
-
-	/* find dmb from hash table */
-	read_lock_bh(&ldev->dmb_ht_lock);
-	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) {
-		if (tmp_node->token == dmb->dmb_tok) {
-			dmb_node = tmp_node;
-			break;
-		}
-	}
-	if (!dmb_node) {
-		read_unlock_bh(&ldev->dmb_ht_lock);
-		return -EINVAL;
-	}
-	read_unlock_bh(&ldev->dmb_ht_lock);
-
-	if (refcount_dec_and_test(&dmb_node->refcnt))
-		__smc_lo_unregister_dmb(ldev, dmb_node);
-	return 0;
-}
-
-static int smc_lo_support_dmb_nocopy(struct smcd_dev *smcd)
-{
-	return SMC_LO_SUPPORT_NOCOPY;
-}
-
-static int smc_lo_attach_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb)
-{
-	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
-	struct smc_lo_dev *ldev = smcd->priv;
-
-	/* find dmb_node according to dmb->dmb_tok */
-	read_lock_bh(&ldev->dmb_ht_lock);
-	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) {
-		if (tmp_node->token == dmb->dmb_tok) {
-			dmb_node = tmp_node;
-			break;
-		}
-	}
-	if (!dmb_node) {
-		read_unlock_bh(&ldev->dmb_ht_lock);
-		return -EINVAL;
-	}
-	read_unlock_bh(&ldev->dmb_ht_lock);
-
-	if (!refcount_inc_not_zero(&dmb_node->refcnt))
-		/* the dmb is being unregistered, but has
-		 * not been removed from the hash table.
-		 */
-		return -EINVAL;
-
-	/* provide dmb information */
-	dmb->sba_idx = dmb_node->sba_idx;
-	dmb->dmb_tok = dmb_node->token;
-	dmb->cpu_addr = dmb_node->cpu_addr;
-	dmb->dma_addr = dmb_node->dma_addr;
-	dmb->dmb_len = dmb_node->len;
-	return 0;
-}
-
-static int smc_lo_detach_dmb(struct smcd_dev *smcd, u64 token)
-{
-	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
-	struct smc_lo_dev *ldev = smcd->priv;
-
-	/* find dmb_node according to dmb->dmb_tok */
-	read_lock_bh(&ldev->dmb_ht_lock);
-	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, token) {
-		if (tmp_node->token == token) {
-			dmb_node = tmp_node;
-			break;
-		}
-	}
-	if (!dmb_node) {
-		read_unlock_bh(&ldev->dmb_ht_lock);
-		return -EINVAL;
-	}
-	read_unlock_bh(&ldev->dmb_ht_lock);
-
-	if (refcount_dec_and_test(&dmb_node->refcnt))
-		__smc_lo_unregister_dmb(ldev, dmb_node);
-	return 0;
-}
-
-static int smc_lo_move_data(struct smcd_dev *smcd, u64 dmb_tok,
-			    unsigned int idx, bool sf, unsigned int offset,
-			    void *data, unsigned int size)
-{
-	struct smc_lo_dmb_node *rmb_node = NULL, *tmp_node;
-	struct smc_lo_dev *ldev = smcd->priv;
-	struct smc_connection *conn;
-
-	if (!sf)
-		/* since sndbuf is merged with peer DMB, there is
-		 * no need to copy data from sndbuf to peer DMB.
-		 */
-		return 0;
-
-	read_lock_bh(&ldev->dmb_ht_lock);
-	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
-		if (tmp_node->token == dmb_tok) {
-			rmb_node = tmp_node;
-			break;
-		}
-	}
-	if (!rmb_node) {
-		read_unlock_bh(&ldev->dmb_ht_lock);
-		return -EINVAL;
-	}
-	memcpy((char *)rmb_node->cpu_addr + offset, data, size);
-	read_unlock_bh(&ldev->dmb_ht_lock);
-
-	conn = smcd->conn[rmb_node->sba_idx];
-	if (!conn || conn->killed)
-		return -EPIPE;
-	tasklet_schedule(&conn->rx_tsklet);
-	return 0;
-}
-
-static int smc_lo_supports_v2(void)
-{
-	return SMC_LO_V2_CAPABLE;
-}
-
-static void smc_lo_get_local_gid(struct smcd_dev *smcd,
-				 struct smcd_gid *smcd_gid)
-{
-	struct smc_lo_dev *ldev = smcd->priv;
-
-	smcd_gid->gid = ldev->local_gid.gid;
-	smcd_gid->gid_ext = ldev->local_gid.gid_ext;
-}
-
-static u16 smc_lo_get_chid(struct smcd_dev *smcd)
-{
-	return ((struct smc_lo_dev *)smcd->priv)->chid;
-}
-
-static struct device *smc_lo_get_dev(struct smcd_dev *smcd)
-{
-	return &((struct smc_lo_dev *)smcd->priv)->dev;
-}
-
-static const struct smcd_ops lo_ops = {
-	.query_remote_gid = smc_lo_query_rgid,
-	.register_dmb = smc_lo_register_dmb,
-	.unregister_dmb = smc_lo_unregister_dmb,
-	.support_dmb_nocopy = smc_lo_support_dmb_nocopy,
-	.attach_dmb = smc_lo_attach_dmb,
-	.detach_dmb = smc_lo_detach_dmb,
-	.add_vlan_id		= NULL,
-	.del_vlan_id		= NULL,
-	.set_vlan_required	= NULL,
-	.reset_vlan_required	= NULL,
-	.signal_event		= NULL,
-	.move_data = smc_lo_move_data,
-	.supports_v2 = smc_lo_supports_v2,
-	.get_local_gid = smc_lo_get_local_gid,
-	.get_chid = smc_lo_get_chid,
-	.get_dev = smc_lo_get_dev,
-};
-
-static struct smcd_dev *smcd_lo_alloc_dev(const struct smcd_ops *ops,
-					  int max_dmbs)
-{
-	struct smcd_dev *smcd;
-
-	smcd = kzalloc(sizeof(*smcd), GFP_KERNEL);
-	if (!smcd)
-		return NULL;
-
-	smcd->conn = kcalloc(max_dmbs, sizeof(struct smc_connection *),
-			     GFP_KERNEL);
-	if (!smcd->conn)
-		goto out_smcd;
-
-	smcd->ops = ops;
-
-	spin_lock_init(&smcd->lock);
-	spin_lock_init(&smcd->lgr_lock);
-	INIT_LIST_HEAD(&smcd->vlan);
-	INIT_LIST_HEAD(&smcd->lgr_list);
-	init_waitqueue_head(&smcd->lgrs_deleted);
-	return smcd;
-
-out_smcd:
-	kfree(smcd);
-	return NULL;
-}
-
-static int smcd_lo_register_dev(struct smc_lo_dev *ldev)
-{
-	struct smcd_dev *smcd;
-
-	smcd = smcd_lo_alloc_dev(&lo_ops, SMC_LO_MAX_DMBS);
-	if (!smcd)
-		return -ENOMEM;
-	ldev->smcd = smcd;
-	smcd->priv = ldev;
-	smc_ism_set_v2_capable();
-	mutex_lock(&smcd_dev_list.mutex);
-	list_add(&smcd->list, &smcd_dev_list.list);
-	mutex_unlock(&smcd_dev_list.mutex);
-	pr_warn_ratelimited("smc: adding smcd device %s\n",
-			    dev_name(&ldev->dev));
-	return 0;
-}
-
-static void smcd_lo_unregister_dev(struct smc_lo_dev *ldev)
-{
-	struct smcd_dev *smcd = ldev->smcd;
-
-	pr_warn_ratelimited("smc: removing smcd device %s\n",
-			    dev_name(&ldev->dev));
-	smcd->going_away = 1;
-	smc_smcd_terminate_all(smcd);
-	mutex_lock(&smcd_dev_list.mutex);
-	list_del_init(&smcd->list);
-	mutex_unlock(&smcd_dev_list.mutex);
-	kfree(smcd->conn);
-	kfree(smcd);
-}
-
-static int smc_lo_dev_init(struct smc_lo_dev *ldev)
-{
-	smc_lo_generate_ids(ldev);
-	rwlock_init(&ldev->dmb_ht_lock);
-	hash_init(ldev->dmb_ht);
-	atomic_set(&ldev->dmb_cnt, 0);
-	init_waitqueue_head(&ldev->ldev_release);
-
-	return smcd_lo_register_dev(ldev);
-}
-
-static void smc_lo_dev_exit(struct smc_lo_dev *ldev)
-{
-	smcd_lo_unregister_dev(ldev);
-	if (atomic_read(&ldev->dmb_cnt))
-		wait_event(ldev->ldev_release, !atomic_read(&ldev->dmb_cnt));
-}
-
-static void smc_lo_dev_release(struct device *dev)
-{
-	struct smc_lo_dev *ldev =
-		container_of(dev, struct smc_lo_dev, dev);
-
-	kfree(ldev);
-}
-
-static int smc_lo_dev_probe(void)
-{
-	struct smc_lo_dev *ldev;
-	int ret;
-
-	ldev = kzalloc(sizeof(*ldev), GFP_KERNEL);
-	if (!ldev)
-		return -ENOMEM;
-
-	ldev->dev.parent = NULL;
-	ldev->dev.release = smc_lo_dev_release;
-	device_initialize(&ldev->dev);
-	dev_set_name(&ldev->dev, smc_lo_dev_name);
-
-	ret = smc_lo_dev_init(ldev);
-	if (ret)
-		goto free_dev;
-
-	lo_dev = ldev; /* global loopback device */
-	return 0;
-
-free_dev:
-	put_device(&ldev->dev);
-	return ret;
-}
-
-static void smc_lo_dev_remove(void)
-{
-	if (!lo_dev)
-		return;
-
-	smc_lo_dev_exit(lo_dev);
-	put_device(&lo_dev->dev); /* device_initialize in smc_lo_dev_probe */
-}
-
-int smc_loopback_init(void)
-{
-	return smc_lo_dev_probe();
-}
-
-void smc_loopback_exit(void)
-{
-	smc_lo_dev_remove();
-}
diff --git a/net/smc/smc_loopback.h b/net/smc/smc_loopback.h
deleted file mode 100644
index 04dc6808d2e1..000000000000
--- a/net/smc/smc_loopback.h
+++ /dev/null
@@ -1,60 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- *  Shared Memory Communications Direct over loopback-ism device.
- *
- *  SMC-D loopback-ism device structure definitions.
- *
- *  Copyright (c) 2024, Alibaba Inc.
- *
- *  Author: Wen Gu <guwen@linux.alibaba.com>
- *          Tony Lu <tonylu@linux.alibaba.com>
- *
- */
-
-#ifndef _SMC_LOOPBACK_H
-#define _SMC_LOOPBACK_H
-
-#include <linux/device.h>
-#include <net/smc.h>
-
-#if IS_ENABLED(CONFIG_SMC_LO)
-#define SMC_LO_MAX_DMBS		5000
-#define SMC_LO_DMBS_HASH_BITS	12
-#define SMC_LO_RESERVED_CHID	0xFFFF
-
-struct smc_lo_dmb_node {
-	struct hlist_node list;
-	u64 token;
-	u32 len;
-	u32 sba_idx;
-	void *cpu_addr;
-	dma_addr_t dma_addr;
-	refcount_t refcnt;
-};
-
-struct smc_lo_dev {
-	struct smcd_dev *smcd;
-	struct device dev;
-	u16 chid;
-	struct smcd_gid local_gid;
-	atomic_t dmb_cnt;
-	rwlock_t dmb_ht_lock;
-	DECLARE_BITMAP(sba_idx_mask, SMC_LO_MAX_DMBS);
-	DECLARE_HASHTABLE(dmb_ht, SMC_LO_DMBS_HASH_BITS);
-	wait_queue_head_t ldev_release;
-};
-
-int smc_loopback_init(void);
-void smc_loopback_exit(void);
-#else
-static inline int smc_loopback_init(void)
-{
-	return 0;
-}
-
-static inline void smc_loopback_exit(void)
-{
-}
-#endif
-
-#endif /* _SMC_LOOPBACK_H */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC net-next 6/7] s390/ism: Define ismvp_dev
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
                   ` (4 preceding siblings ...)
  2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
@ 2025-01-15 19:55 ` Alexandra Winter
  2025-01-15 19:55 ` [RFC net-next 7/7] net/smc: Use only ism_ops Alexandra Winter
  2025-01-16  9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
  7 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

Move the fields that are specific to the s390 ism_vpci driver
out of the generic ism_dev into a local ismvp_dev structure.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 drivers/s390/net/ism.h     | 11 +++++
 drivers/s390/net/ism_drv.c | 87 +++++++++++++++++++++++---------------
 include/linux/ism.h        | 20 +++------
 3 files changed, 71 insertions(+), 47 deletions(-)

diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
index 0deca6d0e328..720a783ebf90 100644
--- a/drivers/s390/net/ism.h
+++ b/drivers/s390/net/ism.h
@@ -196,6 +196,17 @@ struct ism_sba {
 	u16 dmbe_mask[ISM_NR_DMBS];
 };
 
+struct ismvp_dev {
+	struct ism_dev ism;
+	struct ism_sba *sba;
+	dma_addr_t sba_dma_addr;
+	DECLARE_BITMAP(sba_bitmap, ISM_NR_DMBS);
+
+	struct ism_eq *ieq;
+	dma_addr_t ieq_dma_addr;
+	int ieq_idx;
+};
+
 #define ISM_CREATE_REQ(dmb, idx, sf, offset)		\
 	((dmb) | (idx) << 24 | (sf) << 23 | (offset))
 
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index c0954d6dd9f5..c1fb65db504c 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -84,6 +84,7 @@ static int query_info(struct ism_dev *ism)
 
 static int register_sba(struct ism_dev *ism)
 {
+	struct ismvp_dev *ismvp;
 	union ism_reg_sba cmd;
 	dma_addr_t dma_handle;
 	struct ism_sba *sba;
@@ -103,14 +104,16 @@ static int register_sba(struct ism_dev *ism)
 		return -EIO;
 	}
 
-	ism->sba = sba;
-	ism->sba_dma_addr = dma_handle;
+	ismvp = container_of(ism, struct ismvp_dev, ism);
+	ismvp->sba = sba;
+	ismvp->sba_dma_addr = dma_handle;
 
 	return 0;
 }
 
 static int register_ieq(struct ism_dev *ism)
 {
+	struct ismvp_dev *ismvp = container_of(ism, struct ismvp_dev, ism);
 	union ism_reg_ieq cmd;
 	dma_addr_t dma_handle;
 	struct ism_eq *ieq;
@@ -131,18 +134,19 @@ static int register_ieq(struct ism_dev *ism)
 		return -EIO;
 	}
 
-	ism->ieq = ieq;
-	ism->ieq_idx = -1;
-	ism->ieq_dma_addr = dma_handle;
+	ismvp->ieq = ieq;
+	ismvp->ieq_idx = -1;
+	ismvp->ieq_dma_addr = dma_handle;
 
 	return 0;
 }
 
 static int unregister_sba(struct ism_dev *ism)
 {
+	struct ismvp_dev *ismvp = container_of(ism, struct ismvp_dev, ism);
 	int ret;
 
-	if (!ism->sba)
+	if (!ismvp->sba)
 		return 0;
 
 	ret = ism_cmd_simple(ism, ISM_UNREG_SBA);
@@ -150,19 +154,20 @@ static int unregister_sba(struct ism_dev *ism)
 		return -EIO;
 
 	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
-			  ism->sba, ism->sba_dma_addr);
+			  ismvp->sba, ismvp->sba_dma_addr);
 
-	ism->sba = NULL;
-	ism->sba_dma_addr = 0;
+	ismvp->sba = NULL;
+	ismvp->sba_dma_addr = 0;
 
 	return 0;
 }
 
 static int unregister_ieq(struct ism_dev *ism)
 {
+	struct ismvp_dev *ismvp = container_of(ism, struct ismvp_dev, ism);
 	int ret;
 
-	if (!ism->ieq)
+	if (!ismvp->ieq)
 		return 0;
 
 	ret = ism_cmd_simple(ism, ISM_UNREG_IEQ);
@@ -170,10 +175,10 @@ static int unregister_ieq(struct ism_dev *ism)
 		return -EIO;
 
 	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
-			  ism->ieq, ism->ieq_dma_addr);
+			  ismvp->ieq, ismvp->ieq_dma_addr);
 
-	ism->ieq = NULL;
-	ism->ieq_dma_addr = 0;
+	ismvp->ieq = NULL;
+	ismvp->ieq_dma_addr = 0;
 
 	return 0;
 }
@@ -215,7 +220,9 @@ static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
 
 static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 {
-	clear_bit(dmb->sba_idx, ism->sba_bitmap);
+	struct ismvp_dev *ismvp = container_of(ism, struct ismvp_dev, ism);
+
+	clear_bit(dmb->sba_idx, ismvp->sba_bitmap);
 	dma_unmap_page(ism->dev.parent, dmb->dma_addr, dmb->dmb_len,
 		       DMA_FROM_DEVICE);
 	folio_put(virt_to_folio(dmb->cpu_addr));
@@ -223,6 +230,7 @@ static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 
 static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 {
+	struct ismvp_dev *ismvp = container_of(ism, struct ismvp_dev, ism);
 	struct folio *folio;
 	unsigned long bit;
 	int rc;
@@ -231,7 +239,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 		return -EINVAL;
 
 	if (!dmb->sba_idx) {
-		bit = find_next_zero_bit(ism->sba_bitmap, ISM_NR_DMBS,
+		bit = find_next_zero_bit(ismvp->sba_bitmap, ISM_NR_DMBS,
 					 ISM_DMB_BIT_OFFSET);
 		if (bit == ISM_NR_DMBS)
 			return -ENOSPC;
@@ -239,7 +247,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 		dmb->sba_idx = bit;
 	}
 	if (dmb->sba_idx < ISM_DMB_BIT_OFFSET ||
-	    test_and_set_bit(dmb->sba_idx, ism->sba_bitmap))
+	    test_and_set_bit(dmb->sba_idx, ismvp->sba_bitmap))
 		return -EINVAL;
 
 	folio = folio_alloc(GFP_KERNEL | __GFP_NOWARN | __GFP_NOMEMALLOC |
@@ -264,7 +272,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 out_free:
 	kfree(dmb->cpu_addr);
 out_bit:
-	clear_bit(dmb->sba_idx, ism->sba_bitmap);
+	clear_bit(dmb->sba_idx, ismvp->sba_bitmap);
 	return rc;
 }
 
@@ -424,15 +432,16 @@ static u16 ism_get_chid(struct ism_dev *ism)
 
 static void ism_handle_event(struct ism_dev *ism)
 {
+	struct ismvp_dev *ismvp = container_of(ism, struct ismvp_dev, ism);
 	struct ism_event *entry;
 	struct ism_client *clt;
 	int i;
 
-	while ((ism->ieq_idx + 1) != READ_ONCE(ism->ieq->header.idx)) {
-		if (++(ism->ieq_idx) == ARRAY_SIZE(ism->ieq->entry))
-			ism->ieq_idx = 0;
+	while ((ismvp->ieq_idx + 1) != READ_ONCE(ismvp->ieq->header.idx)) {
+		if (++ismvp->ieq_idx == ARRAY_SIZE(ismvp->ieq->entry))
+			ismvp->ieq_idx = 0;
 
-		entry = &ism->ieq->entry[ism->ieq_idx];
+		entry = &ismvp->ieq->entry[ismvp->ieq_idx];
 		debug_event(ism_debug_info, 2, entry, sizeof(*entry));
 		for (i = 0; i < MAX_CLIENTS; ++i) {
 			clt = ism->subs[i];
@@ -445,16 +454,19 @@ static void ism_handle_event(struct ism_dev *ism)
 static irqreturn_t ism_handle_irq(int irq, void *data)
 {
 	struct ism_dev *ism = data;
+	struct ismvp_dev *ismvp;
 	unsigned long bit, end;
 	unsigned long *bv;
 	u16 dmbemask;
 	u8 client_id;
 
-	bv = (void *) &ism->sba->dmb_bits[ISM_DMB_WORD_OFFSET];
-	end = sizeof(ism->sba->dmb_bits) * BITS_PER_BYTE - ISM_DMB_BIT_OFFSET;
+	ismvp = container_of(ism, struct ismvp_dev, ism);
+
+	bv = (void *)&ismvp->sba->dmb_bits[ISM_DMB_WORD_OFFSET];
+	end = sizeof(ismvp->sba->dmb_bits) * BITS_PER_BYTE - ISM_DMB_BIT_OFFSET;
 
 	spin_lock(&ism->lock);
-	ism->sba->s = 0;
+	ismvp->sba->s = 0;
 	barrier();
 	for (bit = 0;;) {
 		bit = find_next_bit_inv(bv, end, bit);
@@ -462,8 +474,8 @@ static irqreturn_t ism_handle_irq(int irq, void *data)
 			break;
 
 		clear_bit_inv(bit, bv);
-		dmbemask = ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET];
-		ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET] = 0;
+		dmbemask = ismvp->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET];
+		ismvp->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET] = 0;
 		barrier();
 		client_id = ism->sba_client_arr[bit];
 		if (unlikely(client_id == NO_CLIENT || !ism->subs[client_id]))
@@ -471,8 +483,8 @@ static irqreturn_t ism_handle_irq(int irq, void *data)
 		ism->subs[client_id]->handle_irq(ism, bit + ISM_DMB_BIT_OFFSET, dmbemask);
 	}
 
-	if (ism->sba->e) {
-		ism->sba->e = 0;
+	if (ismvp->sba->e) {
+		ismvp->sba->e = 0;
 		barrier();
 		ism_handle_event(ism);
 	}
@@ -480,7 +492,7 @@ static irqreturn_t ism_handle_irq(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-static const struct ism_ops ism_vp_ops = {
+static const struct ism_ops ismvp_ops = {
 	.query_remote_gid = ism_query_rgid,
 	.register_dmb = ism_register_dmb,
 	.unregister_dmb = ism_unregister_dmb,
@@ -531,7 +543,7 @@ static int ism_dev_init(struct ism_dev *ism)
 	else
 		ism_v2_capable = false;
 
-	ism->ops = &ism_vp_ops;
+	ism->ops = &ismvp_ops;
 
 	ism_dev_register(ism);
 	query_info(ism);
@@ -553,12 +565,14 @@ static int ism_dev_init(struct ism_dev *ism)
 
 static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
+	struct ismvp_dev *ismvp;
 	struct ism_dev *ism;
 	int ret;
 
-	ism = kzalloc(sizeof(*ism), GFP_KERNEL);
-	if (!ism)
+	ismvp = kzalloc(sizeof(*ismvp), GFP_KERNEL);
+	if (!ismvp)
 		return -ENOMEM;
+	ism = &ismvp->ism;
 
 	spin_lock_init(&ism->lock);
 	dev_set_drvdata(&pdev->dev, ism);
@@ -599,6 +613,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	device_del(&ism->dev);
 err_dev:
 	dev_set_drvdata(&pdev->dev, NULL);
+	kfree(ismvp);
 
 	return ret;
 }
@@ -627,7 +642,11 @@ static void ism_dev_exit(struct ism_dev *ism)
 
 static void ism_remove(struct pci_dev *pdev)
 {
-	struct ism_dev *ism = dev_get_drvdata(&pdev->dev);
+	struct ismvp_dev *ismvp;
+	struct ism_dev *ism;
+
+	ism = dev_get_drvdata(&pdev->dev);
+	ismvp = container_of(ism, struct ismvp_dev, ism);
 
 	ism_dev_exit(ism);
 
@@ -635,7 +654,7 @@ static void ism_remove(struct pci_dev *pdev)
 	pci_disable_device(pdev);
 	device_del(&ism->dev);
 	dev_set_drvdata(&pdev->dev, NULL);
-	kfree(ism);
+	kfree(ismvp);
 }
 
 static struct pci_driver ism_driver = {
diff --git a/include/linux/ism.h b/include/linux/ism.h
index 929a1f275419..f28238fb5d74 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -281,24 +281,18 @@ struct ism_ops {
 
 struct ism_dev {
 	const struct ism_ops *ops;
-	spinlock_t lock; /* protects the ism device */
 	struct list_head list;
-	struct pci_dev *pdev;
-
-	struct ism_sba *sba;
-	dma_addr_t sba_dma_addr;
-	DECLARE_BITMAP(sba_bitmap, ISM_NR_DMBS);
-	u8 *sba_client_arr;	/* entries are indices into 'clients' array */
-	void *priv[MAX_CLIENTS];
-
-	struct ism_eq *ieq;
-	dma_addr_t ieq_dma_addr;
-
 	struct device dev;
 	uuid_t gid;
-	int ieq_idx;
 
+	/* get this lock before accessing any of the fields below */
+	spinlock_t lock;
+	/* indexed by dmb idx; entries are indices into priv and subs arrays: */
+	u8 *sba_client_arr;
+	/* Sparse array of all ISM clients */
 	struct ism_client *subs[MAX_CLIENTS];
+	/* priv pointer per client; for client usage only */
+	void *priv[MAX_CLIENTS];
 };
 
 int ism_dev_register(struct ism_dev *ism);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC net-next 7/7] net/smc: Use only ism_ops
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
                   ` (5 preceding siblings ...)
  2025-01-15 19:55 ` [RFC net-next 6/7] s390/ism: Define ismvp_dev Alexandra Winter
@ 2025-01-15 19:55 ` Alexandra Winter
  2025-01-16  9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
  7 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-15 19:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, Gerd Bayer, Alexandra Winter,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

Replace smcd_ops by using ism_ops directly.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 include/linux/ism.h |   1 +
 include/net/smc.h   |  30 ------
 net/smc/smc_clc.c   |   6 +-
 net/smc/smc_core.c  |   6 +-
 net/smc/smc_diag.c  |   2 +-
 net/smc/smc_ism.c   | 222 ++++++++------------------------------------
 net/smc/smc_ism.h   |   8 +-
 net/smc/smc_pnet.c  |   8 +-
 8 files changed, 55 insertions(+), 228 deletions(-)

diff --git a/include/linux/ism.h b/include/linux/ism.h
index f28238fb5d74..c11de3931722 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -30,6 +30,7 @@ struct ism_dmb {
 	 */
 	u64 dmb_tok;
 	/* rgid - GID of designated remote sending device */
+	//TODO: Change to uuid_t GID. Ok for now, because loopback ignores it.
 	u64 rgid;
 	u32 dmb_len;
 	/* sba_idx - Index of this DMB on this receiving device */
diff --git a/include/net/smc.h b/include/net/smc.h
index 7a96ed2ae20c..a8235de6cf0a 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -28,43 +28,13 @@ struct smc_hashinfo {
 
 /* SMCD/ISM device driver interface */
 
-struct smcd_dev;
-
 struct smcd_gid {
 	u64	gid;
 	u64	gid_ext;
 };
 
-struct smcd_ops {
-	int (*query_remote_gid)(struct smcd_dev *dev, struct smcd_gid *rgid,
-				u32 vid_valid, u32 vid);
-	int (*register_dmb)(struct smcd_dev *dev, struct ism_dmb *dmb,
-			    void *client);
-	int (*unregister_dmb)(struct smcd_dev *dev, struct ism_dmb *dmb);
-	int (*move_data)(struct smcd_dev *dev, u64 dmb_tok, unsigned int idx,
-			 bool sf, unsigned int offset, void *data,
-			 unsigned int size);
-	int (*supports_v2)(void);
-	void (*get_local_gid)(struct smcd_dev *dev, struct smcd_gid *gid);
-	u16 (*get_chid)(struct smcd_dev *dev);
-	struct device* (*get_dev)(struct smcd_dev *dev);
-
-	/* optional operations */
-	int (*add_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
-	int (*del_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
-	int (*set_vlan_required)(struct smcd_dev *dev);
-	int (*reset_vlan_required)(struct smcd_dev *dev);
-	int (*signal_event)(struct smcd_dev *dev, struct smcd_gid *rgid,
-			    u32 trigger_irq, u32 event_code, u64 info);
-	int (*support_dmb_nocopy)(struct smcd_dev *dev);
-	int (*attach_dmb)(struct smcd_dev *dev, struct ism_dmb *dmb);
-	int (*detach_dmb)(struct smcd_dev *dev, u64 token);
-};
-
 struct smcd_dev {
-	const struct smcd_ops *ops;
 	struct ism_dev *ism;
-	struct ism_client *client;
 	struct list_head list;
 	spinlock_t lock;
 	struct smc_connection **conn;
diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index 33fa787c28eb..b546999f83a4 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -900,7 +900,7 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_init_info *ini)
 		/* add SMC-D specifics */
 		if (ini->ism_dev[0]) {
 			smcd = ini->ism_dev[0];
-			smcd->ops->get_local_gid(smcd, &smcd_gid);
+			copy_to_smcdgid(&smcd_gid, &smcd->ism->gid);
 			pclc_smcd->ism.gid = htonll(smcd_gid.gid);
 			pclc_smcd->ism.chid =
 				htons(smc_ism_get_chid(ini->ism_dev[0]));
@@ -950,7 +950,7 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_init_info *ini)
 		if (ini->ism_offered_cnt) {
 			for (i = 1; i <= ini->ism_offered_cnt; i++) {
 				smcd = ini->ism_dev[i];
-				smcd->ops->get_local_gid(smcd, &smcd_gid);
+				copy_to_smcdgid(&smcd_gid, &smcd->ism->gid);
 				gidchids[entry].chid =
 					htons(smc_ism_get_chid(ini->ism_dev[i]));
 				gidchids[entry].gid = htonll(smcd_gid.gid);
@@ -1043,7 +1043,7 @@ smcd_clc_prep_confirm_accept(struct smc_connection *conn,
 	/* SMC-D specific settings */
 	memcpy(clc->hdr.eyecatcher, SMCD_EYECATCHER,
 	       sizeof(SMCD_EYECATCHER));
-	smcd->ops->get_local_gid(smcd, &smcd_gid);
+	copy_to_smcdgid(&smcd_gid, &smcd->ism->gid);
 	clc->hdr.typev1 = SMC_TYPE_D;
 	clc->d0.gid = htonll(smcd_gid.gid);
 	clc->d0.token = htonll(conn->rmb_desc->token);
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d489b80a4503..dca43edfc6be 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -555,7 +555,7 @@ static int smc_nl_fill_smcd_lgr(struct smc_link_group *lgr,
 
 	if (nla_put_u32(skb, SMC_NLA_LGR_D_ID, *((u32 *)&lgr->id)))
 		goto errattr;
-	smcd->ops->get_local_gid(smcd, &smcd_gid);
+	copy_to_smcdgid(&smcd_gid, &smcd->ism->gid);
 	if (nla_put_u64_64bit(skb, SMC_NLA_LGR_D_GID,
 			      smcd_gid.gid, SMC_NLA_LGR_D_PAD))
 		goto errattr;
@@ -919,7 +919,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
 	if (ini->is_smcd) {
 		/* SMC-D specific settings */
 		smcd = ini->ism_dev[ini->ism_selected];
-		get_device(smcd->ops->get_dev(smcd));
+		get_device(ism_get_dev(smcd->ism));
 		lgr->peer_gid.gid =
 			ini->ism_peer_gid[ini->ism_selected].gid;
 		lgr->peer_gid.gid_ext =
@@ -1469,7 +1469,7 @@ static void smc_lgr_free(struct smc_link_group *lgr)
 	destroy_workqueue(lgr->tx_wq);
 	if (lgr->is_smcd) {
 		smc_ism_put_vlan(lgr->smcd, lgr->vlan_id);
-		put_device(lgr->smcd->ops->get_dev(lgr->smcd));
+		put_device(ism_get_dev(lgr->smcd->ism));
 	}
 	smc_lgr_put(lgr); /* theoretically last lgr_put */
 }
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index 6fdb2d96777a..5e79345108d4 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -175,7 +175,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
 		dinfo.linkid = *((u32 *)conn->lgr->id);
 		dinfo.peer_gid = conn->lgr->peer_gid.gid;
 		dinfo.peer_gid_ext = conn->lgr->peer_gid.gid_ext;
-		smcd->ops->get_local_gid(smcd, &smcd_gid);
+		copy_to_smcdgid(&smcd_gid, &smcd->ism->gid);
 		dinfo.my_gid = smcd_gid.gid;
 		dinfo.my_gid_ext = smcd_gid.gid_ext;
 		dinfo.token = conn->rmb_desc->token;
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index 22c1cfb2ad09..9d14aef52283 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -68,8 +68,12 @@ static void smc_ism_create_system_eid(void)
 int smc_ism_cantalk(struct smcd_gid *peer_gid, unsigned short vlan_id,
 		    struct smcd_dev *smcd)
 {
-	return smcd->ops->query_remote_gid(smcd, peer_gid, vlan_id ? 1 : 0,
-					   vlan_id);
+	struct ism_dev *ism = smcd->ism;
+	uuid_t ism_rgid;
+
+	copy_to_ismgid(&ism_rgid, peer_gid);
+	return ism->ops->query_remote_gid(ism, &ism_rgid, vlan_id ? 1 : 0,
+					  vlan_id);
 }
 
 void smc_ism_get_system_eid(u8 **eid)
@@ -82,7 +86,7 @@ void smc_ism_get_system_eid(u8 **eid)
 
 u16 smc_ism_get_chid(struct smcd_dev *smcd)
 {
-	return smcd->ops->get_chid(smcd);
+	return smcd->ism->ops->get_chid(smcd->ism);
 }
 
 /* HW supports ISM V2 and thus System EID is defined */
@@ -131,7 +135,7 @@ int smc_ism_get_vlan(struct smcd_dev *smcd, unsigned short vlanid)
 
 	if (!vlanid)			/* No valid vlan id */
 		return -EINVAL;
-	if (!smcd->ops->add_vlan_id)
+	if (!smcd->ism->ops->add_vlan_id)
 		return -EOPNOTSUPP;
 
 	/* create new vlan entry, in case we need it */
@@ -154,7 +158,7 @@ int smc_ism_get_vlan(struct smcd_dev *smcd, unsigned short vlanid)
 	/* no existing entry found.
 	 * add new entry to device; might fail, e.g., if HW limit reached
 	 */
-	if (smcd->ops->add_vlan_id(smcd, vlanid)) {
+	if (smcd->ism->ops->add_vlan_id(smcd->ism, vlanid)) {
 		kfree(new_vlan);
 		rc = -EIO;
 		goto out;
@@ -178,7 +182,7 @@ int smc_ism_put_vlan(struct smcd_dev *smcd, unsigned short vlanid)
 
 	if (!vlanid)			/* No valid vlan id */
 		return -EINVAL;
-	if (!smcd->ops->del_vlan_id)
+	if (!smcd->ism->ops->del_vlan_id)
 		return -EOPNOTSUPP;
 
 	spin_lock_irqsave(&smcd->lock, flags);
@@ -196,7 +200,7 @@ int smc_ism_put_vlan(struct smcd_dev *smcd, unsigned short vlanid)
 	}
 
 	/* Found and the last reference just gone */
-	if (smcd->ops->del_vlan_id(smcd, vlanid))
+	if (smcd->ism->ops->del_vlan_id(smcd->ism, vlanid))
 		rc = -EIO;
 	list_del(&vlan->list);
 	kfree(vlan);
@@ -219,7 +223,8 @@ int smc_ism_unregister_dmb(struct smcd_dev *smcd, struct smc_buf_desc *dmb_desc)
 	dmb.cpu_addr = dmb_desc->cpu_addr;
 	dmb.dma_addr = dmb_desc->dma_addr;
 	dmb.dmb_len = dmb_desc->len;
-	rc = smcd->ops->unregister_dmb(smcd, &dmb);
+
+	rc = smcd->ism->ops->unregister_dmb(smcd->ism, &dmb);
 	if (!rc || rc == ISM_ERROR) {
 		dmb_desc->cpu_addr = NULL;
 		dmb_desc->dma_addr = 0;
@@ -231,6 +236,7 @@ int smc_ism_unregister_dmb(struct smcd_dev *smcd, struct smc_buf_desc *dmb_desc)
 int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len,
 			 struct smc_buf_desc *dmb_desc)
 {
+	struct ism_dev *ism;
 	struct ism_dmb dmb;
 	int rc;
 
@@ -239,7 +245,9 @@ int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len,
 	dmb.sba_idx = dmb_desc->sba_idx;
 	dmb.vlan_id = lgr->vlan_id;
 	dmb.rgid = lgr->peer_gid.gid;
-	rc = lgr->smcd->ops->register_dmb(lgr->smcd, &dmb, lgr->smcd->client);
+
+	ism = lgr->smcd->ism;
+	rc = ism->ops->register_dmb(ism, &dmb, &smc_ism_client);
 	if (!rc) {
 		dmb_desc->sba_idx = dmb.sba_idx;
 		dmb_desc->token = dmb.dmb_tok;
@@ -256,8 +264,8 @@ bool smc_ism_support_dmb_nocopy(struct smcd_dev *smcd)
 	 * merging sndbuf with peer DMB to avoid
 	 * data copies between them.
 	 */
-	return (smcd->ops->support_dmb_nocopy &&
-		smcd->ops->support_dmb_nocopy(smcd));
+	return (smcd->ism->ops->support_dmb_nocopy &&
+		smcd->ism->ops->support_dmb_nocopy(smcd->ism));
 }
 
 int smc_ism_attach_dmb(struct smcd_dev *dev, u64 token,
@@ -266,12 +274,12 @@ int smc_ism_attach_dmb(struct smcd_dev *dev, u64 token,
 	struct ism_dmb dmb;
 	int rc = 0;
 
-	if (!dev->ops->attach_dmb)
+	if (!dev->ism->ops->attach_dmb)
 		return -EINVAL;
 
 	memset(&dmb, 0, sizeof(dmb));
 	dmb.dmb_tok = token;
-	rc = dev->ops->attach_dmb(dev, &dmb);
+	rc = dev->ism->ops->attach_dmb(dev->ism, &dmb);
 	if (!rc) {
 		dmb_desc->sba_idx = dmb.sba_idx;
 		dmb_desc->token = dmb.dmb_tok;
@@ -284,10 +292,10 @@ int smc_ism_attach_dmb(struct smcd_dev *dev, u64 token,
 
 int smc_ism_detach_dmb(struct smcd_dev *dev, u64 token)
 {
-	if (!dev->ops->detach_dmb)
+	if (!dev->ism->ops->detach_dmb)
 		return -EINVAL;
 
-	return dev->ops->detach_dmb(dev, token);
+	return dev->ism->ops->detach_dmb(dev->ism, token);
 }
 
 static int smc_nl_handle_smcd_dev(struct smcd_dev *smcd,
@@ -412,6 +420,8 @@ static void smcd_handle_sw_event(struct smc_ism_event_work *wrk)
 	struct smcd_gid peer_gid = { .gid = wrk->event.tok,
 				     .gid_ext = 0 };
 	union smcd_sw_event_info ev_info;
+	struct ism_dev *ism = wrk->smcd->ism;
+	uuid_t ism_rgid;
 
 	ev_info.info = wrk->event.info;
 	switch (wrk->event.code) {
@@ -420,14 +430,14 @@ static void smcd_handle_sw_event(struct smc_ism_event_work *wrk)
 		break;
 	case ISM_EVENT_CODE_TESTLINK:	/* Activity timer */
 		if (ev_info.code == ISM_EVENT_REQUEST &&
-		    wrk->smcd->ops->signal_event) {
+		    ism->ops->signal_event) {
 			ev_info.code = ISM_EVENT_RESPONSE;
-			wrk->smcd->ops->signal_event(wrk->smcd,
-						     &peer_gid,
-						     ISM_EVENT_REQUEST_IR,
-						     ISM_EVENT_CODE_TESTLINK,
-						     ev_info.info);
-			}
+			copy_to_ismgid(&ism_rgid, &peer_gid);
+			ism->ops->signal_event(ism, &ism_rgid,
+					       ISM_EVENT_REQUEST_IR,
+					       ISM_EVENT_CODE_TESTLINK,
+					       ev_info.info);
+		}
 		break;
 	}
 }
@@ -453,9 +463,7 @@ static void smc_ism_event_work(struct work_struct *work)
 	kfree(wrk);
 }
 
-static struct smcd_dev *smcd_alloc_dev(const char *name,
-				       const struct smcd_ops *ops,
-				       int max_dmbs)
+static struct smcd_dev *smcd_alloc_dev(const char *name, int max_dmbs)
 {
 	struct smcd_dev *smcd;
 
@@ -472,8 +480,6 @@ static struct smcd_dev *smcd_alloc_dev(const char *name,
 	if (!smcd->event_wq)
 		goto free_conn;
 
-	smcd->ops = ops;
-
 	spin_lock_init(&smcd->lock);
 	spin_lock_init(&smcd->lgr_lock);
 	INIT_LIST_HEAD(&smcd->vlan);
@@ -488,176 +494,22 @@ static struct smcd_dev *smcd_alloc_dev(const char *name,
 	return NULL;
 }
 
-static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
-			   u32 vid_valid, u32 vid)
-{
-	struct ism_dev *ism = smcd->ism;
-	uuid_t ism_rgid;
-
-	copy_to_ismgid(&ism_rgid, rgid);
-	return ism->ops->query_remote_gid(ism, &ism_rgid, vid_valid, vid);
-}
-
-static int smcd_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb,
-			     void *client)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->register_dmb(ism, dmb, (struct ism_client *)client);
-}
-
-static int smcd_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->unregister_dmb(ism, dmb);
-}
-
-static int smcd_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->add_vlan_id(ism, vlan_id);
-}
-
-static int smcd_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->del_vlan_id(ism, vlan_id);
-}
-
-static int smcd_set_vlan_required(struct smcd_dev *smcd)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->set_vlan_required(ism);
-}
-
-static int smcd_reset_vlan_required(struct smcd_dev *smcd)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->reset_vlan_required(ism);
-}
-
-static int smcd_signal_ieq(struct smcd_dev *smcd, struct smcd_gid *rgid,
-			   u32 trigger_irq, u32 event_code, u64 info)
-{
-	struct ism_dev *ism = smcd->ism;
-	uuid_t ism_rgid;
-
-	copy_to_ismgid(&ism_rgid, rgid);
-	return ism->ops->signal_event(ism, &ism_rgid, trigger_irq,
-				      event_code, info);
-}
-
-static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx,
-		     bool sf, unsigned int offset, void *data,
-		     unsigned int size)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->move_data(ism, dmb_tok, idx, sf, offset, data, size);
-}
-
-static int smcd_supports_v2(void)
-{
-	return smc_ism_v2_capable;
-}
-
-static void smcd_get_local_gid(struct smcd_dev *smcd,
-			       struct smcd_gid *smcd_gid)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	copy_to_smcdgid(smcd_gid, &ism->gid);
-}
-
-static u16 smcd_get_chid(struct smcd_dev *smcd)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->get_chid(ism);
-}
-
-static inline struct device *smcd_get_dev(struct smcd_dev *dev)
-{
-	return ism_get_dev(dev->ism);
-}
-
-static const struct smcd_ops ism_smcd_ops = {
-	.query_remote_gid = smcd_query_rgid,
-	.register_dmb = smcd_register_dmb,
-	.unregister_dmb = smcd_unregister_dmb,
-	.add_vlan_id = smcd_add_vlan_id,
-	.del_vlan_id = smcd_del_vlan_id,
-	.set_vlan_required = smcd_set_vlan_required,
-	.reset_vlan_required = smcd_reset_vlan_required,
-	.signal_event = smcd_signal_ieq,
-	.move_data = smcd_move,
-	.supports_v2 = smcd_supports_v2,
-	.get_local_gid = smcd_get_local_gid,
-	.get_chid = smcd_get_chid,
-	.get_dev = smcd_get_dev,
-};
-
-static inline int smcd_support_dmb_nocopy(struct smcd_dev *smcd)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->support_dmb_nocopy(ism);
-}
-
-static inline int smcd_attach_dmb(struct smcd_dev *smcd,
-				  struct ism_dmb *dmb)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->attach_dmb(ism, dmb);
-}
-
-static inline int smcd_detach_dmb(struct smcd_dev *smcd, u64 token)
-{
-	struct ism_dev *ism = smcd->ism;
-
-	return ism->ops->detach_dmb(ism, token);
-}
-
-static const struct smcd_ops lo_ops = {
-	.query_remote_gid = smcd_query_rgid,
-	.register_dmb = smcd_register_dmb,
-	.unregister_dmb = smcd_unregister_dmb,
-	.support_dmb_nocopy = smcd_support_dmb_nocopy,
-	.attach_dmb = smcd_attach_dmb,
-	.detach_dmb = smcd_detach_dmb,
-	.move_data = smcd_move,
-	.supports_v2 = smcd_supports_v2,
-	.get_local_gid = smcd_get_local_gid,
-	.get_chid = smcd_get_chid,
-	.get_dev = smcd_get_dev,
-};
-
 static void smcd_register_dev(struct ism_dev *ism)
 {
-	const struct smcd_ops *ops;
 	struct smcd_dev *smcd, *fentry;
 	int max_dmbs;
 
 	if (ism->ops->get_chid(ism) == ISM_LO_RESERVED_CHID) {
 		max_dmbs = ISM_LO_MAX_DMBS;
-		ops = &lo_ops;
 	} else {
 		max_dmbs = ISM_NR_DMBS;
-		ops = &ism_smcd_ops;
 	}
 
-	smcd = smcd_alloc_dev(dev_name(&ism->dev), ops, max_dmbs);
+	smcd = smcd_alloc_dev(dev_name(&ism->dev), max_dmbs);
 	if (!smcd)
 		return;
 
 	smcd->ism = ism;
-	smcd->client = &smc_ism_client;
 	ism_set_priv(ism, &smc_ism_client, smcd);
 
 	if (smc_pnetid_by_dev_port(ism->dev.parent, 0, smcd->pnetid))
@@ -760,16 +612,18 @@ int smc_ism_signal_shutdown(struct smc_link_group *lgr)
 	int rc = 0;
 #if IS_ENABLED(CONFIG_ISM)
 	union smcd_sw_event_info ev_info;
+	uuid_t ism_rgid;
 
 	if (lgr->peer_shutdown)
 		return 0;
-	if (!lgr->smcd->ops->signal_event)
+	if (!lgr->smcd->ism->ops->signal_event)
 		return 0;
 
 	memcpy(ev_info.uid, lgr->id, SMC_LGR_ID_SIZE);
 	ev_info.vlan_id = lgr->vlan_id;
 	ev_info.code = ISM_EVENT_REQUEST;
-	rc = lgr->smcd->ops->signal_event(lgr->smcd, &lgr->peer_gid,
+	copy_to_ismgid(&ism_rgid, &lgr->peer_gid);
+	rc = lgr->smcd->ism->ops->signal_event(lgr->smcd->ism, &ism_rgid,
 					  ISM_EVENT_REQUEST_IR,
 					  ISM_EVENT_CODE_SHUTDOWN,
 					  ev_info.info);
diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h
index d041e5a7c459..e2e8cfba2575 100644
--- a/net/smc/smc_ism.h
+++ b/net/smc/smc_ism.h
@@ -68,7 +68,9 @@ static inline int smc_ism_write(struct smcd_dev *smcd, u64 dmb_tok,
 {
 	int rc;
 
-	rc = smcd->ops->move_data(smcd, dmb_tok, idx, sf, offset, data, len);
+	rc = smcd->ism->ops->move_data(smcd->ism, dmb_tok, idx, sf, offset,
+				       data, len);
+
 	return rc < 0 ? rc : 0;
 }
 
@@ -85,14 +87,14 @@ static inline bool __smc_ism_is_emulated(u16 chid)
 
 static inline bool smc_ism_is_emulated(struct smcd_dev *smcd)
 {
-	u16 chid = smcd->ops->get_chid(smcd);
+	u16 chid = smcd->ism->ops->get_chid(smcd->ism);
 
 	return __smc_ism_is_emulated(chid);
 }
 
 static inline bool smc_ism_is_loopback(struct smcd_dev *smcd)
 {
-	return (smcd->ops->get_chid(smcd) == 0xFFFF);
+	return (smcd->ism->ops->get_chid(smcd->ism) == ISM_LO_RESERVED_CHID);
 }
 
 static inline void copy_to_smcdgid(struct smcd_gid *sgid, uuid_t *igid)
diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 716808f374a8..397557f4b7d4 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -169,7 +169,7 @@ static int smc_pnet_remove_by_pnetid(struct net *net, char *pnet_name)
 			pr_warn_ratelimited("smc: smcd device %s "
 					    "erased user defined pnetid "
 					    "%.16s\n",
-					    dev_name(smcd->ops->get_dev(smcd)),
+					    dev_name(ism_get_dev(smcd->ism)),
 					    smcd->pnetid);
 			memset(smcd->pnetid, 0, SMC_MAX_PNETID_LEN);
 			smcd->pnetid_by_user = false;
@@ -332,7 +332,7 @@ static struct smcd_dev *smc_pnet_find_smcd(char *smcd_name)
 
 	mutex_lock(&smcd_dev_list.mutex);
 	list_for_each_entry(smcd_dev, &smcd_dev_list.list, list) {
-		if (!strncmp(dev_name(smcd_dev->ops->get_dev(smcd_dev)),
+		if (!strncmp(dev_name(ism_get_dev(smcd_dev->ism)),
 			     smcd_name, IB_DEVICE_NAME_MAX - 1))
 			goto out;
 	}
@@ -431,7 +431,7 @@ static int smc_pnet_add_ib(struct smc_pnettable *pnettable, char *ib_name,
 	if (smcd) {
 		smcddev_applied = smc_pnet_apply_smcd(smcd, pnet_name);
 		if (smcddev_applied) {
-			dev = smcd->ops->get_dev(smcd);
+			dev = ism_get_dev(smcd->ism);
 			pr_warn_ratelimited("smc: smcd device %s "
 					    "applied user defined pnetid "
 					    "%.16s\n", dev_name(dev),
@@ -1190,7 +1190,7 @@ int smc_pnetid_by_table_ib(struct smc_ib_device *smcibdev, u8 ib_port)
  */
 int smc_pnetid_by_table_smcd(struct smcd_dev *smcddev)
 {
-	const char *ib_name = dev_name(smcddev->ops->get_dev(smcddev));
+	const char *ib_name = dev_name(ism_get_dev(smcddev->ism));
 	struct smc_pnettable *pnettable;
 	struct smc_pnetentry *tmp_pe;
 	struct smc_net *sn;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
@ 2025-01-15 22:06   ` Halil Pasic
  2025-01-20  6:32   ` Dust Li
  1 sibling, 0 replies; 61+ messages in thread
From: Halil Pasic @ 2025-01-15 22:06 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: Wenjia Zhang, Jan Karcher, Gerd Bayer, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Julian Ruess, Niklas Schnelle,
	Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman, Halil Pasic

On Wed, 15 Jan 2025 20:55:24 +0100
Alexandra Winter <wintera@linux.ibm.com> wrote:

> Note that in this RFC this patch is not complete, future versions
> of this patch need to contain comments for all ism_ops.
> Especially signal_event() and handle_event() need a good generic
> description.

Such notes don't belong to the commit message, but to the inline cover
letter IMHO.

> 
> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
> ---

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
                   ` (6 preceding siblings ...)
  2025-01-15 19:55 ` [RFC net-next 7/7] net/smc: Use only ism_ops Alexandra Winter
@ 2025-01-16  9:32 ` Dust Li
  2025-01-16 11:55   ` Julian Ruess
  2025-01-17 11:04   ` Alexandra Winter
  7 siblings, 2 replies; 61+ messages in thread
From: Dust Li @ 2025-01-16  9:32 UTC (permalink / raw)
  To: Alexandra Winter, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-15 20:55:20, Alexandra Winter wrote:

Hi Winter,

I'm fully supportive of the refactor!

Interestingly, I developed a similar RFC code about a month ago while
working on enhancing internal communication between guest and host
systems. Here are some of my thoughts on the matter:

Naming and Structure: I suggest we refer to it as SHD (Shared Memory
Device) instead of ISM (Internal Shared Memory). To my knowledge, a
"Shared Memory Device" better encapsulates the functionality we're
aiming to implement. It might be beneficial to place it under
drivers/shd/ and register it as a new class under /sys/class/shd/. That
said, my initial draft also adopted the ISM terminology for simplicity.

Modular Approach: I've made the ism_loopback an independent kernel
module since dynamic enable/disable functionality is not yet supported
in SMC. Using insmod and rmmod for module management could provide the
flexibility needed in practical scenarios.

Abstraction of ISM Device Details: I propose we abstract the ISM device
details by providing SMC with helper functions. These functions could
encapsulate ism->ops, making the implementation cleaner and more
intuitive. This way, the struct ism_device would mainly serve its
implementers, while the upper helper functions offer a streamlined
interface for SMC.

Structuring and Naming: I recommend embedding the structure of ism_ops
directly within ism_dev rather than using a pointer. Additionally,
renaming it to ism_device_ops could enhance clarity and consistency.


>This RFC is about providing a generic shim layer between all kinds of
>ism devices and all kinds of ism users.
>
>Benefits:
>- Cleaner separation of ISM and SMC-D functionality
>- simpler and less module dependencies
>- Clear interface definition.
>- Extendable for future devices and clients.

Fully agree.

>
>Request for comments:
>---------------------
>Any comments are welcome, but I am aware that this series needs more work.
>It may not be worth your time to do an in-depth review of the details, I am
>looking for feedback on the general idea.
>I am mostly interested in your thoughts and recommendations about the general
>concept, the location of net/ism, the structure of include/linux/ism.h, the
>KConfig and makefiles.
>
>Status of this RFC:
>-------------------
>This is a very early RFC to ask you for comments on this general idea.
>The RFC does not fullfill all criteria required for a patchset.
>The whole set compiles and runs, but I did not try all combinations of
>module and built-in yet. I did not check for checkpatch or any other checkers.
>Also I have only done very rudimentary quick tests of SMC-D. More testing is
>required.
>
>Background / Status quo:
>------------------------
>Currently s390 hardware provides virtual PCI ISM devices (ism_vpci). Their
>driver is in drivers/s390/net/ism_drv.c. The main user is SMC-D (net/smc).
>ism_vpci driver offers a client interface so other users/protocols
>can also use them, but it is still heavily intermingled with the smc code.
>Namely, the ISM vPCI module cannot be used without the SMC module, which
>feels artificial.
>
>The ISM concept is being extended:
>[1] proposed an ISM loopback interface (ism_lo), that can be used on non-s390
>architectures (e.g. between containers or to test SMC-D). A minimal implementation
>went upstream with [2]: ism_lo currently is a part of the smc protocol and rather
>hidden.
>
>[3] proposed a virtio definition of ISM (ism_virtio) that can be used between
>kvm guests.
>
>We will shortly send an RFC for an ISM client that uses ISM as transport for TTY.
>
>Concept:
>--------
>Create a shim layer in net/ism that contains common definitions and code for
>all ism devices and all ism clients.
>Any device or client module only needs to depend on this ism layer module and
>any device or client code only needs to include the definitions in
>include/linux/ism.h
>
>Ideas for next steps:
>---------------------
>- sysfs representation? e.g. as /sys/class/ism ?
>- provide a full-fledged ism loopback interface
>    (runtime enable/disable, sysfs device, ..)

I think it's better if we can make this common for all ISM devices.
but yeah, that shoud be the next step.

Best regards,
Dust

>- additional clients (tty over ism)
>- additional devices (virtio-ism, ...)
>
>Link: [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
>Link: [2] https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/
>Link: [3] https://groups.oasis-open.org/communities/community-home/digestviewer/viewthread?GroupId=3973&MessageKey=c060ecf9-ea1a-49a2-9827-c92f0e6447b2&CommunityKey=2f26be99-3aa1-48f6-93a5-018dce262226&hlmlt=VT
>
>Alexandra Winter (7):
>  net/ism: Create net/ism
>  net/ism: Remove dependencies between ISM_VPCI and SMC
>  net/ism: Use uuid_t for ISM GID
>  net/ism: Add kernel-doc comments for ism functions
>  net/ism: Move ism_loopback to net/ism
>  s390/ism: Define ismvp_dev
>  net/smc: Use only ism_ops
>
> MAINTAINERS                |   7 +
> drivers/s390/net/Kconfig   |  10 +-
> drivers/s390/net/Makefile  |   4 +-
> drivers/s390/net/ism.h     |  27 ++-
> drivers/s390/net/ism_drv.c | 467 ++++++++++++-------------------------
> include/linux/ism.h        | 299 +++++++++++++++++++++---
> include/net/smc.h          |  52 +----
> net/Kconfig                |   1 +
> net/Makefile               |   1 +
> net/ism/Kconfig            |  27 +++
> net/ism/Makefile           |   8 +
> net/ism/ism_loopback.c     | 366 +++++++++++++++++++++++++++++
> net/ism/ism_loopback.h     |  59 +++++
> net/ism/ism_main.c         | 171 ++++++++++++++
> net/smc/Kconfig            |  13 --
> net/smc/Makefile           |   1 -
> net/smc/af_smc.c           |  12 +-
> net/smc/smc_clc.c          |   6 +-
> net/smc/smc_core.c         |   6 +-
> net/smc/smc_diag.c         |   2 +-
> net/smc/smc_ism.c          | 112 +++++----
> net/smc/smc_ism.h          |  29 ++-
> net/smc/smc_loopback.c     | 427 ---------------------------------
> net/smc/smc_loopback.h     |  60 -----
> net/smc/smc_pnet.c         |   8 +-
> 25 files changed, 1183 insertions(+), 992 deletions(-)
> create mode 100644 net/ism/Kconfig
> create mode 100644 net/ism/Makefile
> create mode 100644 net/ism/ism_loopback.c
> create mode 100644 net/ism/ism_loopback.h
> create mode 100644 net/ism/ism_main.c
> delete mode 100644 net/smc/smc_loopback.c
> delete mode 100644 net/smc/smc_loopback.h
>
>-- 
>2.45.2
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-16  9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
@ 2025-01-16 11:55   ` Julian Ruess
  2025-01-16 16:17     ` Alexandra Winter
  2025-01-17 11:04   ` Alexandra Winter
  1 sibling, 1 reply; 61+ messages in thread
From: Julian Ruess @ 2025-01-16 11:55 UTC (permalink / raw)
  To: dust.li, Alexandra Winter, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Thu Jan 16, 2025 at 10:32 AM CET, Dust Li wrote:
> On 2025-01-15 20:55:20, Alexandra Winter wrote:
>
> Hi Winter,
>
> I'm fully supportive of the refactor!
>
> Interestingly, I developed a similar RFC code about a month ago while
> working on enhancing internal communication between guest and host
> systems. Here are some of my thoughts on the matter:
>
> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
> Device) instead of ISM (Internal Shared Memory). To my knowledge, a
> "Shared Memory Device" better encapsulates the functionality we're
> aiming to implement. It might be beneficial to place it under
> drivers/shd/ and register it as a new class under /sys/class/shd/. That
> said, my initial draft also adopted the ISM terminology for simplicity.

I'm not sure if we really want to introduce a new name for
the already existing ISM device. For me, having two names
for the same thing just adds additional complexity.

I would go for /sys/class/ism

>
> Modular Approach: I've made the ism_loopback an independent kernel
> module since dynamic enable/disable functionality is not yet supported
> in SMC. Using insmod and rmmod for module management could provide the
> flexibility needed in practical scenarios.
>
> Abstraction of ISM Device Details: I propose we abstract the ISM device
> details by providing SMC with helper functions. These functions could
> encapsulate ism->ops, making the implementation cleaner and more
> intuitive. This way, the struct ism_device would mainly serve its
> implementers, while the upper helper functions offer a streamlined
> interface for SMC.
>
> Structuring and Naming: I recommend embedding the structure of ism_ops
> directly within ism_dev rather than using a pointer. Additionally,
> renaming it to ism_device_ops could enhance clarity and consistency.
>
>
> >This RFC is about providing a generic shim layer between all kinds of
> >ism devices and all kinds of ism users.
> >
> >Benefits:
> >- Cleaner separation of ISM and SMC-D functionality
> >- simpler and less module dependencies
> >- Clear interface definition.
> >- Extendable for future devices and clients.
>
> Fully agree.
>
> >
> >Request for comments:
> >---------------------
> >Any comments are welcome, but I am aware that this series needs more work.
> >It may not be worth your time to do an in-depth review of the details, I am
> >looking for feedback on the general idea.
> >I am mostly interested in your thoughts and recommendations about the general
> >concept, the location of net/ism, the structure of include/linux/ism.h, the
> >KConfig and makefiles.
> >
> >Status of this RFC:
> >-------------------
> >This is a very early RFC to ask you for comments on this general idea.
> >The RFC does not fullfill all criteria required for a patchset.
> >The whole set compiles and runs, but I did not try all combinations of
> >module and built-in yet. I did not check for checkpatch or any other checkers.
> >Also I have only done very rudimentary quick tests of SMC-D. More testing is
> >required.
> >
> >Background / Status quo:
> >------------------------
> >Currently s390 hardware provides virtual PCI ISM devices (ism_vpci). Their
> >driver is in drivers/s390/net/ism_drv.c. The main user is SMC-D (net/smc).
> >ism_vpci driver offers a client interface so other users/protocols
> >can also use them, but it is still heavily intermingled with the smc code.
> >Namely, the ISM vPCI module cannot be used without the SMC module, which
> >feels artificial.
> >
> >The ISM concept is being extended:
> >[1] proposed an ISM loopback interface (ism_lo), that can be used on non-s390
> >architectures (e.g. between containers or to test SMC-D). A minimal implementation
> >went upstream with [2]: ism_lo currently is a part of the smc protocol and rather
> >hidden.
> >
> >[3] proposed a virtio definition of ISM (ism_virtio) that can be used between
> >kvm guests.
> >
> >We will shortly send an RFC for an ISM client that uses ISM as transport for TTY.
> >
> >Concept:
> >--------
> >Create a shim layer in net/ism that contains common definitions and code for
> >all ism devices and all ism clients.
> >Any device or client module only needs to depend on this ism layer module and
> >any device or client code only needs to include the definitions in
> >include/linux/ism.h
> >
> >Ideas for next steps:
> >---------------------
> >- sysfs representation? e.g. as /sys/class/ism ?
> >- provide a full-fledged ism loopback interface
> >    (runtime enable/disable, sysfs device, ..)
>
> I think it's better if we can make this common for all ISM devices.
> but yeah, that shoud be the next step.

I already have patches based on this series that introduce
/sys/class/ism and show ism-loopback as well as
s390/ism devices. I can send this soon.


Julian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-16 11:55   ` Julian Ruess
@ 2025-01-16 16:17     ` Alexandra Winter
  2025-01-16 17:08       ` Julian Ruess
                         ` (3 more replies)
  0 siblings, 4 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-16 16:17 UTC (permalink / raw)
  To: Julian Ruess, dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 16.01.25 12:55, Julian Ruess wrote:
> On Thu Jan 16, 2025 at 10:32 AM CET, Dust Li wrote:
>> On 2025-01-15 20:55:20, Alexandra Winter wrote:
>>
>> Hi Winter,
>>
>> I'm fully supportive of the refactor!


Thank you very much Dust Li for joining the discussion.


>> Interestingly, I developed a similar RFC code about a month ago while
>> working on enhancing internal communication between guest and host
>> systems. 


But you did not send that out, did you?
I hope I did not overlook an earlier proposal by you.


Here are some of my thoughts on the matter:
>>
>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>> Device) instead of ISM (Internal Shared Memory). 


So where does the 'H' come from? If you want to call it Shared Memory _D_evice?


To my knowledge, a
>> "Shared Memory Device" better encapsulates the functionality we're
>> aiming to implement. 


Could you explain why that would be better?
'Internal Shared Memory' is supposed to be a bit of a counterpart to the
Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
devices and by ism_loopback. So what is the benefit in changing it?


It might be beneficial to place it under
>> drivers/shd/ and register it as a new class under /sys/class/shd/. That
>> said, my initial draft also adopted the ISM terminology for simplicity.
> 
> I'm not sure if we really want to introduce a new name for
> the already existing ISM device. For me, having two names
> for the same thing just adds additional complexity.
> 
> I would go for /sys/class/ism
> 
>>
>> Modular Approach: I've made the ism_loopback an independent kernel
>> module since dynamic enable/disable functionality is not yet supported
>> in SMC. Using insmod and rmmod for module management could provide the
>> flexibility needed in practical scenarios.


With this proposal ism_loopback is just another ism device and SMC-D will
handle removal just like ism_client.remove(ism_dev) of other ism devices.

But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
which is a big disadvantage, I agree. The ism layer is prepared for dynamic
removal by ism_dev_unregister(). In case of this RFC that would only happen
in case of rmmod ism. Which should be improved.
One way to do that would be a separate ism_loopback kernel module, like you say.
Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
I also think it is a great way for testing any ISM client, so it has benefit for
anybody using the ism module.
Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
(Once we agree if and how to represent ism devices in genera in sysfs).

>>
>> Abstraction of ISM Device Details: I propose we abstract the ISM device
>> details by providing SMC with helper functions. These functions could
>> encapsulate ism->ops, making the implementation cleaner and more
>> intuitive. This way, the struct ism_device would mainly serve its
>> implementers, while the upper helper functions offer a streamlined
>> interface for SMC.
>>
>> Structuring and Naming: I recommend embedding the structure of ism_ops
>> directly within ism_dev rather than using a pointer. Additionally,
>> renaming it to ism_device_ops could enhance clarity and consistency.
>>
>>
>>> This RFC is about providing a generic shim layer between all kinds of
>>> ism devices and all kinds of ism users.
>>>
>>> Benefits:
>>> - Cleaner separation of ISM and SMC-D functionality
>>> - simpler and less module dependencies
>>> - Clear interface definition.
>>> - Extendable for future devices and clients.
>>
>> Fully agree.
>>
>>>
[...]
>>>
>>> Ideas for next steps:
>>> ---------------------
>>> - sysfs representation? e.g. as /sys/class/ism ?
>>> - provide a full-fledged ism loopback interface
>>>    (runtime enable/disable, sysfs device, ..)
>>
>> I think it's better if we can make this common for all ISM devices.
>> but yeah, that shoud be the next step.


The s390 ism_vpci devices are already backed by struct pci_dev. 
And I assume that would be represented in sysfs somehow like:
/sys/class/ism/ism_vp0/device -> /sys/devices/<pci bus no>/<pci dev no>
so there is an 
/sys/class/ism/<ism dev name>/device/enable entry already, 
because there is /sys/devices/<pci bus no>/<pci dev no>/enable today.

I remember Wen Gu's first proposal for ism_loopback had a device
in /sys/devices/virtual/ and had an 'active' entry to enable/disable.
Something like that could be linked to /sys/class/ism/ism_lo/device.


> 
> I already have patches based on this series that introduce
> /sys/class/ism and show ism-loopback as well as
> s390/ism devices. I can send this soon.
> 
> 
> Julian


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-16 16:17     ` Alexandra Winter
@ 2025-01-16 17:08       ` Julian Ruess
  2025-01-17  2:13       ` Dust Li
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Julian Ruess @ 2025-01-16 17:08 UTC (permalink / raw)
  To: Alexandra Winter, dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Thu Jan 16, 2025 at 5:17 PM CET, Alexandra Winter wrote:
>
>
> On 16.01.25 12:55, Julian Ruess wrote:
> > On Thu Jan 16, 2025 at 10:32 AM CET, Dust Li wrote:
> >> On 2025-01-15 20:55:20, Alexandra Winter wrote:
> >>
> >> Hi Winter,
> >>
> >> I'm fully supportive of the refactor!
>
>
> Thank you very much Dust Li for joining the discussion.
>
>
> >> Interestingly, I developed a similar RFC code about a month ago while
> >> working on enhancing internal communication between guest and host
> >> systems. 
>
>
> But you did not send that out, did you?
> I hope I did not overlook an earlier proposal by you.
>
>
> Here are some of my thoughts on the matter:
> >>
> >> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
> >> Device) instead of ISM (Internal Shared Memory). 
>
>
> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>
>
> To my knowledge, a
> >> "Shared Memory Device" better encapsulates the functionality we're
> >> aiming to implement. 
>
>
> Could you explain why that would be better?
> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
> devices and by ism_loopback. So what is the benefit in changing it?
>
>
> It might be beneficial to place it under
> >> drivers/shd/ and register it as a new class under /sys/class/shd/. That
> >> said, my initial draft also adopted the ISM terminology for simplicity.
> > 
> > I'm not sure if we really want to introduce a new name for
> > the already existing ISM device. For me, having two names
> > for the same thing just adds additional complexity.
> > 
> > I would go for /sys/class/ism
> > 
> >>
> >> Modular Approach: I've made the ism_loopback an independent kernel
> >> module since dynamic enable/disable functionality is not yet supported
> >> in SMC. Using insmod and rmmod for module management could provide the
> >> flexibility needed in practical scenarios.
>
>
> With this proposal ism_loopback is just another ism device and SMC-D will
> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>
> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
> removal by ism_dev_unregister(). In case of this RFC that would only happen
> in case of rmmod ism. Which should be improved.
> One way to do that would be a separate ism_loopback kernel module, like you say.
> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
> I also think it is a great way for testing any ISM client, so it has benefit for
> anybody using the ism module.
> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
> (Once we agree if and how to represent ism devices in genera in sysfs).
>
> >>
> >> Abstraction of ISM Device Details: I propose we abstract the ISM device
> >> details by providing SMC with helper functions. These functions could
> >> encapsulate ism->ops, making the implementation cleaner and more
> >> intuitive. This way, the struct ism_device would mainly serve its
> >> implementers, while the upper helper functions offer a streamlined
> >> interface for SMC.
> >>
> >> Structuring and Naming: I recommend embedding the structure of ism_ops
> >> directly within ism_dev rather than using a pointer. Additionally,
> >> renaming it to ism_device_ops could enhance clarity and consistency.
> >>
> >>
> >>> This RFC is about providing a generic shim layer between all kinds of
> >>> ism devices and all kinds of ism users.
> >>>
> >>> Benefits:
> >>> - Cleaner separation of ISM and SMC-D functionality
> >>> - simpler and less module dependencies
> >>> - Clear interface definition.
> >>> - Extendable for future devices and clients.
> >>
> >> Fully agree.
> >>
> >>>
> [...]
> >>>
> >>> Ideas for next steps:
> >>> ---------------------
> >>> - sysfs representation? e.g. as /sys/class/ism ?
> >>> - provide a full-fledged ism loopback interface
> >>>    (runtime enable/disable, sysfs device, ..)
> >>
> >> I think it's better if we can make this common for all ISM devices.
> >> but yeah, that shoud be the next step.
>
>
> The s390 ism_vpci devices are already backed by struct pci_dev. 
> And I assume that would be represented in sysfs somehow like:
> /sys/class/ism/ism_vp0/device -> /sys/devices/<pci bus no>/<pci dev no>
> so there is an 
> /sys/class/ism/<ism dev name>/device/enable entry already, 
> because there is /sys/devices/<pci bus no>/<pci dev no>/enable today.
>
> I remember Wen Gu's first proposal for ism_loopback had a device
> in /sys/devices/virtual/ and had an 'active' entry to enable/disable.
> Something like that could be linked to /sys/class/ism/ism_lo/device.

My current implementation represents the devices as following
in '/sys/class/ism':

ism_lo -> ../../devices/virtual/ism/ism_lo
lismvpci0 -> ../../devices/pci0124:00/0124:00:00.0/ism/ismvpci0

The driver is repsonsible for the naming of its devices.

And yes, because the s390 ism_vpci is backed by a PCI device,
'/sys/class/ism/ismvpci0/device/enable' exists.

I think we could implement a device attribute for ism_lo
to implement this functionality. I already have a 
device attribute implemented in ism_main
to access the gid of each ISM device. This leads
to the following sysfs entries:

'/sys/class/ism/ism_lo/gid'
'/sys/class/ism/ismvpci0/gid'

Julian

>
>
> > 
> > I already have patches based on this series that introduce
> > /sys/class/ism and show ism-loopback as well as
> > s390/ism devices. I can send this soon.
> > 
> > 
> > Julian


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 1/7] net/ism: Create net/ism
  2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
@ 2025-01-16 20:08   ` Andrew Lunn
  2025-01-17 12:06     ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2025-01-16 20:08 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe,
	Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

> +ISM (INTERNAL SHARED MEMORY)
> +M:	Alexandra Winter <wintera@linux.ibm.com>
> +L:	netdev@vger.kernel.org
> +S:	Supported
> +F:	include/linux/ism.h
> +F:	net/ism/

Is there any high level documentation about this?

A while back, TI was trying to upstream something for one of there
SoCs. It was a multi CPU system, with not all CPUs used for SMP, but
one or two kept for management and real time tasks, not even running
Linux. They had a block of shared memory used for communication
between the CPUs/OSes, along with rproc. They layered an ethernet
driver on top of this, with buffers for frames in the shared memory.

Could ISM be used for something like this?

	Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-16 16:17     ` Alexandra Winter
  2025-01-16 17:08       ` Julian Ruess
@ 2025-01-17  2:13       ` Dust Li
  2025-01-17 10:38         ` Niklas Schnelle
  2025-01-17 13:00         ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
  2025-01-17 15:06       ` Andrew Lunn
  2025-02-16 15:38       ` Wen Gu
  3 siblings, 2 replies; 61+ messages in thread
From: Dust Li @ 2025-01-17  2:13 UTC (permalink / raw)
  To: Alexandra Winter, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-16 17:17:33, Alexandra Winter wrote:
>
>
>On 16.01.25 12:55, Julian Ruess wrote:
>> On Thu Jan 16, 2025 at 10:32 AM CET, Dust Li wrote:
>>> On 2025-01-15 20:55:20, Alexandra Winter wrote:
>>>
>>> Hi Winter,
>>>
>>> I'm fully supportive of the refactor!
>
>
>Thank you very much Dust Li for joining the discussion.
>
>
>>> Interestingly, I developed a similar RFC code about a month ago while
>>> working on enhancing internal communication between guest and host
>>> systems. 
>
>
>But you did not send that out, did you?
>I hope I did not overlook an earlier proposal by you.

No, I just did a POC and didn't find the time to improve it.
So I think we can go on with your version.

>
>
>Here are some of my thoughts on the matter:
>>>
>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>> Device) instead of ISM (Internal Shared Memory). 
>
>
>So where does the 'H' come from? If you want to call it Shared Memory _D_evice?

Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
shm_open(3)). SMD is also OK.

>
>
>To my knowledge, a
>>> "Shared Memory Device" better encapsulates the functionality we're
>>> aiming to implement. 
>
>
>Could you explain why that would be better?
>'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>devices and by ism_loopback. So what is the benefit in changing it?

I believe that if we are going to separate and refine the code, and add
a common subsystem, we should choose the most appropriate name.

In my opinion, "ISM" doesn’t quite capture what the device provides.
Since we’re adding a "Device" that enables different entities (such as
processes or VMs) to perform shared memory communication, I think a more
fitting name would be better. If you have any alternative suggestions,
I’m open to them.


>
>
>It might be beneficial to place it under
>>> drivers/shd/ and register it as a new class under /sys/class/shd/. That
>>> said, my initial draft also adopted the ISM terminology for simplicity.
>> 
>> I'm not sure if we really want to introduce a new name for
>> the already existing ISM device. For me, having two names
>> for the same thing just adds additional complexity.

I believe that if we are going to rename it, there should be no
reference to "ISM" in this subsystem. IBM's PCI ISM can retain that
name, as it is an implementation of the Shared Memory device (assuming
we adopt that name).

>> 
>> I would go for /sys/class/ism
>> 
>>>
>>> Modular Approach: I've made the ism_loopback an independent kernel
>>> module since dynamic enable/disable functionality is not yet supported
>>> in SMC. Using insmod and rmmod for module management could provide the
>>> flexibility needed in practical scenarios.
>
>
>With this proposal ism_loopback is just another ism device and SMC-D will
>handle removal just like ism_client.remove(ism_dev) of other ism devices.
>
>But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>removal by ism_dev_unregister(). In case of this RFC that would only happen
>in case of rmmod ism. Which should be improved.
>One way to do that would be a separate ism_loopback kernel module, like you say.
>Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>I also think it is a great way for testing any ISM client, so it has benefit for
>anybody using the ism module.
>Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>(Once we agree if and how to represent ism devices in genera in sysfs).

This works for me as well. I think it would be better to implement this
within the common ISM layer, rather than duplicating the code in each
device. Similar to how it's done in netdevice.

Best regards,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17  2:13       ` Dust Li
@ 2025-01-17 10:38         ` Niklas Schnelle
  2025-01-17 15:02           ` Andrew Lunn
  2025-01-18 15:31           ` Dust Li
  2025-01-17 13:00         ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
  1 sibling, 2 replies; 61+ messages in thread
From: Niklas Schnelle @ 2025-01-17 10:38 UTC (permalink / raw)
  To: dust.li, Alexandra Winter, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman

On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
> > 
---8<---
> > Here are some of my thoughts on the matter:
> > > > 
> > > > Naming and Structure: I suggest we refer to it as SHD (Shared Memory
> > > > Device) instead of ISM (Internal Shared Memory). 
> > 
> > 
> > So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
> 
> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
> shm_open(3)). SMD is also OK.
> 
> > 
> > 
> > To my knowledge, a
> > > > "Shared Memory Device" better encapsulates the functionality we're
> > > > aiming to implement. 
> > 
> > 
> > Could you explain why that would be better?
> > 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
> > Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
> > devices and by ism_loopback. So what is the benefit in changing it?
> 
> I believe that if we are going to separate and refine the code, and add
> a common subsystem, we should choose the most appropriate name.
> 
> In my opinion, "ISM" doesn’t quite capture what the device provides.
> Since we’re adding a "Device" that enables different entities (such as
> processes or VMs) to perform shared memory communication, I think a more
> fitting name would be better. If you have any alternative suggestions,
> I’m open to them.

I kept thinking about this a bit and I'd like to propose yet another
name for this group of devices: Memory Communication Devices (MCD)

One important point I see is that there is a bit of a misnomer in the
existing ISM name in that our ISM device does in fact *not* share
memory in the common sense of the "shared memory" wording. Instead it
copies data between partitions of memory that share a common
cache/memory hierarchy while not sharing the memory itself. loopback-
ism and a possibly future virtio-ism on the other hand would share
memory in the "shared memory" sense. Though I'd very much hope they
will retain a copy mode to allow use in partition scenarios.

With that background I think the common denominator between them and
the main idea behind ISM is that they facilitate communication via
memory buffers and very simple and reliable copy/share operations. I
think this would also capture our planned use-case of devices (TTYs,
block devices, framebuffers + HID etc) provided by a peer on top of
such a memory communication device.

Thanks,
Niklas



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-16  9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
  2025-01-16 11:55   ` Julian Ruess
@ 2025-01-17 11:04   ` Alexandra Winter
  2025-01-18 15:24     ` Dust Li
  1 sibling, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-17 11:04 UTC (permalink / raw)
  To: dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic,
	D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

I hit the send button to early, sorry about that. 
Let me comment on the other proposals from Dust Li as well.

On 16.01.25 10:32, Dust Li wrote:
> Abstraction of ISM Device Details: I propose we abstract the ISM device
> details by providing SMC with helper functions. These functions could
> encapsulate ism->ops, making the implementation cleaner and more
> intuitive. 


Maybe I misunderstand what you mean by helper functions..
Why would you encapsulate ism->ops functions in another set of wrappers?
I was happy to remove the helper functions in 2/7 and 7/7.


This way, the struct ism_device would mainly serve its
> implementers, while the upper helper functions offer a streamlined
> interface for SMC.


I was actually also wondering, whether the clients should access ism_device
at all. Or whether they should only use the ism_ops.
I can give that a try in the next version. I think this RFC almost there already.
The clients would still need to pass a poitner to ism_dev as a parameter.


> Structuring and Naming: I recommend embedding the structure of ism_ops
> directly within ism_dev rather than using a pointer. 


I think it is a common method to have the const struct xy_ops in the device driver code
and then use pointer to register the device with an upper layer.
What would be the benefit of duplicating that struct in every ism_dev?


Additionally,
> renaming it to ism_device_ops could enhance clarity and consistency.


Yes, that would help to distinguish them from the ism_client functions. 
I' rename them to ism_dev_ops in the next version.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 1/7] net/ism: Create net/ism
  2025-01-16 20:08   ` Andrew Lunn
@ 2025-01-17 12:06     ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-17 12:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe,
	Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 16.01.25 21:08, Andrew Lunn wrote:
>> +ISM (INTERNAL SHARED MEMORY)
>> +M:	Alexandra Winter <wintera@linux.ibm.com>
>> +L:	netdev@vger.kernel.org
>> +S:	Supported
>> +F:	include/linux/ism.h
>> +F:	net/ism/
> 
> Is there any high level documentation about this?


As the ISM devices were developed for SMC-D, the only documentation
is about their usage for SMC-D.
e.g.:
https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Shared%20Memory%20Communications%20Version%202.1%20Emulated-ISM_0.pdf
(page 33)
https://community.ibm.com/community/user/ibmz-and-linuxone/viewdocument/2021-07-15-boosting-tcp-networking?CommunityKey=c1293167-6d93-448e-8854-3068846d3dfe&tab=librarydocuments
But those do not go into much detail.

We now want to provide interfaces for other usecases in Linux.

ism.h would be the place to  explicitely state the assumptions, restrictions and
requirements in a single place, so future devices and clients know about them.


> 
> A while back, TI was trying to upstream something for one of there
> SoCs. It was a multi CPU system, with not all CPUs used for SMP, but
> one or two kept for management and real time tasks, not even running
> Linux. They had a block of shared memory used for communication
> between the CPUs/OSes, along with rproc. They layered an ethernet
> driver on top of this, with buffers for frames in the shared memory.
> 
> Could ISM be used for something like this?
> 
> 	Andrew


If the communication endpoints were represented as devices, that sounds like a similar concept.

I think you could implement a client that provides network devices on top of ism devices.
(mapping MACs to GIDs)
As the memory buffers are set up for 1 sender and 1 receiver, it would either create some additional
latency, if you setup buffers for each message or additional memory consumption, if you try to keep and
re-use the buffers.
I'm not sure what the benefit ISM would provide as ethernet device. A shared network card would probably 
outperform such a usecase.
SMC exploits ISM for TCP traffic. There the buffers are kept per socket connection, and a lot of the
TCP/IP mechanisms are not neccessary, because transport is reliable, synchronous and in-order.
Thus latency is minimal.

 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17  2:13       ` Dust Li
  2025-01-17 10:38         ` Niklas Schnelle
@ 2025-01-17 13:00         ` Alexandra Winter
  2025-01-17 15:10           ` Andrew Lunn
  2025-01-20 10:28           ` Alexandra Winter
  1 sibling, 2 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-17 13:00 UTC (permalink / raw)
  To: dust.li, Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 17.01.25 03:13, Dust Li wrote:
>>>> Modular Approach: I've made the ism_loopback an independent kernel
>>>> module since dynamic enable/disable functionality is not yet supported
>>>> in SMC. Using insmod and rmmod for module management could provide the
>>>> flexibility needed in practical scenarios.
>>
>> With this proposal ism_loopback is just another ism device and SMC-D will
>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>>
>> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>> removal by ism_dev_unregister(). In case of this RFC that would only happen
>> in case of rmmod ism. Which should be improved.
>> One way to do that would be a separate ism_loopback kernel module, like you say.
>> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>> I also think it is a great way for testing any ISM client, so it has benefit for
>> anybody using the ism module.
>> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>> (Once we agree if and how to represent ism devices in genera in sysfs).
> This works for me as well. I think it would be better to implement this
> within the common ISM layer, rather than duplicating the code in each
> device. Similar to how it's done in netdevice.
> 
> Best regards,
> Dust


Is there a specific example for enable/disable in the netdevice code, you have in mind?
Or do you mean in general how netdevice provides a common layer?
Yes, everything that is common for all devices should be provided by the network layer.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 10:38         ` Niklas Schnelle
@ 2025-01-17 15:02           ` Andrew Lunn
  2025-01-17 16:00             ` Niklas Schnelle
  2025-01-18 15:31           ` Dust Li
  1 sibling, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2025-01-17 15:02 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: dust.li, Alexandra Winter, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

> One important point I see is that there is a bit of a misnomer in the
> existing ISM name in that our ISM device does in fact *not* share
> memory in the common sense of the "shared memory" wording.

Maybe this is the trap i fell into. So are you saying it is not a dual
port memory mapped into two CPUs physical address space? In another
email there was reference to shm. That would be a VMM equivalent, a
bunch of pages mapped into two processes address space.

This comes back to the lack of top level architecture documentation.
Outside reviewers such as i will have difficultly making useful
contributions, and seeing potential overlap and reuse with other
systems, without having a basic understanding of what you are talking
about.

	Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-16 16:17     ` Alexandra Winter
  2025-01-16 17:08       ` Julian Ruess
  2025-01-17  2:13       ` Dust Li
@ 2025-01-17 15:06       ` Andrew Lunn
  2025-01-17 15:38         ` Alexandra Winter
  2025-02-16 15:38       ` Wen Gu
  3 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2025-01-17 15:06 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: Julian Ruess, dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

> With this proposal ism_loopback is just another ism device and SMC-D will
> handle removal just like ism_client.remove(ism_dev) of other ism devices.

In Linux terminology, a device is something which has a struct device,
and a device lives on some sort of bus, even if it is a virtual
bus. Will ISM devices properly fit into the Linux device driver model?

	Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 13:00         ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
@ 2025-01-17 15:10           ` Andrew Lunn
  2025-01-17 16:20             ` Alexandra Winter
  2025-01-20 10:28           ` Alexandra Winter
  1 sibling, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2025-01-17 15:10 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: dust.li, Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Fri, Jan 17, 2025 at 02:00:55PM +0100, Alexandra Winter wrote:
> 
> 
> On 17.01.25 03:13, Dust Li wrote:
> >>>> Modular Approach: I've made the ism_loopback an independent kernel
> >>>> module since dynamic enable/disable functionality is not yet supported
> >>>> in SMC. Using insmod and rmmod for module management could provide the
> >>>> flexibility needed in practical scenarios.
> >>
> >> With this proposal ism_loopback is just another ism device and SMC-D will
> >> handle removal just like ism_client.remove(ism_dev) of other ism devices.
> >>
> >> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
> >> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
> >> removal by ism_dev_unregister(). In case of this RFC that would only happen
> >> in case of rmmod ism. Which should be improved.
> >> One way to do that would be a separate ism_loopback kernel module, like you say.
> >> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
> >> I also think it is a great way for testing any ISM client, so it has benefit for
> >> anybody using the ism module.
> >> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
> >> (Once we agree if and how to represent ism devices in genera in sysfs).
> > This works for me as well. I think it would be better to implement this
> > within the common ISM layer, rather than duplicating the code in each
> > device. Similar to how it's done in netdevice.
> > 
> > Best regards,
> > Dust
> 
> 
> Is there a specific example for enable/disable in the netdevice code, you have in mind?
> Or do you mean in general how netdevice provides a common layer?
> Yes, everything that is common for all devices should be provided by the network layer.

Again, lack of basic understanding.... but why is it not a network
device? Network devices are not just Ethernet. We also have CAN, SLIP,
FDDI, etc.

	Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 15:06       ` Andrew Lunn
@ 2025-01-17 15:38         ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-17 15:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Julian Ruess, dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 17.01.25 16:06, Andrew Lunn wrote:
>> With this proposal ism_loopback is just another ism device and SMC-D will
>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
> 
> In Linux terminology, a device is something which has a struct device,
> and a device lives on some sort of bus, even if it is a virtual
> bus. Will ISM devices properly fit into the Linux device driver model?
> 
> 	Andrew
> 

ism_vpci lives on a pci bus (zpci flavor) today. The fact that it is not
backed by a real hardware PCI slot, but emulated by s390 firmware is not
visible to Linux.

In the first proposal, ism_lo lived in on a virtual bus, afaiu. I liked
that. In the current stage 1 implementation that is currently upstream,
it is not visible in sysfs :-(


ism_dev is a bit modeled after net_device. So it is contains a pointer
to a struct device, but it is not the device itself.

I have to admit that the sysfs details are a bit confusing to me,
so I wanted to discuss them first before adding them to the RFC.
But I tried to bring all the prereqs in place.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 15:02           ` Andrew Lunn
@ 2025-01-17 16:00             ` Niklas Schnelle
  2025-01-17 16:33               ` Andrew Lunn
  0 siblings, 1 reply; 61+ messages in thread
From: Niklas Schnelle @ 2025-01-17 16:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: dust.li, Alexandra Winter, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Fri, 2025-01-17 at 16:02 +0100, Andrew Lunn wrote:
> > One important point I see is that there is a bit of a misnomer in the
> > existing ISM name in that our ISM device does in fact *not* share
> > memory in the common sense of the "shared memory" wording.
> 
> Maybe this is the trap i fell into. So are you saying it is not a dual
> port memory mapped into two CPUs physical address space?
> 

Conceptually kind of but the existing s390 specific ISM device is a bit
special. But let me start with some background. On s390 aka Mainframes
OSs including Linux runs in so called logical partitions (LPARs) which
are machine hypervisor VMs which use partitioned non-paging memory. The
fact that memory is partitioned is important because this means LPARs
can not share physical memory by mapping it.

Now at a high level an ISM device allows communication between two such
Linux LPARs on the same machine. The device is discovered as a PCI
device and allows Linux to take a buffer called a DMB map that in the
IOMMU and generate a token specific to another LPAR which also sees an
ISM device sharing the same virtual channel identifier (VCHID). This
token can then be transferred out of band (e.g. as part of an extended
TCP handshake in SMC-D) to that other system. With the token the other
system can use its ISM device to securely (authenticated by the token,
LPAR identity and the IOMMU mapping) write into the original systems
DMB at throughput and latency similar to doing a memcpy() via a
syscall.

On the implementation level the ISM device is actually a piece of
firmware and the write to a remote DMB is a special case of our PCI
Store Block instruction (no real MMIO on s390, instead there are
special instructions). Sadly there are a few more quirks but in
principle you can think of it as redirecting writes to a part of the
ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
There's of course also a mechanism to cause an interrupt on the
receiver as the write completes.

>  In another
> email there was reference to shm. That would be a VMM equivalent, a
> bunch of pages mapped into two processes address space.

Yes as on a hypervisor which backs VMs with pages one can simply map
the DMBin both guests (one mapping potentially being write only) with
writes being literally a memcpy().

> 
> This comes back to the lack of top level architecture documentation.
> Outside reviewers such as i will have difficultly making useful
> contributions, and seeing potential overlap and reuse with other
> systems, without having a basic understanding of what you are talking
> about.
> 
> 	Andrew

I understand your frustration. I do think we're making progress here
though. For one loopback ISM makes SMC-D usable/testable even on bare
metal and then Alibaba is working on a virtio based ISM device and the
draft is public[0]. And there is some information on SMC's use of ISM
via a whitepaper[1]

[0] https://lore.kernel.org/all/Y1IqX2uVpcD7cvRF@TonyMac-Alibaba/T/
[1]
https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Shared%20Memory%20Communications%20Version%202.1%20Emulated-ISM_0.pdf

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 15:10           ` Andrew Lunn
@ 2025-01-17 16:20             ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-17 16:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: dust.li, Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 17.01.25 16:10, Andrew Lunn wrote:
> On Fri, Jan 17, 2025 at 02:00:55PM +0100, Alexandra Winter wrote:
>>
>>
>> On 17.01.25 03:13, Dust Li wrote:
>>>>>> Modular Approach: I've made the ism_loopback an independent kernel
>>>>>> module since dynamic enable/disable functionality is not yet supported
>>>>>> in SMC. Using insmod and rmmod for module management could provide the
>>>>>> flexibility needed in practical scenarios.
>>>>
>>>> With this proposal ism_loopback is just another ism device and SMC-D will
>>>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>>>>
>>>> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>>>> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>>>> removal by ism_dev_unregister(). In case of this RFC that would only happen
>>>> in case of rmmod ism. Which should be improved.
>>>> One way to do that would be a separate ism_loopback kernel module, like you say.
>>>> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>>>> I also think it is a great way for testing any ISM client, so it has benefit for
>>>> anybody using the ism module.
>>>> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>>>> (Once we agree if and how to represent ism devices in genera in sysfs).
>>> This works for me as well. I think it would be better to implement this
>>> within the common ISM layer, rather than duplicating the code in each
>>> device. Similar to how it's done in netdevice.
>>>
>>> Best regards,
>>> Dust
>>
>>
>> Is there a specific example for enable/disable in the netdevice code, you have in mind?
>> Or do you mean in general how netdevice provides a common layer?
>> Yes, everything that is common for all devices should be provided by the network layer.
> 
> Again, lack of basic understanding.... but why is it not a network
> device? Network devices are not just Ethernet. We also have CAN, SLIP,
> FDDI, etc.
> 
> 	Andrew
> 

Thank you very much Andrew for spending the time and discussing this.

At the moment there is not usecase for attaching an ism interface via
struct net_device to the linux network layer.
Current client is SMC-D and they bypass the network stack.
Next client is a tty-over-ism console driver.
And Niklas Schnelle envisions TTYs, block devices, framebuffers over ISM.
None of them do need queues or headers or other things the network stack
offers, as long as the source can directly write into the target buffer.
As mentioned earlier, probably one could write an network-over-ism
driver, but nobody asks for it at the moment.

I have read about the Linux device model with devices and buses, but I
have a bit of a hard time mapping that to an existing machine with
classes, susbsystems, slots, subtypes, buses over buses and virtual
things, etc that often have the same names for different things.
So any advice for alternative placements of ism is welcome.

This is the kind of picture I have in my head:

                 SMC-sockets		     console
ISM clients:         |				|
             +-----------------+        +-----------------+
             |     SMC-D       |        |     tty-ism     |
             +-----------------+        +-----------------+
		     |			         |
+----------------------------------------------------------------------+
|            ism layer						       |
|  ism interfaces:     						       |
| ism_vp0	     ism_vp1,        ism_lo,           ism_virtio0, .. |
+----------------------------------------------------------------------+
     |			|		|
+--------------+  +--------------+ +----------------+	
| 0000:00:00.0 |  | 0000:00:00.1 | | virtual/ism/lo |	
+--------------+  +--------------+ +----------------+	
	\		/
+------------------------+
|     pci bus            |
+------------------------+


ls /sys/class/ism
ism_vp0 -> ../../devices/pci0124:00/0124:00:00.0/ism/ism_vp0
ism_vp1 -> ../../devices/pci0125:00/0125:00:00.0/ism/ism_vp1
ism_lo -> ../../devices/virtual/ism/ism_lo


Maybe all that is overkill for ISM?
I think it would be very helpful, if they show up in sysfs, but not
absolutely required.











^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 16:00             ` Niklas Schnelle
@ 2025-01-17 16:33               ` Andrew Lunn
  2025-01-17 16:57                 ` Niklas Schnelle
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2025-01-17 16:33 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: dust.li, Alexandra Winter, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

> Conceptually kind of but the existing s390 specific ISM device is a bit
> special. But let me start with some background. On s390 aka Mainframes
> OSs including Linux runs in so called logical partitions (LPARs) which
> are machine hypervisor VMs which use partitioned non-paging memory. The
> fact that memory is partitioned is important because this means LPARs
> can not share physical memory by mapping it.
> 
> Now at a high level an ISM device allows communication between two such
> Linux LPARs on the same machine. The device is discovered as a PCI
> device and allows Linux to take a buffer called a DMB map that in the
> IOMMU and generate a token specific to another LPAR which also sees an
> ISM device sharing the same virtual channel identifier (VCHID). This
> token can then be transferred out of band (e.g. as part of an extended
> TCP handshake in SMC-D) to that other system. With the token the other
> system can use its ISM device to securely (authenticated by the token,
> LPAR identity and the IOMMU mapping) write into the original systems
> DMB at throughput and latency similar to doing a memcpy() via a
> syscall.
> 
> On the implementation level the ISM device is actually a piece of
> firmware and the write to a remote DMB is a special case of our PCI
> Store Block instruction (no real MMIO on s390, instead there are
> special instructions). Sadly there are a few more quirks but in
> principle you can think of it as redirecting writes to a part of the
> ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
> There's of course also a mechanism to cause an interrupt on the
> receiver as the write completes.

So the s390 details are interesting, but as you say, it is
special. Ideally, all the special should be hidden away inside the
driver.

So please take a step back. What is the abstract model?

Can the abstract model be mapped onto CLX? Could it be used with a GPU
vRAM? SoC with real shared memory between a pool of CPUs.

	Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 16:33               ` Andrew Lunn
@ 2025-01-17 16:57                 ` Niklas Schnelle
  2025-01-17 20:29                   ` Andrew Lunn
  0 siblings, 1 reply; 61+ messages in thread
From: Niklas Schnelle @ 2025-01-17 16:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: dust.li, Alexandra Winter, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Fri, 2025-01-17 at 17:33 +0100, Andrew Lunn wrote:
> > Conceptually kind of but the existing s390 specific ISM device is a bit
> > special. But let me start with some background. On s390 aka Mainframes
> > OSs including Linux runs in so called logical partitions (LPARs) which
> > are machine hypervisor VMs which use partitioned non-paging memory. The
> > fact that memory is partitioned is important because this means LPARs
> > can not share physical memory by mapping it.
> > 
> > Now at a high level an ISM device allows communication between two such
> > Linux LPARs on the same machine. The device is discovered as a PCI
> > device and allows Linux to take a buffer called a DMB map that in the
> > IOMMU and generate a token specific to another LPAR which also sees an
> > ISM device sharing the same virtual channel identifier (VCHID). This
> > token can then be transferred out of band (e.g. as part of an extended
> > TCP handshake in SMC-D) to that other system. With the token the other
> > system can use its ISM device to securely (authenticated by the token,
> > LPAR identity and the IOMMU mapping) write into the original systems
> > DMB at throughput and latency similar to doing a memcpy() via a
> > syscall.
> > 
> > On the implementation level the ISM device is actually a piece of
> > firmware and the write to a remote DMB is a special case of our PCI
> > Store Block instruction (no real MMIO on s390, instead there are
> > special instructions). Sadly there are a few more quirks but in
> > principle you can think of it as redirecting writes to a part of the
> > ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
> > There's of course also a mechanism to cause an interrupt on the
> > receiver as the write completes.
> 
> So the s390 details are interesting, but as you say, it is
> special. Ideally, all the special should be hidden away inside the
> driver.

Yes and it will be. There are some exceptions e.g. for vfio-pci pass-
through but that's not unusual and why there is already the concept of
vfio-pci extension module.

> 
> So please take a step back. What is the abstract model?

I think my high level description may be a good start. The abstract
model is the ability to share a memory buffer (DMB) for writing by a
communication partner, authenticated by a DMB Token. Plus stuff like
triggering an interrupt on write or explicit trigger. Then Alibaba
added optional support for what they called attaching the buffer which
means it becomes truly shared between the peers but which IBM's ISM
can't support. Plus a few more optional pieces such as VLANs, PNETIDs
don't ask. The idea for the new layer then is to define this interface
with operations and documentation.

> 
> Can the abstract model be mapped onto CLX? Could it be used with a GPU
> vRAM? SoC with real shared memory between a pool of CPUs.
> 
> 	Andrew

I'd think that yes, one could implement such a mechanism on top of CXL
as well as on SoC. Or even with no special hardware between a host and
a DPU (e.g. via PCIe endpoint framework). Basically anything that can
DMA and IRQs between two OS instances.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 16:57                 ` Niklas Schnelle
@ 2025-01-17 20:29                   ` Andrew Lunn
  2025-01-20  6:21                     ` Dust Li
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2025-01-17 20:29 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: dust.li, Alexandra Winter, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Fri, Jan 17, 2025 at 05:57:10PM +0100, Niklas Schnelle wrote:
> On Fri, 2025-01-17 at 17:33 +0100, Andrew Lunn wrote:
> > > Conceptually kind of but the existing s390 specific ISM device is a bit
> > > special. But let me start with some background. On s390 aka Mainframes
> > > OSs including Linux runs in so called logical partitions (LPARs) which
> > > are machine hypervisor VMs which use partitioned non-paging memory. The
> > > fact that memory is partitioned is important because this means LPARs
> > > can not share physical memory by mapping it.
> > > 
> > > Now at a high level an ISM device allows communication between two such
> > > Linux LPARs on the same machine. The device is discovered as a PCI
> > > device and allows Linux to take a buffer called a DMB map that in the
> > > IOMMU and generate a token specific to another LPAR which also sees an
> > > ISM device sharing the same virtual channel identifier (VCHID). This
> > > token can then be transferred out of band (e.g. as part of an extended
> > > TCP handshake in SMC-D) to that other system. With the token the other
> > > system can use its ISM device to securely (authenticated by the token,
> > > LPAR identity and the IOMMU mapping) write into the original systems
> > > DMB at throughput and latency similar to doing a memcpy() via a
> > > syscall.
> > > 
> > > On the implementation level the ISM device is actually a piece of
> > > firmware and the write to a remote DMB is a special case of our PCI
> > > Store Block instruction (no real MMIO on s390, instead there are
> > > special instructions). Sadly there are a few more quirks but in
> > > principle you can think of it as redirecting writes to a part of the
> > > ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
> > > There's of course also a mechanism to cause an interrupt on the
> > > receiver as the write completes.
> > 
> > So the s390 details are interesting, but as you say, it is
> > special. Ideally, all the special should be hidden away inside the
> > driver.
> 
> Yes and it will be. There are some exceptions e.g. for vfio-pci pass-
> through but that's not unusual and why there is already the concept of
> vfio-pci extension module.
> 
> > 
> > So please take a step back. What is the abstract model?
> 
> I think my high level description may be a good start. The abstract
> model is the ability to share a memory buffer (DMB) for writing by a
> communication partner, authenticated by a DMB Token. Plus stuff like
> triggering an interrupt on write or explicit trigger. Then Alibaba
> added optional support for what they called attaching the buffer which
> means it becomes truly shared between the peers but which IBM's ISM
> can't support. Plus a few more optional pieces such as VLANs, PNETIDs
> don't ask. The idea for the new layer then is to define this interface
> with operations and documentation.
> 
> > 
> > Can the abstract model be mapped onto CLX? Could it be used with a GPU
> > vRAM? SoC with real shared memory between a pool of CPUs.
> > 
> > 	Andrew
> 
> I'd think that yes, one could implement such a mechanism on top of CXL
> as well as on SoC. Or even with no special hardware between a host and
> a DPU (e.g. via PCIe endpoint framework). Basically anything that can
> DMA and IRQs between two OS instances.

Is DMA part of the abstract model? That would suggest a true shared
memory system is excluded, since that would not require DMA.

Maybe take a look at subsystems like USB, I2C.

usb_submit_urb(struct urb *urb, gfp_t mem_flags)

An URB is a data structure with a block of memory associated with it,
contains the detail to pass to the USB device.

i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)

*msgs points to num of messages which get transferred to/from the I2C
device.

Could the high level API look like this? No DMA, no IRQ, no concept of
a somewhat shared memory. Just an API which asks for a message to be
sent to the other end? struct urb has some USB concepts in it, struct
i2c_msg has some I2C concepts in it. A struct ism_msg would follow the
same pattern, but does it need to care about the DMA, the IRQ, the
memory which is semi shared?

	Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 11:04   ` Alexandra Winter
@ 2025-01-18 15:24     ` Dust Li
  2025-01-20 11:45       ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Dust Li @ 2025-01-18 15:24 UTC (permalink / raw)
  To: Alexandra Winter, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-17 12:04:06, Alexandra Winter wrote:
>I hit the send button to early, sorry about that. 
>Let me comment on the other proposals from Dust Li as well.
>
>On 16.01.25 10:32, Dust Li wrote:
>> Abstraction of ISM Device Details: I propose we abstract the ISM device
>> details by providing SMC with helper functions. These functions could
>> encapsulate ism->ops, making the implementation cleaner and more
>> intuitive. 
>
>
>Maybe I misunderstand what you mean by helper functions..
>Why would you encapsulate ism->ops functions in another set of wrappers?
>I was happy to remove the helper functions in 2/7 and 7/7.

What I mean is similar to how IB handles it in include/rdma/ib_verbs.h.
A good example is ib_post_send or ibv_post_send in user space:

```c
static inline int ib_post_send(struct ib_qp *qp,
                               const struct ib_send_wr *send_wr,
                               const struct ib_send_wr **bad_send_wr)
{
        const struct ib_send_wr *dummy;

        return qp->device->ops.post_send(qp, send_wr, bad_send_wr ? : &dummy);
}
```

By following this approach, we can "hide" all the implementations behind
ism_xxx. Our users (SMC) should only interact with these APIs. The ism->ops
would then be used by our device implementers (vISM, loopback, etc.). This
would help make the layers clearer, which is the same approach IB takes.

The layout would somehow like this:

| -------------------- |-----------------------------|
|  ism_register_dmb()  |                             |
|  ism_move_data()     | <---  API for our users     |
|  ism_xxx() ...       |                             |
| -------------------- |-----------------------------|
|   ism_device_ops     | <---for our implementers    |
|                      |    (PCI-ISM/loopback, etc)  |
|----------------------|-----------------------------|


>
>
>This way, the struct ism_device would mainly serve its
>> implementers, while the upper helper functions offer a streamlined
>> interface for SMC.
>
>
>I was actually also wondering, whether the clients should access ism_device
>at all. Or whether they should only use the ism_ops.

I believe the client should only pass an ism_dev pointer to the ism_xxx()
helper functions. They should never directly access any of the fields inside
the ism_dev.


>I can give that a try in the next version. I think this RFC almost there already.
>The clients would still need to pass a poitner to ism_dev as a parameter.
>
>
>> Structuring and Naming: I recommend embedding the structure of ism_ops
>> directly within ism_dev rather than using a pointer. 
>
>
>I think it is a common method to have the const struct xy_ops in the device driver code
>and then use pointer to register the device with an upper layer.

Right, If we have many ism_devs for each one ISM type, then using pointer
should save us some memory.

>What would be the benefit of duplicating that struct in every ism_dev?

The main benefit of embedding ism_device_ops within ism_dev is that it
reduces the dereferencing of an extra pointer. We already have too many
dereference in the datapath, it is not good for performance :(

For example:

rc = smcd->ism->ops->move_data(smcd->ism, dmb_tok, idx, sf, offset,
                               data, len);

Best regards,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 10:38         ` Niklas Schnelle
  2025-01-17 15:02           ` Andrew Lunn
@ 2025-01-18 15:31           ` Dust Li
  2025-01-28 16:04             ` Alexandra Winter
  1 sibling, 1 reply; 61+ messages in thread
From: Dust Li @ 2025-01-18 15:31 UTC (permalink / raw)
  To: Niklas Schnelle, Alexandra Winter, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman

On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>> > 
>---8<---
>> > Here are some of my thoughts on the matter:
>> > > > 
>> > > > Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>> > > > Device) instead of ISM (Internal Shared Memory). 
>> > 
>> > 
>> > So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>> 
>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>> shm_open(3)). SMD is also OK.
>> 
>> > 
>> > 
>> > To my knowledge, a
>> > > > "Shared Memory Device" better encapsulates the functionality we're
>> > > > aiming to implement. 
>> > 
>> > 
>> > Could you explain why that would be better?
>> > 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>> > Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>> > devices and by ism_loopback. So what is the benefit in changing it?
>> 
>> I believe that if we are going to separate and refine the code, and add
>> a common subsystem, we should choose the most appropriate name.
>> 
>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>> Since we’re adding a "Device" that enables different entities (such as
>> processes or VMs) to perform shared memory communication, I think a more
>> fitting name would be better. If you have any alternative suggestions,
>> I’m open to them.
>
>I kept thinking about this a bit and I'd like to propose yet another
>name for this group of devices: Memory Communication Devices (MCD)
>
>One important point I see is that there is a bit of a misnomer in the
>existing ISM name in that our ISM device does in fact *not* share
>memory in the common sense of the "shared memory" wording. Instead it
>copies data between partitions of memory that share a common
>cache/memory hierarchy while not sharing the memory itself. loopback-
>ism and a possibly future virtio-ism on the other hand would share
>memory in the "shared memory" sense. Though I'd very much hope they
>will retain a copy mode to allow use in partition scenarios.
>
>With that background I think the common denominator between them and
>the main idea behind ISM is that they facilitate communication via
>memory buffers and very simple and reliable copy/share operations. I
>think this would also capture our planned use-case of devices (TTYs,
>block devices, framebuffers + HID etc) provided by a peer on top of
>such a memory communication device.

Make sense, I agree with MCD.

Best regard,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
  2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
@ 2025-01-20  3:55   ` Dust Li
  2025-01-20  9:31     ` Alexandra Winter
  2025-02-06 17:36   ` Julian Ruess
  1 sibling, 1 reply; 61+ messages in thread
From: Dust Li @ 2025-01-20  3:55 UTC (permalink / raw)
  To: Alexandra Winter, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-15 20:55:25, Alexandra Winter wrote:
>The first stage of ism_loopback was implemented as part of the
>SMC module [1]. Now that we have the ism layer, provide
>access to the ism_loopback device to all ism clients.
>
>Move ism_loopback.* from net/smc to net/ism.
>The following changes are required to ism_loopback.c:
>- Change ism_lo_move_data() to no longer schedule an smcd receive tasklet,
>but instead call ism_client->handle_irq().
>Note: In this RFC patch ism_loppback is not fully generic.
>  The smc-d client uses attached buffers, for moves without signalling.
>  and not-attached buffers for moves with signalling.
>  ism_lo_move_data() must not rely on that assumption.
>  ism_lo_move_data() must be able to handle more than one ism client.
>
>In addition the following changes are required to unify ism_loopback and
>ism_vp:
>
>In ism layer and ism_vpci:
>ism_loopback is not backed by a pci device, so use dev instead of pdev in
>ism_dev.
>
>In smc-d:
>in smcd_alloc_dev():
>- use kernel memory instead of device memory for smcd_dev and smcd->conn.
>        An alternative would be to ask device to alloc the memory.
>- use different smcd_ops and max_dmbs for ism_vp and ism_loopback.
>    A future patch can change smc-d to directly use ism_ops instead of
>    smcd_ops.
>- use ism dev_name instead of pci dev name for ism_evt_wq name
>- allocate an event workqueue for ism_loopback, although it currently does
>  not generate events.
>
>Link: https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ [1]
>
>Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
>---
> drivers/s390/net/ism.h     |   6 +-
> drivers/s390/net/ism_drv.c |  31 ++-
> include/linux/ism.h        |  59 +++++
> include/net/smc.h          |   4 +-
> net/ism/Kconfig            |  13 ++
> net/ism/Makefile           |   1 +
> net/ism/ism_loopback.c     | 366 +++++++++++++++++++++++++++++++
> net/ism/ism_loopback.h     |  59 +++++
> net/ism/ism_main.c         |  11 +-
> net/smc/Kconfig            |  13 --
> net/smc/Makefile           |   1 -
> net/smc/af_smc.c           |  12 +-
> net/smc/smc_ism.c          | 108 +++++++---
> net/smc/smc_loopback.c     | 427 -------------------------------------
> net/smc/smc_loopback.h     |  60 ------
> 15 files changed, 606 insertions(+), 565 deletions(-)
> create mode 100644 net/ism/ism_loopback.c
> create mode 100644 net/ism/ism_loopback.h
> delete mode 100644 net/smc/smc_loopback.c
> delete mode 100644 net/smc/smc_loopback.h
>
>diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
>index 61cf10334170..0deca6d0e328 100644
>--- a/drivers/s390/net/ism.h
>+++ b/drivers/s390/net/ism.h
>@@ -202,7 +202,7 @@ struct ism_sba {
> static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> 				  unsigned long offset, unsigned long len)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, 8);
> 
> 	while (len > 0) {
>@@ -216,7 +216,7 @@ static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> 				   unsigned long offset, unsigned long len)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, len);
> 
> 	if (len)
>@@ -226,7 +226,7 @@ static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> static inline int __ism_move(struct ism_dev *ism, u64 dmb_req, void *data,
> 			     unsigned int size)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, size);
> 
> 	return __zpci_store_block(data, req, dmb_req);
>diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
>index ab70debbdd9d..c0954d6dd9f5 100644
>--- a/drivers/s390/net/ism_drv.c
>+++ b/drivers/s390/net/ism_drv.c
>@@ -88,7 +88,7 @@ static int register_sba(struct ism_dev *ism)
> 	dma_addr_t dma_handle;
> 	struct ism_sba *sba;
> 
>-	sba = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+	sba = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> 				 GFP_KERNEL);
> 	if (!sba)
> 		return -ENOMEM;
>@@ -99,7 +99,7 @@ static int register_sba(struct ism_dev *ism)
> 	cmd.request.sba = dma_handle;
> 
> 	if (ism_cmd(ism, &cmd)) {
>-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, sba, dma_handle);
>+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, sba, dma_handle);
> 		return -EIO;
> 	}
> 
>@@ -115,7 +115,7 @@ static int register_ieq(struct ism_dev *ism)
> 	dma_addr_t dma_handle;
> 	struct ism_eq *ieq;
> 
>-	ieq = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+	ieq = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> 				 GFP_KERNEL);
> 	if (!ieq)
> 		return -ENOMEM;
>@@ -127,7 +127,7 @@ static int register_ieq(struct ism_dev *ism)
> 	cmd.request.len = sizeof(*ieq);
> 
> 	if (ism_cmd(ism, &cmd)) {
>-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, ieq, dma_handle);
>+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, ieq, dma_handle);
> 		return -EIO;
> 	}
> 
>@@ -149,7 +149,7 @@ static int unregister_sba(struct ism_dev *ism)
> 	if (ret && ret != ISM_ERROR)
> 		return -EIO;
> 
>-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> 			  ism->sba, ism->sba_dma_addr);
> 
> 	ism->sba = NULL;
>@@ -169,7 +169,7 @@ static int unregister_ieq(struct ism_dev *ism)
> 	if (ret && ret != ISM_ERROR)
> 		return -EIO;
> 
>-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> 			  ism->ieq, ism->ieq_dma_addr);
> 
> 	ism->ieq = NULL;
>@@ -216,7 +216,7 @@ static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
> static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> {
> 	clear_bit(dmb->sba_idx, ism->sba_bitmap);
>-	dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len,
>+	dma_unmap_page(ism->dev.parent, dmb->dma_addr, dmb->dmb_len,
> 		       DMA_FROM_DEVICE);
> 	folio_put(virt_to_folio(dmb->cpu_addr));
> }
>@@ -227,7 +227,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> 	unsigned long bit;
> 	int rc;
> 
>-	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev))
>+	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(ism->dev.parent))
> 		return -EINVAL;
> 
> 	if (!dmb->sba_idx) {
>@@ -251,10 +251,10 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> 	}
> 
> 	dmb->cpu_addr = folio_address(folio);
>-	dmb->dma_addr = dma_map_page(&ism->pdev->dev,
>+	dmb->dma_addr = dma_map_page(ism->dev.parent,
> 				     virt_to_page(dmb->cpu_addr), 0,
> 				     dmb->dmb_len, DMA_FROM_DEVICE);
>-	if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) {
>+	if (dma_mapping_error(ism->dev.parent, dmb->dma_addr)) {
> 		rc = -ENOMEM;
> 		goto out_free;
> 	}
>@@ -419,10 +419,7 @@ static int ism_supports_v2(void)
> 
> static u16 ism_get_chid(struct ism_dev *ism)
> {
>-	if (!ism || !ism->pdev)
>-		return 0;
>-
>-	return to_zpci(ism->pdev)->pchid;
>+	return to_zpci(to_pci_dev(ism->dev.parent))->pchid;
> }
> 
> static void ism_handle_event(struct ism_dev *ism)
>@@ -499,7 +496,7 @@ static const struct ism_ops ism_vp_ops = {
> 
> static int ism_dev_init(struct ism_dev *ism)
> {
>-	struct pci_dev *pdev = ism->pdev;
>+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> 	int ret;
> 
> 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
>@@ -565,7 +562,6 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 
> 	spin_lock_init(&ism->lock);
> 	dev_set_drvdata(&pdev->dev, ism);
>-	ism->pdev = pdev;
> 	ism->dev.parent = &pdev->dev;
> 	device_initialize(&ism->dev);
> 	dev_set_name(&ism->dev, dev_name(&pdev->dev));
>@@ -603,14 +599,13 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 	device_del(&ism->dev);
> err_dev:
> 	dev_set_drvdata(&pdev->dev, NULL);
>-	kfree(ism);
> 
> 	return ret;
> }
> 
> static void ism_dev_exit(struct ism_dev *ism)
> {
>-	struct pci_dev *pdev = ism->pdev;
>+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> 	unsigned long flags;
> 	int i;
> 
>diff --git a/include/linux/ism.h b/include/linux/ism.h
>index bc165d077071..929a1f275419 100644
>--- a/include/linux/ism.h
>+++ b/include/linux/ism.h
>@@ -144,6 +144,9 @@ int  ism_unregister_client(struct ism_client *client);
>  *	identified by dmb_tok and idx. If signal flag (sf) then signal
>  *	the remote peer that data has arrived in this dmb.
>  *
>+ * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+ *	Unregister an ism_dmb buffer
>+ *
>  * int (*supports_v2)(void);
>  *
>  * u16 (*get_chid)(struct ism_dev *dev);
>@@ -218,12 +221,63 @@ struct ism_ops {
> 	int (*reset_vlan_required)(struct ism_dev *dev);
> 	int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
> 			    u32 trigger_irq, u32 event_code, u64 info);
>+/* no copy option
>+ * --------------
>+ */
>+	/**
>+	 * support_dmb_nocopy() - does this device provide no-copy option?
>+	 * @dev: ism device
>+	 *
>+	 * In addition to using move_data(), a sender device can provide a
>+	 * kernel address + length, that represents a target dmb
>+	 * (like MMIO). If a sender writes into such a ghost-send-buffer
>+	 * (= at this kernel address) the data will automatically
>+	 * immediately appear in the target dmb, even without calling
>+	 * move_data().
>+	 * Note that this is NOT related to the MSG_ZEROCOPY socket flag.
>+	 *
>+	 * Either all 3 function pointers for support_dmb_nocopy(),
>+	 * attach_dmb() and detach_dmb() are defined, or all of them must
>+	 * be NULL.
>+	 *
>+	 * Return: non-zero, if no-copy is supported.
>+	 */
>+	int (*support_dmb_nocopy)(struct ism_dev *dev);
>+	/**
>+	 * attach_dmb() - attach local memory to a remote dmb
>+	 * @dev: Local sending ism device
>+	 * @dmb: all other parameters are passed in the form of a
>+	 *	 dmb struct
>+	 *	 TODO: (THIS IS CONFUSING, should be changed)

Agree.

>+	 *  dmb_tok: (in) Token of the remote dmb, we want to attach to
>+	 *  cpu_addr: (out) MMIO address
>+	 *  dma_addr: (out) MMIO address (if applicable, invalid otherwise)
>+	 *  dmb_len: (out) length of local MMIO region,
>+	 *           equal to length of remote DMB.
>+	 *  sba_idx: (out) index of remote dmb (NOT HELPFUL, should be removed)
>+	 *
>+	 * Provides a memory address to the sender that can be used to
>+	 * directly write into the remote dmb.
>+	 *
>+	 * Return: Zero upon success, Error code otherwise
>+	 */
>+	int (*attach_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+	/**
>+	 * detach_dmb() - Detach the ghost buffer from a remote dmb
>+	 * @dev: ism device
>+	 * @token: dmb token of the remote dmb
>+	 * Return: Zero upon success, Error code otherwise
>+	 */
>+	int (*detach_dmb)(struct ism_dev *dev, u64 token);
> };
> 

...

>+
>+static int ism_lo_move_data(struct ism_dev *ism, u64 dmb_tok,
>+			    unsigned int idx, bool sf, unsigned int offset,
>+			    void *data, unsigned int size)
>+{
>+	struct ism_lo_dmb_node *rmb_node = NULL, *tmp_node;
>+	struct ism_lo_dev *ldev;
>+	u16 s_mask;
>+	u8 client_id;
>+	u32 sba_idx;
>+
>+	ldev = container_of(ism, struct ism_lo_dev, ism);
>+
>+	if (!sf)
>+		/* since sndbuf is merged with peer DMB, there is
>+		 * no need to copy data from sndbuf to peer DMB.
>+		 */
>+		return 0;
>+
>+	read_lock_bh(&ldev->dmb_ht_lock);
>+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
>+		if (tmp_node->token == dmb_tok) {
>+			rmb_node = tmp_node;
>+			break;
>+		}
>+	}
>+	if (!rmb_node) {
>+		read_unlock_bh(&ldev->dmb_ht_lock);
>+		return -EINVAL;
>+	}
>+	// So why copy the data now?? SMC usecase? Data buffer is attached,
>+	// rw-pointer are not attached?

I understand the confusion here. I have the same confusion the first time
I saw this.

This is actually the tricky part: it assumes the CDC will signal, while
the data will not. We need to copy the CDC, so the copy here only to the
CDC.

I think we should refine the move_data() API to make this clearer.

Best regards,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 20:29                   ` Andrew Lunn
@ 2025-01-20  6:21                     ` Dust Li
  2025-01-20 12:03                       ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Dust Li @ 2025-01-20  6:21 UTC (permalink / raw)
  To: Andrew Lunn, Niklas Schnelle
  Cc: Alexandra Winter, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-17 21:29:09, Andrew Lunn wrote:
>On Fri, Jan 17, 2025 at 05:57:10PM +0100, Niklas Schnelle wrote:
>> On Fri, 2025-01-17 at 17:33 +0100, Andrew Lunn wrote:
>> > > Conceptually kind of but the existing s390 specific ISM device is a bit
>> > > special. But let me start with some background. On s390 aka Mainframes
>> > > OSs including Linux runs in so called logical partitions (LPARs) which
>> > > are machine hypervisor VMs which use partitioned non-paging memory. The
>> > > fact that memory is partitioned is important because this means LPARs
>> > > can not share physical memory by mapping it.
>> > > 
>> > > Now at a high level an ISM device allows communication between two such
>> > > Linux LPARs on the same machine. The device is discovered as a PCI
>> > > device and allows Linux to take a buffer called a DMB map that in the
>> > > IOMMU and generate a token specific to another LPAR which also sees an
>> > > ISM device sharing the same virtual channel identifier (VCHID). This
>> > > token can then be transferred out of band (e.g. as part of an extended
>> > > TCP handshake in SMC-D) to that other system. With the token the other
>> > > system can use its ISM device to securely (authenticated by the token,
>> > > LPAR identity and the IOMMU mapping) write into the original systems
>> > > DMB at throughput and latency similar to doing a memcpy() via a
>> > > syscall.
>> > > 
>> > > On the implementation level the ISM device is actually a piece of
>> > > firmware and the write to a remote DMB is a special case of our PCI
>> > > Store Block instruction (no real MMIO on s390, instead there are
>> > > special instructions). Sadly there are a few more quirks but in
>> > > principle you can think of it as redirecting writes to a part of the
>> > > ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
>> > > There's of course also a mechanism to cause an interrupt on the
>> > > receiver as the write completes.
>> > 
>> > So the s390 details are interesting, but as you say, it is
>> > special. Ideally, all the special should be hidden away inside the
>> > driver.
>> 
>> Yes and it will be. There are some exceptions e.g. for vfio-pci pass-
>> through but that's not unusual and why there is already the concept of
>> vfio-pci extension module.
>> 
>> > 
>> > So please take a step back. What is the abstract model?
>> 
>> I think my high level description may be a good start. The abstract
>> model is the ability to share a memory buffer (DMB) for writing by a
>> communication partner, authenticated by a DMB Token. Plus stuff like
>> triggering an interrupt on write or explicit trigger. Then Alibaba
>> added optional support for what they called attaching the buffer which
>> means it becomes truly shared between the peers but which IBM's ISM
>> can't support. Plus a few more optional pieces such as VLANs, PNETIDs
>> don't ask. The idea for the new layer then is to define this interface
>> with operations and documentation.
>> 
>> > 
>> > Can the abstract model be mapped onto CLX? Could it be used with a GPU
>> > vRAM? SoC with real shared memory between a pool of CPUs.
>> > 
>> > 	Andrew
>> 
>> I'd think that yes, one could implement such a mechanism on top of CXL
>> as well as on SoC. Or even with no special hardware between a host and
>> a DPU (e.g. via PCIe endpoint framework). Basically anything that can
>> DMA and IRQs between two OS instances.
>
>Is DMA part of the abstract model? That would suggest a true shared
>memory system is excluded, since that would not require DMA.
>
>Maybe take a look at subsystems like USB, I2C.
>
>usb_submit_urb(struct urb *urb, gfp_t mem_flags)
>
>An URB is a data structure with a block of memory associated with it,
>contains the detail to pass to the USB device.
>
>i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
>
>*msgs points to num of messages which get transferred to/from the I2C
>device.
>
>Could the high level API look like this? No DMA, no IRQ, no concept of
>a somewhat shared memory. Just an API which asks for a message to be
>sent to the other end? struct urb has some USB concepts in it, struct
>i2c_msg has some I2C concepts in it. A struct ism_msg would follow the
>same pattern, but does it need to care about the DMA, the IRQ, the
>memory which is semi shared?

I don’t have a clear picture of what the API should look like yet, but I
believe it’s possible to avoid DMA and IRQ. In fact, the current data
transfer API, ops->move_data() in include/linux/ism.h, already abstracts
away the DMA and IRQ details.

One thing we cannot hide, however, is whether the operation is zero-copy
or copy. This distinction is important because we can reuse the data at
different times in copy mode and zero-copy mode.

Best regards,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
  2025-01-15 22:06   ` Halil Pasic
@ 2025-01-20  6:32   ` Dust Li
  2025-01-20  9:56     ` Alexandra Winter
  2025-01-20 10:34     ` Niklas Schnelle
  1 sibling, 2 replies; 61+ messages in thread
From: Dust Li @ 2025-01-20  6:32 UTC (permalink / raw)
  To: Alexandra Winter, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-15 20:55:24, Alexandra Winter wrote:
>Note that in this RFC this patch is not complete, future versions
>of this patch need to contain comments for all ism_ops.
>Especially signal_event() and handle_event() need a good generic
>description.
>
>Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
>---
> include/linux/ism.h | 115 ++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 105 insertions(+), 10 deletions(-)
>
>diff --git a/include/linux/ism.h b/include/linux/ism.h
>index 50975847248f..bc165d077071 100644
>--- a/include/linux/ism.h
>+++ b/include/linux/ism.h
>@@ -13,11 +13,26 @@
> #include <linux/workqueue.h>
> #include <linux/uuid.h>
> 
>-/* The remote peer rgid can use dmb_tok to write into this buffer. */
>+/*
>+ * DMB - Direct Memory Buffer
>+ * ==========================
>+ * An ism client provides an DMB as input buffer for a local receiving
>+ * ism device for exactly one (remote) sending ism device. Only this
>+ * sending device can send data into this DMB using move_data(). Sender
>+ * and receiver can be the same device.
>+ * TODO: Alignment and length rules (CPU and DMA). Device specific?
>+ */
> struct ism_dmb {
>+	/* dmb_tok - Token for this dmb
>+	 * Used by remote sender to address this dmb.
>+	 * Provided by ism fabric in register_dmb().
>+	 * Unique per ism fabric.
>+	 */
> 	u64 dmb_tok;
>+	/* rgid - GID of designated remote sending device */
> 	u64 rgid;
> 	u32 dmb_len;
>+	/* sba_idx - Index of this DMB on this receiving device */
> 	u32 sba_idx;
> 	u32 vlan_valid;
> 	u32 vlan_id;
>@@ -25,6 +40,8 @@ struct ism_dmb {
> 	dma_addr_t dma_addr;
> };
> 
>+/* ISM event structure (currently device type specific) */
>+// TODO: Define and describe generic event properties
> struct ism_event {
> 	u32 type;
> 	u32 code;
>@@ -33,38 +50,89 @@ struct ism_event {
> 	u64 info;
> };
> 
>+//TODO: use enum typedef
> #define ISM_EVENT_DMB	0
> #define ISM_EVENT_GID	1
> #define ISM_EVENT_SWR	2
> 
> struct ism_dev;
> 
>+/*
>+ * ISM clients
>+ * ===========
>+ * All ism clients have access to all ism devices
>+ * and must provide the following functions to be called by
>+ * ism device drivers:
>+ */
> struct ism_client {
>+	/* client name for logging and debugging purposes */
> 	const char *name;
>+	/**
>+	 *  add() - add an ism device
>+	 *  @dev: device that was added
>+	 *
>+	 * Will be called during ism_register_client() for all existing
>+	 * ism devices and whenever a new ism device is registered.
>+	 * *dev is valid until ism_client->remove() is called.
>+	 */
> 	void (*add)(struct ism_dev *dev);
>+	/**
>+	 * remove() - remove an ism device
>+	 * @dev: device to be removed
>+	 *
>+	 * Will be called whenever an ism device is unregistered.
>+	 * Before this call the device is already inactive: It will
>+	 * no longer call client handlers.
>+	 * The client must not access *dev after this call.
>+	 */
> 	void (*remove)(struct ism_dev *dev);
>+	/**
>+	 * handle_event() - Handle control information sent by device
>+	 * @dev: device reporting the event
>+	 * @event: ism event structure
>+	 */
> 	void (*handle_event)(struct ism_dev *dev, struct ism_event *event);
>-	/* Parameter dmbemask contains a bit vector with updated DMBEs, if sent
>-	 * via ism_move_data(). Callback function must handle all active bits
>-	 * indicated by dmbemask.
>+	/**
>+	 * handle_irq() - Handle signalling of a DMB
>+	 * @dev: device owns the dmb
>+	 * @bit: sba_idx=idx of the ism_dmb that got signalled
>+	 *	TODO: Pass a priv pointer to ism_dmb instead of 'bit'(?)
>+	 * @dmbemask: ism signalling mask of the dmb
>+	 *
>+	 * Handle signalling of a dmb that was registered by this client
>+	 * for this device.
>+	 * The ism device can coalesce multiple signalling triggers into a
>+	 * single call of handle_irq(). dmbemask can be used to indicate
>+	 * different kinds of triggers.
> 	 */
> 	void (*handle_irq)(struct ism_dev *dev, unsigned int bit, u16 dmbemask);
>-	/* Private area - don't touch! */
>+	/* client index - provided by ism layer */
> 	u8 id;
> };
> 
> int ism_register_client(struct ism_client *client);
> int  ism_unregister_client(struct ism_client *client);
> 
>+//TODO: Pair descriptions with functions
>+/*
>+ * ISM devices
>+ * ===========
>+ */
> /* Mandatory operations for all ism devices:
>  * int (*query_remote_gid)(struct ism_dev *dev, uuid_t *rgid,
>  *	                   u32 vid_valid, u32 vid);
>  *	Query whether remote GID rgid is reachable via this device and this
>  *	vlan id. Vlan id is only checked if vid_valid != 0.
>+ *	Returns 0 if remote gid is reachable.
>  *
>  * int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
>  *			    void *client);
>- *	Register an ism_dmb buffer for this device and this client.
>+ *	Allocate and register an ism_dmb buffer for this device and this client.
>+ *	The following fields of ism_dmb must be valid:
>+ *	rgid, dmb_len, vlan_*; Optionally:requested sba_idx (non-zero)
>+ *	Upon return the following fields will be valid: dmb_tok, sba_idx
>+ *		cpu_addr, dma_addr (if applicable)
>+ *	Returns zero on success
>  *
>  * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>  *	Unregister an ism_dmb buffer
>@@ -81,10 +149,15 @@ int  ism_unregister_client(struct ism_client *client);
>  * u16 (*get_chid)(struct ism_dev *dev);
>  *	Returns ism fabric identifier (channel id) of this device.
>  *	Only devices on the same ism fabric can communicate.
>- *	chid is unique per HW system, except for 0xFFFF, which denotes
>- *	an ism_loopback device that can only communicate with itself.
>- *	Use chid for fast negative checks, but only query_remote_gid()
>- *	can give a reliable positive answer.
>+ *	chid is unique per HW system. Use chid for fast negative checks,
>+ *	but only query_remote_gid() can give a reliable positive answer:
>+ *	Different chid: ism is not possible
>+ *	Same chid: ism traffic may be possible or not
>+ *		   (e.g. different HW systems)
>+ *	EXCEPTION: A value of 0xFFFF denotes an ism_loopback device
>+ *		that can only communicate with itself. Use GID or
>+ *		query_remote_gid()to determine whether sender and
>+ *		receiver use the same ism_loopback device.
>  *
>  * struct device* (*get_dev)(struct ism_dev *dev);
>  *
>@@ -109,6 +182,28 @@ struct ism_ops {
> 	int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
> 			    struct ism_client *client);
> 	int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+	/**
>+	 * move_data() - write into a remote dmb
>+	 * @dev: Local sending ism device
>+	 * @dmb_tok: Token of the remote dmb
>+	 * @idx: signalling index
>+	 * @sf: signalling flag;
>+	 *      if true, idx will be turned on at target ism interrupt mask
>+	 *      and target device will be signalled, if required.
>+	 * @offset: offset within target dmb
>+	 * @data: pointer to data to be sent
>+	 * @size: length of data to be sent
>+	 *
>+	 * Use dev to write data of size at offset into a remote dmb
>+	 * identified by dmb_tok. Data is moved synchronously, *data can
>+	 * be freed when this function returns.

When considering the API, I found this comment may be incorrect.

IIUC, in copy mode for PCI ISM devices, the CPU only tells the
device to perform a DMA copy. As a result, when this function returns,
the device may not have completed the DMA copy.

In zero-copy mode for loopback, the source and destination share the
same buffer. If the source rewrites the buffer, the destination may
encounter corrupted data. The source should only reuse the data after
the destination has finished reading it.

Best regards,
Dust

>+	 *
>+	 * If signalling flag (sf) is true, bit number idx bit will be
>+	 * turned on in the ism signalling mask, that belongs to the
>+	 * target dmb, and handle_irq() of the ism client that owns this
>+	 * dmb will be called, if required. The target device may chose to
>+	 * coalesce multiple signalling triggers.
>+	 */
> 	int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
> 			 bool sf, unsigned int offset, void *data,
> 			 unsigned int size);
>-- 
>2.45.2
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
  2025-01-20  3:55   ` Dust Li
@ 2025-01-20  9:31     ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-20  9:31 UTC (permalink / raw)
  To: dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic,
	D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 20.01.25 04:55, Dust Li wrote:
>> +static int ism_lo_move_data(struct ism_dev *ism, u64 dmb_tok,
>> +			    unsigned int idx, bool sf, unsigned int offset,
>> +			    void *data, unsigned int size)
>> +{
>> +	struct ism_lo_dmb_node *rmb_node = NULL, *tmp_node;
>> +	struct ism_lo_dev *ldev;
>> +	u16 s_mask;
>> +	u8 client_id;
>> +	u32 sba_idx;
>> +
>> +	ldev = container_of(ism, struct ism_lo_dev, ism);
>> +
>> +	if (!sf)
>> +		/* since sndbuf is merged with peer DMB, there is
>> +		 * no need to copy data from sndbuf to peer DMB.
>> +		 */
>> +		return 0;
>> +
>> +	read_lock_bh(&ldev->dmb_ht_lock);
>> +	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
>> +		if (tmp_node->token == dmb_tok) {
>> +			rmb_node = tmp_node;
>> +			break;
>> +		}
>> +	}
>> +	if (!rmb_node) {
>> +		read_unlock_bh(&ldev->dmb_ht_lock);
>> +		return -EINVAL;
>> +	}
>> +	// So why copy the data now?? SMC usecase? Data buffer is attached,
>> +	// rw-pointer are not attached?
> I understand the confusion here. I have the same confusion the first time
> I saw this.
> 
> This is actually the tricky part: it assumes the CDC will signal, while
> the data will not. We need to copy the CDC, so the copy here only to the
> CDC.
> 
> I think we should refine the move_data() API to make this clearer.
> 
> Best regards,
> Dust

I agree. Will be refined in next version.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-20  6:32   ` Dust Li
@ 2025-01-20  9:56     ` Alexandra Winter
  2025-01-20 10:07       ` Julian Ruess
  2025-01-20 10:34     ` Niklas Schnelle
  1 sibling, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-20  9:56 UTC (permalink / raw)
  To: dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic,
	D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 20.01.25 07:32, Dust Li wrote:
>> +	/**
>> +	 * move_data() - write into a remote dmb
>> +	 * @dev: Local sending ism device
>> +	 * @dmb_tok: Token of the remote dmb
>> +	 * @idx: signalling index
>> +	 * @sf: signalling flag;
>> +	 *      if true, idx will be turned on at target ism interrupt mask
>> +	 *      and target device will be signalled, if required.
>> +	 * @offset: offset within target dmb
>> +	 * @data: pointer to data to be sent
>> +	 * @size: length of data to be sent
>> +	 *
>> +	 * Use dev to write data of size at offset into a remote dmb
>> +	 * identified by dmb_tok. Data is moved synchronously, *data can
>> +	 * be freed when this function returns.
> When considering the API, I found this comment may be incorrect.
> 
> IIUC, in copy mode for PCI ISM devices, the CPU only tells the
> device to perform a DMA copy. As a result, when this function returns,
> the device may not have completed the DMA copy.
> 

No, it is actually one of the properties of ISM vPCI that the data is
moved synchronously inside the move_data() function. (on PCI layer the
data is moved inside the __zpci_store_block() command).
Obviously for loopback move_data() is also synchornous.

SMC-D does not make use of it, instead they re-use the same
conn->sndbuf_desc for the lifetime of a connection.


> In zero-copy mode for loopback, the source and destination share the
> same buffer. If the source rewrites the buffer, the destination may
> encounter corrupted data. The source should only reuse the data after
> the destination has finished reading it.
> 

That is true independent of the question, whether the move is
synchronous or not.
It is the clients' responsibility to make sure a sender does not
overwrite unread data. SMC uses the write-pointers and read-pointer for
that.


> Best regards,
> Dust
> 
>> +	 *
>> +	 * If signalling flag (sf) is true, bit number idx bit will be
>> +	 * turned on in the ism signalling mask, that belongs to the
>> +	 * target dmb, and handle_irq() of the ism client that owns this
>> +	 * dmb will be called, if required. The target device may chose to
>> +	 * coalesce multiple signalling triggers.
>> +	 */
>> 	int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
>> 			 bool sf, unsigned int offset, void *data,
>> 			 unsigned int size);
>> -- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-20  9:56     ` Alexandra Winter
@ 2025-01-20 10:07       ` Julian Ruess
  2025-01-20 11:35         ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Julian Ruess @ 2025-01-20 10:07 UTC (permalink / raw)
  To: Alexandra Winter, dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Mon Jan 20, 2025 at 10:56 AM CET, Alexandra Winter wrote:
>
>
> On 20.01.25 07:32, Dust Li wrote:
> >> +	/**
> >> +	 * move_data() - write into a remote dmb
> >> +	 * @dev: Local sending ism device
> >> +	 * @dmb_tok: Token of the remote dmb
> >> +	 * @idx: signalling index
> >> +	 * @sf: signalling flag;
> >> +	 *      if true, idx will be turned on at target ism interrupt mask
> >> +	 *      and target device will be signalled, if required.
> >> +	 * @offset: offset within target dmb
> >> +	 * @data: pointer to data to be sent
> >> +	 * @size: length of data to be sent
> >> +	 *
> >> +	 * Use dev to write data of size at offset into a remote dmb
> >> +	 * identified by dmb_tok. Data is moved synchronously, *data can
> >> +	 * be freed when this function returns.
> > When considering the API, I found this comment may be incorrect.
> > 
> > IIUC, in copy mode for PCI ISM devices, the CPU only tells the
> > device to perform a DMA copy. As a result, when this function returns,
> > the device may not have completed the DMA copy.
> > 
>
> No, it is actually one of the properties of ISM vPCI that the data is
> moved synchronously inside the move_data() function. (on PCI layer the
> data is moved inside the __zpci_store_block() command).
> Obviously for loopback move_data() is also synchornous.

That is true for the IBM ISM vPCI device but maybe we
should design the API also for future PCI devices
that do not move data synchronously.

>
> SMC-D does not make use of it, instead they re-use the same
> conn->sndbuf_desc for the lifetime of a connection.
>
>
> > In zero-copy mode for loopback, the source and destination share the
> > same buffer. If the source rewrites the buffer, the destination may
> > encounter corrupted data. The source should only reuse the data after
> > the destination has finished reading it.
> > 
>
> That is true independent of the question, whether the move is
> synchronous or not.
> It is the clients' responsibility to make sure a sender does not
> overwrite unread data. SMC uses the write-pointers and read-pointer for
> that.
>
>
> > Best regards,
> > Dust
> > 
> >> +	 *
> >> +	 * If signalling flag (sf) is true, bit number idx bit will be
> >> +	 * turned on in the ism signalling mask, that belongs to the
> >> +	 * target dmb, and handle_irq() of the ism client that owns this
> >> +	 * dmb will be called, if required. The target device may chose to
> >> +	 * coalesce multiple signalling triggers.
> >> +	 */
> >> 	int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
> >> 			 bool sf, unsigned int offset, void *data,
> >> 			 unsigned int size);
> >> -- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-17 13:00         ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
  2025-01-17 15:10           ` Andrew Lunn
@ 2025-01-20 10:28           ` Alexandra Winter
  2025-01-22  3:04             ` Dust Li
  1 sibling, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-20 10:28 UTC (permalink / raw)
  To: dust.li, Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 17.01.25 14:00, Alexandra Winter wrote:
> 
> 
> On 17.01.25 03:13, Dust Li wrote:
>>>>> Modular Approach: I've made the ism_loopback an independent kernel
>>>>> module since dynamic enable/disable functionality is not yet supported
>>>>> in SMC. Using insmod and rmmod for module management could provide the
>>>>> flexibility needed in practical scenarios.
>>>
>>> With this proposal ism_loopback is just another ism device and SMC-D will
>>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>>>
>>> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>>> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>>> removal by ism_dev_unregister(). In case of this RFC that would only happen
>>> in case of rmmod ism. Which should be improved.
>>> One way to do that would be a separate ism_loopback kernel module, like you say.
>>> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>>> I also think it is a great way for testing any ISM client, so it has benefit for
>>> anybody using the ism module.
>>> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>>> (Once we agree if and how to represent ism devices in genera in sysfs).
>> This works for me as well. I think it would be better to implement this
>> within the common ISM layer, rather than duplicating the code in each
>> device. Similar to how it's done in netdevice.
>>
>> Best regards,
>> Dust
> 
> 
> Is there a specific example for enable/disable in the netdevice code, you have in mind?
> Or do you mean in general how netdevice provides a common layer?
> Yes, everything that is common for all devices should be provided by the network layer.


Dust for some reason, you did not 'Reply-all':
Dust Li wrote:
> I think dev_close()/dev_open() are the high-level APIs, while
> ndo_stop()/ndo_open() are the underlying device operations that we
> can reference.


I hear you, it can be beneficial to have a way for upper layers to
enable/disable an ism device.
But all this is typically a tricky area. The device driver can also have
reasons to enable/disable a device, then hardware could do that or even
hotplug a device. Error recovery on different levels may want to run a
disable/enable sequence as a reset, etc. And all this has potential for
deadlocks.
All this is rather trivial for ism-loopback, as there is not much of a
lower layer.
ism-vpci already has 'HW' / device driver configure on/off and device
add/remove.
For a future ism-virtio, the Hipervisor may want to add/remove devices.

I wonder what could be the simplest definition of an enable/disable for
the ism layer, that we can start with? More sophisticated functionality
can always be added later.
Maybe support for add/remove ism-device by the device driver is
sufficient as  starting point?











^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-20  6:32   ` Dust Li
  2025-01-20  9:56     ` Alexandra Winter
@ 2025-01-20 10:34     ` Niklas Schnelle
  2025-01-22 15:02       ` Dust Li
  1 sibling, 1 reply; 61+ messages in thread
From: Niklas Schnelle @ 2025-01-20 10:34 UTC (permalink / raw)
  To: dust.li, Alexandra Winter, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Julian Ruess, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Mon, 2025-01-20 at 14:32 +0800, Dust Li wrote:
> On 2025-01-15 20:55:24, Alexandra Winter wrote:
> > Note that in this RFC this patch is not complete, future versions
> > of this patch need to contain comments for all ism_ops.
> > Especially signal_event() and handle_event() need a good generic
> > description.
> > 
> > Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
> > ---
> > include/linux/ism.h | 115 ++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 105 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/linux/ism.h b/include/linux/ism.h
> > index 50975847248f..bc165d077071 100644
> > --- a/include/linux/ism.h
> > +++ b/include/linux/ism.h
> > @@ -13,11 +13,26 @@
> > #include <linux/workqueue.h>
> > #include <linux/uuid.h>
> > 
> > -/* The remote peer rgid can use dmb_tok to write into this buffer. */
> > +/*
> > + * DMB - Direct Memory Buffer
> > + * ==========================
> > + * An ism client provides an DMB as input buffer for a local receiving
> > + * ism device for exactly one (remote) sending ism device. Only this
> > + * sending device can send data into this DMB using move_data(). Sender
> > + * and receiver can be the same device.
> > + * TODO: Alignment and length rules (CPU and DMA). Device specific?
> > + */
> > struct ism_dmb {
> > +	/* dmb_tok - Token for this dmb
> > +	 * Used by remote sender to address this dmb.
> > +	 * Provided by ism fabric in register_dmb().
> > +	 * Unique per ism fabric.
> > +	 */
> > 	u64 dmb_tok;
> > +	/* rgid - GID of designated remote sending device */
> > 	u64 rgid;
> > 	u32 dmb_len;
> > +	/* sba_idx - Index of this DMB on this receiving device */
> > 	u32 sba_idx;
> > 	u32 vlan_valid;
> > 	u32 vlan_id;
> > @@ -25,6 +40,8 @@ struct ism_dmb {
> > 	dma_addr_t dma_addr;
> > };
> > 
> > +/* ISM event structure (currently device type specific) */
> > +// TODO: Define and describe generic event properties
> > struct ism_event {
> > 	u32 type;
> > 	u32 code;
> > @@ -33,38 +50,89 @@ struct ism_event {
> > 	u64 info;
> > };
> > 
> > +//TODO: use enum typedef
> > #define ISM_EVENT_DMB	0
> > #define ISM_EVENT_GID	1
> > #define ISM_EVENT_SWR	2
> > 
> > struct ism_dev;
> > 
> > +/*
> > + * ISM clients
> > + * ===========
> > + * All ism clients have access to all ism devices
> > + * and must provide the following functions to be called by
> > + * ism device drivers:
> > + */
> > struct ism_client {
> > +	/* client name for logging and debugging purposes */
> > 	const char *name;
> > +	/**
> > +	 *  add() - add an ism device
> > +	 *  @dev: device that was added
> > +	 *
> > +	 * Will be called during ism_register_client() for all existing
> > +	 * ism devices and whenever a new ism device is registered.
> > +	 * *dev is valid until ism_client->remove() is called.
> > +	 */
> > 	void (*add)(struct ism_dev *dev);
> > +	/**
> > +	 * remove() - remove an ism device
> > +	 * @dev: device to be removed
> > +	 *
> > +	 * Will be called whenever an ism device is unregistered.
> > +	 * Before this call the device is already inactive: It will
> > +	 * no longer call client handlers.
> > +	 * The client must not access *dev after this call.
> > +	 */
> > 	void (*remove)(struct ism_dev *dev);
> > +	/**
> > +	 * handle_event() - Handle control information sent by device
> > +	 * @dev: device reporting the event
> > +	 * @event: ism event structure
> > +	 */
> > 	void (*handle_event)(struct ism_dev *dev, struct ism_event *event);
> > -	/* Parameter dmbemask contains a bit vector with updated DMBEs, if sent
> > -	 * via ism_move_data(). Callback function must handle all active bits
> > -	 * indicated by dmbemask.
> > +	/**
> > +	 * handle_irq() - Handle signalling of a DMB
> > +	 * @dev: device owns the dmb
> > +	 * @bit: sba_idx=idx of the ism_dmb that got signalled
> > +	 *	TODO: Pass a priv pointer to ism_dmb instead of 'bit'(?)
> > +	 * @dmbemask: ism signalling mask of the dmb
> > +	 *
> > +	 * Handle signalling of a dmb that was registered by this client
> > +	 * for this device.
> > +	 * The ism device can coalesce multiple signalling triggers into a
> > +	 * single call of handle_irq(). dmbemask can be used to indicate
> > +	 * different kinds of triggers.
> > 	 */
> > 	void (*handle_irq)(struct ism_dev *dev, unsigned int bit, u16 dmbemask);
> > -	/* Private area - don't touch! */
> > +	/* client index - provided by ism layer */
> > 	u8 id;
> > };
> > 
> > int ism_register_client(struct ism_client *client);
> > int  ism_unregister_client(struct ism_client *client);
> > 
> > +//TODO: Pair descriptions with functions
> > +/*
> > + * ISM devices
> > + * ===========
> > + */
> > /* Mandatory operations for all ism devices:
> >  * int (*query_remote_gid)(struct ism_dev *dev, uuid_t *rgid,
> >  *	                   u32 vid_valid, u32 vid);
> >  *	Query whether remote GID rgid is reachable via this device and this
> >  *	vlan id. Vlan id is only checked if vid_valid != 0.
> > + *	Returns 0 if remote gid is reachable.
> >  *
> >  * int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
> >  *			    void *client);
> > - *	Register an ism_dmb buffer for this device and this client.
> > + *	Allocate and register an ism_dmb buffer for this device and this client.
> > + *	The following fields of ism_dmb must be valid:
> > + *	rgid, dmb_len, vlan_*; Optionally:requested sba_idx (non-zero)
> > + *	Upon return the following fields will be valid: dmb_tok, sba_idx
> > + *		cpu_addr, dma_addr (if applicable)
> > + *	Returns zero on success
> >  *
> >  * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
> >  *	Unregister an ism_dmb buffer
> > @@ -81,10 +149,15 @@ int  ism_unregister_client(struct ism_client *client);
> >  * u16 (*get_chid)(struct ism_dev *dev);
> >  *	Returns ism fabric identifier (channel id) of this device.
> >  *	Only devices on the same ism fabric can communicate.
> > - *	chid is unique per HW system, except for 0xFFFF, which denotes
> > - *	an ism_loopback device that can only communicate with itself.
> > - *	Use chid for fast negative checks, but only query_remote_gid()
> > - *	can give a reliable positive answer.
> > + *	chid is unique per HW system. Use chid for fast negative checks,
> > + *	but only query_remote_gid() can give a reliable positive answer:
> > + *	Different chid: ism is not possible
> > + *	Same chid: ism traffic may be possible or not
> > + *		   (e.g. different HW systems)
> > + *	EXCEPTION: A value of 0xFFFF denotes an ism_loopback device
> > + *		that can only communicate with itself. Use GID or
> > + *		query_remote_gid()to determine whether sender and
> > + *		receiver use the same ism_loopback device.
> >  *
> >  * struct device* (*get_dev)(struct ism_dev *dev);
> >  *
> > @@ -109,6 +182,28 @@ struct ism_ops {
> > 	int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
> > 			    struct ism_client *client);
> > 	int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
> > +	/**
> > +	 * move_data() - write into a remote dmb
> > +	 * @dev: Local sending ism device
> > +	 * @dmb_tok: Token of the remote dmb
> > +	 * @idx: signalling index
> > +	 * @sf: signalling flag;
> > +	 *      if true, idx will be turned on at target ism interrupt mask
> > +	 *      and target device will be signalled, if required.
> > +	 * @offset: offset within target dmb
> > +	 * @data: pointer to data to be sent
> > +	 * @size: length of data to be sent
> > +	 *
> > +	 * Use dev to write data of size at offset into a remote dmb
> > +	 * identified by dmb_tok. Data is moved synchronously, *data can
> > +	 * be freed when this function returns.
> 
> When considering the API, I found this comment may be incorrect.
> 
> IIUC, in copy mode for PCI ISM devices, the CPU only tells the
> device to perform a DMA copy. As a result, when this function returns,
> the device may not have completed the DMA copy.

For the s390 ISM device the statement is true. The move_data() function
does a PCI Store Block instruction which is both the write on the
sender side but also synchronously acts as the devices DMA write on the
receiver side. So when the PCI Store Block instruction completes the
data has been cache coherently written to the receiver DMB. And yes
full synchronicity would be impossible with the posted writes of real
PCIe.

That said when it comes to API design I think you have a great point
here in that we need to decide if this synchronicity should be baked
into the move_data() API. I think we instead want to only guarantee a
weaker rule. That is the source buffer can be re-used after the move.
This to me is also aligned with the word "move" here in that the data
has been moved after the call not registered to be moved or such. This
could be achieved with a real PCIe device by copying the data or by
waiting on completion. If we ever get devices which need to wait on
completion it may indeed be better to have a separate completion step
in the API too. Then again I think the concept of having a single "move
data" step is somewhat central to ISM and I'd hate to lose that
simplicity.

I've been thinking also about a possible copy mode in a virtio-ism.
That could be useful if we wanted to use virtio-ism between memory
partitioned guests, or if one wanted to transparently proxy virtio-ism
over s390 ISM to span multiple KVM hosts. And I think such a mode could
still work with a single "move data" step and I'd love to have that in
any future virtio-ism spec.

> 
> In zero-copy mode for loopback, the source and destination share the
> same buffer. If the source rewrites the buffer, the destination may
> encounter corrupted data. The source should only reuse the data after
> the destination has finished reading it.

I think there are two potential overwrite scenarios here.

1. The sender re-uses the source data buffer i.e. the @data buffer of
the move_data() call. On s390 ISM this is fine because the data was
copied out and into the destination DMB during the call. This could
typically become an issue if the device DMA reads directly from @data
after the move_data() call completed.

2. The sender does subsequent move_data() overwriting data in the
destination DMB before the receiver has read the data. This can happen
on s390 ISM too and needs to be prevented by DMB access rules.

For the move_data() call I think that even in a "shared i.e. same page
DMB" scenario move_data() must still do a copy out of the @data buffer
into the shared DMB. Otherwise it really wouldn't "move" data and it
would be a very weird API since @data is just a buffer not some kind of
descriptor. In other words I think scenario 1 shouldn't be possible in
either copy or shared DMB mode by the semantics of move_data().

> 
> > +	 *
> > +	 * If signalling flag (sf) is true, bit number idx bit will be
> > +	 * turned on in the ism signalling mask, that belongs to the
> > +	 * target dmb, and handle_irq() of the ism client that owns this
> > +	 * dmb will be called, if required. The target device may chose to
> > +	 * coalesce multiple signalling triggers.
> > +	 */
> > 	int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
> > 			 bool sf, unsigned int offset, void *data,
> > 			 unsigned int size);
> > -- 
> > 2.45.2
> > 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-20 10:07       ` Julian Ruess
@ 2025-01-20 11:35         ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-20 11:35 UTC (permalink / raw)
  To: Julian Ruess, dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 20.01.25 11:07, Julian Ruess wrote:
> On Mon Jan 20, 2025 at 10:56 AM CET, Alexandra Winter wrote:
>>
>>
>> On 20.01.25 07:32, Dust Li wrote:
>>>> +	/**
>>>> +	 * move_data() - write into a remote dmb
>>>> +	 * @dev: Local sending ism device
>>>> +	 * @dmb_tok: Token of the remote dmb
>>>> +	 * @idx: signalling index
>>>> +	 * @sf: signalling flag;
>>>> +	 *      if true, idx will be turned on at target ism interrupt mask
>>>> +	 *      and target device will be signalled, if required.
>>>> +	 * @offset: offset within target dmb
>>>> +	 * @data: pointer to data to be sent
>>>> +	 * @size: length of data to be sent
>>>> +	 *
>>>> +	 * Use dev to write data of size at offset into a remote dmb
>>>> +	 * identified by dmb_tok. Data is moved synchronously, *data can
>>>> +	 * be freed when this function returns.
>>> When considering the API, I found this comment may be incorrect.
>>>
>>> IIUC, in copy mode for PCI ISM devices, the CPU only tells the
>>> device to perform a DMA copy. As a result, when this function returns,
>>> the device may not have completed the DMA copy.
>>>
>>
>> No, it is actually one of the properties of ISM vPCI that the data is
>> moved synchronously inside the move_data() function. (on PCI layer the
>> data is moved inside the __zpci_store_block() command).
>> Obviously for loopback move_data() is also synchornous.
> 
> That is true for the IBM ISM vPCI device but maybe we
> should design the API also for future PCI devices
> that do not move data synchronously.
>

An API should always be extendable

>>
>> SMC-D does not make use of it, instead they re-use the same
>> conn->sndbuf_desc for the lifetime of a connection.
>>
>>
>>> In zero-copy mode for loopback, the source and destination share the
>>> same buffer. If the source rewrites the buffer, the destination may
>>> encounter corrupted data. The source should only reuse the data after
>>> the destination has finished reading it.
>>>
>>
>> That is true independent of the question, whether the move is
>> synchronous or not.
>> It is the clients' responsibility to make sure a sender does not
>> overwrite unread data. SMC uses the write-pointers and read-pointer for
>> that.
>>
>>
>>> Best regards,
>>> Dust
>>>
>>>> +	 *
>>>> +	 * If signalling flag (sf) is true, bit number idx bit will be
>>>> +	 * turned on in the ism signalling mask, that belongs to the
>>>> +	 * target dmb, and handle_irq() of the ism client that owns this
>>>> +	 * dmb will be called, if required. The target device may chose to
>>>> +	 * coalesce multiple signalling triggers.
>>>> +	 */
>>>> 	int (*move_data)(struct ism_dev *dev, u64 dmb_tok, unsigned int idx,
>>>> 			 bool sf, unsigned int offset, void *data,
>>>> 			 unsigned int size);
>>>> -- 
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-18 15:24     ` Dust Li
@ 2025-01-20 11:45       ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-20 11:45 UTC (permalink / raw)
  To: dust.li, Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic,
	D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn
  Cc: Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 18.01.25 16:24, Dust Li wrote:
> On 2025-01-17 12:04:06, Alexandra Winter wrote:
>> I hit the send button to early, sorry about that. 
>> Let me comment on the other proposals from Dust Li as well.
>>
>> On 16.01.25 10:32, Dust Li wrote:
>>> Abstraction of ISM Device Details: I propose we abstract the ISM device
>>> details by providing SMC with helper functions. These functions could
>>> encapsulate ism->ops, making the implementation cleaner and more
>>> intuitive. 
>>
>>
>> Maybe I misunderstand what you mean by helper functions..
>> Why would you encapsulate ism->ops functions in another set of wrappers?
>> I was happy to remove the helper functions in 2/7 and 7/7.
> 
> What I mean is similar to how IB handles it in include/rdma/ib_verbs.h.
> A good example is ib_post_send or ibv_post_send in user space:
> 
> ```c
> static inline int ib_post_send(struct ib_qp *qp,
>                                const struct ib_send_wr *send_wr,
>                                const struct ib_send_wr **bad_send_wr)
> {
>         const struct ib_send_wr *dummy;
> 
>         return qp->device->ops.post_send(qp, send_wr, bad_send_wr ? : &dummy);
> }
> ```
> 
> By following this approach, we can "hide" all the implementations behind
> ism_xxx. Our users (SMC) should only interact with these APIs. The ism->ops
> would then be used by our device implementers (vISM, loopback, etc.). This
> would help make the layers clearer, which is the same approach IB takes.
> 
> The layout would somehow like this:
> 
> | -------------------- |-----------------------------|
> |  ism_register_dmb()  |                             |
> |  ism_move_data()     | <---  API for our users     |
> |  ism_xxx() ...       |                             |
> | -------------------- |-----------------------------|
> |   ism_device_ops     | <---for our implementers    |
> |                      |    (PCI-ISM/loopback, etc)  |
> |----------------------|-----------------------------|
> 
> 
>>
>>
>> This way, the struct ism_device would mainly serve its
>>> implementers, while the upper helper functions offer a streamlined
>>> interface for SMC.
>>
>>
Thanks for the explanations.
Yes, probably makes sense to further decouple the client API from the
device API. I'll give that a try in the next version.


>> I was actually also wondering, whether the clients should access ism_device
>> at all. Or whether they should only use the ism_ops.
> 
> I believe the client should only pass an ism_dev pointer to the ism_xxx()
> helper functions. They should never directly access any of the fields inside
> the ism_dev.
> 
> 
>> I can give that a try in the next version. I think this RFC almost there already.
>> The clients would still need to pass a poitner to ism_dev as a parameter.
>>
>>
>>> Structuring and Naming: I recommend embedding the structure of ism_ops
>>> directly within ism_dev rather than using a pointer. 
>>
>>
>> I think it is a common method to have the const struct xy_ops in the device driver code
>> and then use pointer to register the device with an upper layer.
> 
> Right, If we have many ism_devs for each one ISM type, then using pointer
> should save us some memory.
> 
>> What would be the benefit of duplicating that struct in every ism_dev?
> 
> The main benefit of embedding ism_device_ops within ism_dev is that it
> reduces the dereferencing of an extra pointer. We already have too many
> dereference in the datapath, it is not good for performance :(
> 
> For example:
> 
> rc = smcd->ism->ops->move_data(smcd->ism, dmb_tok, idx, sf, offset,
>                                data, len);
> 
> Best regards,
> Dust
> 

I see your point. I'm not yet convinced. I'll think more about it.




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-20  6:21                     ` Dust Li
@ 2025-01-20 12:03                       ` Alexandra Winter
  2025-01-20 16:01                         ` Andrew Lunn
  0 siblings, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-20 12:03 UTC (permalink / raw)
  To: dust.li, Andrew Lunn, Niklas Schnelle
  Cc: Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic,
	D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman



On 20.01.25 07:21, Dust Li wrote:
> On 2025-01-17 21:29:09, Andrew Lunn wrote:
>> On Fri, Jan 17, 2025 at 05:57:10PM +0100, Niklas Schnelle wrote:
>>> On Fri, 2025-01-17 at 17:33 +0100, Andrew Lunn wrote:
>>>>> Conceptually kind of but the existing s390 specific ISM device is a bit
>>>>> special. But let me start with some background. On s390 aka Mainframes
>>>>> OSs including Linux runs in so called logical partitions (LPARs) which
>>>>> are machine hypervisor VMs which use partitioned non-paging memory. The
>>>>> fact that memory is partitioned is important because this means LPARs
>>>>> can not share physical memory by mapping it.
>>>>>
>>>>> Now at a high level an ISM device allows communication between two such
>>>>> Linux LPARs on the same machine. The device is discovered as a PCI
>>>>> device and allows Linux to take a buffer called a DMB map that in the
>>>>> IOMMU and generate a token specific to another LPAR which also sees an
>>>>> ISM device sharing the same virtual channel identifier (VCHID). This
>>>>> token can then be transferred out of band (e.g. as part of an extended
>>>>> TCP handshake in SMC-D) to that other system. With the token the other
>>>>> system can use its ISM device to securely (authenticated by the token,
>>>>> LPAR identity and the IOMMU mapping) write into the original systems
>>>>> DMB at throughput and latency similar to doing a memcpy() via a
>>>>> syscall.
>>>>>
>>>>> On the implementation level the ISM device is actually a piece of
>>>>> firmware and the write to a remote DMB is a special case of our PCI
>>>>> Store Block instruction (no real MMIO on s390, instead there are
>>>>> special instructions). Sadly there are a few more quirks but in
>>>>> principle you can think of it as redirecting writes to a part of the
>>>>> ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
>>>>> There's of course also a mechanism to cause an interrupt on the
>>>>> receiver as the write completes.
>>>>
>>>> So the s390 details are interesting, but as you say, it is
>>>> special. Ideally, all the special should be hidden away inside the
>>>> driver.
>>>
>>> Yes and it will be. There are some exceptions e.g. for vfio-pci pass-
>>> through but that's not unusual and why there is already the concept of
>>> vfio-pci extension module.
>>>
>>>>
>>>> So please take a step back. What is the abstract model?
>>>
>>> I think my high level description may be a good start. The abstract
>>> model is the ability to share a memory buffer (DMB) for writing by a
>>> communication partner, authenticated by a DMB Token. Plus stuff like
>>> triggering an interrupt on write or explicit trigger. Then Alibaba
>>> added optional support for what they called attaching the buffer which
>>> means it becomes truly shared between the peers but which IBM's ISM
>>> can't support. Plus a few more optional pieces such as VLANs, PNETIDs
>>> don't ask. The idea for the new layer then is to define this interface
>>> with operations and documentation.
>>>
>>>>
>>>> Can the abstract model be mapped onto CLX? Could it be used with a GPU
>>>> vRAM? SoC with real shared memory between a pool of CPUs.
>>>>
>>>> 	Andrew
>>>
>>> I'd think that yes, one could implement such a mechanism on top of CXL
>>> as well as on SoC. Or even with no special hardware between a host and
>>> a DPU (e.g. via PCIe endpoint framework). Basically anything that can
>>> DMA and IRQs between two OS instances.
>>
>> Is DMA part of the abstract model? That would suggest a true shared
>> memory system is excluded, since that would not require DMA.
>>
>> Maybe take a look at subsystems like USB, I2C.
>>
>> usb_submit_urb(struct urb *urb, gfp_t mem_flags)
>>
>> An URB is a data structure with a block of memory associated with it,
>> contains the detail to pass to the USB device.
>>
>> i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
>>
>> *msgs points to num of messages which get transferred to/from the I2C
>> device.
>>
>> Could the high level API look like this? No DMA, no IRQ, no concept of
>> a somewhat shared memory. Just an API which asks for a message to be
>> sent to the other end? struct urb has some USB concepts in it, struct
>> i2c_msg has some I2C concepts in it. A struct ism_msg would follow the
>> same pattern, but does it need to care about the DMA, the IRQ, the
>> memory which is semi shared?
> 
> I don’t have a clear picture of what the API should look like yet, but I
> believe it’s possible to avoid DMA and IRQ. In fact, the current data
> transfer API, ops->move_data() in include/linux/ism.h, already abstracts
> away the DMA and IRQ details.
> 

What is central to ISM is the DMB (Direct Memory Buffer). The concept
that there is a DMB dedicated to one writer and one reader. It is owned
by the reader and only this writer can write at any offset into the DMB
(Fabric controlled). (Reader can technically read/write as well).

So for the client API I think the core functions are
- move_data(*data, target_dmb_token, offset) - called by the sending
client, to move data at some offset into a DMB.
- receive_signal(dmb_token, some_signal_info) - called by the ism layer
to signal the client, that this DMB needs handling. (currently called
handle_irq)

I would not want to abstract that to a message based API, because then
we need queues etc and are almost at a net_device. All that is not
needed for ism, because DMBs are dedicated to a single writer (who has
the responsibility).


> One thing we cannot hide, however, is whether the operation is zero-copy
> or copy. This distinction is important because we can reuse the data at
> different times in copy mode and zero-copy mode.
> 
> Best regards,
> Dust
> 

See my reply on 4/7, as well as Niklas' reply. Currently you can always
re-use the send buffer. So zero-copy can be a property of the DMB
(attach() function, etc. )


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-20 12:03                       ` Alexandra Winter
@ 2025-01-20 16:01                         ` Andrew Lunn
  2025-01-20 17:25                           ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2025-01-20 16:01 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: dust.li, Niklas Schnelle, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

> What is central to ISM is the DMB (Direct Memory Buffer). The concept
> that there is a DMB dedicated to one writer and one reader. It is owned
> by the reader and only this writer can write at any offset into the DMB
> (Fabric controlled). (Reader can technically read/write as well).
> 
> So for the client API I think the core functions are
> - move_data(*data, target_dmb_token, offset) - called by the sending
> client, to move data at some offset into a DMB.

Missing a length, but otherwise this looks O.K.

> - receive_signal(dmb_token, some_signal_info) - called by the ism layer
> to signal the client, that this DMB needs handling. (currently called
> handle_irq)

So there is no indication where in the DMB there is new content?

And when you say "This DMB" does that imply there are multiple DMB
shared between two peers?

Maybe i have the wrong idea about a DMB. I was thinking of maybe 64K
to a few Mega bytes of memory, in a memory which could truly be shared
by CPUs. But maybe a DMB is just a 4K Page, and you have lots of them?
If you are 'faking' a shared memory with DMA, they can be anywhere in
the address space where the DMA engine can access them.

> I would not want to abstract that to a message based API, because then
> we need queues etc and are almost at a net_device. All that is not
> needed for ism, because DMBs are dedicated to a single writer (who has
> the responsibility).

But i assume there are "protocols" above this. You talked about
running a TTY over this. That should be standardized, so everybody
implements TTYs in exactly the same way. 

> > One thing we cannot hide, however, is whether the operation is zero-copy
> > or copy. This distinction is important because we can reuse the data at
> > different times in copy mode and zero-copy mode.

This needs more explanation. Are you talking about putting data into
the DMB, or moving the DMB to the peer?

If you have a DMA engine
moving stuff around, the data can be anywhere the DMA engine can
access. But if you have a true shared memory, ideally you want to
avoid copying into it.

Then you have the API used by your protocol drivers above. For a TTY
running at 9600 baud, a copy into the DMB does not matter. But if you
are talking about a network protocol stack on top, your copy from user
space to kernel space probably wants to go direct into the DMB. So
maybe your API also needs to include allocating/freeing DMBs in an
abstract way so it can hide the difference between true shared memory,
and kernel memory which can be DMAed?

	Andrew


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID
  2025-01-15 19:55 ` [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID Alexandra Winter
@ 2025-01-20 17:18   ` Simon Horman
  2025-01-22 14:46     ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Simon Horman @ 2025-01-20 17:18 UTC (permalink / raw)
  To: Alexandra Winter
  Cc: Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe,
	Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle

On Wed, Jan 15, 2025 at 08:55:23PM +0100, Alexandra Winter wrote:
> SMC uses 64 Bit and 128 Bit Global Identifiers (GIDs)
> that need to be sent via the SMC protocol.
> When integers are used network endianness and host endianness
> need to be considered.
> 
> Avoid this in the ISM layer by using uuid_t byte arrays.
> Follow on patches could do the same change for SMC, for now
> conversion helper functions are introduced.
> 
> ISM-vPCI devices provide 64 Bit GIDs. Map them to ISM uuid_t GIDs
> like this:
>  _________________________________________
> | 64 Bit ISM-vPCI GID | 00000000_00000000 |
>  -----------------------------------------
> If interpreted as UUID, this would be interpreted as th UIID variant,
> that is reserved for NCS backward compatibility. So it will not collide
> with UUIDs that were generated according to the standard.
> 
> Future ISM devices, shall use real UUIDs as 128 Bit GIDs.
> 
> Note:
> - In this RFC patch smcd_gid is now moved back to smc.h,
>   future patchset should avoid that.
> - ism_dmb and ism_event structs still contain 64 Bit rgid and info
>   fields. A future patch could change them to uuid_t gids. This
>   does not break anything, because ism_loopback does not use them.
> 
> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>

...

> diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h
> index 6763133dd8d0..d041e5a7c459 100644
> --- a/net/smc/smc_ism.h
> +++ b/net/smc/smc_ism.h
> @@ -12,6 +12,7 @@
>  #include <linux/uio.h>
>  #include <linux/types.h>
>  #include <linux/mutex.h>
> +#include <linux/ism.h>
>  
>  #include "smc.h"
>  
> @@ -94,4 +95,24 @@ static inline bool smc_ism_is_loopback(struct smcd_dev *smcd)
>  	return (smcd->ops->get_chid(smcd) == 0xFFFF);
>  }
>  
> +static inline void copy_to_smcdgid(struct smcd_gid *sgid, uuid_t *igid)
> +{
> +	__be64 temp;
> +
> +	memcpy(&temp, igid, sizeof(sgid->gid));
> +	sgid->gid = ntohll(temp);
> +	memcpy(&temp, igid + sizeof(sgid->gid), sizeof(sgid->gid_ext));

Hi Alexandra,

The stride of the pointer arithmetic is the width of igid
so this write will be at an offset of:

   sizeof(igid) + sizeof(sgid->gid) = 128 bytes

Which is beyond the end of *igid.

I think the desired operation is to write at an offset of 8 bytes, so
perhaps this is a way to achieve that, as the bi field is a
16 byte array of u8:

	memcpy(&temp, igid->b + sizeof(sgid->gid), sizeof(sgid->gid_ext));


Flagged by W=1 builds with gcc-14 and clang-19, and by Smatch.

> +	sgid->gid_ext = ntohll(temp);
> +}
> +
> +static inline void copy_to_ismgid(uuid_t *igid, struct smcd_gid *sgid)
> +{
> +	__be64 temp;
> +
> +	temp = htonll(sgid->gid);
> +	memcpy(igid, &temp, sizeof(sgid->gid));
> +	temp = htonll(sgid->gid_ext);
> +	memcpy(igid + sizeof(sgid->gid), &temp, sizeof(sgid->gid_ext));

I believe there is a similar problem here too.

> +}
> +
>  #endif
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-20 16:01                         ` Andrew Lunn
@ 2025-01-20 17:25                           ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-20 17:25 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: dust.li, Niklas Schnelle, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 20.01.25 17:01, Andrew Lunn wrote:
>> What is central to ISM is the DMB (Direct Memory Buffer). The concept
>> that there is a DMB dedicated to one writer and one reader. It is owned
>> by the reader and only this writer can write at any offset into the DMB
>> (Fabric controlled). (Reader can technically read/write as well).
>>
>> So for the client API I think the core functions are
>> - move_data(*data, target_dmb_token, offset) - called by the sending
>> client, to move data at some offset into a DMB.
> 
> Missing a length, but otherwise this looks O.K.


Right, move_data() has a length field. My bad.

> 
>> - receive_signal(dmb_token, some_signal_info) - called by the ism layer
>> to signal the client, that this DMB needs handling. (currently called
>> handle_irq)
> 
> So there is no indication where in the DMB there is new content?
> 

The existing ism implementations pass a bit mask in 'some_signal_info'
that can be used to signal which parts of the DMB have data to look at.


> And when you say "This DMB" does that imply there are multiple DMB
> shared between two peers?
> 

Yes, there can be multiple DMBs between the same two peers. And/or an
ism device can provide multiple DMBs that are shared with different peers.


> Maybe i have the wrong idea about a DMB. I was thinking of maybe 64K
> to a few Mega bytes of memory, in a memory which could truly be shared
> by CPUs. But maybe a DMB is just a 4K Page, and you have lots of them?
> If you are 'faking' a shared memory with DMA, they can be anywhere in
> the address space where the DMA engine can access them.
> 

More the latter. Although they can be large, if the client or
application wants to spend so much memory.

Which brings us back to the other thread, that ISM may not be the best
name for this concept. MCD - 'Memory Communication Device', was a
proposals without 'Shared' in the name...


>> I would not want to abstract that to a message based API, because then
>> we need queues etc and are almost at a net_device. All that is not
>> needed for ism, because DMBs are dedicated to a single writer (who has
>> the responsibility).
> 
> But i assume there are "protocols" above this. You talked about
> running a TTY over this. That should be standardized, so everybody
> implements TTYs in exactly the same way. 
> 

Yes, the 'clients' are the protocols above this.


>>> One thing we cannot hide, however, is whether the operation is zero-copy
>>> or copy. This distinction is important because we can reuse the data at
>>> different times in copy mode and zero-copy mode.
> 
> This needs more explanation. Are you talking about putting data into
> the DMB, or moving the DMB to the peer?
> 

The former: putting data into the DMB.
But yes the concept of attached, no-copy DMBs, that was introduced by
ism-loopback needs a better description.


> If you have a DMA engine
> moving stuff around, the data can be anywhere the DMA engine can
> access. But if you have a true shared memory, ideally you want to
> avoid copying into it.
> 
> Then you have the API used by your protocol drivers above. For a TTY
> running at 9600 baud, a copy into the DMB does not matter. But if you
> are talking about a network protocol stack on top, your copy from user
> space to kernel space probably wants to go direct into the DMB. So
> maybe your API also needs to include allocating/freeing DMBs in an
> abstract way so it can hide the difference between true shared memory,
> and kernel memory which can be DMAed?
> 
> 	Andrew
> 
> 
The ism_ops register_dmb() and unregister_dmb() are meant to provide
that API.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-20 10:28           ` Alexandra Winter
@ 2025-01-22  3:04             ` Dust Li
  2025-01-22 12:02               ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Dust Li @ 2025-01-22  3:04 UTC (permalink / raw)
  To: Alexandra Winter, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-20 11:28:41, Alexandra Winter wrote:
>
>
>On 17.01.25 14:00, Alexandra Winter wrote:
>> 
>> 
>> On 17.01.25 03:13, Dust Li wrote:
>>>>>> Modular Approach: I've made the ism_loopback an independent kernel
>>>>>> module since dynamic enable/disable functionality is not yet supported
>>>>>> in SMC. Using insmod and rmmod for module management could provide the
>>>>>> flexibility needed in practical scenarios.
>>>>
>>>> With this proposal ism_loopback is just another ism device and SMC-D will
>>>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>>>>
>>>> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>>>> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>>>> removal by ism_dev_unregister(). In case of this RFC that would only happen
>>>> in case of rmmod ism. Which should be improved.
>>>> One way to do that would be a separate ism_loopback kernel module, like you say.
>>>> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>>>> I also think it is a great way for testing any ISM client, so it has benefit for
>>>> anybody using the ism module.
>>>> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>>>> (Once we agree if and how to represent ism devices in genera in sysfs).
>>> This works for me as well. I think it would be better to implement this
>>> within the common ISM layer, rather than duplicating the code in each
>>> device. Similar to how it's done in netdevice.
>>>
>>> Best regards,
>>> Dust
>> 
>> 
>> Is there a specific example for enable/disable in the netdevice code, you have in mind?
>> Or do you mean in general how netdevice provides a common layer?
>> Yes, everything that is common for all devices should be provided by the network layer.
>
>
>Dust for some reason, you did not 'Reply-all':

Oh, sorry I didn't notice that

>Dust Li wrote:
>> I think dev_close()/dev_open() are the high-level APIs, while
>> ndo_stop()/ndo_open() are the underlying device operations that we
>> can reference.
>
>
>I hear you, it can be beneficial to have a way for upper layers to
>enable/disable an ism device.
>But all this is typically a tricky area. The device driver can also have
>reasons to enable/disable a device, then hardware could do that or even
>hotplug a device. Error recovery on different levels may want to run a
>disable/enable sequence as a reset, etc. And all this has potential for
>deadlocks.
>All this is rather trivial for ism-loopback, as there is not much of a
>lower layer.
>ism-vpci already has 'HW' / device driver configure on/off and device
>add/remove.
>For a future ism-virtio, the Hipervisor may want to add/remove devices.
>
>I wonder what could be the simplest definition of an enable/disable for
>the ism layer, that we can start with? More sophisticated functionality
>can always be added later.
>Maybe support for add/remove ism-device by the device driver is
>sufficient as  starting point?

I agree; this can be added later. For now, we can simply support
unregistering a device from the device driver. Which is already handled
by ism_dev_unregister() IIUC.

However, I believe we still need an API and the ability to enable or
disable ISM devices from the upper layer. For example, if we want to
disable a specific ISM device (such as the loopback device) in SMC, we
should not do so by disabling the loopback device at the device layer,
as it may also serve other clients beyond SMC.

Further more, I think removing the loopback from the loopback device
driver seems unnecessory ? Since we should support that from the upper
layer in the future.

Best regards,
Dust

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-22  3:04             ` Dust Li
@ 2025-01-22 12:02               ` Alexandra Winter
  2025-01-22 12:05                 ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-22 12:02 UTC (permalink / raw)
  To: dust.li, Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 22.01.25 04:04, Dust Li wrote:
> On 2025-01-20 11:28:41, Alexandra Winter wrote:
>>
>>
>> On 17.01.25 14:00, Alexandra Winter wrote:
>>>
>>>
>>> On 17.01.25 03:13, Dust Li wrote:
>>>>>>> Modular Approach: I've made the ism_loopback an independent kernel
>>>>>>> module since dynamic enable/disable functionality is not yet supported
>>>>>>> in SMC. Using insmod and rmmod for module management could provide the
>>>>>>> flexibility needed in practical scenarios.
>>>>>
>>>>> With this proposal ism_loopback is just another ism device and SMC-D will
>>>>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>>>>>
>>>>> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>>>>> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>>>>> removal by ism_dev_unregister(). In case of this RFC that would only happen
>>>>> in case of rmmod ism. Which should be improved.
>>>>> One way to do that would be a separate ism_loopback kernel module, like you say.
>>>>> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>>>>> I also think it is a great way for testing any ISM client, so it has benefit for
>>>>> anybody using the ism module.
>>>>> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>>>>> (Once we agree if and how to represent ism devices in genera in sysfs).
>>>> This works for me as well. I think it would be better to implement this
>>>> within the common ISM layer, rather than duplicating the code in each
>>>> device. Similar to how it's done in netdevice.
>>>>
>>>> Best regards,
>>>> Dust
>>>
>>>
>>> Is there a specific example for enable/disable in the netdevice code, you have in mind?
>>> Or do you mean in general how netdevice provides a common layer?
>>> Yes, everything that is common for all devices should be provided by the network layer.
>>
>>
>> Dust for some reason, you did not 'Reply-all':
> 
> Oh, sorry I didn't notice that
> 
>> Dust Li wrote:
>>> I think dev_close()/dev_open() are the high-level APIs, while
>>> ndo_stop()/ndo_open() are the underlying device operations that we
>>> can reference.
>>
>>
>> I hear you, it can be beneficial to have a way for upper layers to
>> enable/disable an ism device.
>> But all this is typically a tricky area. The device driver can also have
>> reasons to enable/disable a device, then hardware could do that or even
>> hotplug a device. Error recovery on different levels may want to run a
>> disable/enable sequence as a reset, etc. And all this has potential for
>> deadlocks.
>> All this is rather trivial for ism-loopback, as there is not much of a
>> lower layer.
>> ism-vpci already has 'HW' / device driver configure on/off and device
>> add/remove.
>> For a future ism-virtio, the Hipervisor may want to add/remove devices.
>>
>> I wonder what could be the simplest definition of an enable/disable for
>> the ism layer, that we can start with? More sophisticated functionality
>> can always be added later.
>> Maybe support for add/remove ism-device by the device driver is
>> sufficient as  starting point?
> 
> I agree; this can be added later. For now, we can simply support
> unregistering a device from the device driver. Which is already handled
> by ism_dev_unregister() IIUC.
> 
> However, I believe we still need an API and the ability to enable or
> disable ISM devices from the upper layer. For example, if we want to
> disable a specific ISM device (such as the loopback device) in SMC, we
> should not do so by disabling the loopback device at the device layer,
> as it may also serve other clients beyond SMC.


Just a thought: not all clients have to use all available ism devices.
The client could opt out without removing the device.

> 
> Further more, I think removing the loopback from the loopback device
> driver seems unnecessory ? Since we should support that from the upper
> layer in the future.
> 
> Best regards,
> Dust


All good points. But it also shows that there are many options how to
extend ism device handling of the upper layers / clients.
e.g. I can image a loop macro ism_for_each_dev() might be nice...
I'd prefer to take one step at a time. Start with a minimal useful ism
layer and extend by usecase.









^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-22 12:02               ` Alexandra Winter
@ 2025-01-22 12:05                 ` Alexandra Winter
  2025-01-22 14:10                   ` Dust Li
  0 siblings, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-22 12:05 UTC (permalink / raw)
  To: dust.li, Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 22.01.25 13:02, Alexandra Winter wrote:
> 
> 
> On 22.01.25 04:04, Dust Li wrote:
>> On 2025-01-20 11:28:41, Alexandra Winter wrote:
>>>
>>>
>>> On 17.01.25 14:00, Alexandra Winter wrote:
>>>>
>>>>
>>>> On 17.01.25 03:13, Dust Li wrote:
>>>>>>>> Modular Approach: I've made the ism_loopback an independent kernel
>>>>>>>> module since dynamic enable/disable functionality is not yet supported
>>>>>>>> in SMC. Using insmod and rmmod for module management could provide the
>>>>>>>> flexibility needed in practical scenarios.
>>>>>>
>>>>>> With this proposal ism_loopback is just another ism device and SMC-D will
>>>>>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>>>>>>
>>>>>> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>>>>>> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>>>>>> removal by ism_dev_unregister(). In case of this RFC that would only happen
>>>>>> in case of rmmod ism. Which should be improved.
>>>>>> One way to do that would be a separate ism_loopback kernel module, like you say.
>>>>>> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>>>>>> I also think it is a great way for testing any ISM client, so it has benefit for
>>>>>> anybody using the ism module.
>>>>>> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>>>>>> (Once we agree if and how to represent ism devices in genera in sysfs).
>>>>> This works for me as well. I think it would be better to implement this
>>>>> within the common ISM layer, rather than duplicating the code in each
>>>>> device. Similar to how it's done in netdevice.
>>>>>
>>>>> Best regards,
>>>>> Dust
>>>>
>>>>
>>>> Is there a specific example for enable/disable in the netdevice code, you have in mind?
>>>> Or do you mean in general how netdevice provides a common layer?
>>>> Yes, everything that is common for all devices should be provided by the network layer.
>>>
>>>
>>> Dust for some reason, you did not 'Reply-all':
>>
>> Oh, sorry I didn't notice that
>>
>>> Dust Li wrote:
>>>> I think dev_close()/dev_open() are the high-level APIs, while
>>>> ndo_stop()/ndo_open() are the underlying device operations that we
>>>> can reference.
>>>
>>>
>>> I hear you, it can be beneficial to have a way for upper layers to
>>> enable/disable an ism device.
>>> But all this is typically a tricky area. The device driver can also have
>>> reasons to enable/disable a device, then hardware could do that or even
>>> hotplug a device. Error recovery on different levels may want to run a
>>> disable/enable sequence as a reset, etc. And all this has potential for
>>> deadlocks.
>>> All this is rather trivial for ism-loopback, as there is not much of a
>>> lower layer.
>>> ism-vpci already has 'HW' / device driver configure on/off and device
>>> add/remove.
>>> For a future ism-virtio, the Hipervisor may want to add/remove devices.
>>>
>>> I wonder what could be the simplest definition of an enable/disable for
>>> the ism layer, that we can start with? More sophisticated functionality
>>> can always be added later.
>>> Maybe support for add/remove ism-device by the device driver is
>>> sufficient as  starting point?
>>
>> I agree; this can be added later. For now, we can simply support
>> unregistering a device from the device driver. Which is already handled
>> by ism_dev_unregister() IIUC.
>>
>> However, I believe we still need an API and the ability to enable or
>> disable ISM devices from the upper layer. For example, if we want to
>> disable a specific ISM device (such as the loopback device) in SMC, we
>> should not do so by disabling the loopback device at the device layer,
>> as it may also serve other clients beyond SMC.
> 
> 
> Just a thought: not all clients have to use all available ism devices.
> The client could opt out without removing the device.
> 
>>
>> Further more, I think removing the loopback from the loopback device
>> driver seems unnecessory ? Since we should support that from the upper
>> layer in the future.


If it is not too much effort, I would like to have a simple remove for
ism_loopback soon, as it would allow for simple variations of testcases.


>>
>> Best regards,
>> Dust
> 
> 
> All good points. But it also shows that there are many options how to
> extend ism device handling of the upper layers / clients.
> e.g. I can image a loop macro ism_for_each_dev() might be nice...
> I'd prefer to take one step at a time. Start with a minimal useful ism
> layer and extend by usecase.
> 
> 
> 
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-22 12:05                 ` Alexandra Winter
@ 2025-01-22 14:10                   ` Dust Li
  0 siblings, 0 replies; 61+ messages in thread
From: Dust Li @ 2025-01-22 14:10 UTC (permalink / raw)
  To: Alexandra Winter, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-22 13:05:57, Alexandra Winter wrote:
>
>
>On 22.01.25 13:02, Alexandra Winter wrote:
>> 
>> 
>> On 22.01.25 04:04, Dust Li wrote:
>>> On 2025-01-20 11:28:41, Alexandra Winter wrote:
>>>>
>>>>
>>>> On 17.01.25 14:00, Alexandra Winter wrote:
>>>>>
>>>>>
>>>>> On 17.01.25 03:13, Dust Li wrote:
>>>>>>>>> Modular Approach: I've made the ism_loopback an independent kernel
>>>>>>>>> module since dynamic enable/disable functionality is not yet supported
>>>>>>>>> in SMC. Using insmod and rmmod for module management could provide the
>>>>>>>>> flexibility needed in practical scenarios.
>>>>>>>
>>>>>>> With this proposal ism_loopback is just another ism device and SMC-D will
>>>>>>> handle removal just like ism_client.remove(ism_dev) of other ism devices.
>>>>>>>
>>>>>>> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
>>>>>>> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
>>>>>>> removal by ism_dev_unregister(). In case of this RFC that would only happen
>>>>>>> in case of rmmod ism. Which should be improved.
>>>>>>> One way to do that would be a separate ism_loopback kernel module, like you say.
>>>>>>> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
>>>>>>> I also think it is a great way for testing any ISM client, so it has benefit for
>>>>>>> anybody using the ism module.
>>>>>>> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
>>>>>>> (Once we agree if and how to represent ism devices in genera in sysfs).
>>>>>> This works for me as well. I think it would be better to implement this
>>>>>> within the common ISM layer, rather than duplicating the code in each
>>>>>> device. Similar to how it's done in netdevice.
>>>>>>
>>>>>> Best regards,
>>>>>> Dust
>>>>>
>>>>>
>>>>> Is there a specific example for enable/disable in the netdevice code, you have in mind?
>>>>> Or do you mean in general how netdevice provides a common layer?
>>>>> Yes, everything that is common for all devices should be provided by the network layer.
>>>>
>>>>
>>>> Dust for some reason, you did not 'Reply-all':
>>>
>>> Oh, sorry I didn't notice that
>>>
>>>> Dust Li wrote:
>>>>> I think dev_close()/dev_open() are the high-level APIs, while
>>>>> ndo_stop()/ndo_open() are the underlying device operations that we
>>>>> can reference.
>>>>
>>>>
>>>> I hear you, it can be beneficial to have a way for upper layers to
>>>> enable/disable an ism device.
>>>> But all this is typically a tricky area. The device driver can also have
>>>> reasons to enable/disable a device, then hardware could do that or even
>>>> hotplug a device. Error recovery on different levels may want to run a
>>>> disable/enable sequence as a reset, etc. And all this has potential for
>>>> deadlocks.
>>>> All this is rather trivial for ism-loopback, as there is not much of a
>>>> lower layer.
>>>> ism-vpci already has 'HW' / device driver configure on/off and device
>>>> add/remove.
>>>> For a future ism-virtio, the Hipervisor may want to add/remove devices.
>>>>
>>>> I wonder what could be the simplest definition of an enable/disable for
>>>> the ism layer, that we can start with? More sophisticated functionality
>>>> can always be added later.
>>>> Maybe support for add/remove ism-device by the device driver is
>>>> sufficient as  starting point?
>>>
>>> I agree; this can be added later. For now, we can simply support
>>> unregistering a device from the device driver. Which is already handled
>>> by ism_dev_unregister() IIUC.
>>>
>>> However, I believe we still need an API and the ability to enable or
>>> disable ISM devices from the upper layer. For example, if we want to
>>> disable a specific ISM device (such as the loopback device) in SMC, we
>>> should not do so by disabling the loopback device at the device layer,
>>> as it may also serve other clients beyond SMC.
>> 
>> 
>> Just a thought: not all clients have to use all available ism devices.
>> The client could opt out without removing the device.
>> 
>>>
>>> Further more, I think removing the loopback from the loopback device
>>> driver seems unnecessory ? Since we should support that from the upper
>>> layer in the future.
>
>
>If it is not too much effort, I would like to have a simple remove for
>ism_loopback soon, as it would allow for simple variations of testcases.

Yes, this is very useful for testing before we can do that from the
upper layer.

>
>
>>>
>>> Best regards,
>>> Dust
>> 
>> 
>> All good points. But it also shows that there are many options how to
>> extend ism device handling of the upper layers / clients.
>> e.g. I can image a loop macro ism_for_each_dev() might be nice...
>> I'd prefer to take one step at a time. Start with a minimal useful ism
>> layer and extend by usecase.

That works for me.

Best regards,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID
  2025-01-20 17:18   ` Simon Horman
@ 2025-01-22 14:46     ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-01-22 14:46 UTC (permalink / raw)
  To: Simon Horman
  Cc: Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe,
	Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Julian Ruess, Niklas Schnelle, Thorsten Winkler, netdev,
	linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle



On 20.01.25 18:18, Simon Horman wrote:
>> +static inline void copy_to_smcdgid(struct smcd_gid *sgid, uuid_t *igid)
>> +{
>> +	__be64 temp;
>> +
>> +	memcpy(&temp, igid, sizeof(sgid->gid));
>> +	sgid->gid = ntohll(temp);
>> +	memcpy(&temp, igid + sizeof(sgid->gid), sizeof(sgid->gid_ext));
> Hi Alexandra,
> 
> The stride of the pointer arithmetic is the width of igid
> so this write will be at an offset of:
> 
>    sizeof(igid) + sizeof(sgid->gid) = 128 bytes
> 
> Which is beyond the end of *igid.


Duh, what a stupid mistake. Thank you.


> I think the desired operation is to write at an offset of 8 bytes, so
> perhaps this is a way to achieve that, as the bi field is a
> 16 byte array of u8:
> 
> 	memcpy(&temp, igid->b + sizeof(sgid->gid), sizeof(sgid->gid_ext));

I propose to keep the
memcpy(&temp, (u8 *)igid + sizeof(sgid->gid), sizeof(sgid->gid_ext));
like in the orginal net/smc/smc_loopback.c


> Flagged by W=1 builds with gcc-14 and clang-19, and by Smatch.
> 
>> +	sgid->gid_ext = ntohll(temp);
>> +}

I actually overlooked it in my smatch run (too many old warnings), but I
cannot get W=1 to flag it. I'll try to improve my setup.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions
  2025-01-20 10:34     ` Niklas Schnelle
@ 2025-01-22 15:02       ` Dust Li
  0 siblings, 0 replies; 61+ messages in thread
From: Dust Li @ 2025-01-22 15:02 UTC (permalink / raw)
  To: Niklas Schnelle, Alexandra Winter, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Julian Ruess, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On 2025-01-20 11:34:21, Niklas Schnelle wrote:
>On Mon, 2025-01-20 at 14:32 +0800, Dust Li wrote:
>> On 2025-01-15 20:55:24, Alexandra Winter wrote:
>> > Note that in this RFC this patch is not complete, future versions
>> > of this patch need to contain comments for all ism_ops.
>> > Especially signal_event() and handle_event() need a good generic
>> > description.
>> > 
>> > Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
>> > ---
>> > include/linux/ism.h | 115 ++++++++++++++++++++++++++++++++++++++++----
>> > 1 file changed, 105 insertions(+), 10 deletions(-)
>> > 
>> > diff --git a/include/linux/ism.h b/include/linux/ism.h
>> > index 50975847248f..bc165d077071 100644
>> > --- a/include/linux/ism.h
>> > +++ b/include/linux/ism.h
>> > @@ -13,11 +13,26 @@
>> > #include <linux/workqueue.h>
>> > #include <linux/uuid.h>
>> > 
>> > -/* The remote peer rgid can use dmb_tok to write into this buffer. */
>> > +/*
>> > + * DMB - Direct Memory Buffer
>> > + * ==========================
>> > + * An ism client provides an DMB as input buffer for a local receiving
>> > + * ism device for exactly one (remote) sending ism device. Only this
>> > + * sending device can send data into this DMB using move_data(). Sender
>> > + * and receiver can be the same device.
>> > + * TODO: Alignment and length rules (CPU and DMA). Device specific?
>> > + */
>> > struct ism_dmb {
>> > +	/* dmb_tok - Token for this dmb
>> > +	 * Used by remote sender to address this dmb.
>> > +	 * Provided by ism fabric in register_dmb().
>> > +	 * Unique per ism fabric.
>> > +	 */
>> > 	u64 dmb_tok;
>> > +	/* rgid - GID of designated remote sending device */
>> > 	u64 rgid;
>> > 	u32 dmb_len;
>> > +	/* sba_idx - Index of this DMB on this receiving device */
>> > 	u32 sba_idx;
>> > 	u32 vlan_valid;
>> > 	u32 vlan_id;
>> > @@ -25,6 +40,8 @@ struct ism_dmb {
>> > 	dma_addr_t dma_addr;
>> > };
>> > 
>> > +/* ISM event structure (currently device type specific) */
>> > +// TODO: Define and describe generic event properties
>> > struct ism_event {
>> > 	u32 type;
>> > 	u32 code;
>> > @@ -33,38 +50,89 @@ struct ism_event {
>> > 	u64 info;
>> > };
>> > 
>> > +//TODO: use enum typedef
>> > #define ISM_EVENT_DMB	0
>> > #define ISM_EVENT_GID	1
>> > #define ISM_EVENT_SWR	2
>> > 
>> > struct ism_dev;
>> > 
>> > +/*
>> > + * ISM clients
>> > + * ===========
>> > + * All ism clients have access to all ism devices
>> > + * and must provide the following functions to be called by
>> > + * ism device drivers:
>> > + */
>> > struct ism_client {
>> > +	/* client name for logging and debugging purposes */
>> > 	const char *name;
>> > +	/**
>> > +	 *  add() - add an ism device
>> > +	 *  @dev: device that was added
>> > +	 *
>> > +	 * Will be called during ism_register_client() for all existing
>> > +	 * ism devices and whenever a new ism device is registered.
>> > +	 * *dev is valid until ism_client->remove() is called.
>> > +	 */
>> > 	void (*add)(struct ism_dev *dev);
>> > +	/**
>> > +	 * remove() - remove an ism device
>> > +	 * @dev: device to be removed
>> > +	 *
>> > +	 * Will be called whenever an ism device is unregistered.
>> > +	 * Before this call the device is already inactive: It will
>> > +	 * no longer call client handlers.
>> > +	 * The client must not access *dev after this call.
>> > +	 */
>> > 	void (*remove)(struct ism_dev *dev);
>> > +	/**
>> > +	 * handle_event() - Handle control information sent by device
>> > +	 * @dev: device reporting the event
>> > +	 * @event: ism event structure
>> > +	 */
>> > 	void (*handle_event)(struct ism_dev *dev, struct ism_event *event);
>> > -	/* Parameter dmbemask contains a bit vector with updated DMBEs, if sent
>> > -	 * via ism_move_data(). Callback function must handle all active bits
>> > -	 * indicated by dmbemask.
>> > +	/**
>> > +	 * handle_irq() - Handle signalling of a DMB
>> > +	 * @dev: device owns the dmb
>> > +	 * @bit: sba_idx=idx of the ism_dmb that got signalled
>> > +	 *	TODO: Pass a priv pointer to ism_dmb instead of 'bit'(?)
>> > +	 * @dmbemask: ism signalling mask of the dmb
>> > +	 *
>> > +	 * Handle signalling of a dmb that was registered by this client
>> > +	 * for this device.
>> > +	 * The ism device can coalesce multiple signalling triggers into a
>> > +	 * single call of handle_irq(). dmbemask can be used to indicate
>> > +	 * different kinds of triggers.
>> > 	 */
>> > 	void (*handle_irq)(struct ism_dev *dev, unsigned int bit, u16 dmbemask);
>> > -	/* Private area - don't touch! */
>> > +	/* client index - provided by ism layer */
>> > 	u8 id;
>> > };
>> > 
>> > int ism_register_client(struct ism_client *client);
>> > int  ism_unregister_client(struct ism_client *client);
>> > 
>> > +//TODO: Pair descriptions with functions
>> > +/*
>> > + * ISM devices
>> > + * ===========
>> > + */
>> > /* Mandatory operations for all ism devices:
>> >  * int (*query_remote_gid)(struct ism_dev *dev, uuid_t *rgid,
>> >  *	                   u32 vid_valid, u32 vid);
>> >  *	Query whether remote GID rgid is reachable via this device and this
>> >  *	vlan id. Vlan id is only checked if vid_valid != 0.
>> > + *	Returns 0 if remote gid is reachable.
>> >  *
>> >  * int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
>> >  *			    void *client);
>> > - *	Register an ism_dmb buffer for this device and this client.
>> > + *	Allocate and register an ism_dmb buffer for this device and this client.
>> > + *	The following fields of ism_dmb must be valid:
>> > + *	rgid, dmb_len, vlan_*; Optionally:requested sba_idx (non-zero)
>> > + *	Upon return the following fields will be valid: dmb_tok, sba_idx
>> > + *		cpu_addr, dma_addr (if applicable)
>> > + *	Returns zero on success
>> >  *
>> >  * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>> >  *	Unregister an ism_dmb buffer
>> > @@ -81,10 +149,15 @@ int  ism_unregister_client(struct ism_client *client);
>> >  * u16 (*get_chid)(struct ism_dev *dev);
>> >  *	Returns ism fabric identifier (channel id) of this device.
>> >  *	Only devices on the same ism fabric can communicate.
>> > - *	chid is unique per HW system, except for 0xFFFF, which denotes
>> > - *	an ism_loopback device that can only communicate with itself.
>> > - *	Use chid for fast negative checks, but only query_remote_gid()
>> > - *	can give a reliable positive answer.
>> > + *	chid is unique per HW system. Use chid for fast negative checks,
>> > + *	but only query_remote_gid() can give a reliable positive answer:
>> > + *	Different chid: ism is not possible
>> > + *	Same chid: ism traffic may be possible or not
>> > + *		   (e.g. different HW systems)
>> > + *	EXCEPTION: A value of 0xFFFF denotes an ism_loopback device
>> > + *		that can only communicate with itself. Use GID or
>> > + *		query_remote_gid()to determine whether sender and
>> > + *		receiver use the same ism_loopback device.
>> >  *
>> >  * struct device* (*get_dev)(struct ism_dev *dev);
>> >  *
>> > @@ -109,6 +182,28 @@ struct ism_ops {
>> > 	int (*register_dmb)(struct ism_dev *dev, struct ism_dmb *dmb,
>> > 			    struct ism_client *client);
>> > 	int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>> > +	/**
>> > +	 * move_data() - write into a remote dmb
>> > +	 * @dev: Local sending ism device
>> > +	 * @dmb_tok: Token of the remote dmb
>> > +	 * @idx: signalling index
>> > +	 * @sf: signalling flag;
>> > +	 *      if true, idx will be turned on at target ism interrupt mask
>> > +	 *      and target device will be signalled, if required.
>> > +	 * @offset: offset within target dmb
>> > +	 * @data: pointer to data to be sent
>> > +	 * @size: length of data to be sent
>> > +	 *
>> > +	 * Use dev to write data of size at offset into a remote dmb
>> > +	 * identified by dmb_tok. Data is moved synchronously, *data can
>> > +	 * be freed when this function returns.
>> 
>> When considering the API, I found this comment may be incorrect.
>> 
>> IIUC, in copy mode for PCI ISM devices, the CPU only tells the
>> device to perform a DMA copy. As a result, when this function returns,
>> the device may not have completed the DMA copy.
>
>For the s390 ISM device the statement is true. The move_data() function
>does a PCI Store Block instruction which is both the write on the
>sender side but also synchronously acts as the devices DMA write on the
>receiver side. So when the PCI Store Block instruction completes the
>data has been cache coherently written to the receiver DMB. And yes
>full synchronicity would be impossible with the posted writes of real
>PCIe.

Thank you guys for the detail explaination on s390! I throught it was
like normal PCI device. @Niklas, @Alexandra, @Julian


>
>That said when it comes to API design I think you have a great point
>here in that we need to decide if this synchronicity should be baked
>into the move_data() API. I think we instead want to only guarantee a
>weaker rule. That is the source buffer can be re-used after the move.
>This to me is also aligned with the word "move" here in that the data
>has been moved after the call not registered to be moved or such. This
>could be achieved with a real PCIe device by copying the data or by
>waiting on completion. If we ever get devices which need to wait on
>completion it may indeed be better to have a separate completion step
>in the API too. Then again I think the concept of having a single "move
>data" step is somewhat central to ISM and I'd hate to lose that
>simplicity.

Agree.

For SMC, it already handles zero-copy and copy modes differently. In
copy mode, the current API seems appropriate, but for zero-copy mode, we
may need to extend it.

Taking the loopback device as an example, SMC requires the move_data()
function to copy the CDC into the DMB. However, copying the data from
the source to the destination DMB is not necessary. Instead, it needs to
signal the peer that the data transfer is complete, so an API for
notification is required.

One solution might be to retain the move_data() function and introduce
something like a notify() API. Alternatively, if we don't want to extend
the API, we could use something like move_data(ismdev, data=NULL,
length=0, ..., signal=1).


>
>I've been thinking also about a possible copy mode in a virtio-ism.
>That could be useful if we wanted to use virtio-ism between memory
>partitioned guests, or if one wanted to transparently proxy virtio-ism
>over s390 ISM to span multiple KVM hosts. And I think such a mode could
>still work with a single "move data" step and I'd love to have that in
>any future virtio-ism spec.

I'm not opposed to that.

>
>> 
>> In zero-copy mode for loopback, the source and destination share the
>> same buffer. If the source rewrites the buffer, the destination may
>> encounter corrupted data. The source should only reuse the data after
>> the destination has finished reading it.
>
>I think there are two potential overwrite scenarios here.
>
>1. The sender re-uses the source data buffer i.e. the @data buffer of
>the move_data() call. On s390 ISM this is fine because the data was
>copied out and into the destination DMB during the call. This could
>typically become an issue if the device DMA reads directly from @data
>after the move_data() call completed.
>
>2. The sender does subsequent move_data() overwriting data in the
>destination DMB before the receiver has read the data. This can happen
>on s390 ISM too and needs to be prevented by DMB access rules.
>
>For the move_data() call I think that even in a "shared i.e. same page
>DMB" scenario move_data() must still do a copy out of the @data buffer
>into the shared DMB. Otherwise it really wouldn't "move" data and it
>would be a very weird API since @data is just a buffer not some kind of
>descriptor. In other words I think scenario 1 shouldn't be possible in
>either copy or shared DMB mode by the semantics of move_data().

For the shared DMB, I now agree with you. Perhaps we can keep the
move_data() behavior consistent with memcpy(3) or memmove(3) and leave
the usage to the upper layer.

Best regards,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-18 15:31           ` Dust Li
@ 2025-01-28 16:04             ` Alexandra Winter
  2025-02-10  5:08               ` Dust Li
  0 siblings, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-01-28 16:04 UTC (permalink / raw)
  To: dust.li, Niklas Schnelle, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman



On 18.01.25 16:31, Dust Li wrote:
> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>
>> ---8<---
>>>> Here are some of my thoughts on the matter:
>>>>>>
>>>>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>>>>> Device) instead of ISM (Internal Shared Memory). 
>>>>
>>>>
>>>> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>>>
>>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>>> shm_open(3)). SMD is also OK.
>>>
>>>>
>>>>
>>>> To my knowledge, a
>>>>>> "Shared Memory Device" better encapsulates the functionality we're
>>>>>> aiming to implement. 
>>>>
>>>>
>>>> Could you explain why that would be better?
>>>> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>>>> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>>>> devices and by ism_loopback. So what is the benefit in changing it?
>>>
>>> I believe that if we are going to separate and refine the code, and add
>>> a common subsystem, we should choose the most appropriate name.
>>>
>>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>>> Since we’re adding a "Device" that enables different entities (such as
>>> processes or VMs) to perform shared memory communication, I think a more
>>> fitting name would be better. If you have any alternative suggestions,
>>> I’m open to them.
>>
>> I kept thinking about this a bit and I'd like to propose yet another
>> name for this group of devices: Memory Communication Devices (MCD)
>>
>> One important point I see is that there is a bit of a misnomer in the
>> existing ISM name in that our ISM device does in fact *not* share
>> memory in the common sense of the "shared memory" wording. Instead it
>> copies data between partitions of memory that share a common
>> cache/memory hierarchy while not sharing the memory itself. loopback-
>> ism and a possibly future virtio-ism on the other hand would share
>> memory in the "shared memory" sense. Though I'd very much hope they
>> will retain a copy mode to allow use in partition scenarios.
>>
>> With that background I think the common denominator between them and
>> the main idea behind ISM is that they facilitate communication via
>> memory buffers and very simple and reliable copy/share operations. I
>> think this would also capture our planned use-case of devices (TTYs,
>> block devices, framebuffers + HID etc) provided by a peer on top of
>> such a memory communication device.
> 
> Make sense, I agree with MCD.
> 
> Best regard,
> Dust
> 



In the discussion with Andrew Lunn, it showed that
a) we need an abstract description of 'ISM' devices (noted)
b) DMBs (Direct Memory Buffers) are a critical differentiator.

So what do your think of Direct Memory Communication (DMC) as class name for these devices?

I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
concrete than MCD or ISM.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
  2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
  2025-01-20  3:55   ` Dust Li
@ 2025-02-06 17:36   ` Julian Ruess
  2025-02-10 10:39     ` Alexandra Winter
  1 sibling, 1 reply; 61+ messages in thread
From: Julian Ruess @ 2025-02-06 17:36 UTC (permalink / raw)
  To: Alexandra Winter, Wenjia Zhang, Jan Karcher, Gerd Bayer,
	Halil Pasic, D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman

On Wed Jan 15, 2025 at 8:55 PM CET, Alexandra Winter wrote:
> The first stage of ism_loopback was implemented as part of the
> SMC module [1]. Now that we have the ism layer, provide
> access to the ism_loopback device to all ism clients.
>
> Move ism_loopback.* from net/smc to net/ism.
> The following changes are required to ism_loopback.c:
> - Change ism_lo_move_data() to no longer schedule an smcd receive tasklet,
> but instead call ism_client->handle_irq().
> Note: In this RFC patch ism_loppback is not fully generic.
>   The smc-d client uses attached buffers, for moves without signalling.
>   and not-attached buffers for moves with signalling.
>   ism_lo_move_data() must not rely on that assumption.
>   ism_lo_move_data() must be able to handle more than one ism client.
>
> In addition the following changes are required to unify ism_loopback and
> ism_vp:
>
> In ism layer and ism_vpci:
> ism_loopback is not backed by a pci device, so use dev instead of pdev in
> ism_dev.
>
> In smc-d:
> in smcd_alloc_dev():
> - use kernel memory instead of device memory for smcd_dev and smcd->conn.
>         An alternative would be to ask device to alloc the memory.
> - use different smcd_ops and max_dmbs for ism_vp and ism_loopback.
>     A future patch can change smc-d to directly use ism_ops instead of
>     smcd_ops.
> - use ism dev_name instead of pci dev name for ism_evt_wq name
> - allocate an event workqueue for ism_loopback, although it currently does
>   not generate events.
>
> Link: https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ [1]
>
> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
> ---
>  drivers/s390/net/ism.h     |   6 +-
>  drivers/s390/net/ism_drv.c |  31 ++-
>  include/linux/ism.h        |  59 +++++
>  include/net/smc.h          |   4 +-
>  net/ism/Kconfig            |  13 ++
>  net/ism/Makefile           |   1 +
>  net/ism/ism_loopback.c     | 366 +++++++++++++++++++++++++++++++
>  net/ism/ism_loopback.h     |  59 +++++
>  net/ism/ism_main.c         |  11 +-
>  net/smc/Kconfig            |  13 --
>  net/smc/Makefile           |   1 -
>  net/smc/af_smc.c           |  12 +-
>  net/smc/smc_ism.c          | 108 +++++++---
>  net/smc/smc_loopback.c     | 427 -------------------------------------
>  net/smc/smc_loopback.h     |  60 ------
>  15 files changed, 606 insertions(+), 565 deletions(-)
>  create mode 100644 net/ism/ism_loopback.c
>  create mode 100644 net/ism/ism_loopback.h
>  delete mode 100644 net/smc/smc_loopback.c
>  delete mode 100644 net/smc/smc_loopback.h
>

...

> diff --git a/net/ism/ism_loopback.c b/net/ism/ism_loopback.c
> new file mode 100644
> index 000000000000..47e5ef355dd7
> --- /dev/null
> +++ b/net/ism/ism_loopback.c
> @@ -0,0 +1,366 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *  Functions for loopback-ism device.
> + *
> + *  Copyright (c) 2024, Alibaba Inc.
> + *
> + *  Author: Wen Gu <guwen@linux.alibaba.com>
> + *          Tony Lu <tonylu@linux.alibaba.com>
> + *
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/device.h>
> +#include <linux/ism.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +#include "ism_loopback.h"
> +
> +#define ISM_LO_V2_CAPABLE	0x1 /* loopback-ism acts as ISMv2 */
> +#define ISM_LO_SUPPORT_NOCOPY	0x1
> +#define ISM_DMA_ADDR_INVALID	(~(dma_addr_t)0)
> +
> +static const char ism_lo_dev_name[] = "loopback-ism";
> +/* global loopback device */
> +static struct ism_lo_dev *lo_dev;
> +
> +static int ism_lo_query_rgid(struct ism_dev *ism, uuid_t *rgid,
> +			     u32 vid_valid, u32 vid)
> +{
> +	/* rgid should be the same as lgid; vlan is not supported */
> +	if (!vid_valid && uuid_equal(rgid, &ism->gid))
> +		return 0;
> +	return -ENETUNREACH;
> +}

This vid_valid check breaks ism-loopback for me.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-28 16:04             ` Alexandra Winter
@ 2025-02-10  5:08               ` Dust Li
  2025-02-10  9:38                 ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Dust Li @ 2025-02-10  5:08 UTC (permalink / raw)
  To: Alexandra Winter, Niklas Schnelle, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman

On 2025-01-28 17:04:53, Alexandra Winter wrote:
>
>
>On 18.01.25 16:31, Dust Li wrote:
>> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>>
>>> ---8<---
>>>>> Here are some of my thoughts on the matter:
>>>>>>>
>>>>>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>>>>>> Device) instead of ISM (Internal Shared Memory). 
>>>>>
>>>>>
>>>>> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>>>>
>>>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>>>> shm_open(3)). SMD is also OK.
>>>>
>>>>>
>>>>>
>>>>> To my knowledge, a
>>>>>>> "Shared Memory Device" better encapsulates the functionality we're
>>>>>>> aiming to implement. 
>>>>>
>>>>>
>>>>> Could you explain why that would be better?
>>>>> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>>>>> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>>>>> devices and by ism_loopback. So what is the benefit in changing it?
>>>>
>>>> I believe that if we are going to separate and refine the code, and add
>>>> a common subsystem, we should choose the most appropriate name.
>>>>
>>>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>>>> Since we’re adding a "Device" that enables different entities (such as
>>>> processes or VMs) to perform shared memory communication, I think a more
>>>> fitting name would be better. If you have any alternative suggestions,
>>>> I’m open to them.
>>>
>>> I kept thinking about this a bit and I'd like to propose yet another
>>> name for this group of devices: Memory Communication Devices (MCD)
>>>
>>> One important point I see is that there is a bit of a misnomer in the
>>> existing ISM name in that our ISM device does in fact *not* share
>>> memory in the common sense of the "shared memory" wording. Instead it
>>> copies data between partitions of memory that share a common
>>> cache/memory hierarchy while not sharing the memory itself. loopback-
>>> ism and a possibly future virtio-ism on the other hand would share
>>> memory in the "shared memory" sense. Though I'd very much hope they
>>> will retain a copy mode to allow use in partition scenarios.
>>>
>>> With that background I think the common denominator between them and
>>> the main idea behind ISM is that they facilitate communication via
>>> memory buffers and very simple and reliable copy/share operations. I
>>> think this would also capture our planned use-case of devices (TTYs,
>>> block devices, framebuffers + HID etc) provided by a peer on top of
>>> such a memory communication device.
>> 
>> Make sense, I agree with MCD.
>> 
>> Best regard,
>> Dust
>> 
>
>

Hi Winter,

Sorry for the late reply; we were on break for the Chinese Spring
Festival.

>
>In the discussion with Andrew Lunn, it showed that
>a) we need an abstract description of 'ISM' devices (noted)
>b) DMBs (Direct Memory Buffers) are a critical differentiator.
>
>So what do your think of Direct Memory Communication (DMC) as class name for these devices?
>
>I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
>concrete than MCD or ISM.

I personally prefer MCD over Direct Memory Communication (DMC).

For loopback or Virtio-ISM, DMC seems like a good choice. However, for
IBM ISM, since there's a DMA copy involved, it doesn’t seem truly "Direct,"
does it?

Additionally, since we are providing a device, MCD feels like a more
fitting choice, as it aligns better with the concept of a "device."

Best regards,
Dust

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-02-10  5:08               ` Dust Li
@ 2025-02-10  9:38                 ` Alexandra Winter
  2025-02-11  1:57                   ` Dust Li
  2025-02-16 15:40                   ` Wen Gu
  0 siblings, 2 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-02-10  9:38 UTC (permalink / raw)
  To: dust.li, Niklas Schnelle, Julian Ruess, Wenjia Zhang, Jan Karcher,
	Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman



On 10.02.25 06:08, Dust Li wrote:
> On 2025-01-28 17:04:53, Alexandra Winter wrote:
>>
>>
>> On 18.01.25 16:31, Dust Li wrote:
>>> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>>>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>>>
>>>> ---8<---
>>>>>> Here are some of my thoughts on the matter:
>>>>>>>>
>>>>>>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>>>>>>> Device) instead of ISM (Internal Shared Memory). 
>>>>>>
>>>>>>
>>>>>> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>>>>>
>>>>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>>>>> shm_open(3)). SMD is also OK.
>>>>>
>>>>>>
>>>>>>
>>>>>> To my knowledge, a
>>>>>>>> "Shared Memory Device" better encapsulates the functionality we're
>>>>>>>> aiming to implement. 
>>>>>>
>>>>>>
>>>>>> Could you explain why that would be better?
>>>>>> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>>>>>> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>>>>>> devices and by ism_loopback. So what is the benefit in changing it?
>>>>>
>>>>> I believe that if we are going to separate and refine the code, and add
>>>>> a common subsystem, we should choose the most appropriate name.
>>>>>
>>>>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>>>>> Since we’re adding a "Device" that enables different entities (such as
>>>>> processes or VMs) to perform shared memory communication, I think a more
>>>>> fitting name would be better. If you have any alternative suggestions,
>>>>> I’m open to them.
>>>>
>>>> I kept thinking about this a bit and I'd like to propose yet another
>>>> name for this group of devices: Memory Communication Devices (MCD)
>>>>
>>>> One important point I see is that there is a bit of a misnomer in the
>>>> existing ISM name in that our ISM device does in fact *not* share
>>>> memory in the common sense of the "shared memory" wording. Instead it
>>>> copies data between partitions of memory that share a common
>>>> cache/memory hierarchy while not sharing the memory itself. loopback-
>>>> ism and a possibly future virtio-ism on the other hand would share
>>>> memory in the "shared memory" sense. Though I'd very much hope they
>>>> will retain a copy mode to allow use in partition scenarios.
>>>>
>>>> With that background I think the common denominator between them and
>>>> the main idea behind ISM is that they facilitate communication via
>>>> memory buffers and very simple and reliable copy/share operations. I
>>>> think this would also capture our planned use-case of devices (TTYs,
>>>> block devices, framebuffers + HID etc) provided by a peer on top of
>>>> such a memory communication device.
>>>
>>> Make sense, I agree with MCD.
>>>
>>> Best regard,
>>> Dust
>>>
>>
>>
> 
> Hi Winter,
> 
> Sorry for the late reply; we were on break for the Chinese Spring
> Festival.
> 
>>
>> In the discussion with Andrew Lunn, it showed that
>> a) we need an abstract description of 'ISM' devices (noted)
>> b) DMBs (Direct Memory Buffers) are a critical differentiator.
>>
>> So what do your think of Direct Memory Communication (DMC) as class name for these devices?
>>
>> I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
>> concrete than MCD or ISM.
> 
> I personally prefer MCD over Direct Memory Communication (DMC).
> 
> For loopback or Virtio-ISM, DMC seems like a good choice. However, for
> IBM ISM, since there's a DMA copy involved, it doesn’t seem truly "Direct,"
> does it?
> 
> Additionally, since we are providing a device, MCD feels like a more
> fitting choice, as it aligns better with the concept of a "device."
> 
> Best regards,
> Dust

Thank you for your thoughts, Dust.
For me the 'D as 'direct' is not so much about the number of copies, but more about the
aspect, that you can directly write at any offset into the buffer. I.e. no queues.
More like the D in DMA or RDMA.

I am preparing a talk for netdev in March about this subject, and the more I work on it,
it seems to me that the buffers ('B'), that are
a) only authorized for a single remote device and
b) can be accessed at any offset
are the important differentiator compared other virtual devices.
So maybe 'D' for Dedicated?

I even came up with
dibs - Dedicated Internal Buffer Sharing or
dibc - Dedicated Internal Buffer Communication
(ok, I like the sound and look of the 'I'. But being on the same hardware as opposed
to RDMA is also an important aspect.)


MCD - 'memory communication device' sounds rather vague to me. But if it is the
smallest common denominator, i.e. the only thing we can all agree on, I could live with it.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism
  2025-02-06 17:36   ` Julian Ruess
@ 2025-02-10 10:39     ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-02-10 10:39 UTC (permalink / raw)
  To: Julian Ruess, Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic,
	D. Wythe, Tony Lu, Wen Gu, Peter Oberparleiter, David Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 06.02.25 18:36, Julian Ruess wrote:
>> +static int ism_lo_query_rgid(struct ism_dev *ism, uuid_t *rgid,
>> +			     u32 vid_valid, u32 vid)
>> +{
>> +	/* rgid should be the same as lgid; vlan is not supported */
>> +	if (!vid_valid && uuid_equal(rgid, &ism->gid))
>> +		return 0;
>> +	return -ENETUNREACH;
>> +}
> This vid_valid check breaks ism-loopback for me.


oops, I also get:
> smc_chk -C 10.44.30.50
[1] 967189
Test with target IP 10.44.30.50 and port 37373
  Live test (SMC-D and SMC-R)
Server started on port 37373
     Failed  (TCP fallback), reasons:
          Client:        0x05000000   Peer declined during handshake
          Server:        0x03030007   No SMC-Dv2 device found, but required

Sorry about that.
Current upstream smc_loopback just ignores vid_valid in smc_lo_query_rgidsmc_lo_query_rgid(),
but I'm not sure that is the best way to handle that.
I'll investigate and make sure it works in v2.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-02-10  9:38                 ` Alexandra Winter
@ 2025-02-11  1:57                   ` Dust Li
  2025-02-16 15:40                   ` Wen Gu
  1 sibling, 0 replies; 61+ messages in thread
From: Dust Li @ 2025-02-11  1:57 UTC (permalink / raw)
  To: Alexandra Winter, Niklas Schnelle, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu, Wen Gu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman

On 2025-02-10 10:38:27, Alexandra Winter wrote:
>
>
>On 10.02.25 06:08, Dust Li wrote:
>> On 2025-01-28 17:04:53, Alexandra Winter wrote:
>>>
>>>
>>> On 18.01.25 16:31, Dust Li wrote:
>>>> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>>>>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>>>>
>>>>> ---8<---
>>>>>>> Here are some of my thoughts on the matter:
>>>>>>>>>
>>>>>>>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>>>>>>>> Device) instead of ISM (Internal Shared Memory). 
>>>>>>>
>>>>>>>
>>>>>>> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>>>>>>
>>>>>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>>>>>> shm_open(3)). SMD is also OK.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> To my knowledge, a
>>>>>>>>> "Shared Memory Device" better encapsulates the functionality we're
>>>>>>>>> aiming to implement. 
>>>>>>>
>>>>>>>
>>>>>>> Could you explain why that would be better?
>>>>>>> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>>>>>>> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>>>>>>> devices and by ism_loopback. So what is the benefit in changing it?
>>>>>>
>>>>>> I believe that if we are going to separate and refine the code, and add
>>>>>> a common subsystem, we should choose the most appropriate name.
>>>>>>
>>>>>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>>>>>> Since we’re adding a "Device" that enables different entities (such as
>>>>>> processes or VMs) to perform shared memory communication, I think a more
>>>>>> fitting name would be better. If you have any alternative suggestions,
>>>>>> I’m open to them.
>>>>>
>>>>> I kept thinking about this a bit and I'd like to propose yet another
>>>>> name for this group of devices: Memory Communication Devices (MCD)
>>>>>
>>>>> One important point I see is that there is a bit of a misnomer in the
>>>>> existing ISM name in that our ISM device does in fact *not* share
>>>>> memory in the common sense of the "shared memory" wording. Instead it
>>>>> copies data between partitions of memory that share a common
>>>>> cache/memory hierarchy while not sharing the memory itself. loopback-
>>>>> ism and a possibly future virtio-ism on the other hand would share
>>>>> memory in the "shared memory" sense. Though I'd very much hope they
>>>>> will retain a copy mode to allow use in partition scenarios.
>>>>>
>>>>> With that background I think the common denominator between them and
>>>>> the main idea behind ISM is that they facilitate communication via
>>>>> memory buffers and very simple and reliable copy/share operations. I
>>>>> think this would also capture our planned use-case of devices (TTYs,
>>>>> block devices, framebuffers + HID etc) provided by a peer on top of
>>>>> such a memory communication device.
>>>>
>>>> Make sense, I agree with MCD.
>>>>
>>>> Best regard,
>>>> Dust
>>>>
>>>
>>>
>> 
>> Hi Winter,
>> 
>> Sorry for the late reply; we were on break for the Chinese Spring
>> Festival.
>> 
>>>
>>> In the discussion with Andrew Lunn, it showed that
>>> a) we need an abstract description of 'ISM' devices (noted)
>>> b) DMBs (Direct Memory Buffers) are a critical differentiator.
>>>
>>> So what do your think of Direct Memory Communication (DMC) as class name for these devices?
>>>
>>> I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
>>> concrete than MCD or ISM.
>> 
>> I personally prefer MCD over Direct Memory Communication (DMC).
>> 
>> For loopback or Virtio-ISM, DMC seems like a good choice. However, for
>> IBM ISM, since there's a DMA copy involved, it doesn’t seem truly "Direct,"
>> does it?
>> 
>> Additionally, since we are providing a device, MCD feels like a more
>> fitting choice, as it aligns better with the concept of a "device."
>> 
>> Best regards,
>> Dust
>
>Thank you for your thoughts, Dust.
>For me the 'D as 'direct' is not so much about the number of copies, but more about the
>aspect, that you can directly write at any offset into the buffer. I.e. no queues.
>More like the D in DMA or RDMA.

Thanks for you explaintion. I think I understand you now.

>
>I am preparing a talk for netdev in March about this subject, and the more I work on it,
>it seems to me that the buffers ('B'), that are
>a) only authorized for a single remote device and
>b) can be accessed at any offset
>are the important differentiator compared other virtual devices.
>So maybe 'D' for Dedicated?
>
>I even came up with
>dibs - Dedicated Internal Buffer Sharing or
>dibc - Dedicated Internal Buffer Communication
>(ok, I like the sound and look of the 'I'. But being on the same hardware as opposed
>to RDMA is also an important aspect.)
>
>
>MCD - 'memory communication device' sounds rather vague to me. But if it is the
>smallest common denominator, i.e. the only thing we can all agree on, I could live with it.
>

I've thought about it a bit more. Since DMC is at the same level as RDMA
and fits well with the "D" in SMC-D, either DMC or MCD works for me.

Best regards,
Dust

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-01-16 16:17     ` Alexandra Winter
                         ` (2 preceding siblings ...)
  2025-01-17 15:06       ` Andrew Lunn
@ 2025-02-16 15:38       ` Wen Gu
  3 siblings, 0 replies; 61+ messages in thread
From: Wen Gu @ 2025-02-16 15:38 UTC (permalink / raw)
  To: Alexandra Winter, Julian Ruess, dust.li, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Niklas Schnelle, Thorsten Winkler, netdev, linux-s390,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Simon Horman



On 2025/1/17 00:17, Alexandra Winter wrote:
> 
> 
> On 16.01.25 12:55, Julian Ruess wrote:
>> On Thu Jan 16, 2025 at 10:32 AM CET, Dust Li wrote:
>>> On 2025-01-15 20:55:20, Alexandra Winter wrote:
>>>
>>> Hi Winter,
>>>
>>> I'm fully supportive of the refactor!
> 
> 
> Thank you very much Dust Li for joining the discussion.
> 
> 
>>> Interestingly, I developed a similar RFC code about a month ago while
>>> working on enhancing internal communication between guest and host
>>> systems.
> 
> 
> But you did not send that out, did you?
> I hope I did not overlook an earlier proposal by you.
> 
> 
> Here are some of my thoughts on the matter:
>>>
>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>> Device) instead of ISM (Internal Shared Memory).
> 
> 
> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
> 
> 
> To my knowledge, a
>>> "Shared Memory Device" better encapsulates the functionality we're
>>> aiming to implement.
> 
> 
> Could you explain why that would be better?
> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
> devices and by ism_loopback. So what is the benefit in changing it?
> 
> 
> It might be beneficial to place it under
>>> drivers/shd/ and register it as a new class under /sys/class/shd/. That
>>> said, my initial draft also adopted the ISM terminology for simplicity.
>>
>> I'm not sure if we really want to introduce a new name for
>> the already existing ISM device. For me, having two names
>> for the same thing just adds additional complexity.
>>
>> I would go for /sys/class/ism
>>
>>>
>>> Modular Approach: I've made the ism_loopback an independent kernel
>>> module since dynamic enable/disable functionality is not yet supported
>>> in SMC. Using insmod and rmmod for module management could provide the
>>> flexibility needed in practical scenarios.
> 
> 
> With this proposal ism_loopback is just another ism device and SMC-D will
> handle removal just like ism_client.remove(ism_dev) of other ism devices.
> 
> But I understand that net/smc/ism_loopback.c today does not provide enable/disable,
> which is a big disadvantage, I agree. The ism layer is prepared for dynamic
> removal by ism_dev_unregister(). In case of this RFC that would only happen
> in case of rmmod ism. Which should be improved.
> One way to do that would be a separate ism_loopback kernel module, like you say.
> Today ism_loopback is only 10k LOC, so I'd be fine with leaving it in the ism module.
> I also think it is a great way for testing any ISM client, so it has benefit for
> anybody using the ism module.
> Another way would be e.g. an 'enable' entry in the sysfs of the loopback device.
> (Once we agree if and how to represent ism devices in genera in sysfs).
> 
>>>
>>> Abstraction of ISM Device Details: I propose we abstract the ISM device
>>> details by providing SMC with helper functions. These functions could
>>> encapsulate ism->ops, making the implementation cleaner and more
>>> intuitive. This way, the struct ism_device would mainly serve its
>>> implementers, while the upper helper functions offer a streamlined
>>> interface for SMC.
>>>
>>> Structuring and Naming: I recommend embedding the structure of ism_ops
>>> directly within ism_dev rather than using a pointer. Additionally,
>>> renaming it to ism_device_ops could enhance clarity and consistency.
>>>
>>>
>>>> This RFC is about providing a generic shim layer between all kinds of
>>>> ism devices and all kinds of ism users.
>>>>
>>>> Benefits:
>>>> - Cleaner separation of ISM and SMC-D functionality
>>>> - simpler and less module dependencies
>>>> - Clear interface definition.
>>>> - Extendable for future devices and clients.
>>>
>>> Fully agree.
>>>
>>>>
> [...]
>>>>
>>>> Ideas for next steps:
>>>> ---------------------
>>>> - sysfs representation? e.g. as /sys/class/ism ?
>>>> - provide a full-fledged ism loopback interface
>>>>     (runtime enable/disable, sysfs device, ..)
>>>
>>> I think it's better if we can make this common for all ISM devices.
>>> but yeah, that shoud be the next step.
> 
> 
> The s390 ism_vpci devices are already backed by struct pci_dev.
> And I assume that would be represented in sysfs somehow like:
> /sys/class/ism/ism_vp0/device -> /sys/devices/<pci bus no>/<pci dev no>
> so there is an
> /sys/class/ism/<ism dev name>/device/enable entry already,
> because there is /sys/devices/<pci bus no>/<pci dev no>/enable today.
> 
> I remember Wen Gu's first proposal for ism_loopback had a device
> in /sys/devices/virtual/ and had an 'active' entry to enable/disable.
> Something like that could be linked to /sys/class/ism/ism_lo/device.
> 

Yes, the previous proposal can be refered to [1]. And the hierarchy
you mentioned makes sense to me.

[1] https://lore.kernel.org/netdev/20240111120036.109903-1-guwen@linux.alibaba.com/

> 
>>
>> I already have patches based on this series that introduce
>> /sys/class/ism and show ism-loopback as well as
>> s390/ism devices. I can send this soon.
>>
>>
>> Julian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer
  2025-02-10  9:38                 ` Alexandra Winter
  2025-02-11  1:57                   ` Dust Li
@ 2025-02-16 15:40                   ` Wen Gu
  2025-02-19 11:25                     ` [RFC net-next 0/7] Provide an ism layer - naming Alexandra Winter
  1 sibling, 1 reply; 61+ messages in thread
From: Wen Gu @ 2025-02-16 15:40 UTC (permalink / raw)
  To: Alexandra Winter, dust.li, Niklas Schnelle, Julian Ruess,
	Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe,
	Tony Lu, Peter Oberparleiter, David Miller, Jakub Kicinski,
	Paolo Abeni, Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman



On 2025/2/10 17:38, Alexandra Winter wrote:
> 
> 
> On 10.02.25 06:08, Dust Li wrote:
>> On 2025-01-28 17:04:53, Alexandra Winter wrote:
>>>
>>>
>>> On 18.01.25 16:31, Dust Li wrote:
>>>> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>>>>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>>>>
>>>>> ---8<---
>>>>>>> Here are some of my thoughts on the matter:
>>>>>>>>>
>>>>>>>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>>>>>>>> Device) instead of ISM (Internal Shared Memory).
>>>>>>>
>>>>>>>
>>>>>>> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>>>>>>
>>>>>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>>>>>> shm_open(3)). SMD is also OK.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> To my knowledge, a
>>>>>>>>> "Shared Memory Device" better encapsulates the functionality we're
>>>>>>>>> aiming to implement.
>>>>>>>
>>>>>>>
>>>>>>> Could you explain why that would be better?
>>>>>>> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>>>>>>> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>>>>>>> devices and by ism_loopback. So what is the benefit in changing it?
>>>>>>
>>>>>> I believe that if we are going to separate and refine the code, and add
>>>>>> a common subsystem, we should choose the most appropriate name.
>>>>>>
>>>>>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>>>>>> Since we’re adding a "Device" that enables different entities (such as
>>>>>> processes or VMs) to perform shared memory communication, I think a more
>>>>>> fitting name would be better. If you have any alternative suggestions,
>>>>>> I’m open to them.
>>>>>
>>>>> I kept thinking about this a bit and I'd like to propose yet another
>>>>> name for this group of devices: Memory Communication Devices (MCD)
>>>>>
>>>>> One important point I see is that there is a bit of a misnomer in the
>>>>> existing ISM name in that our ISM device does in fact *not* share
>>>>> memory in the common sense of the "shared memory" wording. Instead it
>>>>> copies data between partitions of memory that share a common
>>>>> cache/memory hierarchy while not sharing the memory itself. loopback-
>>>>> ism and a possibly future virtio-ism on the other hand would share
>>>>> memory in the "shared memory" sense. Though I'd very much hope they
>>>>> will retain a copy mode to allow use in partition scenarios.
>>>>>
>>>>> With that background I think the common denominator between them and
>>>>> the main idea behind ISM is that they facilitate communication via
>>>>> memory buffers and very simple and reliable copy/share operations. I
>>>>> think this would also capture our planned use-case of devices (TTYs,
>>>>> block devices, framebuffers + HID etc) provided by a peer on top of
>>>>> such a memory communication device.
>>>>
>>>> Make sense, I agree with MCD.
>>>>
>>>> Best regard,
>>>> Dust
>>>>
>>>
>>>
>>
>> Hi Winter,
>>
>> Sorry for the late reply; we were on break for the Chinese Spring
>> Festival.
>>
>>>
>>> In the discussion with Andrew Lunn, it showed that
>>> a) we need an abstract description of 'ISM' devices (noted)
>>> b) DMBs (Direct Memory Buffers) are a critical differentiator.
>>>
>>> So what do your think of Direct Memory Communication (DMC) as class name for these devices?
>>>
>>> I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
>>> concrete than MCD or ISM.
>>
>> I personally prefer MCD over Direct Memory Communication (DMC).
>>
>> For loopback or Virtio-ISM, DMC seems like a good choice. However, for
>> IBM ISM, since there's a DMA copy involved, it doesn’t seem truly "Direct,"
>> does it?
>>
>> Additionally, since we are providing a device, MCD feels like a more
>> fitting choice, as it aligns better with the concept of a "device."
>>
>> Best regards,
>> Dust
> 
> Thank you for your thoughts, Dust.
> For me the 'D as 'direct' is not so much about the number of copies, but more about the
> aspect, that you can directly write at any offset into the buffer. I.e. no queues.
> More like the D in DMA or RDMA.
> 

IMHO the 'D' means that the CPU copy does not need to be involved, and memory access
only involves between memory and IO devices. So under this semantics, I think 'DMC'
also applies to s390 ism device, since IIUC the s390 ism directly access to the memory
which is passed down by move_data(). The exception is lo-ism, where the device
actually doesn't need to access the memory(DMB), since the data has been put into the
shared memory once the sendmsg() is called and no copy or move is needed. But this
is not a violation of name, just a special kind of short-cut. So DMC makes sense
to me.

> I am preparing a talk for netdev in March about this subject, and the more I work on it,
> it seems to me that the buffers ('B'), that are
> a) only authorized for a single remote device and
> b) can be accessed at any offset
> are the important differentiator compared other virtual devices.
> So maybe 'D' for Dedicated?
> 
> I even came up with
> dibs - Dedicated Internal Buffer Sharing or
> dibc - Dedicated Internal Buffer Communication
> (ok, I like the sound and look of the 'I'. But being on the same hardware as opposed
> to RDMA is also an important aspect.)
> 
> 
> MCD - 'memory communication device' sounds rather vague to me. But if it is the
> smallest common denominator, i.e. the only thing we can all agree on, I could live with it.
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer - naming
  2025-02-16 15:40                   ` Wen Gu
@ 2025-02-19 11:25                     ` Alexandra Winter
  2025-02-25  1:36                       ` Dust Li
  0 siblings, 1 reply; 61+ messages in thread
From: Alexandra Winter @ 2025-02-19 11:25 UTC (permalink / raw)
  To: Wen Gu, dust.li, Niklas Schnelle, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman



On 16.02.25 16:40, Wen Gu wrote:
> 
> 
> On 2025/2/10 17:38, Alexandra Winter wrote:
>>
>>
>> On 10.02.25 06:08, Dust Li wrote:
>>> On 2025-01-28 17:04:53, Alexandra Winter wrote:
>>>>
>>>>
>>>> On 18.01.25 16:31, Dust Li wrote:
>>>>> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>>>>>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>>>>>
>>>>>> ---8<---
>>>>>>>> Here are some of my thoughts on the matter:
>>>>>>>>>>
>>>>>>>>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>>>>>>>>> Device) instead of ISM (Internal Shared Memory).
>>>>>>>>
>>>>>>>>
>>>>>>>> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>>>>>>>
>>>>>>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>>>>>>> shm_open(3)). SMD is also OK.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> To my knowledge, a
>>>>>>>>>> "Shared Memory Device" better encapsulates the functionality we're
>>>>>>>>>> aiming to implement.
>>>>>>>>
>>>>>>>>
>>>>>>>> Could you explain why that would be better?
>>>>>>>> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>>>>>>>> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>>>>>>>> devices and by ism_loopback. So what is the benefit in changing it?
>>>>>>>
>>>>>>> I believe that if we are going to separate and refine the code, and add
>>>>>>> a common subsystem, we should choose the most appropriate name.
>>>>>>>
>>>>>>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>>>>>>> Since we’re adding a "Device" that enables different entities (such as
>>>>>>> processes or VMs) to perform shared memory communication, I think a more
>>>>>>> fitting name would be better. If you have any alternative suggestions,
>>>>>>> I’m open to them.
>>>>>>
>>>>>> I kept thinking about this a bit and I'd like to propose yet another
>>>>>> name for this group of devices: Memory Communication Devices (MCD)
>>>>>>
>>>>>> One important point I see is that there is a bit of a misnomer in the
>>>>>> existing ISM name in that our ISM device does in fact *not* share
>>>>>> memory in the common sense of the "shared memory" wording. Instead it
>>>>>> copies data between partitions of memory that share a common
>>>>>> cache/memory hierarchy while not sharing the memory itself. loopback-
>>>>>> ism and a possibly future virtio-ism on the other hand would share
>>>>>> memory in the "shared memory" sense. Though I'd very much hope they
>>>>>> will retain a copy mode to allow use in partition scenarios.
>>>>>>
>>>>>> With that background I think the common denominator between them and
>>>>>> the main idea behind ISM is that they facilitate communication via
>>>>>> memory buffers and very simple and reliable copy/share operations. I
>>>>>> think this would also capture our planned use-case of devices (TTYs,
>>>>>> block devices, framebuffers + HID etc) provided by a peer on top of
>>>>>> such a memory communication device.
>>>>>
>>>>> Make sense, I agree with MCD.
>>>>>
>>>>> Best regard,
>>>>> Dust
>>>>>
>>>>
>>>>
>>>
>>> Hi Winter,
>>>
>>> Sorry for the late reply; we were on break for the Chinese Spring
>>> Festival.
>>>
>>>>
>>>> In the discussion with Andrew Lunn, it showed that
>>>> a) we need an abstract description of 'ISM' devices (noted)
>>>> b) DMBs (Direct Memory Buffers) are a critical differentiator.
>>>>
>>>> So what do your think of Direct Memory Communication (DMC) as class name for these devices?
>>>>
>>>> I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
>>>> concrete than MCD or ISM.
>>>
>>> I personally prefer MCD over Direct Memory Communication (DMC).
>>>
>>> For loopback or Virtio-ISM, DMC seems like a good choice. However, for
>>> IBM ISM, since there's a DMA copy involved, it doesn’t seem truly "Direct,"
>>> does it?
>>>
>>> Additionally, since we are providing a device, MCD feels like a more
>>> fitting choice, as it aligns better with the concept of a "device."
>>>
>>> Best regards,
>>> Dust
>>
>> Thank you for your thoughts, Dust.
>> For me the 'D as 'direct' is not so much about the number of copies, but more about the
>> aspect, that you can directly write at any offset into the buffer. I.e. no queues.
>> More like the D in DMA or RDMA.
>>
> 
> IMHO the 'D' means that the CPU copy does not need to be involved, and memory access
> only involves between memory and IO devices. So under this semantics, I think 'DMC'
> also applies to s390 ism device, since IIUC the s390 ism directly access to the memory
> which is passed down by move_data(). The exception is lo-ism, where the device
> actually doesn't need to access the memory(DMB), since the data has been put into the
> shared memory once the sendmsg() is called and no copy or move is needed. But this
> is not a violation of name, just a special kind of short-cut. So DMC makes sense
> to me.
> 
>> I am preparing a talk for netdev in March about this subject, and the more I work on it,
>> it seems to me that the buffers ('B'), that are
>> a) only authorized for a single remote device and
>> b) can be accessed at any offset
>> are the important differentiator compared other virtual devices.
>> So maybe 'D' for Dedicated?
>>
>> I even came up with
>> dibs - Dedicated Internal Buffer Sharing or
>> dibc - Dedicated Internal Buffer Communication
>> (ok, I like the sound and look of the 'I'. But being on the same hardware as opposed
>> to RDMA is also an important aspect.)
>>
>>
>> MCD - 'memory communication device' sounds rather vague to me. But if it is the
>> smallest common denominator, i.e. the only thing we can all agree on, I could live with it.
>>


Could you guys accept
'DIBS - Dedicated Internal Buffer Sharing'
as well?
-> dibs_layer, /class/dibs/, dibs_dev

That is currently my favourite.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer - naming
  2025-02-19 11:25                     ` [RFC net-next 0/7] Provide an ism layer - naming Alexandra Winter
@ 2025-02-25  1:36                       ` Dust Li
  2025-02-25  8:40                         ` Alexandra Winter
  0 siblings, 1 reply; 61+ messages in thread
From: Dust Li @ 2025-02-25  1:36 UTC (permalink / raw)
  To: Alexandra Winter, Wen Gu, Niklas Schnelle, Julian Ruess,
	Wenjia Zhang, Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe,
	Tony Lu, Peter Oberparleiter, David Miller, Jakub Kicinski,
	Paolo Abeni, Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman

On 2025-02-19 12:25:59, Alexandra Winter wrote:
>
>
>On 16.02.25 16:40, Wen Gu wrote:
>> 
>> 
>> On 2025/2/10 17:38, Alexandra Winter wrote:
>>>
>>>
>>> On 10.02.25 06:08, Dust Li wrote:
>>>> On 2025-01-28 17:04:53, Alexandra Winter wrote:
>>>>>
>>>>>
>>>>> On 18.01.25 16:31, Dust Li wrote:
>>>>>> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>>>>>>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>>>>>>
>>>>>>> ---8<---
>>>>>>>>> Here are some of my thoughts on the matter:
>>>>>>>>>>>
>>>>>>>>>>> Naming and Structure: I suggest we refer to it as SHD (Shared Memory
>>>>>>>>>>> Device) instead of ISM (Internal Shared Memory).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So where does the 'H' come from? If you want to call it Shared Memory _D_evice?
>>>>>>>>
>>>>>>>> Oh, I was trying to refer to SHM(Share memory file in the userspace, see man
>>>>>>>> shm_open(3)). SMD is also OK.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To my knowledge, a
>>>>>>>>>>> "Shared Memory Device" better encapsulates the functionality we're
>>>>>>>>>>> aiming to implement.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you explain why that would be better?
>>>>>>>>> 'Internal Shared Memory' is supposed to be a bit of a counterpart to the
>>>>>>>>> Remote 'R' in RoCE. Not the greatest name, but it is used already by our ISM
>>>>>>>>> devices and by ism_loopback. So what is the benefit in changing it?
>>>>>>>>
>>>>>>>> I believe that if we are going to separate and refine the code, and add
>>>>>>>> a common subsystem, we should choose the most appropriate name.
>>>>>>>>
>>>>>>>> In my opinion, "ISM" doesn’t quite capture what the device provides.
>>>>>>>> Since we’re adding a "Device" that enables different entities (such as
>>>>>>>> processes or VMs) to perform shared memory communication, I think a more
>>>>>>>> fitting name would be better. If you have any alternative suggestions,
>>>>>>>> I’m open to them.
>>>>>>>
>>>>>>> I kept thinking about this a bit and I'd like to propose yet another
>>>>>>> name for this group of devices: Memory Communication Devices (MCD)
>>>>>>>
>>>>>>> One important point I see is that there is a bit of a misnomer in the
>>>>>>> existing ISM name in that our ISM device does in fact *not* share
>>>>>>> memory in the common sense of the "shared memory" wording. Instead it
>>>>>>> copies data between partitions of memory that share a common
>>>>>>> cache/memory hierarchy while not sharing the memory itself. loopback-
>>>>>>> ism and a possibly future virtio-ism on the other hand would share
>>>>>>> memory in the "shared memory" sense. Though I'd very much hope they
>>>>>>> will retain a copy mode to allow use in partition scenarios.
>>>>>>>
>>>>>>> With that background I think the common denominator between them and
>>>>>>> the main idea behind ISM is that they facilitate communication via
>>>>>>> memory buffers and very simple and reliable copy/share operations. I
>>>>>>> think this would also capture our planned use-case of devices (TTYs,
>>>>>>> block devices, framebuffers + HID etc) provided by a peer on top of
>>>>>>> such a memory communication device.
>>>>>>
>>>>>> Make sense, I agree with MCD.
>>>>>>
>>>>>> Best regard,
>>>>>> Dust
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Hi Winter,
>>>>
>>>> Sorry for the late reply; we were on break for the Chinese Spring
>>>> Festival.
>>>>
>>>>>
>>>>> In the discussion with Andrew Lunn, it showed that
>>>>> a) we need an abstract description of 'ISM' devices (noted)
>>>>> b) DMBs (Direct Memory Buffers) are a critical differentiator.
>>>>>
>>>>> So what do your think of Direct Memory Communication (DMC) as class name for these devices?
>>>>>
>>>>> I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
>>>>> concrete than MCD or ISM.
>>>>
>>>> I personally prefer MCD over Direct Memory Communication (DMC).
>>>>
>>>> For loopback or Virtio-ISM, DMC seems like a good choice. However, for
>>>> IBM ISM, since there's a DMA copy involved, it doesn’t seem truly "Direct,"
>>>> does it?
>>>>
>>>> Additionally, since we are providing a device, MCD feels like a more
>>>> fitting choice, as it aligns better with the concept of a "device."
>>>>
>>>> Best regards,
>>>> Dust
>>>
>>> Thank you for your thoughts, Dust.
>>> For me the 'D as 'direct' is not so much about the number of copies, but more about the
>>> aspect, that you can directly write at any offset into the buffer. I.e. no queues.
>>> More like the D in DMA or RDMA.
>>>
>> 
>> IMHO the 'D' means that the CPU copy does not need to be involved, and memory access
>> only involves between memory and IO devices. So under this semantics, I think 'DMC'
>> also applies to s390 ism device, since IIUC the s390 ism directly access to the memory
>> which is passed down by move_data(). The exception is lo-ism, where the device
>> actually doesn't need to access the memory(DMB), since the data has been put into the
>> shared memory once the sendmsg() is called and no copy or move is needed. But this
>> is not a violation of name, just a special kind of short-cut. So DMC makes sense
>> to me.
>> 
>>> I am preparing a talk for netdev in March about this subject, and the more I work on it,
>>> it seems to me that the buffers ('B'), that are
>>> a) only authorized for a single remote device and
>>> b) can be accessed at any offset
>>> are the important differentiator compared other virtual devices.
>>> So maybe 'D' for Dedicated?
>>>
>>> I even came up with
>>> dibs - Dedicated Internal Buffer Sharing or
>>> dibc - Dedicated Internal Buffer Communication
>>> (ok, I like the sound and look of the 'I'. But being on the same hardware as opposed
>>> to RDMA is also an important aspect.)
>>>
>>>
>>> MCD - 'memory communication device' sounds rather vague to me. But if it is the
>>> smallest common denominator, i.e. the only thing we can all agree on, I could live with it.
>>>
>
>
>Could you guys accept
>'DIBS - Dedicated Internal Buffer Sharing'
>as well?
>-> dibs_layer, /class/dibs/, dibs_dev
>
>That is currently my favourite.
>

I think you might prefer a name that describes shared memory,
but I personally believe that something reflecting the device itself
would be more fitting.

To be honest, here’s my ranking:

MCD > DMC > DIBS

Best regards,
Dust


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC net-next 0/7] Provide an ism layer - naming
  2025-02-25  1:36                       ` Dust Li
@ 2025-02-25  8:40                         ` Alexandra Winter
  0 siblings, 0 replies; 61+ messages in thread
From: Alexandra Winter @ 2025-02-25  8:40 UTC (permalink / raw)
  To: dust.li, Wen Gu, Niklas Schnelle, Julian Ruess, Wenjia Zhang,
	Jan Karcher, Gerd Bayer, Halil Pasic, D. Wythe, Tony Lu,
	Peter Oberparleiter, David Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Thorsten Winkler, netdev, linux-s390, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Simon Horman



On 25.02.25 02:36, Dust Li wrote:
> On 2025-02-19 12:25:59, Alexandra Winter wrote:
>>
>>
>> On 16.02.25 16:40, Wen Gu wrote:
>>>
>>>
>>> On 2025/2/10 17:38, Alexandra Winter wrote:
>>>>
>>>>
>>>> On 10.02.25 06:08, Dust Li wrote:
>>>>> On 2025-01-28 17:04:53, Alexandra Winter wrote:
>>>>>>
>>>>>>
>>>>>> On 18.01.25 16:31, Dust Li wrote:
>>>>>>> On 2025-01-17 11:38:39, Niklas Schnelle wrote:
>>>>>>>> On Fri, 2025-01-17 at 10:13 +0800, Dust Li wrote:
>>>>>>>>>>
>>>>>>>> ---8<---

>>>>>>
>>>>>>
>>>>>
>>>>> Hi Winter,
>>>>>
>>>>> Sorry for the late reply; we were on break for the Chinese Spring
>>>>> Festival.
>>>>>
>>>>>>
>>>>>> In the discussion with Andrew Lunn, it showed that
>>>>>> a) we need an abstract description of 'ISM' devices (noted)
>>>>>> b) DMBs (Direct Memory Buffers) are a critical differentiator.
>>>>>>
>>>>>> So what do your think of Direct Memory Communication (DMC) as class name for these devices?
>>>>>>
>>>>>> I don't have a strong preference (we could also stay with ISM). But DMC may be a bit more
>>>>>> concrete than MCD or ISM.
>>>>>
>>>>> I personally prefer MCD over Direct Memory Communication (DMC).
>>>>>
>>>>> For loopback or Virtio-ISM, DMC seems like a good choice. However, for
>>>>> IBM ISM, since there's a DMA copy involved, it doesn’t seem truly "Direct,"
>>>>> does it?
>>>>>
>>>>> Additionally, since we are providing a device, MCD feels like a more
>>>>> fitting choice, as it aligns better with the concept of a "device."
>>>>>
>>>>> Best regards,
>>>>> Dust
>>>>
>>>> Thank you for your thoughts, Dust.
>>>> For me the 'D as 'direct' is not so much about the number of copies, but more about the
>>>> aspect, that you can directly write at any offset into the buffer. I.e. no queues.
>>>> More like the D in DMA or RDMA.
>>>>
>>>
>>> IMHO the 'D' means that the CPU copy does not need to be involved, and memory access
>>> only involves between memory and IO devices. So under this semantics, I think 'DMC'
>>> also applies to s390 ism device, since IIUC the s390 ism directly access to the memory
>>> which is passed down by move_data(). The exception is lo-ism, where the device
>>> actually doesn't need to access the memory(DMB), since the data has been put into the
>>> shared memory once the sendmsg() is called and no copy or move is needed. But this
>>> is not a violation of name, just a special kind of short-cut. So DMC makes sense
>>> to me.
>>>
>>>> I am preparing a talk for netdev in March about this subject, and the more I work on it,
>>>> it seems to me that the buffers ('B'), that are
>>>> a) only authorized for a single remote device and
>>>> b) can be accessed at any offset
>>>> are the important differentiator compared other virtual devices.
>>>> So maybe 'D' for Dedicated?
>>>>
>>>> I even came up with
>>>> dibs - Dedicated Internal Buffer Sharing or
>>>> dibc - Dedicated Internal Buffer Communication
>>>> (ok, I like the sound and look of the 'I'. But being on the same hardware as opposed
>>>> to RDMA is also an important aspect.)
>>>>
>>>>
>>>> MCD - 'memory communication device' sounds rather vague to me. But if it is the
>>>> smallest common denominator, i.e. the only thing we can all agree on, I could live with it.
>>>>
>>
>>
>> Could you guys accept
>> 'DIBS - Dedicated Internal Buffer Sharing'
>> as well?
>> -> dibs_layer, /class/dibs/, dibs_dev
>>
>> That is currently my favourite.
>>
> 
> I think you might prefer a name that describes shared memory,
> but I personally believe that something reflecting the device itself
> would be more fitting.
> 
> To be honest, here’s my ranking:
> 
> MCD > DMC > DIBS
> 
> Best regards,
> Dust


Thank you keeping the discussion going, Dust.
I know there is no perfect answer, but imo good names can make things easier to
understand.
For reasons described above, I would like to have a 'B for buffer' in the prefix.
There are many I/O concepts that share memory somehow, but the concept of a dmb dedicated to
exactly 2 devices is a differentiator, imo.
I thought 'D for device' was somehow redundant and obvious. But you now you are saying that for
you the device is a differentiator? As opposed to other memory sharing techniques that work
without devices? Maybe you have a point...

So maybe DMB - Direct Memory Buffers is a good term?
I know we use it for the buffers already, but they are actually a common aspect for all devices
and clients, right? So we could define a dmb layer with generic dmb_devices that can be used by dmb_clients
to communicate via dmb_bufs.





^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2025-02-25  9:01 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-15 19:55 [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 1/7] net/ism: Create net/ism Alexandra Winter
2025-01-16 20:08   ` Andrew Lunn
2025-01-17 12:06     ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 2/7] net/ism: Remove dependencies between ISM_VPCI and SMC Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 3/7] net/ism: Use uuid_t for ISM GID Alexandra Winter
2025-01-20 17:18   ` Simon Horman
2025-01-22 14:46     ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 4/7] net/ism: Add kernel-doc comments for ism functions Alexandra Winter
2025-01-15 22:06   ` Halil Pasic
2025-01-20  6:32   ` Dust Li
2025-01-20  9:56     ` Alexandra Winter
2025-01-20 10:07       ` Julian Ruess
2025-01-20 11:35         ` Alexandra Winter
2025-01-20 10:34     ` Niklas Schnelle
2025-01-22 15:02       ` Dust Li
2025-01-15 19:55 ` [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism Alexandra Winter
2025-01-20  3:55   ` Dust Li
2025-01-20  9:31     ` Alexandra Winter
2025-02-06 17:36   ` Julian Ruess
2025-02-10 10:39     ` Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 6/7] s390/ism: Define ismvp_dev Alexandra Winter
2025-01-15 19:55 ` [RFC net-next 7/7] net/smc: Use only ism_ops Alexandra Winter
2025-01-16  9:32 ` [RFC net-next 0/7] Provide an ism layer Dust Li
2025-01-16 11:55   ` Julian Ruess
2025-01-16 16:17     ` Alexandra Winter
2025-01-16 17:08       ` Julian Ruess
2025-01-17  2:13       ` Dust Li
2025-01-17 10:38         ` Niklas Schnelle
2025-01-17 15:02           ` Andrew Lunn
2025-01-17 16:00             ` Niklas Schnelle
2025-01-17 16:33               ` Andrew Lunn
2025-01-17 16:57                 ` Niklas Schnelle
2025-01-17 20:29                   ` Andrew Lunn
2025-01-20  6:21                     ` Dust Li
2025-01-20 12:03                       ` Alexandra Winter
2025-01-20 16:01                         ` Andrew Lunn
2025-01-20 17:25                           ` Alexandra Winter
2025-01-18 15:31           ` Dust Li
2025-01-28 16:04             ` Alexandra Winter
2025-02-10  5:08               ` Dust Li
2025-02-10  9:38                 ` Alexandra Winter
2025-02-11  1:57                   ` Dust Li
2025-02-16 15:40                   ` Wen Gu
2025-02-19 11:25                     ` [RFC net-next 0/7] Provide an ism layer - naming Alexandra Winter
2025-02-25  1:36                       ` Dust Li
2025-02-25  8:40                         ` Alexandra Winter
2025-01-17 13:00         ` [RFC net-next 0/7] Provide an ism layer Alexandra Winter
2025-01-17 15:10           ` Andrew Lunn
2025-01-17 16:20             ` Alexandra Winter
2025-01-20 10:28           ` Alexandra Winter
2025-01-22  3:04             ` Dust Li
2025-01-22 12:02               ` Alexandra Winter
2025-01-22 12:05                 ` Alexandra Winter
2025-01-22 14:10                   ` Dust Li
2025-01-17 15:06       ` Andrew Lunn
2025-01-17 15:38         ` Alexandra Winter
2025-02-16 15:38       ` Wen Gu
2025-01-17 11:04   ` Alexandra Winter
2025-01-18 15:24     ` Dust Li
2025-01-20 11:45       ` Alexandra Winter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).