Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v7 07/12] MAINTAINERS: add myself as PCS subsystem maintainer
From: Christian Marangi @ 2026-06-15 12:29 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Christian Marangi,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-1-ansuelsmth@gmail.com>

List all the files of the Ethernet PCS subsystem and add myself as
maintainer.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 MAINTAINERS | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cc1dde0c9067..ef3ef5096d08 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9593,6 +9593,15 @@ F:	include/uapi/linux/if_bridge.h
 F:	include/linux/netfilter_bridge/
 F:	net/bridge/
 
+ETHERNET PCS SUBSYSTEM
+M:	Christian Marangi <ansuelsmth@gmail.com>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	Documentation/networking/pcs.rst
+F:	drivers/net/pcs/pcs.c
+F:	include/linux/pcs/pcs-provider.h
+F:	include/linux/pcs/pcs.h
+
 ETHERNET PHY LIBRARY
 M:	Andrew Lunn <andrew@lunn.ch>
 M:	Heiner Kallweit <hkallweit1@gmail.com>
-- 
2.53.0



^ permalink raw reply related

* [PATCH net-next v7 04/12] net: pcs: implement Firmware node support for PCS driver
From: Christian Marangi @ 2026-06-15 12:29 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Christian Marangi,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
  Cc: Daniel Golle
In-Reply-To: <20260615122950.22281-1-ansuelsmth@gmail.com>

Implement the foundation of Firmware node support for PCS driver.

To support this, implement a simple Provider API where a PCS driver can
expose multiple PCS with an xlate .get function.

PCS driver will have to call fwnode_pcs_add_provider() and pass the
firmware node pointer and a xlate function to return the correct PCS for
the passed #pcs-cells.

This will register the PCS in a global list of providers so that
consumer can access it.

The consumer will then use fwnode_pcs_get() to get the actual PCS by
passing the firmware node pointer and the index for #pcs-cells.

For a simple implementation where #pcs-cells is 0 and the PCS driver
expose a single PCS, the xlate function fwnode_pcs_simple_get() is
provided.

For an advanced implementation a custom xlate function is required.

On removal the PCS driver should first delete itself from the provider
list using fwnode_pcs_del_provider() and then call phylink_release_pcs()
on every PCS the driver provides.

Generic functions fwnode_phylink_pcs_count() and fwnode_phylink_pcs_parse()
are provided for MAC driver that will declare PCS in DT (or ACPI).

Function fwnode_phylink_pcs_count() will parse "pcs-handle" property and
will return the number of PCS entries described in the passed firmware
node.

Function fwnode_phylink_pcs_parse() will parse "pcs-handle" property and
fill the passed available_pcs array with the available PCS found up to passed
num_pcs value. It's worth to mention that this function will ignore PCS
that still needs to be probed (returning -ENODEV) and such PCS won't be
added to the available_pcs array.

Co-developed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 drivers/net/pcs/Kconfig          |   6 +
 drivers/net/pcs/Makefile         |   1 +
 drivers/net/pcs/pcs.c            | 208 +++++++++++++++++++++++++++++++
 include/linux/pcs/pcs-provider.h |  41 ++++++
 include/linux/pcs/pcs.h          |  75 +++++++++++
 5 files changed, 331 insertions(+)
 create mode 100644 drivers/net/pcs/pcs.c
 create mode 100644 include/linux/pcs/pcs-provider.h
 create mode 100644 include/linux/pcs/pcs.h

diff --git a/drivers/net/pcs/Kconfig b/drivers/net/pcs/Kconfig
index e417fd66f660..2ce89d4bff6b 100644
--- a/drivers/net/pcs/Kconfig
+++ b/drivers/net/pcs/Kconfig
@@ -5,6 +5,12 @@
 
 menu "PCS device drivers"
 
+config FWNODE_PCS
+	bool "PCS Firmware Node"
+	depends on (ACPI || OF)
+	help
+		Firmware node PCS accessors
+
 config PCS_XPCS
 	tristate "Synopsys DesignWare Ethernet XPCS"
 	select PHYLINK
diff --git a/drivers/net/pcs/Makefile b/drivers/net/pcs/Makefile
index 4f7920618b90..3005cdd89ab7 100644
--- a/drivers/net/pcs/Makefile
+++ b/drivers/net/pcs/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 # Makefile for Linux PCS drivers
 
+obj-$(CONFIG_FWNODE_PCS)	+= pcs.o
 pcs_xpcs-$(CONFIG_PCS_XPCS)	:= pcs-xpcs.o pcs-xpcs-plat.o \
 				   pcs-xpcs-nxp.o pcs-xpcs-wx.o
 
diff --git a/drivers/net/pcs/pcs.c b/drivers/net/pcs/pcs.c
new file mode 100644
index 000000000000..67f9716f48fd
--- /dev/null
+++ b/drivers/net/pcs/pcs.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/mutex.h>
+#include <linux/property.h>
+#include <linux/phylink.h>
+#include <linux/pcs/pcs.h>
+#include <linux/pcs/pcs-provider.h>
+
+MODULE_DESCRIPTION("PCS library");
+MODULE_AUTHOR("Christian Marangi <ansuelsmth@gmail.com>");
+MODULE_LICENSE("GPL");
+
+struct fwnode_pcs_provider {
+	struct list_head link;
+
+	struct fwnode_handle *fwnode;
+	struct phylink_pcs *(*get)(struct fwnode_reference_args *pcsspec,
+				   void *data);
+
+	void *data;
+};
+
+static LIST_HEAD(fwnode_pcs_providers);
+static DEFINE_MUTEX(fwnode_pcs_mutex);
+
+struct phylink_pcs *fwnode_pcs_simple_get(struct fwnode_reference_args *pcsspec,
+					  void *data)
+{
+	return data;
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_simple_get);
+
+int fwnode_pcs_add_provider(struct fwnode_handle *fwnode,
+			    struct phylink_pcs *(*get)(struct fwnode_reference_args *pcsspec,
+						       void *data),
+			    void *data)
+{
+	struct fwnode_pcs_provider *pp;
+
+	if (!fwnode)
+		return 0;
+
+	pp = kzalloc_obj(*pp);
+	if (!pp)
+		return -ENOMEM;
+
+	pp->fwnode = fwnode_handle_get(fwnode);
+	pp->data = data;
+	pp->get = get;
+
+	mutex_lock(&fwnode_pcs_mutex);
+	list_add(&pp->link, &fwnode_pcs_providers);
+	mutex_unlock(&fwnode_pcs_mutex);
+	pr_debug("Added pcs provider from %pfwf\n", fwnode);
+
+	fwnode_dev_initialized(fwnode, true);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_add_provider);
+
+void fwnode_pcs_del_provider(struct fwnode_handle *fwnode)
+{
+	struct fwnode_pcs_provider *pp;
+
+	if (!fwnode)
+		return;
+
+	mutex_lock(&fwnode_pcs_mutex);
+	list_for_each_entry(pp, &fwnode_pcs_providers, link) {
+		if (pp->fwnode == fwnode) {
+			list_del(&pp->link);
+			fwnode_dev_initialized(pp->fwnode, false);
+			fwnode_handle_put(pp->fwnode);
+			kfree(pp);
+			break;
+		}
+	}
+	mutex_unlock(&fwnode_pcs_mutex);
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_del_provider);
+
+static int fwnode_parse_pcsspec(const struct fwnode_handle *fwnode,
+				unsigned int index, const char *name,
+				struct fwnode_reference_args *out_args)
+{
+	int ret;
+
+	if (!fwnode)
+		return -EINVAL;
+
+	if (name) {
+		index = fwnode_property_match_string(fwnode, "pcs-names",
+						     name);
+		if (index < 0)
+			return index;
+	}
+
+	ret = fwnode_property_get_reference_args(fwnode, "pcs-handle",
+						 "#pcs-cells",
+						 -1, index, out_args);
+	if (ret || (name && index < 0))
+		return ret;
+
+	return 0;
+}
+
+static struct phylink_pcs *
+fwnode_pcs_get_from_pcsspec(struct fwnode_reference_args *pcsspec)
+{
+	struct fwnode_pcs_provider *provider;
+	struct phylink_pcs *pcs = ERR_PTR(-ENODEV);
+
+	if (!pcsspec)
+		return ERR_PTR(-EINVAL);
+
+	mutex_lock(&fwnode_pcs_mutex);
+	list_for_each_entry(provider, &fwnode_pcs_providers, link) {
+		if (provider->fwnode == pcsspec->fwnode) {
+			pcs = provider->get(pcsspec, provider->data);
+			if (!IS_ERR(pcs))
+				break;
+		}
+	}
+	mutex_unlock(&fwnode_pcs_mutex);
+
+	return pcs;
+}
+
+static struct phylink_pcs *__fwnode_pcs_get(struct fwnode_handle *fwnode,
+					    unsigned int index, const char *con_id)
+{
+	struct fwnode_reference_args pcsspec;
+	struct phylink_pcs *pcs;
+	int ret;
+
+	ret = fwnode_parse_pcsspec(fwnode, index, con_id, &pcsspec);
+	if (ret)
+		return ERR_PTR(ret);
+
+	pcs = fwnode_pcs_get_from_pcsspec(&pcsspec);
+	fwnode_handle_put(pcsspec.fwnode);
+
+	return pcs;
+}
+
+struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode, unsigned int index)
+{
+	return __fwnode_pcs_get(fwnode, index, NULL);
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_get);
+
+unsigned int fwnode_phylink_pcs_count(struct fwnode_handle *fwnode)
+{
+	struct fwnode_reference_args out_args;
+	int index = 0;
+	int ret;
+
+	while (true) {
+		ret = fwnode_property_get_reference_args(fwnode, "pcs-handle",
+							 "#pcs-cells",
+							 -1, index, &out_args);
+		/* We expect to reach an -ENOENT error while counting */
+		if (ret)
+			break;
+
+		fwnode_handle_put(out_args.fwnode);
+		index++;
+	}
+
+	return index;
+}
+EXPORT_SYMBOL_GPL(fwnode_phylink_pcs_count);
+
+int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
+			     struct phylink_pcs **available_pcs,
+			     unsigned int num_pcs)
+{
+	unsigned int i, found = 0;
+
+	if (!available_pcs)
+		return -EINVAL;
+
+	if (!fwnode_property_present(fwnode, "pcs-handle"))
+		return -ENODEV;
+
+	for (i = 0; i < num_pcs; i++) {
+		struct phylink_pcs *pcs;
+
+		pcs = fwnode_pcs_get(fwnode, i);
+		if (IS_ERR(pcs)) {
+			/*
+			 * Ignore -ENODEV error for PCS that still
+			 * needs to probe.
+			 */
+			if (PTR_ERR(pcs) == -ENODEV)
+				continue;
+
+			return PTR_ERR(pcs);
+		}
+
+		available_pcs[found] = pcs;
+		found++;
+	}
+
+	return found;
+}
+EXPORT_SYMBOL_GPL(fwnode_phylink_pcs_parse);
diff --git a/include/linux/pcs/pcs-provider.h b/include/linux/pcs/pcs-provider.h
new file mode 100644
index 000000000000..ae51c108147e
--- /dev/null
+++ b/include/linux/pcs/pcs-provider.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __LINUX_PCS_PROVIDER_H
+#define __LINUX_PCS_PROVIDER_H
+
+/**
+ * fwnode_pcs_simple_get - Simple xlate function to retrieve PCS
+ * @pcsspec: reference arguments
+ * @data: Context data (assumed assigned to the single PCS)
+ *
+ * Returns: the PCS pointed by data.
+ */
+struct phylink_pcs *fwnode_pcs_simple_get(struct fwnode_reference_args *pcsspec,
+					  void *data);
+
+/**
+ * fwnode_pcs_add_provider - Registers a new PCS provider
+ * @fwnode: Firmware node
+ * @get: xlate function to retrieve the PCS
+ * @data: Context data
+ *
+ * Register and add a new PCS to the global providers list
+ * for the firmware node. A function to get the PCS from
+ * firmware node with the use fwnode reference arguments.
+ * To the get function is also passed the interface type
+ * requested for the PHY. PCS driver will use the passed
+ * interface to understand if the PCS can support it or not.
+ *
+ * Returns: 0 on success or -ENOMEM on allocation failure.
+ */
+int fwnode_pcs_add_provider(struct fwnode_handle *fwnode,
+			    struct phylink_pcs *(*get)(struct fwnode_reference_args *pcsspec,
+						       void *data),
+			    void *data);
+
+/**
+ * fwnode_pcs_del_provider - Removes a PCS provider
+ * @fwnode: Firmware node
+ */
+void fwnode_pcs_del_provider(struct fwnode_handle *fwnode);
+
+#endif /* __LINUX_PCS_PROVIDER_H */
diff --git a/include/linux/pcs/pcs.h b/include/linux/pcs/pcs.h
new file mode 100644
index 000000000000..df1e6f32cfad
--- /dev/null
+++ b/include/linux/pcs/pcs.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __LINUX_PCS_H
+#define __LINUX_PCS_H
+
+#include <linux/phylink.h>
+
+#if IS_ENABLED(CONFIG_FWNODE_PCS)
+/**
+ * fwnode_pcs_get - Retrieves a PCS from a firmware node
+ * @fwnode: firmware node
+ * @index: index fwnode PCS handle in firmware node
+ *
+ * Get a PCS from the firmware node at index.
+ *
+ * Returns: a pointer to the phylink_pcs or a negative
+ * error pointer. Can return -ENODEV if the PCS is not
+ * present in global providers list (either due to driver
+ * still needs to be probed or it failed to probe/removed).
+ */
+struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode,
+				   unsigned int index);
+
+/**
+ * fwnode_phylink_pcs_count - count PCS entries described in firmware node
+ * @fwnode: firmware node
+ *
+ * Helper function to count the number of PCS entries referenced by the
+ * "pcs-handle" property in a firmware node.
+ *
+ * Note that this function counts all PCS references in the firmware node,
+ * regardless of whether the corresponding PCS devices are already probed.
+ *
+ * Returns: number of PCS entries described in the firmware node.
+ */
+unsigned int fwnode_phylink_pcs_count(struct fwnode_handle *fwnode);
+
+/**
+ * fwnode_phylink_pcs_parse - parse available PCS from firmware node
+ * @fwnode: firmware node
+ * @available_pcs: pointer to preallocated array of PCS
+ * @num_pcs: maximum number of PCS entries to scan
+ *
+ * Helper function that parses PCS references from the "pcs-handle"
+ * property of a firmware node and fills @available_pcs with PCS that are
+ * currently available up to @num_pcs.
+ *
+ * Only PCS that are currently available are stored in @available_pcs.
+ * PCS that returns -ENODEV are skipped.
+ *
+ * Returns: number of PCS stored in @available_pcs, or negative error code.
+ */
+int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
+			     struct phylink_pcs **available_pcs,
+			     unsigned int num_pcs);
+#else
+static inline struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode,
+						 int index)
+{
+	return ERR_PTR(-ENOENT);
+}
+
+static inline unsigned int fwnode_phylink_pcs_count(struct fwnode_handle *fwnode)
+{
+	return 0;
+}
+
+static inline int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
+					   struct phylink_pcs **available_pcs,
+					   unsigned int num_pcs)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
+#endif /* __LINUX_PCS_H */
-- 
2.53.0



^ permalink raw reply related

* [PATCH net-next v7 02/12] net: phylink: introduce internal phylink PCS handling
From: Christian Marangi @ 2026-06-15 12:29 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Christian Marangi,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-1-ansuelsmth@gmail.com>

Introduce internal handling of PCS for phylink. This is an alternative
way to .mac_select_pcs that moves the selection logic of the PCS entirely
to phylink with the usage of the supported_interface value in the PCS
struct.

MAC should now provide a callback to fill the available PCS in
phylink_config in .fill_available_pcs and fill the .num_possible_pcs with
the number of elements in the array. MAC should also define a new bitmap,
pcs_interfaces, in phylink_config to define for what interface mode a
dedicated PCS is required.

On phylink_create(), an array of PCS pointer is allocated of size
.num_possible_pcs from phylink_config and .fill_available_pcs from
phylink_config is called passing as args the just allocated array and
the number of possible element in it.

MAC will fill this passed array with all the available PCS.

This array is then parsed and a linked list of PCS is created based on
the allocated PCS array filled by MAC via .fill_available_pcs().

Every PCS in phylink PCS list gets then linked to the phylink instance
by setting the phylink value in phylink_pcs struct to the phylink instance.
Also the supported_interface value in phylink struct is updated with
the new supported_interface from the provided PCS.

On phylink_destroy(), every PCS in phylink PCS list is unlinked from the
phylink instance by setting the phylink value in phylink_pcs struct to NULL
and removed from the PCS list.

phylink_validate_mac_and_pcs(), phylink_major_config() and
phylink_inband_caps() are updated to support this new implementation
with the PCS list stored in phylink.

They will make use of phylink_validate_pcs_interface() that will loop
for every PCS in the phylink PCS available list and find one that supports
the passed interface.

phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
where if a supported_interface value is not set for the PCS struct, then
it's assumed every interface is supported.

A MAC is required to implement either a .mac_select_pcs or make use of
the PCS list implementation. Implementing both will result in a fail
on phylink_create().

A MAC defining .num_possible_pcs in phylink_config MUST also define a
.fill_available_pcs or phylink_create() will fail with an negative error.

phylink value in phylink_pcs struct with this implementation is used to
track from PCS side when it's attached to a phylink instance. PCS driver
will make use of this information to correctly detach from a phylink
instance if needed.

phylink_pcs_change() is also changed to verify that the PCS that triggered
a link change is the one that is currently used by the phylink instance.

The .mac_select_pcs implementation is not changed but it's expected that
every MAC driver migrates to the new implementation to later deprecate
and remove .mac_select_pcs.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 drivers/net/phy/phylink.c | 196 ++++++++++++++++++++++++++++++++++----
 include/linux/phylink.h   |  16 ++++
 2 files changed, 192 insertions(+), 20 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 4d59c0dd78db..cb07184ce82f 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -60,6 +60,9 @@ struct phylink {
 	/* The link configuration settings */
 	struct phylink_link_state link_config;
 
+	/* List of available PCS */
+	struct list_head pcs_list;
+
 	/* What interface are supported by the current link.
 	 * Can change on removal or addition of new PCS.
 	 */
@@ -154,6 +157,8 @@ static const phy_interface_t phylink_sfp_interface_preference[] = {
 
 static DECLARE_PHY_INTERFACE_MASK(phylink_sfp_interfaces);
 
+static void phylink_run_resolve(struct phylink *pl);
+
 /**
  * phylink_set_port_modes() - set the port type modes in the ethtool mask
  * @mask: ethtool link mode mask
@@ -518,12 +523,29 @@ static void phylink_validate_mask_caps(unsigned long *supported,
 	linkmode_and(state->advertising, state->advertising, mask);
 }
 
+static int phylink_validate_pcs_interface(struct phylink_pcs *pcs,
+					  phy_interface_t interface)
+{
+	/* If PCS define an empty supported_interfaces value, assume
+	 * all interface are supported.
+	 */
+	if (phy_interface_empty(pcs->supported_interfaces))
+		return 0;
+
+	/* Ensure that this PCS supports the interface mode */
+	if (!test_bit(interface, pcs->supported_interfaces))
+		return -EINVAL;
+
+	return 0;
+}
+
 static int phylink_validate_mac_and_pcs(struct phylink *pl,
 					unsigned long *supported,
 					struct phylink_link_state *state)
 {
-	struct phylink_pcs *pcs = NULL;
 	unsigned long capabilities;
+	struct phylink_pcs *pcs;
+	bool pcs_found = false;
 	int ret;
 
 	/* Get the PCS for this interface mode */
@@ -531,9 +553,24 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
 		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
 		if (IS_ERR(pcs))
 			return PTR_ERR(pcs);
+
+		pcs_found = !!pcs;
+	/*
+	 * Find a PCS in available PCS list for the requested interface.
+	 *
+	 * Skip searching if the MAC doesn't require a dedicated PCS for
+	 * the requested interface.
+	 */
+	} else if (test_bit(state->interface, pl->config->pcs_interfaces)) {
+		list_for_each_entry(pcs, &pl->pcs_list, list) {
+			if (!phylink_validate_pcs_interface(pcs, state->interface)) {
+				pcs_found = true;
+				break;
+			}
+		}
 	}
 
-	if (pcs) {
+	if (pcs_found) {
 		/* The PCS, if present, must be setup before phylink_create()
 		 * has been called. If the ops is not initialised, print an
 		 * error and backtrace rather than oopsing the kernel.
@@ -545,13 +582,10 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
 			return -EINVAL;
 		}
 
-		/* Ensure that this PCS supports the interface which the MAC
-		 * returned it for. It is an error for the MAC to return a PCS
-		 * that does not support the interface mode.
-		 */
-		if (!phy_interface_empty(pcs->supported_interfaces) &&
-		    !test_bit(state->interface, pcs->supported_interfaces)) {
-			phylink_err(pl, "MAC returned PCS which does not support %s\n",
+		/* Recheck PCS to handle legacy way for .mac_select_pcs */
+		ret = phylink_validate_pcs_interface(pcs, state->interface);
+		if (ret) {
+			phylink_err(pl, "selected PCS does not support %s\n",
 				    phy_modes(state->interface));
 			return -EINVAL;
 		}
@@ -965,12 +999,22 @@ static unsigned int phylink_inband_caps(struct phylink *pl,
 					 phy_interface_t interface)
 {
 	struct phylink_pcs *pcs;
+	bool pcs_found = false;
 
-	if (!pl->mac_ops->mac_select_pcs)
-		return 0;
+	if (pl->mac_ops->mac_select_pcs) {
+		pcs = pl->mac_ops->mac_select_pcs(pl->config,
+						  interface);
+		pcs_found = !!pcs;
+	} else if (test_bit(interface, pl->config->pcs_interfaces)) {
+		list_for_each_entry(pcs, &pl->pcs_list, list) {
+			if (!phylink_validate_pcs_interface(pcs, interface)) {
+				pcs_found = true;
+				break;
+			}
+		}
+	}
 
-	pcs = pl->mac_ops->mac_select_pcs(pl->config, interface);
-	if (!pcs)
+	if (!pcs_found)
 		return 0;
 
 	return phylink_pcs_inband_caps(pcs, interface);
@@ -1265,10 +1309,36 @@ static void phylink_major_config(struct phylink *pl, bool restart,
 			pl->major_config_failed = true;
 			return;
 		}
+	/* Find a PCS in available PCS list for the requested interface.
+	 * This doesn't overwrite the previous .mac_select_pcs as either
+	 * .mac_select_pcs or PCS list implementation are permitted.
+	 *
+	 * Skip searching if the MAC doesn't require a dedicated PCS for
+	 * the requested interface.
+	 */
+	} else if (test_bit(state->interface, pl->config->pcs_interfaces)) {
+		bool pcs_found = false;
+
+		list_for_each_entry(pcs, &pl->pcs_list, list) {
+			if (!phylink_validate_pcs_interface(pcs,
+							    state->interface)) {
+				pcs_found = true;
+				break;
+			}
+		}
 
-		pcs_changed = pl->pcs != pcs;
+		if (!pcs_found) {
+			phylink_err(pl,
+				    "couldn't find a PCS for %s\n",
+				    phy_modes(state->interface));
+
+			pl->major_config_failed = true;
+			return;
+		}
 	}
 
+	pcs_changed = pl->pcs != pcs;
+
 	phylink_pcs_neg_mode(pl, pcs, state->interface, state->advertising);
 
 	phylink_dbg(pl, "major config, active %s/%s/%s\n",
@@ -1295,11 +1365,13 @@ static void phylink_major_config(struct phylink *pl, bool restart,
 	if (pcs_changed) {
 		phylink_pcs_disable(pl->pcs);
 
-		if (pl->pcs)
-			pl->pcs->phylink = NULL;
+		if (pl->mac_ops->mac_select_pcs) {
+			if (pl->pcs)
+				pl->pcs->phylink = NULL;
 
-		if (pcs)
-			pcs->phylink = pl;
+			if (pcs)
+				pcs->phylink = pl;
+		}
 
 		pl->pcs = pcs;
 	}
@@ -1834,6 +1906,44 @@ int phylink_set_fixed_link(struct phylink *pl,
 }
 EXPORT_SYMBOL_GPL(phylink_set_fixed_link);
 
+static int phylink_fill_available_pcs(struct phylink *pl,
+				      struct phylink_config *config)
+{
+	struct phylink_pcs **pcss;
+	int i, ret;
+
+	if (!config->num_possible_pcs)
+		return 0;
+
+	if (!config->fill_available_pcs) {
+		dev_err(config->dev,
+			"phylink: error: num_possible_pcs defined but no fill_available_pcs\n");
+		return -EINVAL;
+	}
+
+	pcss = kzalloc_objs(*pcss, config->num_possible_pcs);
+	if (!pcss)
+		return -ENOMEM;
+
+	ret = config->fill_available_pcs(config, pcss, config->num_possible_pcs);
+	if (ret < 0)
+		goto out;
+
+	for (i = 0; i < config->num_possible_pcs; i++) {
+		struct phylink_pcs *pcs = pcss[i];
+
+		if (!pcs)
+			continue;
+
+		list_add(&pcs->list, &pl->pcs_list);
+	}
+
+out:
+	kfree(pcss);
+
+	return ret;
+}
+
 /**
  * phylink_create() - create a phylink instance
  * @config: a pointer to the target &struct phylink_config
@@ -1855,6 +1965,7 @@ struct phylink *phylink_create(struct phylink_config *config,
 			       phy_interface_t iface,
 			       const struct phylink_mac_ops *mac_ops)
 {
+	struct phylink_pcs *pcs;
 	struct phylink *pl;
 	int ret;
 
@@ -1865,6 +1976,16 @@ struct phylink *phylink_create(struct phylink_config *config,
 		return ERR_PTR(-EINVAL);
 	}
 
+	/*
+	 * Make sure either PCS internal validation or .mac_select_pcs
+	 * is used. Return error if both are defined.
+	 */
+	if (config->num_possible_pcs && pl->mac_ops->mac_select_pcs) {
+		dev_err(config->dev,
+			"phylink: error: either phylink_config .num_possible_pcs or .mac_select_pcs must be used\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	pl = kzalloc_obj(*pl);
 	if (!pl)
 		return ERR_PTR(-ENOMEM);
@@ -1872,10 +1993,28 @@ struct phylink *phylink_create(struct phylink_config *config,
 	mutex_init(&pl->phydev_mutex);
 	mutex_init(&pl->state_mutex);
 	INIT_WORK(&pl->resolve, phylink_resolve);
+	INIT_LIST_HEAD(&pl->pcs_list);
+
+	/* Fill the PCS list with available PCS from phylink config */
+	ret = phylink_fill_available_pcs(pl, config);
+	if (ret < 0) {
+		kfree(pl);
+		return ERR_PTR(ret);
+	}
+
+	/* Link available PCS to phylink */
+	list_for_each_entry(pcs, &pl->pcs_list, list)
+		pcs->phylink = pl;
 
 	phy_interface_copy(pl->supported_interfaces,
 			   config->supported_interfaces);
 
+	/* Update supported interfaces */
+	list_for_each_entry(pcs, &pl->pcs_list, list)
+		phy_interface_or(pl->supported_interfaces,
+				 pl->supported_interfaces,
+				 pcs->supported_interfaces);
+
 	pl->config = config;
 	if (config->type == PHYLINK_NETDEV) {
 		pl->netdev = to_net_dev(config->dev);
@@ -1953,10 +2092,20 @@ EXPORT_SYMBOL_GPL(phylink_create);
  */
 void phylink_destroy(struct phylink *pl)
 {
+	struct phylink_pcs *pcs, *tmp;
+
 	sfp_bus_del_upstream(pl->sfp_bus);
 	if (pl->link_gpio)
 		gpiod_put(pl->link_gpio);
 
+	/* Drop link between PCS and phylink */
+	list_for_each_entry(pcs, &pl->pcs_list, list)
+		pcs->phylink = NULL;
+
+	/* Remove every PCS from phylink PCS list */
+	list_for_each_entry_safe(pcs, tmp, &pl->pcs_list, list)
+		list_del(&pcs->list);
+
 	cancel_work_sync(&pl->resolve);
 	kfree(pl);
 }
@@ -2413,8 +2562,15 @@ void phylink_pcs_change(struct phylink_pcs *pcs, bool up)
 {
 	struct phylink *pl = pcs->phylink;
 
-	if (pl)
-		phylink_link_changed(pl, up, "pcs");
+	/*
+	 * Ignore PCS link state change if the PCS is not
+	 * attached to a phylink instance or the phylink
+	 * instance is not currently using this PCS.
+	 */
+	if (!pl || pl->pcs != pcs)
+		return;
+
+	phylink_link_changed(pl, up, "pcs");
 }
 EXPORT_SYMBOL_GPL(phylink_pcs_change);
 
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 2bc0db3d52ac..ca9dfc142388 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -12,6 +12,7 @@ struct ethtool_cmd;
 struct fwnode_handle;
 struct net_device;
 struct phylink;
+struct phylink_pcs;
 
 enum {
 	MLO_PAUSE_NONE,
@@ -151,6 +152,8 @@ enum phylink_op_type {
  *		     if MAC link is at %MLO_AN_FIXED mode.
  * @supported_interfaces: bitmap describing which PHY_INTERFACE_MODE_xxx
  *                        are supported by the MAC/PCS.
+ * @pcs_interfaces: bitmap describing for which PHY_INTERFACE_MODE_xxx a
+ *		    dedicated PCS is required.
  * @lpi_interfaces: bitmap describing which PHY interface modes can support
  *		    LPI signalling.
  * @mac_capabilities: MAC pause/speed/duplex capabilities.
@@ -160,6 +163,10 @@ enum phylink_op_type {
  * @wol_phy_legacy: Use Wake-on-Lan with PHY even if phy_can_wakeup() is false
  * @wol_phy_speed_ctrl: Use phy speed control on suspend/resume
  * @wol_mac_support: Bitmask of MAC supported %WAKE_* options
+ * @num_possible_pcs: num of possible phylink_pcs PCS
+ * @fill_available_pcs: callback to fill the available PCS in the passed
+ *			array struct of phylink_pcs PCS available_pcs up to
+ *			num_possible_pcs.
  */
 struct phylink_config {
 	struct device *dev;
@@ -172,6 +179,7 @@ struct phylink_config {
 	void (*get_fixed_state)(struct phylink_config *config,
 				struct phylink_link_state *state);
 	DECLARE_PHY_INTERFACE_MASK(supported_interfaces);
+	DECLARE_PHY_INTERFACE_MASK(pcs_interfaces);
 	DECLARE_PHY_INTERFACE_MASK(lpi_interfaces);
 	unsigned long mac_capabilities;
 	unsigned long lpi_capabilities;
@@ -182,6 +190,11 @@ struct phylink_config {
 	bool wol_phy_legacy;
 	bool wol_phy_speed_ctrl;
 	u32 wol_mac_support;
+
+	unsigned int num_possible_pcs;
+	int (*fill_available_pcs)(struct phylink_config *config,
+				  struct phylink_pcs **available_pcs,
+				  unsigned int num_possible_pcs);
 };
 
 void phylink_limit_mac_speed(struct phylink_config *config, u32 max_speed);
@@ -497,6 +510,9 @@ struct phylink_pcs {
 	struct phylink *phylink;
 	bool poll;
 	bool rxc_always_on;
+
+	/* private: */
+	struct list_head list;
 };
 
 /**
-- 
2.53.0



^ permalink raw reply related

* [PATCH net-next v7 03/12] net: phylink: add phylink_release_pcs() to externally release a PCS
From: Christian Marangi @ 2026-06-15 12:29 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Christian Marangi,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-1-ansuelsmth@gmail.com>

Add phylink_release_pcs() to externally release a PCS from a phylink
instance. This can be used to handle case when a single PCS needs to be
removed and the phylink instance needs to be refreshed.

On calling phylink_release_pcs(), the PCS will be removed from the
phylink internal PCS list and the phylink supported_interfaces value is
reparsed with the remaining PCS interfaces.

Also a phylink resolve is triggered to handle the PCS removal.

The flag force_major_config is set to make phylink resolve reconfigure
the interface (even if it didn't change).
This is needed to handle the special case when the current PCS used
by phylink is removed and a major_config is needed to propagae the
configuration change. With this option enabled we also force mac_config
even if the PHY link is not up for the in-band case.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 drivers/net/phy/phylink.c | 53 +++++++++++++++++++++++++++++++++++++++
 include/linux/phylink.h   |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index cb07184ce82f..ca4dad4b140a 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -158,6 +158,7 @@ static const phy_interface_t phylink_sfp_interface_preference[] = {
 static DECLARE_PHY_INTERFACE_MASK(phylink_sfp_interfaces);
 
 static void phylink_run_resolve(struct phylink *pl);
+static void phylink_pcs_disable(struct phylink_pcs *pcs);
 
 /**
  * phylink_set_port_modes() - set the port type modes in the ethtool mask
@@ -918,6 +919,58 @@ static void phylink_resolve_an_pause(struct phylink_link_state *state)
 	}
 }
 
+/**
+ * phylink_release_pcs - Removes a PCS from the phylink PCS available list
+ * @pcs: a pointer to the phylink_pcs struct to be released
+ *
+ * This function release a PCS from the phylink PCS available list if
+ * actually in use. It also refreshes the supported interfaces of the
+ * phylink instance by copying the supported interfaces from the phylink
+ * conf and merging the supported interfaces of the remaining available PCS
+ * in the list and trigger a resolve.
+ */
+void phylink_release_pcs(struct phylink_pcs *pcs)
+{
+	struct phylink *pl;
+
+	ASSERT_RTNL();
+
+	pl = pcs->phylink;
+	if (!pl)
+		return;
+
+	list_del(&pcs->list);
+	pcs->phylink = NULL;
+
+	mutex_lock(&pl->state_mutex);
+
+	/* Check if we are removing the PCS currently
+	 * in use by phylink. If this is the case,
+	 * force phylink resolve to reconfigure the interface
+	 * mode, disable the current PCS and set the
+	 * phylink PCS to NULL.
+	 */
+	if (pl->pcs == pcs) {
+		phylink_pcs_disable(pl->pcs);
+
+		pl->force_major_config = true;
+		pl->pcs = NULL;
+	}
+
+	mutex_unlock(&pl->state_mutex);
+
+	/* Refresh supported interfaces */
+	phy_interface_copy(pl->supported_interfaces,
+			   pl->config->supported_interfaces);
+	list_for_each_entry(pcs, &pl->pcs_list, list)
+		phy_interface_or(pl->supported_interfaces,
+				 pl->supported_interfaces,
+				 pcs->supported_interfaces);
+
+	phylink_run_resolve(pl);
+}
+EXPORT_SYMBOL_GPL(phylink_release_pcs);
+
 static unsigned int phylink_pcs_inband_caps(struct phylink_pcs *pcs,
 				    phy_interface_t interface)
 {
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index ca9dfc142388..15e6b1a39dfe 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -751,6 +751,8 @@ void phylink_disconnect_phy(struct phylink *);
 int phylink_set_fixed_link(struct phylink *,
 			   const struct phylink_link_state *);
 
+void phylink_release_pcs(struct phylink_pcs *pcs);
+
 void phylink_mac_change(struct phylink *, bool up);
 void phylink_pcs_change(struct phylink_pcs *, bool up);
 
-- 
2.53.0



^ permalink raw reply related

* [PATCH net-next v7 00/12] net: pcs: Introduce support for fwnode PCS
From: Christian Marangi @ 2026-06-15 12:29 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Christian Marangi,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm

This series introduce a most awaited feature that is correctly
provide PCS with fwnode without having to use specific export symbol
and additional handling of PCS in phylink.

At times there were 2 different implementation (this and the one
from Sean) but Sean agreed that this can be picked and used in favor
of his implementation as long as his case with race condition is
correctly handled.

---
First the PCS fwnode:

The concept is to implement a producer-consumer API similar to other
subsystem like clock or PHY.

That seems to be the best solution to the problem as PCS driver needs
to be detached from phylink and implement a simple way to provide a
PCS while maintaining support for probe defer or driver removal.

To keep the implementation simple, the PCS driver devs needs some
collaboration to correctly implement this. This is O.K. as helper
to correctly implement this are provided hence it's really a matter
of following a pattern to correct follow removal of a PCS driver.

A PCS provider have to implement and call fwnode_pcs_add_provider() in
probe function and define an xlate function to define how the PCS
should be provided based on the requested interface and phandle spec
defined in fwnode (based on the #pcs-cells)

fwnode_pcs_get() is provided to provide a specific PCS declared in
fwnode at index.

A simple xlate function is provided for simple single PCS
implementation, fwnode_pcs_simple_get.

A PCS provider on driver removal should first call
fwnode_pcs_del_provider() to delete itself as a provider and then
release the PCS from phylink with phylink_release_pcs() under rtnl
lock.

---
Second PCS handling in phylink:

We have the PCS problem for the only reason that in initial
implementation, we permitted way too much flexibility to MAC driver
and things started to deviate. At times we couldn't think SoC
would start to put PCS outside the MAC hence it was OK to assume
they would live in the same driver. With the introduction of
10g in more consumer devices, we are observing a rapid growth
of this pattern with multiple PCS external to MAC.

To put a stop on this, the only solution is to give back to phylink
control on PCS handling and enforce more robust supported interface
definition from both MAC and PCS side.

It's suggested to read patch 0003 of this series for more info, here
a brief explaination of the idea:

This series introduce handling of PCS in phylink and try to deprecate
.mac_select_pcs.

Phylink now might contain a linked list of available PCS and
those will be used for PCS selection on phylink_major_config.

MAC driver needs to define pcs_interfaces mask in phylink_config
for every interface that needs a dedicated PCS.

These PCS needs to be provided to phylink at phylink_create time
by setting the .fill_available_pcs and .num_possible_pcs in phylink_config.
Helpers to parse PCS from fwnode are provided
fwnode_phylink_pcs_count() that will return the count of PCS entries
described in the firmware node and fwnode_phylink_pcs_parse() that will
fill a preallocated array of PCS pointer with the actual available PCS
(ignoring the one that still needs to be probed).

phylink_create() will fill the internal PCS list with the passed
array of PCS. phylink_major_config and other user of .mac_select_pcs
are adapted to make use of this new PCS list.

The supported interface value is also moved internally to phylink
struct. This is to handle late removal and addition of PCS.
(the bonus effect to this is giving phylink a clear idea of what
is actually supported by the MAC and his constraint with PCS)

The supported interface mask in phylink is done by OR the
supported_interfaces in phylink_config with every PCS in PCS list.

PCS removal is supported by forcing a mac_config, refresh the
supported interfaces and run a phy_resolve().

PCS late addition is supported by introducing a global notifier
for PCS provider. If a phylink have the pcs_interfaces mask not
zero, it's registered to this notifier.

PCS provider will emit a global PCS add event to signal any
interface that a new PCS might be available.

The function will then check if the PCS is related to the MAC
fwnode and add it accordingly.

A user for this new implementation is provided as an Airoha PCS
driver. This was also tested downstream with the IPQ95xx QCOM SoC
and with the help of Daniel also on the various Mediatek MT7988
SoC with both SFP cage implementation and DSA attached.

Lots of tests were done with driver unbind/bind and with interface
up/down also by adding print to make sure major_config_fail gets
correctly triggered and reset once the PCS comes back.

The dedicated commits have longer description on the implementation
so it's suggested to also check there for additional info.

It's worth to mention that OpenWrt is currently using this on
Mediatek SoC and QCOM ipq807x/ipq60xx/ipq50xx and Airoha are
already ported in staging tree for testing.

---

Changes v7:
- Address all the bug from the Sashiko bot
- Rename .num_available_pcs to .num_possible_pcs
- Link PCS in phylink_create()
- Correctly unregister the notifier on phylink_destroy()
- Introduce fwnode_phylink_pcs_count()
- Better handle locking in phylink for PCS handling
- Better handle unavailable PCS at phylink_create() time
- Improve Documentation file
- Other minor fixes to address suggestion from bot
- Rebase on top of net-next
Changes v6:
- Rebase on top of net-next
- Add Documentation files
- Add fw_devlink patch
- Fix some comments typo
- Rework the airoha_eth.c implementation with new multi serdes code
- Extend PCS code with PCIe and USB support
- Align schema to new property
Changes v5:
- Rebase on top of net-next
- Use the new force_major_config
- Reword some comments and commit description
- Return -ENODEV instead of -EPROBE_DEFER to perevent race condition
- Drop phy_interface_copy patch (Russell pushed an equivalent version)
Changes v4:
- Move patch 0002 phy_interface_copy to 0002 (fix bisectability
  problem)
- Address review from Lorenzo for Airoha ethernet driver
- Fix kdoc error with missing Return (actually missing : before Return)
- Fix UNMET dependency reported error for CONFIG_FWNODE_PCS
- Revert to pcs.c instead of core.c (due to name conflict with other kmod)
- Fix clang compilation error for Airoha PCS driver
- Add missing inline function to pcs.h function
Changes v3:
- Out of RFC
- Fix various spelling mistake
- Drop circular dependency patch
- Complete Airoha Ethernet phylink integration
- Introduce .pcs_link_down PCS OP
Changes v2:
- Switch to fwnode
- Implement PCS provider notifier
- Better split changes
- Move supported_interfaces to phylink
- Add circular dependency patch
- Rework handling with indirect addition/removal and
  trigger of phylink_resolve()

Christian Marangi (12):
  net: phylink: keep and use MAC supported_interfaces in phylink struct
  net: phylink: introduce internal phylink PCS handling
  net: phylink: add phylink_release_pcs() to externally release a PCS
  net: pcs: implement Firmware node support for PCS driver
  net: phylink: support late PCS provider attach
  net: Document PCS subsystem
  MAINTAINERS: add myself as PCS subsystem maintainer
  of: property: fw_devlink: Add support for "pcs-handle"
  net: phylink: add .pcs_link_down PCS OP
  dt-bindings: net: pcs: Document support for Airoha Ethernet PCS
  net: pcs: airoha: add PCS driver for Airoha AN7581 SoC
  net: airoha: add phylink support

 .../bindings/net/pcs/airoha,pcs.yaml          |  261 ++
 Documentation/networking/index.rst            |    1 +
 Documentation/networking/pcs.rst              |  229 ++
 MAINTAINERS                                   |    9 +
 drivers/net/ethernet/airoha/Kconfig           |    1 +
 drivers/net/ethernet/airoha/airoha_eth.c      |  161 +-
 drivers/net/ethernet/airoha/airoha_eth.h      |    3 +
 drivers/net/ethernet/airoha/airoha_regs.h     |   12 +
 drivers/net/pcs/Kconfig                       |    8 +
 drivers/net/pcs/Makefile                      |    3 +
 drivers/net/pcs/airoha/Kconfig                |   12 +
 drivers/net/pcs/airoha/Makefile               |    7 +
 drivers/net/pcs/airoha/pcs-airoha-common.c    | 1318 +++++++++++
 drivers/net/pcs/airoha/pcs-airoha.h           | 1309 +++++++++++
 drivers/net/pcs/airoha/pcs-an7581.c           | 2093 +++++++++++++++++
 drivers/net/pcs/pcs.c                         |  257 ++
 drivers/net/phy/phylink.c                     |  337 ++-
 drivers/of/property.c                         |    2 +
 include/linux/pcs/pcs-provider.h              |   41 +
 include/linux/pcs/pcs.h                       |  135 ++
 include/linux/phylink.h                       |   20 +
 21 files changed, 6191 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/pcs/airoha,pcs.yaml
 create mode 100644 Documentation/networking/pcs.rst
 create mode 100644 drivers/net/pcs/airoha/Kconfig
 create mode 100644 drivers/net/pcs/airoha/Makefile
 create mode 100644 drivers/net/pcs/airoha/pcs-airoha-common.c
 create mode 100644 drivers/net/pcs/airoha/pcs-airoha.h
 create mode 100644 drivers/net/pcs/airoha/pcs-an7581.c
 create mode 100644 drivers/net/pcs/pcs.c
 create mode 100644 include/linux/pcs/pcs-provider.h
 create mode 100644 include/linux/pcs/pcs.h

-- 
2.53.0

^ permalink raw reply

* Re: [PATCH v3 2/2] clk: amlogic: Add A9 peripherals clock controller driver
From: Jerome Brunet @ 2026-06-15 12:29 UTC (permalink / raw)
  To: Jian Hu
  Cc: Jian Hu via B4 Relay, Neil Armstrong, Michael Turquette,
	Stephen Boyd, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Xianwei Zhao, Kevin Hilman, Martin Blumenstingl, linux-amlogic,
	linux-clk, devicetree, linux-kernel, linux-arm-kernel
In-Reply-To: <bfe92bbe-5325-4497-b79f-10c7a6e1ed5b@amlogic.com>

On lun. 15 juin 2026 at 19:25, Jian Hu <jian.hu@amlogic.com> wrote:

> On 6/10/2026 8:49 PM, Jerome Brunet wrote:
>> [ EXTERNAL EMAIL ]
>>
>> On mer. 10 juin 2026 at 16:14, Jian Hu via B4 Relay <devnull+jian.hu.amlogic.com@kernel.org> wrote:
>>
>>> From: Jian Hu <jian.hu@amlogic.com>
>>>
>>> Add the peripherals clock controller driver for the Amlogic A9 SoC family.
>>>
>>> Signed-off-by: Jian Hu <jian.hu@amlogic.com>
>>> ---
>>>   drivers/clk/meson/Kconfig          |   15 +
>>>   drivers/clk/meson/Makefile         |    1 +
>>>   drivers/clk/meson/a9-peripherals.c | 1925 ++++++++++++++++++++++++++++++++++++
>>>   3 files changed, 1941 insertions(+)
>>>
>
> [ ... ]
>
>>> +
>>> +/* Channel 6 is unconnected. */
>>> +static u32 a9_glb_parents_val_table[] = { 0, 1, 2, 3, 4, 5, 7 };
>>> +static struct clk_regmap a9_dspa;
>> What is this ?
>
>
> The peripheral clock definitions are ordered by register offset.
>
> dspa is one of the parents of the glb clock, while the dsp clock registers
> are located after the GLB clock registers.
>
> Since glb references a9_dspa before its full definition appears, the
> declaration
>
> static struct clk_regmap a9_dspa;
>
> is added as a forward declaration to satisfy the compiler.
>
>
> Would it make sense to relax the register-offset ordering in this case?
>

I don't think we ever enforced such ordering (or any other ordering) in
the clock driver, so yes please.


> By defining the DSP clock before the GLB clock, we could remove the forward
> declaration of a9_dspa.

Unless it is absolutely necessary, please avoid forward declaration.

Declare what is needed first, keep related things together and use your
best judgement ... IOW, make it easy for me to review ;) 

>
>>> +
>>> +static const struct clk_parent_data a9_glb_parents[] = {
>>> +};

[...]

>>> +
>>> +static struct clk_regmap a9_vclk_div2_en = {
>>> +     .data = &(struct clk_regmap_gate_data){
>>> +             .offset = VID_CLK_CTRL,
>>> +             .bit_idx = 1,
>>> +     },
>>> +     .hw.init = CLK_HW_INIT_HW("vclk_div2_en", &a9_vclk.hw,
>>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>>> +};
>> Looks to me all this div_en / div repeating pattern would be easier to review
>> with tiny macro .
>
>
> Good point.
>
> I tried to reduce the repeated div_en/div pattern using a helper macro.
>
> It keeps the relationship between gate and fixed-factor clock more compact
> and easier to review.
>
> After using the helper macro, the div_en/div code can be simplified to the
> following:
>
> #define A9_VCLK(_name, _reg, _bit, _div, _parent)        \
> struct clk_regmap a9_##_name##_en = {      \
                       ^- not strictly necessary, a touch too agressive
                        

>         .data = &(struct clk_regmap_gate_data){          \
>                 .offset = _reg,      \
>                 .bit_idx = _bit,       \
>         },       \
>         .hw.init = &(struct clk_init_data) {           \
>                 .name = #_name "_en",      \
>                 .ops = &clk_regmap_gate_ops,           \
>                 .parent_hws = (const struct clk_hw *[]) { _parent },    \
>                 .num_parents = 1,      \
>                 .flags = CLK_SET_RATE_PARENT,      \
>         },       \
> };       \
>       \
> struct clk_fixed_factor a9_##_name = {       \
>         .mult = 1,       \
>         .div = _div,       \
>         .hw.init = &(struct clk_init_data){          \
>                 .name = #_name,      \
>                 .ops = &clk_fixed_factor_ops,          \
>                 .parent_hws = (const struct clk_hw *[]) {      \
>                         &a9_##_name##_en.hw          \
>                 },       \
>                 .num_parents = 1,      \
>                 .flags = CLK_SET_RATE_PARENT,      \
>         },       \
> };       \
>
> static A9_VCLK(vclk_div2, VID_CLK_CTRL, 1, 2, &a9_vclk.hw);
> static A9_VCLK(vclk_div4, VID_CLK_CTRL, 2, 4, &a9_vclk.hw);
> static A9_VCLK(vclk_div6, VID_CLK_CTRL, 3, 6, &a9_vclk.hw);
> static A9_VCLK(vclk_div6, VID_CLK_CTRL, 4, 12, &a9_vclk.hw);
> static A9_VCLK(vclk2_div2, VIID_CLK_CTRL, 1, 2, &a9_vclk2.hw);
> static A9_VCLK(vclk2_div4, VIID_CLK_CTRL, 2, 4, &a9_vclk2.hw);
> static A9_VCLK(vclk2_div6, VIID_CLK_CTRL, 3, 6, &a9_vclk2.hw);
> static A9_VCLK(vclk2_div6, VIID_CLK_CTRL, 4, 12, &a9_vclk2.hw);
>
>
> If you think splitting it further into separate helper macros would improve
> readability.

One clock per macro please. Hidding 2 declaration is recipe for
disaster. For ex, here the first one is static, the 2nd is not 

>
> I can do that as well.
>



^ permalink raw reply

* [PATCH] net: stmmac: loongson1: Use dev_err_probe()
From: Keguang Zhang via B4 Relay @ 2026-06-15 12:24 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue
  Cc: linux-mips, netdev, linux-stm32, linux-arm-kernel, linux-kernel,
	Keguang Zhang

From: Keguang Zhang <keguang.zhang@gmail.com>

Use dev_err_probe() for the missing match data case to simplify
error handling.

Signed-off-by: Keguang Zhang <keguang.zhang@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-loongson1.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson1.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson1.c
index de9aba756aac..ec34adb63f61 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson1.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson1.c
@@ -176,10 +176,8 @@ static int ls1x_dwmac_probe(struct platform_device *pdev)
 				     "Unable to find syscon\n");
 
 	data = of_device_get_match_data(&pdev->dev);
-	if (!data) {
-		dev_err(&pdev->dev, "No of match data provided\n");
-		return -EINVAL;
-	}
+	if (!data)
+		return dev_err_probe(&pdev->dev, -EINVAL, "No of match data provided\n");
 
 	dwmac = devm_kzalloc(&pdev->dev, sizeof(*dwmac), GFP_KERNEL);
 	if (!dwmac)

---
base-commit: ec039126b7fac4e3af35ebccaa7c6f9b6875ba81
change-id: 20260602-dwmac-loongson1-5e1b9dfc3c62

Best regards,
-- 
Keguang Zhang <keguang.zhang@gmail.com>




^ permalink raw reply related

* [PATCH v2] arm64: errata: Handle Apple WFI State Loss
From: Yureka Lilian @ 2026-06-15 12:21 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, linux-kernel, asahi, Sasha Finkelstein,
	Yureka Lilian

Apple Silicon CPUs can lose register state in WFI, leading to crashes
in the idle loop early in the boot process.
This applies to any previous Apple Silicon CPUs too, but is worked
around by configuring the WFI mode in SYS_IMP_APL_CYC_OVRD sysreg
during m1n1's chickens setup.
This workaround no longer exists since M4.

Add a workaround capability for replacing wfi and wfit with nop, and
an erratum to enable it on the affected CPUs if the workaround using the
sysreg is not already applied. Leave the decision whether the sysreg
workaround can be used up to the earlier parts of the boot chain which
already configure the Apple Silicon chicken bits.

This alternative has to be applied in early boot, since otherwise some
cores might enter the idle loop before apply_alternatives_all() is run.

Reviewed-by: Sasha Finkelstein <k@chaosmail.tech>
Signed-off-by: Yureka Lilian <yureka@cyberchaos.dev>
---
Changes since v1:
Restricted the erratum to EL2 only, since in EL1 we'd expect the
hypervisor to trap WFI and handle the erratum.

Tested on M4 and M4 Pro (which now sometimes nondeterministically
crash later during boot).
Successfully booted on M3 Max with the SYS_IMP_APL_CYC_OVRD
workaround disabled in the bootloader, as well as A18 Pro (which,
like M4 / M4 Pro, doesn't have SYS_IMP_APL_CYC_OVRD).

There is probably a better place for the SYS_IMP_APL_CYC_OVRD
defines, which I currently put in the middle of cpu_errata.c, but I
wouldn't know where.
---
 arch/arm64/Kconfig               | 12 ++++++++++++
 arch/arm64/include/asm/barrier.h | 19 ++++++++++++++++---
 arch/arm64/kernel/cpu_errata.c   | 21 +++++++++++++++++++++
 arch/arm64/tools/cpucaps         |  1 +
 4 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b3afe0688919..8c8ff069856f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -453,6 +453,18 @@ config AMPERE_ERRATUM_AC04_CPU_23
 
 	  If unsure, say Y.
 
+config APPLE_ERRATUM_WFI_STATE
+	bool "Apple Silicon: WFI loses state"
+	default y
+	help
+	  This option adds an alternative code sequence to work around some
+	  Apple Silicon CPUs losing register state during wfi and wfit
+	  instructions.
+
+	  As a workaround, the wfi and wfit instructions are replaced with nop
+	  operations via the alternative framework if an affected CPU is
+	  detected.
+
 config ARM64_WORKAROUND_CLEAN_CACHE
 	bool
 
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 9495c4441a46..f72eddc7c434 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -20,9 +20,22 @@
 #define wfe()		asm volatile("wfe" : : : "memory")
 #define wfet(val)	asm volatile("msr s0_3_c1_c0_0, %0"	\
 				     : : "r" (val) : "memory")
-#define wfi()		asm volatile("wfi" : : : "memory")
-#define wfit(val)	asm volatile("msr s0_3_c1_c0_1, %0"	\
-				     : : "r" (val) : "memory")
+#define wfi()							\
+	do {							\
+		asm volatile(					\
+		ALTERNATIVE("wfi",				\
+			    "nop",				\
+			    ARM64_WORKAROUND_WFI_STATE)		\
+		: : : "memory");				\
+	} while (0)
+#define wfit(val)						\
+	do {							\
+		asm volatile(					\
+		ALTERNATIVE("msr s0_3_c1_c0_1, %0",		\
+			    "nop",				\
+			    ARM64_WORKAROUND_WFI_STATE)		\
+		: : "r" (val) : "memory");			\
+	} while (0)
 
 #define isb()		asm volatile("isb" : : : "memory")
 #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 1995e1198648..8c9a194eddc4 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -309,6 +309,19 @@ static void cpu_enable_impdef_pmuv3_traps(const struct arm64_cpu_capabilities *_
 	sysreg_clear_set_s(SYS_HACR_EL2, 0, BIT(56));
 }
 
+#ifdef CONFIG_APPLE_ERRATUM_WFI_STATE
+static bool has_apple_erratum_wfi_state(const struct arm64_cpu_capabilities *entry, int scope)
+{
+#define SYS_IMP_APL_CYC_OVRD   sys_reg(3, 5, 15, 5, 0)
+#define CYC_OVRD_WFI_MODE_MASK GENMASK(26, 24)
+	if (read_cpuid_implementor() != ARM_CPU_IMP_APPLE)
+		return false;
+	if ((read_sysreg(CurrentEL) >> 2) != 2)
+		return false;
+	return FIELD_GET(CYC_OVRD_WFI_MODE_MASK, read_sysreg_s(SYS_IMP_APL_CYC_OVRD)) != 2;
+}
+#endif
+
 #ifdef CONFIG_ARM64_WORKAROUND_REPEAT_TLBI
 static const struct arm64_cpu_capabilities arm64_repeat_tlbi_list[] = {
 #ifdef CONFIG_QCOM_FALKOR_ERRATUM_1009
@@ -1009,6 +1022,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.matches = has_impdef_pmuv3,
 		.cpu_enable = cpu_enable_impdef_pmuv3_traps,
 	},
+#ifdef CONFIG_APPLE_ERRATUM_WFI_STATE
+	{
+		.desc = "Apple WFI loses state",
+		.capability = ARM64_WORKAROUND_WFI_STATE,
+		.type = ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU,
+		.matches = has_apple_erratum_wfi_state,
+	},
+#endif
 	{
 	}
 };
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 9b85a84f6fd4..bbf8c15d79b0 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -128,3 +128,4 @@ WORKAROUND_REPEAT_TLBI
 WORKAROUND_SPECULATIVE_AT
 WORKAROUND_SPECULATIVE_SSBS
 WORKAROUND_SPECULATIVE_UNPRIV_LOAD
+WORKAROUND_WFI_STATE

---
base-commit: c425609d6ac4012c8bbf01ec2e10e801b1923a7b
change-id: 20260614-wfi-erratum-7a9f305f601f

Best regards,
--  
Yureka Lilian <yureka@cyberchaos.dev>



^ permalink raw reply related

* Re: i.MX95: EdgeLock Enclave secure storage
From: Fabio Estevam @ 2026-06-15 12:13 UTC (permalink / raw)
  To: Frieder Schrempf
  Cc: Pankaj Gupta,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Peng Fan,
	Stefano Babic, Frank Li
In-Reply-To: <b7c92302-d675-4610-a815-b353ff365e36@kontron.de>

Hi Frieder,

On Mon, Jun 15, 2026 at 4:18 AM Frieder Schrempf
<frieder.schrempf@kontron.de> wrote:

> There is no upstream support for OCOTP access via ELE. The
> imx-ocotp-ele.c driver (despite its name) does not currently use the ELE
> but the FSB to access the fuses (and is therefore limited to read-only
> access).
>
> I have some local WIP to add ELE support for the OCOTP driver. I think I
> can post it soonish.

Thanks for the clarification, appreciate it.


^ permalink raw reply

* Re: [PATCH RFC 4/9] net: stmmac: qcom-ethqos: add per-platform NOC clock voting
From: Konrad Dybcio @ 2026-06-15 12:13 UTC (permalink / raw)
  To: Mohd Ayaan Anwar, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Richard Cochran, Bjorn Andersson, Konrad Dybcio,
	Maxime Coquelin, Alexandre Torgue, Russell King
  Cc: linux-arm-msm, netdev, devicetree, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20260612-shikra_ethernet-v1-4-f0f4a1d19929@oss.qualcomm.com>

On 6/11/26 8:37 PM, Mohd Ayaan Anwar wrote:
> Some SoCs gate the EMAC's path to the System NOC behind dedicated clocks
> that must be enabled before the DMA can reach memory.  Add
> ethqos_noc_clk_cfg and the corresponding fields in the driver-data and
> runtime structs so each compatible can declare its own set with per-clock
> rates.  The clocks are acquired during probe and enabled/disabled
> alongside the existing link clock in ethqos_clks_config().

Sounds like we should use an OPP table instead, we can't just do 
set_rate() on qcom, as that will not propagate the required perf
state to the clock controller's supplier power domain (i.e. VDDCX)

Konrad


^ permalink raw reply

* Re: [PATCH 1/8] mm: Add ptep_try_set() for lockless empty-slot installs
From: David Hildenbrand (Arm) @ 2026-06-15 11:49 UTC (permalink / raw)
  To: Tejun Heo, Will Deacon
  Cc: David Vernet, Andrea Righi, Changwoo Min, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Peter Zijlstra, Catalin Marinas,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andrew Morton, Mike Rapoport, Emil Tsalapatis, sched-ext, bpf,
	x86, linux-arm-kernel, linux-mm, linux-kernel
In-Reply-To: <ai8PO0NThJeuQYzy@slm.duckdns.org>

On 6/14/26 22:29, Tejun Heo wrote:
> Hello,
> 
> On Sun, Jun 14, 2026 at 10:28:02AM +0100, Will Deacon wrote:
>>> +/*
>>> + * Note: strictly-zero compare is narrower than pte_none(), but the gap is
>>> + * harmless: a fresh kernel PTE has no software bits set.
>>> + */
>>
>> This comment really confused me :/
>>
>> What is a "fresh" kernel PTE and why do you specifically call out "software
>> bits" if the CAS requires all 64 bits to be 0? Why is that narrower than
>> pte_none() given that pte_none() for arm64 is:
>>
>> #define pte_none(pte)           (!pte_val(pte))
> 
> Yeah, that's complete non-sense for arm. The comment is about x86's
> pte_none() excluding DIRTY and ACCESSED due to an erratum when testing none
> and how that doesn't matter here. This shouldn't have been copied to arm.
> I'll send a patch to remove that.

Is BPF maybe picking up patches from other subsystems up too early without
waiting for acks?

-- 
Cheers,

David


^ permalink raw reply

* Re: [RESEND PATCH v1] spi: uniphier: Fix completion initialization order before devm_request_irq()
From: Mark Brown @ 2026-06-15 11:45 UTC (permalink / raw)
  To: Kunihiko Hayashi
  Cc: linux-spi, linux-arm-kernel, linux-kernel, Sangyun Kim,
	Kyungwook Boo, Masami Hiramatsu
In-Reply-To: <20260615023415.190655-1-hayashi.kunihiko@socionext.com>

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On Mon, Jun 15, 2026 at 11:34:15AM +0900, Kunihiko Hayashi wrote:
> The driver calls devm_request_irq() before initializing the completion
> used by the interrupt handler. Because the interrupt may occur immediately
> after devm_request_irq(), the handler may execute before init_completion().
> 
> This may result in calling complete() on an uninitialized completion,
> causing undefined behavior. This has been observed with KASAN.
> 
> Fix this by initializing the completion before registering the IRQ.

I thought you were going to rebase this, why resend the same version?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v14 10/44] arm64: RMI: Add support for SRO
From: Steven Price @ 2026-06-15 11:45 UTC (permalink / raw)
  To: Dan Williams (nvidia), Gavin Shan, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Shanker Donthineni,
	Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
	WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <6a2c91398fad5_a003b10027@djbw-dev.notmuch>

Hi Dan,

On 13/06/2026 00:07, Dan Williams (nvidia) wrote:
> Steven Price wrote:
> [..]
>>> alloc_pages_exact() will fail if the requested size exceeds the maximal
>>> allowed
>>> size (1 << MAX_PAGE_ORDER). The maximal size is usually smaller than
>>> PUD_SIZE
>>> but PUD_SIZE is allowed by the RMM.
>>
>> This is an area where to be honest I'm really not sure what to do.
>> Technically the RMM is allowed to ask for a contiguous range of 512GB
>> pages (on a 4K system - larger with larger page sizes) - but clearly no
>> real OS is going to be able to provide anything like that.
>>
>> In practise we don't expect the RMM to do anything so crazy. It's not
>> really clear to be whether even 2MB (PMD_SIZE) is needed. But the spec
>> is written to be generic.
>>
>> So my current approach is to calculate the required size and pass it
>> into alloc_pages_exact(). For "stupidly large" values this will fail and
>> Linux just doesn't support an RMM which attempts this. If there is ever
>> a usecase which needs this then we'd need to find a different method of
>> providing the memory (most likely some form of carveout to avoid
>> fragmentation). But my view is we should wait for that usecase to be
>> identified first.
> 
> Just some comparison comments as I am also going through the TDX patches
> which enable "Extension SEAMCALLs". These new SEAMCALLs are similar to
> the SRO mechanism [1].

Looks like at least at the moment it's much more one-way than the SRO
mechanism - there's no reclaim mechanism (yet).

> TDX asks for an upfront delegation of memory at init time using
> alloc_contig_pages() that is never returned until entire module is
> shutdown. alloc_contig_pages() is not subject to the MAX_ORDER limit,
> but not sure that alloc_contig_pages() is suitable for small+dynamic
> runtime memory add / release that SRO potentially wants to do?

Yeah I'm not sure quite what is best. I expect the RMM to only request
contiguous memory for very small allocations to use as hardware page
tables. It's an issue I'm trying to work through that the specification
doesn't provide any guidance for what sort of allocations the host
should expect to provide.

> Does SRO always balance the size of RMI_OP_MEM_REQ_DONATE with
> RMI_OP_MEM_REQ_RECLAIM, or might some donate requests be a one way
> donation like TDX? Just poking to see if there is a path to preallocate
> a pool vs the fine grained per-operation alloc/free.

The spec is unfortunately not prescriptive on this point. For an
operation which eventually fails, the expectation is that the RMM will
return all the memory that was provided (and exactly that memory). But
the specification doesn't actually require that.

The problem is that there are situations where a racing operation on
another CPU could trigger this to not happen. For example, a new page
table needs to be allocated to complete a map operation, but then a
racing operation on another CPU makes use of this page table (e.g due to
a map at a different address), the memory for the page table cannot be
returned even if the operation doesn't complete because it's in use from
the racing operation.

I don't believe the current RMM design will actually do this - but it's
not something we actually want to prevent in the spec.

Equally the expectation is that all the donated memory for a guest will
be returned when the guest is destroyed. But we don't have anything in
the spec to enforce this.

I don't particularly expect a pool to be that useful for the expected
memory allocation patterns as I expect SRO donations to be long lived.
We don't (yet at least) have a concept of donating memory just for
"scratch" memory during an operation. Although the SRO mechanism doesn't
rule that out.

Thanks,
Steve

^ permalink raw reply

* Re: [PATCH v3 2/2] clk: amlogic: Add A9 peripherals clock controller driver
From: Jian Hu @ 2026-06-15 11:25 UTC (permalink / raw)
  To: Jerome Brunet, Jian Hu via B4 Relay
  Cc: Neil Armstrong, Michael Turquette, Stephen Boyd, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Xianwei Zhao, Kevin Hilman,
	Martin Blumenstingl, linux-amlogic, linux-clk, devicetree,
	linux-kernel, linux-arm-kernel
In-Reply-To: <1jecieftme.fsf@starbuckisacylon.baylibre.com>


On 6/10/2026 8:49 PM, Jerome Brunet wrote:
> [ EXTERNAL EMAIL ]
>
> On mer. 10 juin 2026 at 16:14, Jian Hu via B4 Relay <devnull+jian.hu.amlogic.com@kernel.org> wrote:
>
>> From: Jian Hu <jian.hu@amlogic.com>
>>
>> Add the peripherals clock controller driver for the Amlogic A9 SoC family.
>>
>> Signed-off-by: Jian Hu <jian.hu@amlogic.com>
>> ---
>>   drivers/clk/meson/Kconfig          |   15 +
>>   drivers/clk/meson/Makefile         |    1 +
>>   drivers/clk/meson/a9-peripherals.c | 1925 ++++++++++++++++++++++++++++++++++++
>>   3 files changed, 1941 insertions(+)
>>

[ ... ]

>> +
>> +/* Channel 6 is unconnected. */
>> +static u32 a9_glb_parents_val_table[] = { 0, 1, 2, 3, 4, 5, 7 };
>> +static struct clk_regmap a9_dspa;
> What is this ?


The peripheral clock definitions are ordered by register offset.

dspa is one of the parents of the glb clock, while the dsp clock 
registers are located after the GLB clock registers.

Since glb references a9_dspa before its full definition appears, the 
declaration

static struct clk_regmap a9_dspa;

is added as a forward declaration to satisfy the compiler.


Would it make sense to relax the register-offset ordering in this case?

By defining the DSP clock before the GLB clock, we could remove the 
forward declaration of a9_dspa.

>> +
>> +static const struct clk_parent_data a9_glb_parents[] = {
>> +     { .fw_name = "xtal", },
>> +     { .hw = &a9_dspa.hw },
>> +     { .fw_name = "fdiv3", },
>> +     { .fw_name = "fdiv4", },
>> +     { .fw_name = "fdiv5", },
>> +     { .hw = &a9_isp.hw },
>> +     { .fw_name = "rtc", }
>> +};
>> +
>> +static A9_COMP_SEL(glb, GLB_CLK_CTRL, 9, 0x7, a9_glb_parents,
>> +                a9_glb_parents_val_table);
>> +static A9_COMP_DIV(glb, GLB_CLK_CTRL, 0, 7);
>> +static A9_COMP_GATE(glb, GLB_CLK_CTRL, 8, 0);
>> +
>> +static struct clk_regmap a9_usb_48m_dualdiv_in = {
>> +     .data = &(struct clk_regmap_gate_data) {
>> +             .offset = USB_CLK_CTRL,
>> +             .bit_idx = 31,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("usb_48m_dualdiv_in", &a9_usb_48m_pre.hw,
>> +                               &clk_regmap_gate_ops, 0),
> Same comment as on the AO controller


Ok, I will drop CLK_HW_INIT*, and apply the same change to the other 
clock definitions.


[ ... ]

>> +
>> +/* Channel 3 is unconnected. */
> You meant 3rd I guess but this is misleading and confusing with the
> table bellow. Channel 2 would be more appropriate I think, since those
> are 0-based.


You are right,  The 3rd channel is unconnected.  I will fix it in the 
next version.

>> +static u32 a9_can_pe_parents_val_table[] = { 0, 1, 3 };
>> +static const struct clk_parent_data a9_can_pe_parents[] = {
>> +     { .fw_name = "sys", },
>> +     { .fw_name = "xtal", },
>> +     { .fw_name = "fdiv5", }
>> +};
>> +


[ ... ]

>> +
>> +/*
>> + * Channel 3(ddr_dpll_pt_clk) is manged by the DDR module;
>> + * channel 12(msr_clk) is manged by clock measures module.
>> + * channel 16(audio_dac1_clk) is manged by audio module.
> Some why can't you expose those then ? gen clk is used for debugging
> AFAIK. The clock above are worth debugging I think
>
> Please be consistent with the CaSing.


For channel 3 (ddr_dpll_pt_clk), it is sourced from the DDR PLL clock 
controller rather than
this clock controller. I will expose it in the DT.

For channel 12 (msr_clk), it depends on the clock measurement 
configuration.
The measurement source must first be selected through freq_ctrl[20:26], and
only then does msr_clk become meaningful. In practice.
It is not a real clock but an output of the measurement logic, so it can 
not be exposed as a clock.

https://elixir.bootlin.com/linux/v7.1-rc7/source/drivers/soc/amlogic/meson-clk-measure.c#L809

For channel 16 (audio_dac1_clk), I confirmed with the clock hardware 
designer that
it is actually unconnected. It looks like there is an error in the 
documentation.

I'll also fix the casing to keep it consistent throughout the comments.
>> + * Channel 10, 11, 13, 14 are not connected.
>> + */
>> +static u32 a9_gen_parents_val_table[] = { 0, 1, 2, 4, 5, 6, 7, 8, 9, 15, 17, 18,
>> +                                       19, 20, 21, 22, 23, 24, 25, 26};
>> +static struct clk_regmap a9_vid_pll;
>> +
>> +static const struct clk_parent_data a9_gen_parents[] = {
>> +     { .fw_name = "xtal" },
>> +     { .fw_name = "rtc" },
>> +     { .fw_name = "sysplldiv16" },
>> +     { .hw = &a9_vid_pll.hw },
>> +     { .fw_name = "gp0" },
>> +     { .fw_name = "hifi1" },
>> +     { .fw_name = "hifi0" },
>> +     { .fw_name = "gp1" },
>> +     { .fw_name = "gp2" },
>> +     { .fw_name = "dsudiv16" },
>> +     { .fw_name = "cpudiv16" },
>> +     { .fw_name = "a78div16" },
>> +     { .fw_name = "fdiv2" },
>> +     { .fw_name = "fdiv2p5" },
>> +     { .fw_name = "fdiv3" },
>> +     { .fw_name = "fdiv4" },
>> +     { .fw_name = "fdiv5" },
>> +     { .fw_name = "fdiv7" },
>> +     { .fw_name = "mclk0" },
>> +     { .fw_name = "mclk1" }
>> +};
>> +
>> +static A9_COMP_SEL(gen, GEN_CLK_CTRL, 12, 0x1f, a9_gen_parents,
>> +                a9_gen_parents_val_table);
>> +static A9_COMP_DIV(gen, GEN_CLK_CTRL, 0, 11);
>> +static A9_COMP_GATE(gen, GEN_CLK_CTRL, 11, 0);
>> +


[ ... ]

>> +
>> +static struct clk_regmap a9_enc, a9_enc1;
> What is this again ?? and please come up with better names.


Same as the previous dspa clock declaration.

enc stands for encoder. I'll rename a9_enc to a9_encoder0 and a9_enc1 to 
a9_encoder1 in the next version.

>> +
>> +static const struct clk_parent_data a9_vid_lock_parents[] = {
>> +     { .fw_name = "xtal", },
>> +     { .hw = &a9_enc.hw },
>> +     { .hw = &a9_enc1.hw }
>> +};
>> +
>> +static A9_COMP_SEL(vid_lock, VID_LOCK_CLK_CTRL, 9, 0x7, a9_vid_lock_parents,
>> +                NULL);
>> +static A9_COMP_DIV(vid_lock, VID_LOCK_CLK_CTRL, 0, 7);
>> +static A9_COMP_GATE(vid_lock, VID_LOCK_CLK_CTRL, 8, 0);
>> +
>> +static const struct clk_parent_data a9_vdin_meas_parents[] = {
>> +     { .fw_name = "xtal", },
>> +     { .fw_name = "fdiv4", },
>> +     { .fw_name = "fdiv3", },
>> +     { .fw_name = "fdiv5", }
>> +};
>> +
>> +static A9_COMP_SEL(vdin_meas, VDIN_MEAS_CLK_CTRL, 9, 0x7, a9_vdin_meas_parents,
>> +                NULL);
>> +static A9_COMP_DIV(vdin_meas, VDIN_MEAS_CLK_CTRL, 0, 7);
>> +static A9_COMP_GATE(vdin_meas, VDIN_MEAS_CLK_CTRL, 8, 0);
>> +
>> +static struct clk_regmap a9_vid_pll_div = {
>> +     .data = &(struct meson_vid_pll_div_data){
>> +             .val = {
>> +                     .reg_off = VID_PLL_CLK_DIV,
>> +                     .shift   = 0,
>> +                     .width   = 15,
>> +             },
>> +             .sel = {
>> +                     .reg_off = VID_PLL_CLK_DIV,
>> +                     .shift   = 16,
>> +                     .width   = 2,
>> +             },
>> +     },
>> +     .hw.init = CLK_HW_INIT_FW_NAME("vid_pll_div", "hdmiout2",
>> +                                    &meson_vid_pll_div_ro_ops, 0),
>> +};
>> +
>> +static struct clk_regmap a9_vid_pll_sel = {
>> +     .data = &(struct clk_regmap_mux_data){
>> +             .offset = VID_PLL_CLK_DIV,
>> +             .mask = 0x1,
>> +             .shift = 18,
>> +     },
>> +     .hw.init = CLK_HW_INIT_PARENTS_DATA("vid_pll_sel",
>> +                     ((const struct clk_parent_data []) {
>> +                             { .hw = &a9_vid_pll_div.hw },
>> +                             { .fw_name = "hdmiout2" }
>> +                     }), &clk_regmap_mux_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vid_pll = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_PLL_CLK_DIV,
>> +             .bit_idx = 19,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vid_pll", &a9_vid_pll_sel.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vid_pll_vclk = {
>> +     .data = &(struct clk_regmap_mux_data){
>> +             .offset = HDMI_CLK_CTRL,
>> +             .mask = 0x1,
>> +             .shift = 15,
>> +     },
>> +     .hw.init = CLK_HW_INIT_PARENTS_DATA("vid_pll_vclk",
>> +                     ((const struct clk_parent_data []) {
>> +                             { .hw = &a9_vid_pll.hw },
>> +                             { .fw_name = "hdmipix" }
>> +                     }), &clk_regmap_mux_ops, 0),
>> +};
>> +
>> +static const struct clk_parent_data a9_vclk_parents[] = {
>> +     { .hw = &a9_vid_pll_vclk.hw },
>> +     { .fw_name = "pix0", },
>> +     { .fw_name = "vid1", },
>> +     { .fw_name = "pix1", },
>> +     { .fw_name = "fdiv3", },
>> +     { .fw_name = "fdiv4", },
>> +     { .fw_name = "fdiv5", },
>> +     { .fw_name = "vid2", }
>> +};
>> +
>> +static struct clk_regmap a9_vclk_sel = {
>> +     .data = &(struct clk_regmap_mux_data){
>> +             .offset = VID_CLK_CTRL,
>> +             .mask = 0x7,
>> +             .shift = 16,
>> +     },
>> +     .hw.init = CLK_HW_INIT_PARENTS_DATA("vclk_sel", a9_vclk_parents,
>> +                     &clk_regmap_mux_ops, 0),
>> +};
>> +
>> +static struct clk_regmap a9_vclk_in = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_DIV,
>> +             .bit_idx = 16,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk_in", &a9_vclk_sel.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk_div = {
>> +     .data = &(struct clk_regmap_div_data){
>> +             .offset = VID_CLK_DIV,
>> +             .shift = 0,
>> +             .width = 8,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div", &a9_vclk_in.hw,
>> +                               &clk_regmap_divider_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL,
>> +             .bit_idx = 19,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk", &a9_vclk_div.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk_div1_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL,
>> +             .bit_idx = 0,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div1_en", &a9_vclk.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk_div2_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL,
>> +             .bit_idx = 1,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div2_en", &a9_vclk.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
> Looks to me all this div_en / div repeating pattern would be easier to review
> with tiny macro .


Good point.

I tried to reduce the repeated div_en/div pattern using a helper macro.

It keeps the relationship between gate and fixed-factor clock more 
compact and easier to review.

After using the helper macro, the div_en/div code can be simplified to 
the following:

#define A9_VCLK(_name, _reg, _bit, _div, _parent)        \
struct clk_regmap a9_##_name##_en = {      \
         .data = &(struct clk_regmap_gate_data){          \
                 .offset = _reg,      \
                 .bit_idx = _bit,       \
         },       \
         .hw.init = &(struct clk_init_data) {           \
                 .name = #_name "_en",      \
                 .ops = &clk_regmap_gate_ops,           \
                 .parent_hws = (const struct clk_hw *[]) { _parent },    \
                 .num_parents = 1,      \
                 .flags = CLK_SET_RATE_PARENT,      \
         },       \
};       \
       \
struct clk_fixed_factor a9_##_name = {       \
         .mult = 1,       \
         .div = _div,       \
         .hw.init = &(struct clk_init_data){          \
                 .name = #_name,      \
                 .ops = &clk_fixed_factor_ops,          \
                 .parent_hws = (const struct clk_hw *[]) {      \
                         &a9_##_name##_en.hw          \
                 },       \
                 .num_parents = 1,      \
                 .flags = CLK_SET_RATE_PARENT,      \
         },       \
};       \

static A9_VCLK(vclk_div2, VID_CLK_CTRL, 1, 2, &a9_vclk.hw);
static A9_VCLK(vclk_div4, VID_CLK_CTRL, 2, 4, &a9_vclk.hw);
static A9_VCLK(vclk_div6, VID_CLK_CTRL, 3, 6, &a9_vclk.hw);
static A9_VCLK(vclk_div6, VID_CLK_CTRL, 4, 12, &a9_vclk.hw);
static A9_VCLK(vclk2_div2, VIID_CLK_CTRL, 1, 2, &a9_vclk2.hw);
static A9_VCLK(vclk2_div4, VIID_CLK_CTRL, 2, 4, &a9_vclk2.hw);
static A9_VCLK(vclk2_div6, VIID_CLK_CTRL, 3, 6, &a9_vclk2.hw);
static A9_VCLK(vclk2_div6, VIID_CLK_CTRL, 4, 12, &a9_vclk2.hw);


If you think splitting it further into separate helper macros would 
improve readability.

I can do that as well.

>> +
>> +static struct clk_fixed_factor a9_vclk_div2 = {
>> +     .mult = 1,
>> +     .div = 2,
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div2", &a9_vclk_div2_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk_div4_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL,
>> +             .bit_idx = 2,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div4_en", &a9_vclk.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_fixed_factor a9_vclk_div4 = {
>> +     .mult = 1,
>> +     .div = 4,
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div4", &a9_vclk_div4_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk_div6_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL,
>> +             .bit_idx = 3,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div6_en", &a9_vclk.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_fixed_factor a9_vclk_div6 = {
>> +     .mult = 1,
>> +     .div = 6,
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div6", &a9_vclk_div6_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk_div12_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL,
>> +             .bit_idx = 4,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div12_en", &a9_vclk.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_fixed_factor a9_vclk_div12 = {
>> +     .mult = 1,
>> +     .div = 12,
>> +     .hw.init = CLK_HW_INIT_HW("vclk_div12", &a9_vclk_div12_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_sel = {
>> +     .data = &(struct clk_regmap_mux_data){
>> +             .offset = VIID_CLK_CTRL,
>> +             .mask = 0x7,
>> +             .shift = 16,
>> +     },
>> +     .hw.init = CLK_HW_INIT_PARENTS_DATA("vclk2_sel", a9_vclk_parents,
>> +                     &clk_regmap_mux_ops, 0),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_in = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VIID_CLK_DIV,
>> +             .bit_idx = 16,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_in", &a9_vclk2_sel.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_div = {
>> +     .data = &(struct clk_regmap_div_data){
>> +             .offset = VIID_CLK_DIV,
>> +             .shift = 0,
>> +             .width = 8,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div", &a9_vclk2_in.hw,
>> +                               &clk_regmap_divider_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2 = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VIID_CLK_CTRL,
>> +             .bit_idx = 19,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2", &a9_vclk2_div.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_div1_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VIID_CLK_CTRL,
>> +             .bit_idx = 0,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div1_en", &a9_vclk2.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_div2_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VIID_CLK_CTRL,
>> +             .bit_idx = 1,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div2_en", &a9_vclk2.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_fixed_factor a9_vclk2_div2 = {
>> +     .mult = 1,
>> +     .div = 2,
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div2", &a9_vclk2_div2_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_div4_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VIID_CLK_CTRL,
>> +             .bit_idx = 2,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div4_en", &a9_vclk2.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_fixed_factor a9_vclk2_div4 = {
>> +     .mult = 1,
>> +     .div = 4,
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div4", &a9_vclk2_div4_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_div6_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VIID_CLK_CTRL,
>> +             .bit_idx = 3,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div6_en", &a9_vclk2.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_fixed_factor a9_vclk2_div6 = {
>> +     .mult = 1,
>> +     .div = 6,
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div6", &a9_vclk2_div6_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_vclk2_div12_en = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VIID_CLK_CTRL,
>> +             .bit_idx = 4,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div12_en", &a9_vclk2.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_fixed_factor a9_vclk2_div12 = {
>> +     .mult = 1,
>> +     .div = 12,
>> +     .hw.init = CLK_HW_INIT_HW("vclk2_div12", &a9_vclk2_div12_en.hw,
>> +                               &clk_fixed_factor_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +/* Channel 5, 6 and 7 are unconnected */
>> +static u32 a9_vid_parents_val_table[] = { 0, 1, 2, 3, 4, 8, 9, 10, 11, 12 };
>> +static const struct clk_hw *a9_vid_parents[] = {
>> +     &a9_vclk_div1_en.hw,
>> +     &a9_vclk_div2.hw,
>> +     &a9_vclk_div4.hw,
>> +     &a9_vclk_div6.hw,
>> +     &a9_vclk_div12.hw,
>> +     &a9_vclk2_div1_en.hw,
>> +     &a9_vclk2_div2.hw,
>> +     &a9_vclk2_div4.hw,
>> +     &a9_vclk2_div6.hw,
>> +     &a9_vclk2_div12.hw
>> +};
>> +
>> +static struct clk_regmap a9_vdac_sel = {
>> +     .data = &(struct clk_regmap_mux_data){
>> +             .offset = VIID_CLK_DIV,
>> +             .mask = 0xf,
>> +             .shift = 28,
>> +             .table = a9_vid_parents_val_table,
>> +     },
>> +     .hw.init = CLK_HW_INIT_PARENTS_HW("vdac_sel", a9_vid_parents
>> +                     , &clk_regmap_mux_ops, 0),
>> +};
>> +
>> +static struct clk_regmap a9_vdac = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL2,
>> +             .bit_idx = 4,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("vdac", &a9_vdac_sel.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +
>> +static struct clk_regmap a9_enc_sel = {
> Should this be enc0 then ? for consistency ?
> Same applies to similar instance (it is the same discussion we already
> had on the T7 I believe)


Yes, that makes sense.

I'll rename them to a9_encoder0 and a9_encoder1 for consistency, and 
I'll check for other similar instances as well.

>> +     .data = &(struct clk_regmap_mux_data){
>> +             .offset = VIID_CLK_DIV,
>> +             .mask = 0xf,
>> +             .shift = 12,
>> +             .table = a9_vid_parents_val_table,
>> +     },
>> +     .hw.init = CLK_HW_INIT_PARENTS_HW("enc_sel", a9_vid_parents
>> +                     , &clk_regmap_mux_ops, 0),
>> +};
>> +
>> +static struct clk_regmap a9_enc = {
>> +     .data = &(struct clk_regmap_gate_data){
>> +             .offset = VID_CLK_CTRL2,
>> +             .bit_idx = 10,
>> +     },
>> +     .hw.init = CLK_HW_INIT_HW("enc", &a9_enc_sel.hw,
>> +                               &clk_regmap_gate_ops, CLK_SET_RATE_PARENT),
>> +};
>> +


[ ... ]

> --
> Jerome


--

Jian



^ permalink raw reply

* Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
From: Ryan Roberts @ 2026-06-15 11:21 UTC (permalink / raw)
  To: Will Deacon, Linu Cherian
  Cc: Catalin Marinas, Kevin Brodsky, Anshuman Khandual, Yang Shi,
	Mark Rutland, Huang Ying, linux-arm-kernel, linux-kernel
In-Reply-To: <ai6KzFgfMAxqplcr@willie-the-truck>

On 14/06/2026 12:04, Will Deacon wrote:
> On Sat, May 23, 2026 at 07:17:10PM +0530, Linu Cherian wrote:
>> From: Ryan Roberts <ryan.roberts@arm.com>
>>
>> There are 3 variants of tlb flush that invalidate user mappings:
>> flush_tlb_mm(), flush_tlb_page() and __flush_tlb_range(). All of these
>> would previously unconditionally broadcast their tlbis to all cpus in
>> the inner shareable domain.
>>
>> But this is a waste of effort if we can prove that the mm for which we
>> are flushing the mappings has only ever been active on the local cpu. In
>> that case, it is safe to avoid the broadcast and simply invalidate the
>> current cpu.
>>
>> So let's track in mm_context_t::active_cpu either the mm has never been
>> active on any cpu, has been active on more than 1 cpu, or has been
>> active on precisely 1 cpu - and in that case, which one. We update this
>> when switching context, being careful to ensure that it gets updated
>> *before* installing the mm's pgtables. On the reader side, we ensure we
>> read *after* the previous write(s) to the pgtable(s) that necessitated
>> the tlb flush have completed. This guarrantees that if a cpu that is
>> doing a tlb flush sees it's own id in active_cpu, then the old pgtable
>> entry cannot have been seen by any other cpu and we can flush only the
>> local cpu.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> Tested-by: Huang Ying <ying.huang@linux.alibaba.com>
>> [linu.cherian@arm.com: Adapted for v7.1 flush tlb API changes]
>> Signed-off-by: Linu Cherian <linu.cherian@arm.com>
>> ---
>> Changelog from RFC v1:
>> - Adapted for v7.1 flush tlb API changes
>>   No changes in core logic
>> - Collected Rb and Tb tags
>> - lat_mmap benchmark showed dsb(ishst) performs better than dsb(ish),
>>   hence retained dsb(ishst) in flush_tlb_user_pre	
>>
>>
>> Testing with 7.1-rc4 :
>> +-----------------------+---------------------------------------------------+-------------+
>> | Benchmark             | Result Class                                      |  Improvement|  
>> +=======================+===================================================+=============+
>> | perf/syscall          | fork (ops/sec)                                    |   (I) 3.25% |
>> +-----------------------+---------------------------------------------------+-------------+
>> | pts/memtier-benchmark | Protocol: Redis Clients: 100 Ratio: 1:5 (Ops/sec) |   (I) 2.70% |
>> | 			| Protocol: Redis Clients: 100 Ratio: 5:1 (Ops/sec) |   (I) 2.13% |
>> +-----------------------+---------------------------------------------------+-------------+
> 
> I think we need a much more comprehensive set of benchmarks before we can
> begin to consider a change like this.

I believe that Linu ran a wider set of benchmarks and didn't find any
regressions. These are just the ones that show improvement (Linu, please correct
me and/or provide details).

Additionally Huang Ying did some testing against the RFC and reported 4.5%
improvement with Redis:

https://lore.kernel.org/linux-arm-kernel/87segumv6w.fsf@DESKTOP-5N7EMDA/

> 
>>  arch/arm64/include/asm/mmu.h         |  12 +++
>>  arch/arm64/include/asm/mmu_context.h |   2 +
>>  arch/arm64/include/asm/tlbflush.h    | 127 +++++++++++++++++++++------
>>  arch/arm64/mm/context.c              |  30 ++++++-
>>  4 files changed, 141 insertions(+), 30 deletions(-)
> 
> Doesn't this break BTM/SVM with the SMMU? I think that's a non-starter
> even if you can provide some more compelling numbers.

AIUI, we don't support BTM upstream - the SMMU uses private ASIDs and implements
MMU notifiers to forward the TLBIs via its command queue interface.

I was also under the impression that supporting BTM upsteam was not desired;
Please correct me if that's not accurate or if you're aware of plans to add
support. I've been (coincidentlly) looking at some other stuff that could
benefit from BTM but had concluded it wouldn't be an acceptable approach upstream.

If we did ever want to add SMMU BTM support though, I think it would be simple
enough to add an interface to allow the SMMU to disable the optimization (i.e.
force active_cpu to ACTIVE_CPU_MULTIPLE)?

> 
>> +static inline bool flush_tlb_user_pre(struct mm_struct *mm, tlbf_t flags)
>> +{
>> +	unsigned int self, active;
>> +	bool local;
>> +
>> +	migrate_disable();
>> +
>> +	if (flags & TLBF_NOBROADCAST) {
>> +		dsb(nshst);
>> +		return true;
>> +	}
> 
> Why does the NOBROADCAST case need migration disabled? It didn't before...

The existing semantic for TLBF_BOBROADCAST is that it emits a local TLBI on
whatever CPU we happen to be executing on. It's used for lazily fixing up
spurious faults (i.e. hitting RO TLB entries when the PTE has been relaxed to
RW). So it's still functionally correct if the thread migrates CPU between
taking the fault and issuing the local TLBI - in the worst case it just leads to
another spurious fault.

For this new case, we need to ensure we don't get migrated between reading
active_cpu and issuing the local TLBI, otherwise we would only issue a local
TLBI when a broadcast was required.

> 
>> +
>> +	self = smp_processor_id();
>> +
>> +	/*
>> +	 * The load of mm->context.active_cpu must not be reordered before the
>> +	 * store to the pgtable that necessitated this flush. This ensures that
>> +	 * if the value read is our cpu id, then no other cpu can have seen the
>> +	 * old pgtable value and therefore does not need this old value to be
>> +	 * flushed from its tlb. But we don't want to upgrade the dsb(ishst),
>> +	 * needed to make the pgtable updates visible to the walker, to a
>> +	 * dsb(ish) by default. So speculatively load without a barrier and if
>> +	 * it indicates our cpu id, then upgrade the barrier and re-load.
>> +	 */
>> +	active = READ_ONCE(mm->context.active_cpu);
>> +	if (active == self) {
>> +		dsb(ish);
>> +		active = READ_ONCE(mm->context.active_cpu);
>> +	} else {
>> +		dsb(ishst);
>> +	}
> 
> Why can't you just do:
> 
> 	dsb(ishst);
> 	active = READ_ONCE(mm->context.active_cpu);
> 
> ?

Prior to this optimization, we always issued a dsb(ishst) here. Catalin
suggested the same simplification against the RFC. I believe Linu tried it but
saw regressions; Hopefully Linu can provide the details.

> 
>> +
>> +	local = active == self;
>> +	if (!local)
>> +		migrate_enable();
>> +
>> +	return local;
>> +}
>> +
>> +static inline void flush_tlb_user_post(bool local)
>> +{
>> +	if (local)
>> +		migrate_enable();
>> +}
> 
> I was under the impression that disabling/enabling migration was an
> expensive thing to do, so I'd really want to see some more numbers to
> justify this (including from inside a VM) and allow us to consider the
> trade-offs properly. It's also not at all clear to me that it's safe
> from such a low-level TLB invalidation helper.

I had assumed it wasn't very expensive, but perhaps I'm wrong. I know
preempt_enable() can be expensive because it has to test to see if it needs to
reschedule. But I assumed for disabling/enabling migration, it would just be a
counter and the scheduler would check that it's zero before considing moving the
task to another run queue. (But I have practically zero understanding of the
scheduler so I'll assume I'm wrong...).

Instead of disabling migration, perhaps we could re-check active_cpu after
issuing the local tlbi - if it's now reporting "multiple" we must have been
migrated and we need to upgrade to a braodcast TLBI?

> 
>> +
>>  /*
>>   *	TLB Invalidation
>>   *	================
>> @@ -408,12 +482,20 @@ static inline void flush_tlb_all(void)
>>  static inline void flush_tlb_mm(struct mm_struct *mm)
>>  {
>>  	unsigned long asid;
>> +	bool local;
>>  
>> -	dsb(ishst);
>> +	local = flush_tlb_user_pre(mm, TLBF_NONE);
>>  	asid = __TLBI_VADDR(0, ASID(mm));
>> -	__tlbi(aside1is, asid);
>> -	__tlbi_user(aside1is, asid);
>> -	__tlbi_sync_s1ish(mm);
>> +	if (local) {
>> +		__tlbi(aside1, asid);
>> +		__tlbi_user(aside1, asid);
>> +		dsb(nsh);
>> +	} else {
>> +		__tlbi(aside1is, asid);
>> +		__tlbi_user(aside1is, asid);
>> +		__tlbi_sync_s1ish(mm);
>> +	}
>> +	flush_tlb_user_post(local);
> 
> I think you've changed this since Ryan's original patch, but why are you
> only calling __tlbi_sync_s1ish() for the !local case? Doesn't that break
> the erratum workaround when running as a VM if the vCPU is migrated?

Hmm. So from the guest kernel's perspective, it has concluded that it only needs
to target the local (v)CPU. Since the errata only affect boardcast TLBIs, it
concludes there is no need to issue the workarounds. But since it's a VM, the HW
will upgrade the local TLBIs to broadcast TLBIs, but will not magically
re-instate the workarounds. I guess the simplest solution would be to disable
the optimization when either workaround is enabled.

Perhaps this is all getting a bit too complex for not enough benefit...

Thanks,
Ryan

> 
>> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
>> index 0f4a28b87469..f34ed78393e0 100644
>> --- a/arch/arm64/mm/context.c
>> +++ b/arch/arm64/mm/context.c
>> @@ -214,9 +214,10 @@ static u64 new_context(struct mm_struct *mm)
>>  
>>  void check_and_switch_context(struct mm_struct *mm)
>>  {
>> -	unsigned long flags;
>> -	unsigned int cpu;
>> +	unsigned int cpu = smp_processor_id();
>>  	u64 asid, old_active_asid;
>> +	unsigned int active;
>> +	unsigned long flags;
>>  
>>  	if (system_supports_cnp())
>>  		cpu_set_reserved_ttbr0();
>> @@ -251,7 +252,6 @@ void check_and_switch_context(struct mm_struct *mm)
>>  		atomic64_set(&mm->context.id, asid);
>>  	}
>>  
>> -	cpu = smp_processor_id();
>>  	if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
>>  		local_flush_tlb_all();
>>  
>> @@ -262,6 +262,30 @@ void check_and_switch_context(struct mm_struct *mm)
>>  
>>  	arm64_apply_bp_hardening();
>>  
>> +	/*
>> +	 * Update mm->context.active_cpu in such a manner that we avoid cmpxchg
>> +	 * and dsb unless we definitely need it. If initially ACTIVE_CPU_NONE
>> +	 * then we are the first cpu to run so set it to our id. If initially
>> +	 * any id other than ours, we are the second cpu to run so set it to
>> +	 * ACTIVE_CPU_MULTIPLE. If we update the value then we must issue
>> +	 * dsb(ishst) to ensure stores to mm->context.active_cpu are ordered
>> +	 * against the TTBR0 write in cpu_switch_mm()/uaccess_enable(); the
>> +	 * store must be visible to another cpu before this cpu could have
>> +	 * populated any TLB entries based on the pgtables that will be
>> +	 * installed.
>> +	 */
>> +	active = READ_ONCE(mm->context.active_cpu);
>> +	if (active != cpu && active != ACTIVE_CPU_MULTIPLE) {
>> +		if (active == ACTIVE_CPU_NONE)
>> +			active = cmpxchg_relaxed(&mm->context.active_cpu,
>> +						 ACTIVE_CPU_NONE, cpu);
>> +
>> +		if (active != ACTIVE_CPU_NONE)
>> +			WRITE_ONCE(mm->context.active_cpu, ACTIVE_CPU_MULTIPLE);
>> +
>> +		dsb(ishst);
>> +	}
>> +
> 
> Can you simplify the 'if' condition here?
> 
> 	if (active == ACTIVE_CPU_NONE) {
> 		if (!try_cmpxchg_relaxed(...))
> 			WRITE_ONCE(...);
> 
> 		dsb(ishst);
> 	}
> 
> (as an aside, maybe we should implement arch_try_cmpxchg{,_relaxed} so
>  we could drop the READ_ONCE() here as well?)
> 
> Will



^ permalink raw reply

* Re: [PATCH v5 09/10] iio: dac: add mcf54415 DAC
From: Angelo Dureghello @ 2026-06-15 11:20 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Angelo Dureghello, Greg Ungerer, Geert Uytterhoeven, Steven King,
	Arnd Bergmann, Maxime Coquelin, Alexandre Torgue, David Lechner,
	Nuno Sá, Andy Shevchenko, Greg Ungerer, linux-m68k,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio,
	Michael Turquette, Stephen Boyd, Brian Masney
In-Reply-To: <20260611114800.009d9797@jic23-huawei>

Hi Jonathan,

On Thu, Jun 11, 2026 at 11:48:00AM +0100, Jonathan Cameron wrote:
> On Wed, 10 Jun 2026 22:35:14 +0200
> Angelo Dureghello <adureghello@baylibre.com> wrote:
>
> > From: Angelo Dureghello <adureghello@baylibre.com>
> >
> > Add basic version of mcf54415 DAC driver. DAC is embedded in the SoC and
> > DAC configuration registers are mapped in the internal IO address space.
> >
> > The DAC accepts a 12-bit digital signal and creates a monotonic 12-bit
> > analog output varying from DAC_VREFL to DAC_VREFH. The DAC module
> > consists of a conversion unit, an output amplifier, and the associated
> > digital control blocks. Default register values for DAC_VREFL and DAC_VREFH
> > are respectively 0 and 0xfff, left untouched in this initial version.
> >
> > This initial version of the driver is minimalistic, "output raw" only, to
> > be extended in the future. DMA and external sync are disabled, default mode
> > is high speed, default format is right-justified 12-bit on 16-bit word.
> >
> > Signed-off-by: Angelo Dureghello <adureghello@baylibre.com>
> I'm lazy so didn't check earlier versions but assume the two bits
> of feedback from Sashiko are false positives:
> https://sashiko.dev/#/patchset/20260610-wip-stmark2-dac-v5-0-b76b83366d5c%40baylibre.com
>
> The one about clock underflow if resume fails, and then devm cleanup happens later
> is a bit nasty.
>
> I did a bit of digging and maybe it is better to just leave the clock on?
> The status dev.power.is_suspended is set to false whether or not resume succeeded
> and I believe a following suspend will not take into account that resume failed.
>
> I'm not set up to poke the combinations but it might be worth trying that.
> +CC common clk people who may immediately know what the right answer is.
>

was about testing this, there are no bus faults on read/write of registers
with clock disabled, nor warnings in dmesg on double disable.

Anyway, i see now from arch Kconfig that is not possible to have CONFIG_PM
for this specifig CPU (with MMU, PM is force-disabled), so would remove pm_ops.
Ok ?

Will fix all other things you pointyed out in v6.

> Otherwise just a few minor style comments inline.
>
> Thanks,
>
> Jonathan
>

thanks,
regards,
angelo


^ permalink raw reply

* Re: [PATCH] PCI: meson: Fix PERST# timing by asserting reset before LTSSM enable
From: Ronald Claveau @ 2026-06-15 10:34 UTC (permalink / raw)
  To: Gowtham Kudupudi
  Cc: robh, bhelgaas, neil.armstrong, khilman, jbrunet,
	martin.blumenstingl, linux-pci, linux-amlogic, linux-arm-kernel,
	linux-kernel, yue.wang, lpieralisi, kwilczynski, mani
In-Reply-To: <20260614015620.20432-1-gowtham@ferryfair.com>

On 6/14/26 3:56 AM, Gowtham Kudupudi wrote:
> On warm reboot, the PCIe controller's LTSSM starts link training
> immediately if PERST# is already deasserted from the previous boot.
> The driver then pulses PERST# for only 500us, which is too short to
> properly reset the endpoint device that has already started training.
> 
> Fix by moving the PERST# assert/deassert pulse BEFORE enabling LTSSM,
> so the endpoint gets a clean reset cycle before link training begins.
> 
> This was found on Amlogic G12B (A311D) with NVMe on an M.2 slot.
> Cold boot worked because POR held PERST# low; warm reboot did not.
> The fix was confirmed on a Banana Pi CM4 with Waveshare IO base board.
> 
> Signed-off-by: Gowtham Kudupudi <gowtham@ferryfair.com>
> ---
>  drivers/pci/controller/dwc/pci-meson.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-meson.c b/drivers/pci/controller/dwc/pci-meson.c
> index 5f8e2f4b3c12..3a7e9f1d5b8c 100644
> --- a/drivers/pci/controller/dwc/pci-meson.c
> +++ b/drivers/pci/controller/dwc/pci-meson.c
> @@ -310,8 +310,8 @@ static int meson_pcie_start_link(struct dw_pcie *pci)
>  {
>  	struct meson_pcie *mp = to_meson_pcie(pci);
>  
> +	meson_pcie_assert_reset(mp);
>  	meson_pcie_ltssm_enable(mp);
> -	meson_pcie_assert_reset(mp);
>  
>  	return 0;
>  }

Hi Gowtham,

I have a patch [1] that I haven't submitted yet.
This might be related to your issue, what do you think ?

[1] https://github.com/rclaveau-tech/linux-khadas/commit/bee0a02d9756

-- 
Best regards,
Ronald


^ permalink raw reply

* Re: [PATCH] KVM: arm64: Sync SPSR_EL1 when injecting an exception into a pVM
From: Fuad Tabba @ 2026-06-15 10:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, Oliver Upton, linux-arm-kernel, kvmarm,
	linux-kernel, Joey Gouly, Steffen Eiden, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Sascha Bischoff, Andrew Jones
In-Reply-To: <ai_OevUNS-OACPm7@willie-the-truck>

Hi Will,


On Mon, 15 Jun 2026 at 11:05, Will Deacon <will@kernel.org> wrote:
>
> On Fri, Jun 12, 2026 at 12:34:14PM +0100, Fuad Tabba wrote:
> > When pKVM injects a synchronous exception into a protected guest, it
> > re-enters without restoring the guest's EL1 sysregs and writes the EL1
> > exception registers to hardware by hand: ESR_EL1 and ELR_EL1, but not
> > SPSR_EL1. enter_exception64() sets SPSR_EL1 (the interrupted PSTATE)
> > only in memory, so the guest's handler reads a stale SPSR_EL1 and
> > restores the wrong PSTATE on eret.
> >
> > Write SPSR_EL1 alongside the other exception registers.
> >
> > Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers")
> > Reported-by: sashiko <sashiko@sashiko.dev>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/sys_regs.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
> > index 8c3fbb413a06..1a7d5cd16d72 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
> > @@ -268,6 +268,7 @@ static void inject_sync64(struct kvm_vcpu *vcpu, u64 esr)
> >
> >       write_sysreg_el1(esr, SYS_ESR);
> >       write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
> > +     write_sysreg_el1(read_sysreg_el2(SYS_SPSR), SYS_SPSR);
> >       write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
> >       write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> >  }
>
> Is SPSR_EL1 not set in enter_exception64() using vcpu_cpsr(vcpu), which
> *is* set here? I'm just a bit wary of the report, as I'd have expected
> fireworks if we weren't initialising the guest's SPSR on the exception
> injection path.

Yes, enter_exception64() sets SPSR_EL1, but only in memory:
__vcpu_write_spsr() takes the nVHE path and calls
__vcpu_assign_sys_reg(vcpu, SPSR_EL1, val), which writes
vcpu->arch.ctxt.sys_regs[SPSR_EL1] without touching the hardware
register.

In the normal nVHE entry path that memory value reaches hardware via
__sysreg_restore_state_nvhe() before __guest_enter(). But
inject_sync64() runs inside the fixup_guest_exit() loop, which
re-enters the guest directly without a sysreg restore pass. ESR_EL1
and ELR_EL1 are already written to hardware by hand for this reason
but SPSR_EL1 was missed.

I think we don't see fireworks because inject_sync64() only fires on
error paths, e.g., accesses to restricted or undefined sysregs. A
well-behaved guest shouldn't trigger these, so the stale SPSR_EL1
isn't observed.

Cheers,
/fuad

>
> Will


^ permalink raw reply

* Re: [PATCH 05/10] [v2] mips: select legacy gpiolib interfaces where used
From: Thomas Bogendoerfer @ 2026-06-15 10:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-gpio, linux-kernel, Arnd Bergmann, Christian Lamparter,
	Johannes Berg, Aaro Koskinen, Andreas Kemnade, Kevin Hilman,
	Roger Quadros, Tony Lindgren, John Paul Adrian Glaubitz,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Linus Walleij, Bartosz Golaszewski,
	Dmitry Torokhov, Lee Jones, Pavel Machek, Matti Vaittinen,
	Florian Fainelli, Jonas Gorski, Andrew Lunn, Vladimir Oltean,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-wireless, linux-omap, linux-arm-kernel, linux-mips,
	linux-sh, linux-input, linux-leds, netdev, Bartosz Golaszewski
In-Reply-To: <20260520183815.2510387-6-arnd@kernel.org>

On Wed, May 20, 2026 at 08:38:10PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> A few old machines have not been converted away from the old-style
> gpiolib interfaces. Make these select the new CONFIG_GPIOLIB_LEGACY
> symbol so the code still works where it is needed but can be left
> out otherwise.
> 
> This is the list of all gpio_request() calls in mips:
> 
>   arch/mips/alchemy/devboards/db1000.c:           gpio_request(19, "sd0_cd");
>   arch/mips/alchemy/devboards/db1000.c:           gpio_request(20, "sd1_cd");
>   arch/mips/alchemy/devboards/db1200.c:   gpio_request(215, "otg-vbus");
>   arch/mips/bcm47xx/workarounds.c:        err = gpio_request_one(usb_power, GPIOF_OUT_INIT_HIGH, "usb_power");
>   arch/mips/bcm63xx/boards/board_bcm963xx.c:              gpio_request_one(board.ephy_reset_gpio,
>   arch/mips/txx9/rbtx4927/setup.c:        gpio_request(15, "sio-dtr");
> 
> Most of these should be easy enough to change to modern gpio descriptors
> or remove if they are no longer in use.
> 
> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> Reviewed-by: Linus Walleij <linusw@kernel.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: no changes. There was no discussion on this, but the patch
>     has so far not made it into the linux-mips tree, so I'm including
>     it for completeness.
> ---
>  arch/mips/Kconfig         | 5 +++++
>  arch/mips/alchemy/Kconfig | 1 -
>  arch/mips/txx9/Kconfig    | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)

applied to mips-next

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]


^ permalink raw reply

* Re: [PATCH v5 2/3] pwm: rp1: Add RP1 PWM controller driver
From: Julian Braha @ 2026-06-15 10:29 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Andrea della Porta, linux-pwm, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Florian Fainelli,
	Broadcom internal kernel review list, devicetree,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel, Naushir Patuck,
	Stanimir Varbanov, mbrugger
In-Reply-To: <ai-dNlC1_nbQTy5Z@monoceros>

Hi Uwe,

On 6/15/26 07:37, Uwe Kleine-König wrote:
> IMHO selecting REGMAP_MMIO explicitly here is fine because at least to
> me it's not obvious that MFD_SYSCON enforces REGMAP_MMIO.

I think it's better to use comments to document non-obvious behavior,
rather than dead code.
E.g.:
'select MFD_SYSCON # selects REGMAP_MMIO'

But I guess this is not really worth bikeshedding over.

- Julian Braha


^ permalink raw reply

* Re: [PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort
From: Puranjay Mohan @ 2026-06-15 10:25 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Catalin Marinas, Will Deacon, James Morse, Mark Rutland,
	Marc Zyngier, Doug Anderson, Petr Mladek, Thomas Gleixner,
	Andrew Morton, Baoquan He, Usama Arif, Breno Leitao,
	Julien Thierry, Lecopzer Chen, Sumit Garg, kernel-team, kexec,
	linux-arm-kernel, linux-kernel, Kiryl Shutsemau (Meta)
In-Reply-To: <167493d30ef6c99a44291de14cddd41ced8149c4.1781490440.git.kas@kernel.org>

On Mon, Jun 15, 2026 at 4:36 AM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> A CPU wedged with interrupts masked ignores the stop IPI, and without
> pseudo-NMI there is no NMI IPI to escalate to: a reboot proceeds with
> the CPU still running, and a kdump misses its registers.
>
> Add a third rung to smp_send_stop(): once the IPI (and pseudo-NMI IPI,
> if enabled) rungs have run, signal SDEI event 0 at whatever stayed
> online. Firmware delivers it regardless of the target's DAIF, so it
> reaches a CPU a plain IPI cannot; the target acks by going offline,
> which the caller already polls for.
>
> Fold the stop bookkeeping into one arm64_nmi_cpu_stop(regs,
> die_on_crash), shared by the stop IPI handlers, panic_smp_self_stop()
> and the SDEI handler, replacing the near-duplicate local_cpu_stop() and
> ipi_cpu_crash_stop(). @die_on_crash is the only difference: the IPI
> handlers pass true and PSCI CPU_OFF the CPU on a crash stop so a capture
> kernel can reclaim it; the SDEI handler and self-stop pass false and
> park. The SDEI park is required, not conservative -- its handler runs
> inside an SDEI event that is never completed (completing it resumes the
> wedged context), and a CPU_OFF from that unfinished-event context wedges
> EL3 on some firmware (left as a follow-up). The dump is unaffected; only
> re-onlining the CPU in an SMP capture kernel is lost.
>
> Suggested-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> ---
>  arch/arm64/include/asm/nmi.h    |  24 +++++++
>  arch/arm64/kernel/smp.c         | 109 +++++++++++++++++++++-----------
>  drivers/firmware/Kconfig        |   2 +
>  drivers/firmware/arm_sdei_nmi.c |  75 ++++++++++++++++++++++
>  4 files changed, 172 insertions(+), 38 deletions(-)
>
> diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
> index 9366be419d18..2e8974ff8d63 100644
> --- a/arch/arm64/include/asm/nmi.h
> +++ b/arch/arm64/include/asm/nmi.h
> @@ -4,21 +4,45 @@
>
>  #include <linux/cpumask.h>
>
> +struct pt_regs;
> +
>  /*
>   * Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
>   * its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
>   * drivers/firmware/arm_sdei_nmi.c implements them when active; a future
>   * FEAT_NMI provider could slot in here too. The stubs let callers stay
>   * unconditional when ARM_SDEI_NMI is off.
> + *
> + * sdei_nmi_active() lets a caller test for the service before committing
> + * to (and waiting on) the SDEI stop rung; sdei_nmi_stop_cpus() then signals
> + * the targets, which ack by going offline.
>   */
>  #ifdef CONFIG_ARM_SDEI_NMI
>  bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
> +bool sdei_nmi_active(void);
> +void sdei_nmi_stop_cpus(const cpumask_t *mask);
>  #else
>  static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>                                                       int exclude_cpu)
>  {
>         return false;
>  }
> +
> +static inline bool sdei_nmi_active(void)
> +{
> +       return false;
> +}
> +
> +static inline void sdei_nmi_stop_cpus(const cpumask_t *mask) { }
>  #endif
>
> +/*
> + * The common "stop this CPU" entry every arm64 stop path funnels through:
> + * the regular/pseudo-NMI stop IPI handlers, panic_smp_self_stop(), and the
> + * SDEI cross-CPU NMI handler. @die_on_crash powers the CPU off on the kdump
> + * crash path (IPI handlers) instead of parking it (SDEI / self-stop).
> + * Defined in arch/arm64/kernel/smp.c.
> + */
> +void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash);
> +
>  #endif /* __ASM_NMI_H */
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index a670434a8cae..e85a4ba18d5c 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -33,6 +33,7 @@
>  #include <linux/kernel_stat.h>
>  #include <linux/kexec.h>
>  #include <linux/kgdb.h>
> +#include <linux/kprobes.h>
>  #include <linux/kvm_host.h>
>  #include <linux/nmi.h>
>
> @@ -862,14 +863,58 @@ void arch_irq_work_raise(void)
>  }
>  #endif
>
> -static void __noreturn local_cpu_stop(unsigned int cpu)
> +/*
> + * Bring the local CPU to a stop, saving its register state into the vmcore
> + * on the kdump crash path first. The single point every arm64 stop path
> + * funnels through, so the bookkeeping (mask interrupts, mark offline, mask
> + * SDEI, optionally power off) lives in one place:
> + *
> + *   - the regular IPI_CPU_STOP and pseudo-NMI IPI_CPU_STOP_NMI handlers;
> + *   - panic_smp_self_stop(), a CPU parking itself on a parallel panic();
> + *   - the SDEI cross-CPU NMI handler (drivers/firmware/arm_sdei_nmi.c),
> + *     which reaches CPUs the stop IPIs could not.
> + *
> + * @regs is the register state to record in the vmcore on a crash stop; NULL
> + * means "capture the current context". @die_on_crash decides the kdump crash
> + * path: the IPI stop handlers pass true and power the CPU off (PSCI CPU_OFF,
> + * via __cpu_try_die()) so a capture kernel can reclaim it. The SDEI handler
> + * and panic_smp_self_stop() pass false and only park. For SDEI that is
> + * required, not just conservative: it runs inside an SDEI event that is
> + * deliberately never completed (completing it has firmware resume the wedged
> + * context), and a CPU_OFF from that not-yet-completed context wedges EL3 on
> + * some firmware -- a documented follow-up. Parking also matches this path's
> + * own fallback when CPU_OFF is unavailable.
> + */
> +void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash)
>  {
> +       unsigned int cpu = smp_processor_id();
> +       bool crash = IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop;
> +
> +       /*
> +        * Use local_daif_mask() instead of local_irq_disable() to make sure
> +        * that pseudo-NMIs are disabled. The "stop" code starts with an IRQ
> +        * and falls back to NMI (which might be pseudo). If the IRQ finally
> +        * goes through right as we're timing out then the NMI could interrupt
> +        * us. It's better to prevent the NMI and let the IRQ finish since the
> +        * pt_regs will be better.
> +        */
> +       local_daif_mask();
> +
> +       if (crash)
> +               crash_save_cpu(regs, cpu);
> +
> +       /* the ack a stop requester (e.g. smp_send_stop()) polls for */
>         set_cpu_online(cpu, false);
>
> -       local_daif_mask();
>         sdei_mask_local_cpu();
> +
> +       if (crash && die_on_crash)
> +               __cpu_try_die(cpu);
> +
> +       /* just in case */
>         cpu_park_loop();
>  }
> +NOKPROBE_SYMBOL(arm64_nmi_cpu_stop);
>
>  /*
>   * We need to implement panic_smp_self_stop() for parallel panic() calls, so
> @@ -878,36 +923,7 @@ static void __noreturn local_cpu_stop(unsigned int cpu)
>   */
>  void __noreturn panic_smp_self_stop(void)
>  {
> -       local_cpu_stop(smp_processor_id());
> -}
> -
> -static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
> -{
> -#ifdef CONFIG_KEXEC_CORE
> -       /*
> -        * Use local_daif_mask() instead of local_irq_disable() to make sure
> -        * that pseudo-NMIs are disabled. The "crash stop" code starts with
> -        * an IRQ and falls back to NMI (which might be pseudo). If the IRQ
> -        * finally goes through right as we're timing out then the NMI could
> -        * interrupt us. It's better to prevent the NMI and let the IRQ
> -        * finish since the pt_regs will be better.
> -        */
> -       local_daif_mask();
> -
> -       crash_save_cpu(regs, cpu);
> -
> -       set_cpu_online(cpu, false);
> -
> -       sdei_mask_local_cpu();
> -
> -       if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
> -               __cpu_try_die(cpu);
> -
> -       /* just in case */
> -       cpu_park_loop();
> -#else
> -       BUG();
> -#endif
> +       arm64_nmi_cpu_stop(NULL, false);
>  }

panic_smp_self_stop() passes regs == NULL. If a second CPU panics
after the primary has already set crash_stop, it loses the panic_cpu
cmpxchg and calls panic_smp_self_stop(); if it was running with
interrupts masked it never took the stop IPI, so it gets here with
crash_stop == 1. crash is then true and we do crash_save_cpu(NULL,
cpu), which ends up in elf_core_copy_regs(), and on arm64 that is just

        *(struct user_pt_regs *)&(dest) = (regs)->user_regs;

so a straight NULL deref -> synchronous abort while we're in the
middle of crashing. The old local_cpu_stop() never called
crash_save_cpu(), so this is a new regression from the unification.

The comment above the function says NULL means "capture the current
context", but crash_save_cpu() doesn't do that, it just dereferences
regs. If that's the intent, materialise it:

        if (crash) {
                struct pt_regs local;

                if (!regs) {
                        crash_setup_regs(&local, NULL);
                        regs = &local;
                }
                crash_save_cpu(regs, cpu);
        }

  crash_setup_regs(..., NULL) is the existing "capture current" helper. Or just
  skip the save when regs is NULL if the self-stop registers aren't
worth having.


^ permalink raw reply

* Re: [PATCH v3 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64
From: Puranjay Mohan @ 2026-06-15 10:18 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Catalin Marinas, Will Deacon, James Morse, Mark Rutland,
	Marc Zyngier, Doug Anderson, Petr Mladek, Thomas Gleixner,
	Andrew Morton, Baoquan He, Usama Arif, Breno Leitao,
	Julien Thierry, Lecopzer Chen, Sumit Garg, kernel-team, kexec,
	linux-arm-kernel, linux-kernel, Kiryl Shutsemau (Meta)
In-Reply-To: <704b467d5b320da9cf49fc9bb4a6814063986f3b.1781490440.git.kas@kernel.org>

On Mon, Jun 15, 2026 at 4:35 AM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> Deliver an NMI-like event to an interrupt-masked arm64 CPU via the
> standard SDEI software-signalled event (event 0), without the pseudo-NMI
> hot-path cost: register a handler for event 0 and poke a target with
> sdei_event_signal(0, mpidr).
>
> First user is arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
> hung-task/soft-lockup dumps), which otherwise rides an IPI that can't
> reach a masked CPU. Falls back to the IPI path when SDEI is absent; no
> watchdog backend yet, so the stock detector is untouched.
>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
> ---
>  MAINTAINERS                     |   2 +-
>  arch/arm64/include/asm/nmi.h    |  24 +++++
>  arch/arm64/kernel/smp.c         |  11 +++
>  drivers/firmware/Kconfig        |  19 ++++
>  drivers/firmware/Makefile       |   1 +
>  drivers/firmware/arm_sdei_nmi.c | 149 ++++++++++++++++++++++++++++++++
>  6 files changed, 205 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/include/asm/nmi.h
>  create mode 100644 drivers/firmware/arm_sdei_nmi.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c8d4b913f26c..b5ddfb85dce9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -24797,7 +24797,7 @@ M:      James Morse <james.morse@arm.com>
>  L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
>  S:     Maintained
>  F:     Documentation/devicetree/bindings/arm/firmware/sdei.txt
> -F:     drivers/firmware/arm_sdei.c
> +F:     drivers/firmware/arm_sdei*
>  F:     include/linux/arm_sdei.h
>  F:     include/uapi/linux/arm_sdei.h
>
> diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
> new file mode 100644
> index 000000000000..9366be419d18
> --- /dev/null
> +++ b/arch/arm64/include/asm/nmi.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_NMI_H
> +#define __ASM_NMI_H
> +
> +#include <linux/cpumask.h>
> +
> +/*
> + * Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
> + * its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
> + * drivers/firmware/arm_sdei_nmi.c implements them when active; a future
> + * FEAT_NMI provider could slot in here too. The stubs let callers stay
> + * unconditional when ARM_SDEI_NMI is off.
> + */
> +#ifdef CONFIG_ARM_SDEI_NMI
> +bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
> +#else
> +static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> +                                                     int exclude_cpu)
> +{
> +       return false;
> +}
> +#endif
> +
> +#endif /* __ASM_NMI_H */
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 1aa324104afb..a670434a8cae 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -45,6 +45,7 @@
>  #include <asm/daifflags.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/mmu_context.h>
> +#include <asm/nmi.h>
>  #include <asm/numa.h>
>  #include <asm/processor.h>
>  #include <asm/smp_plat.h>
> @@ -927,6 +928,16 @@ static void arm64_backtrace_ipi(cpumask_t *mask)
>
>  void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
>  {
> +       /*
> +        * Prefer the SDEI cross-CPU NMI provider when active: firmware
> +        * dispatches the event out of EL3 and reaches CPUs that have
> +        * interrupts locally masked, without the per-IRQ-mask cost that
> +        * pseudo-NMI pays for the same reach. The plain IPI path below
> +        * can't reach such a CPU unless pseudo-NMI is enabled.
> +        */
> +       if (sdei_nmi_trigger_cpumask_backtrace(mask, exclude_cpu))
> +               return;
> +
>         /*
>          * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the name,
>          * nothing about it truly needs to be implemented using an NMI, it's
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index bbd2155d8483..6501087ff90d 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -36,6 +36,25 @@ config ARM_SDE_INTERFACE
>           standard for registering callbacks from the platform firmware
>           into the OS. This is typically used to implement RAS notifications.
>
> +config ARM_SDEI_NMI
> +       bool "SDEI-based cross-CPU NMI service (arm64)"
> +       depends on ARM64 && ARM_SDE_INTERFACE
> +       help
> +         Provides SDEI-based cross-CPU NMI delivery for hooks that need
> +         to reach interrupt-masked CPUs on silicon that lacks FEAT_NMI:
> +
> +           - arch_trigger_cpumask_backtrace()  (sysrq-l, RCU stalls,
> +             hardlockup_all_cpu_backtrace, soft-lockup secondary dumps,
> +             hung-task auxiliary dumps)
> +
> +         The driver registers a handler for the SDEI software-signalled
> +         event (event 0) and reaches a target CPU by signalling it with
> +         SDEI_EVENT_SIGNAL. Firmware delivers the event out of EL3
> +         regardless of the target's PSTATE.DAIF -- forced delivery into a
> +         CPU wedged with interrupts locally masked.
> +
> +         If unsure, say N.
> +
>  config EDD
>         tristate "BIOS Enhanced Disk Drive calls determine boot disk"
>         depends on X86
> diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
> index 4ddec2820c96..be46f1e1dc77 100644
> --- a/drivers/firmware/Makefile
> +++ b/drivers/firmware/Makefile
> @@ -4,6 +4,7 @@
>  #
>  obj-$(CONFIG_ARM_SCPI_PROTOCOL)        += arm_scpi.o
>  obj-$(CONFIG_ARM_SDE_INTERFACE)        += arm_sdei.o
> +obj-$(CONFIG_ARM_SDEI_NMI)     += arm_sdei_nmi.o
>  obj-$(CONFIG_DMI)              += dmi_scan.o
>  obj-$(CONFIG_DMI_SYSFS)                += dmi-sysfs.o
>  obj-$(CONFIG_EDD)              += edd.o
> diff --git a/drivers/firmware/arm_sdei_nmi.c b/drivers/firmware/arm_sdei_nmi.c
> new file mode 100644
> index 000000000000..a82776e7b55a
> --- /dev/null
> +++ b/drivers/firmware/arm_sdei_nmi.c
> @@ -0,0 +1,149 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm64 SDEI-based cross-CPU NMI service.
> + *
> + * Delivering an "NMI-shaped" event to an EL1 context that has locally
> + * masked interrupts, on silicon without FEAT_NMI, can be done two ways:
> + *
> + *   - pseudo-NMI: mask "interrupts" via the GIC priority register
> + *     (ICC_PMR_EL1) instead of PSTATE.DAIF, leaving a high-priority band
> + *     deliverable. Functionally this works -- but it reimplements every
> + *     local_irq_disable()/enable() and exception entry/exit as a PMR
> + *     write plus synchronisation, a cost paid on that hot path forever,
> + *     whether or not an NMI is ever delivered.
> + *
> + *   - SDEI: leave interrupt masking as the cheap PSTATE.DAIF operation
> + *     and have the firmware bounce an EL3-routed Group-0 SGI back to
> + *     NS-EL1 as an event callback. The cost is a firmware round-trip,
> + *     but only at the rare moment delivery is actually needed.
> + *
> + * This driver takes the second path: it keeps the IRQ-mask hot path
> + * free and pays only when it fires, which is what makes cross-CPU NMI
> + * affordable on hardware where the pseudo-NMI tax isn't, until FEAT_NMI
> + * makes NMI masking cheap in the architecture itself.
> + *
> + * Capabilities provided:
> + *
> + *   - sdei_nmi_trigger_cpumask_backtrace() — override for arm64's
> + *     arch_trigger_cpumask_backtrace(), so sysrq-l, RCU stall dumps,
> + *     hardlockup_all_cpu_backtrace, soft-lockup/hung-task secondary
> + *     dumps all reach interrupt-masked CPUs.
> + *
> + * Delivery uses the standard SDEI software-signalled event (event 0) and
> + * SDEI_EVENT_SIGNAL. We register a handler for event 0, enable it, and
> + * poke a target CPU with sdei_event_signal(0, mpidr): firmware makes
> + * event 0 pending on that PE and dispatches the handler NMI-like,
> + * regardless of the target's DAIF.
> + * Availability is simply whether event 0 registers and enables -- if SDEI
> + * and its software-signalled event are present we use it, otherwise the
> + * driver stays inert.
> + */
> +
> +#define pr_fmt(fmt) "sdei_nmi: " fmt
> +
> +#include <linux/arm_sdei.h>
> +#include <linux/cpumask.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/kprobes.h>
> +#include <linux/nmi.h>
> +#include <linux/printk.h>
> +#include <linux/ptrace.h>
> +#include <linux/smp.h>
> +#include <linux/types.h>
> +
> +#include <asm/nmi.h>
> +#include <asm/smp_plat.h>
> +
> +static bool sdei_nmi_available;
> +
> +#define SDEI_NMI_EVENT                 0
> +
> +static int sdei_nmi_handler(u32 event, struct pt_regs *regs, void *arg)
> +{
> +       /*
> +        * nmi_cpu_backtrace() no-ops unless this CPU's bit is set in the
> +        * global backtrace mask (driven by nmi_trigger_cpumask_backtrace()),
> +        * so a fire that reaches a CPU not being backtraced is harmless.
> +        */
> +       nmi_cpu_backtrace(regs);
> +       return SDEI_EV_HANDLED;
> +}
> +NOKPROBE_SYMBOL(sdei_nmi_handler);
> +
> +static void sdei_nmi_fire(unsigned int target_cpu)
> +{
> +       int err = sdei_event_signal(SDEI_NMI_EVENT, cpu_logical_map(target_cpu));
> +
> +       if (err)
> +               pr_warn("SDEI_EVENT_SIGNAL to CPU %u failed: %d\n",
> +                       target_cpu, err);
> +}
> +
> +/*
> + * Raise callback for nmi_trigger_cpumask_backtrace(): signal event 0
> + * at every CPU still pending in @mask. The framework excludes the local
> + * CPU from @mask before calling us.
> + */
> +static void sdei_nmi_raise_backtrace(cpumask_t *mask)
> +{
> +       unsigned int cpu;
> +
> +       for_each_cpu(cpu, mask)
> +               sdei_nmi_fire(cpu);
> +}
> +
> +/*
> + * Override hook for arch_trigger_cpumask_backtrace() (see
> + * arch/arm64/kernel/smp.c). Returns true when SDEI handled the request,
> + * which is the case whenever SDEI is active; on a false return the arch
> + * falls back to its regular-IRQ (or pseudo-NMI, if enabled) IPI.
> + *
> + * On a kernel built without paying the pseudo-NMI hot-path cost (the
> + * usual case for this driver's target), the IPI can't reach a CPU that
> + * has interrupts masked -- so the backtrace of the one CPU you care
> + * about comes back empty. SDEI is dispatched out of EL3 and lands
> + * regardless of the target's DAIF, without taxing the IRQ-mask path.
> + */
> +bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
> +{
> +       if (!sdei_nmi_available)
> +               return false;
> +
> +       nmi_trigger_cpumask_backtrace(mask, exclude_cpu,
> +                                     sdei_nmi_raise_backtrace);
> +       return true;
> +}
> +
> +/*
> + * device_initcall (after arch_initcall(sdei_init), so the SDEI subsystem
> + * is up): probe the firmware, register the event, and turn on the
> + * cross-CPU service. If the probe fails the driver stays inert and the
> + * override hooks decline, leaving the arch's own paths in place.
> + */
> +static int __init sdei_nmi_init(void)
> +{
> +       int err;
> +
> +       err = sdei_event_register(SDEI_NMI_EVENT, sdei_nmi_handler, NULL);
> +       if (err) {
> +               pr_err("sdei_event_register(%u) failed: %d\n",
> +                      SDEI_NMI_EVENT, err);
> +               return 0;
> +       }

This initcall runs unconditionally whenever ARM_SDEI_NMI is built in,
which includes the many arm64 systems that have no SDEI at all. On
those, sdei_event_register() -> sdei_event_create() ->
invoke_sdei_fn() returns -EIO, and the core already complains:
    pr_warn("Failed to create event %u: %d\n", event_num, err);

(that one isn't gated on err != -EIO, unlike sdei_mask_local_cpu() & friends).
We then add a second pr_err() on top, so every boot on a non-SDEI box
with this config gets two alarming lines for what is just "no firmware
support". At minimum, don't shout for -EIO/-EOPNOTSUPP here. Better,
skip the probe when SDEI isn't present  there's no exported predicate
today, but -EIO is the de-facto one.

> +       err = sdei_event_enable(SDEI_NMI_EVENT);
> +       if (err) {
> +               pr_err("sdei_event_enable(%u) failed: %d\n",
> +                      SDEI_NMI_EVENT, err);
> +               sdei_event_unregister(SDEI_NMI_EVENT);
> +               return 0;
> +       }
> +
> +       sdei_nmi_available = true;
> +       pr_info("using SDEI cross-CPU NMI (SDEI_EVENT_SIGNAL, event %u)\n",
> +               SDEI_NMI_EVENT);
> +
> +       return 0;
> +}
> +device_initcall(sdei_nmi_init);
> --
> 2.54.0
>


^ permalink raw reply

* [GIT PULL] arm64 updates for 7.2
From: Will Deacon @ 2026-06-15 10:08 UTC (permalink / raw)
  To: torvalds
  Cc: catalin.marinas, linux-arm-kernel, linux-kernel, kernel-team, maz

Hi Linus,

Please pull these arm64 updates for 7.2. The usual summary is in the
tag, although it feels like the new world of AI tooling has slowed us
down a little on the feature side when compared to the fixes side. The
extra rounds of Sashiko review have also pushed a few things out until
next time.

Still, there's some good foundational stuff here for the fpsimd code and
hardening work towards removing the predictable linear alias of the
kernel image.

There's a small conflict with an upstream fix that landed post -rc1 via
the KVM/arm64 tree (837263307489 ("KVM: arm64: Correctly cap ZCR_EL2
provided by a guest hypervisor")). My resolution is below.

Cheers,

Will

diff --cc arch/arm64/kvm/hyp/include/hyp/switch.h
index e9b36a3b27bb,1f12c4ba295a..000000000000
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@@ -470,10 -466,9 +470,9 @@@ static inline void __hyp_sve_restore_gu
  	 * The vCPU's saved SVE state layout always matches the max VL of the
  	 * vCPU. Start off with the max VL so we can load the SVE state.
  	 */
 -	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
 +	sve_cond_update_zcr_vq(zcr_el2, SYS_ZCR_EL2);
- 	__sve_restore_state(vcpu_sve_pffr(vcpu),
- 			    &vcpu->arch.ctxt.fp_regs.fpsr,
- 			    true);
+ 	sve_load_state(kern_hyp_va(vcpu->arch.sve_state), true);
+ 	fpsimd_load_common(&vcpu->arch.ctxt.fp_regs);
  
  	/*
  	 * The effective VL for a VM could differ from the max VL when running a

--->8

The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:

  Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-upstream

for you to fetch changes up to 61c19a9feb1d87156e46e38d7759f3ad23710e24:

  Merge branch 'for-next/sysregs' into for-next/core (2026-06-14 12:18:27 +0100)

----------------------------------------------------------------
arm64 updates for 7.2

CPU errata handling:
- Extend CnP disabling workaround to HiSilicon HIP09 hardware.
- Work around eternally broken broadcast TLB invalidation on more CPUs.
- Documentation and code cleanups.

CPU features:
- Add new hwcaps for the 2025 dpISA extensions.

Floating point / SVE / SME:
- Significant cleanup to the low-level state management code in the core
  architecture code and KVM.
- Use correct register widths during SVE/SME save/restore assembly.
- Expose SVE/SME save/restore memory accesses to sanitisers.

Memory management:
- Preparatory work for unmapping the kernel data and bss sections from
  the linear map.

Miscellaneous:
- Inline DAIF manipulation helpers so they can be used safely from
  non-instrumentable code.
- Fix handling of the 'nosmp' cmdline option to avoid marking secondary
  cores as "possible".

MPAM:
- Add support for v0.1 of the MPAM architecture.

Perf:
- Update HiSilicon PMU MAINTAINERS entry.
- Fix event encodings for the DVM node in the CMN driver.

Selftests:
- Extend sigframe tests to cover POE context.
- Add coverage for the newly added 2025 dpISA hwcaps.

System registers:
- Add new registers and ESR encodings for the HDBSS feature.

Plus minor fixes and cleanups across the board.

----------------------------------------------------------------
Anshuman Khandual (2):
      arm64/mm: Replace BUG_ON() with VM_WARN_ON_ONCE()
      arm64/mm: Rename ptdesc_t

Ard Biesheuvel (20):
      arm64: mm: Remove bogus stop condition from map_mem() loop
      arm64: mm: Drop redundant pgd_t* argument from map_mem()
      arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
      arm64: mm: Preserve existing table mappings when mapping DRAM
      arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
      arm64: mm: Permit contiguous descriptors to be manipulated
      arm64: kfence: Avoid NOMAP tricks when mapping the early pool
      arm64: mm: Permit contiguous attribute for preliminary mappings
      arm64: Move fixmap and kasan page tables to end of kernel image
      arm64: mm: Don't abuse memblock NOMAP to check for overlaps
      powerpc/code-patching: Avoid r/w mapping of the zero page
      sh: Drop cache flush of the zero page at boot
      mm: Make empty_zero_page[] const
      arm64: mm: Map the kernel data/bss read-only in the linear map
      arm64: mm: Unmap kernel data/bss entirely from the linear map
      arm64: Rename page table BSS section to .bss..pgtbl
      kasan: Move generic KASAN page tables out of BSS too
      arm64: Avoid double evaluation of __ptep_get()
      KVM: arm64: Omit tag sync on stage-2 mappings of the zero page
      arm64: mm: Defer remap of linear alias of data/bss

Breno Leitao (1):
      arm64: arch_timer: reuse arch_timer_read_cnt{p,v}ct_el0() helpers

Ethan Nelson-Moore (1):
      ARM64: remove unnecessary architecture-specific <asm/device.h>

Jonathan Cameron (1):
      MAINTAINERS: Update HiSilicon PMU driver maintainer to Yushan Wang

Kevin Brodsky (4):
      selftests/mm: Fix resv_sz when parsing arm64 signal frame
      kselftest/arm64: Add POE as a feature in the signal tests
      kselftest/arm64: Move/add POE helpers to test_signals_utils.h
      kselftest/arm64: Add tests for POR_EL0 save/reset/restore

Krzysztof Kozlowski (1):
      perf: qcom: Unify user-visible "Qualcomm" name

Leonardo Bras (1):
      arm64/daifflags: Make local_daif_*() helpers __always_inline

Marco Elver (1):
      arm64: Implement _THIS_IP_ using inline asm

Mark Brown (3):
      arm64/cpufeature: Define hwcaps for 2025 dpISA features
      kselftest/arm64: Add 2025 dpISA coverage to hwcaps
      arm64: Document SVE constraints on new hwcaps

Mark Rutland (23):
      arm64: fpsimd: Fix type mismatch in sve_{save,load}_state()
      arm64: fpsimd: Fix type mismatch in sme_{save,load}_state()
      KVM: arm64: Don't include <asm/fpsimdmacros.h>
      KVM: arm64: Don't override FFR save/restore argument
      KVM: arm64: pkvm: Save host FPMR in host cpu context
      KVM: arm64: pkvm: Remove struct cpu_sve_state
      arm64: fpsimd: Fold sve_init_regs() into do_sve_acc()
      arm64: fpsimd: Remove sve_set_vq() and sme_set_vq()
      arm64: fpsimd: Use assembler for SVE instructions
      arm64: fpsimd: Use assembler for baseline SME instructions
      arm64: fpsimd: Move sve_get_vl() and sme_get_vl() inline
      arm64: sysreg: Add FPCR and FPSR
      arm64: fpsimd: Split FPSR/FPCR from SVE save/restore
      arm64: fpsimd: Move fpsimd save/restore inline
      arm64: fpsimd: Use opaque type for SVE state
      arm64: fpsimd: Use opaque type for SME state
      arm64: fpsimd: Move SVE save/restore inline
      arm64: fpsimd: Move sve_flush_live() inline
      arm64: fpsimd: Move SME save/restore inline
      arm64: fpsimd: Remove <asm/fpsimdmacros.h>
      arm64: cputype: Add C1-Ultra definitions
      arm64: cputype: Add C1-Premium definitions
      arm64: errata: Mitigate TLBI errata on various Arm CPUs

Osama Abdelkader (1):
      arm64: panic from init_IRQ if IRQ handler stacks cannot be allocated

Pengjie Zhang (1):
      arm64: smp: Do not mark secondary CPUs possible under nosmp

Robin Murphy (2):
      arm64: errata: Reformat table for IDs
      perf/arm-cmn: Fix DVM node events

Shanker Donthineni (1):
      arm64: errata: Mitigate TLBI errata on NVIDIA Olympus CPU

Thorsten Blum (2):
      arm64: proton-pack: use sysfs_emit in sysfs show functions
      arm64: patching: replace min_t with min in __text_poke

Will Deacon (12):
      Revert "arm64: mm: Defer remap of linear alias of data/bss"
      Revert "arm64: mm: Unmap kernel data/bss entirely from the linear map"
      arm64: errata: Mitigate TLBI errata on Microsoft Azure Cobalt 100 CPU
      Merge branch 'for-next/cpufeature' into for-next/core
      Merge branch 'for-next/errata' into for-next/core
      Merge branch 'for-next/fpsimd-cleanups' into for-next/core
      Merge branch 'for-next/misc' into for-next/core
      Merge branch 'for-next/mm' into for-next/core
      Merge branch 'for-next/mpam' into for-next/core
      Merge branch 'for-next/perf' into for-next/core
      Merge branch 'for-next/selftests' into for-next/core
      Merge branch 'for-next/sysregs' into for-next/core

Zeng Heng (4):
      arm64: cpufeature: Add support for the MPAM v0.1 architecture version
      arm_mpam: Update architecture version check for MPAM MSC
      arm64: cpufeature: Add WORKAROUND_DISABLE_CNP capability
      arm64: kernel: Disable CNP on HiSilicon HIP09

eillon (1):
      arm64/sysreg: Add HDBSS related register information

 Documentation/arch/arm64/elf_hwcaps.rst            |  27 ++
 Documentation/arch/arm64/silicon-errata.rst        |  93 +++--
 MAINTAINERS                                        |   2 +-
 arch/arm64/Kconfig                                 |  63 ++++
 arch/arm64/include/asm/arch_timer.h                |  12 +-
 arch/arm64/include/asm/cpucaps.h                   |   4 +-
 arch/arm64/include/asm/cpufeature.h                |   7 +
 arch/arm64/include/asm/cputype.h                   |   4 +
 arch/arm64/include/asm/daifflags.h                 |  10 +-
 arch/arm64/include/asm/device.h                    |  14 -
 arch/arm64/include/asm/el2_setup.h                 |   4 +-
 arch/arm64/include/asm/esr.h                       |   2 +
 arch/arm64/include/asm/fpsimd.h                    | 386 +++++++++++++++++++--
 arch/arm64/include/asm/fpsimdmacros.h              | 357 -------------------
 arch/arm64/include/asm/io.h                        |   2 +-
 arch/arm64/include/asm/kvm_host.h                  |  27 +-
 arch/arm64/include/asm/kvm_hyp.h                   |   5 -
 arch/arm64/include/asm/kvm_pkvm.h                  |   3 +-
 arch/arm64/include/asm/linkage.h                   |   4 +
 arch/arm64/include/asm/pgtable-types.h             |  14 +-
 arch/arm64/include/asm/pgtable.h                   |   4 +-
 arch/arm64/include/asm/processor.h                 |   7 +-
 arch/arm64/include/asm/ptdump.h                    |   8 +-
 arch/arm64/include/asm/tlbflush.h                  |   4 +-
 arch/arm64/include/uapi/asm/hwcap.h                |   8 +
 arch/arm64/kernel/Makefile                         |   2 +-
 arch/arm64/kernel/cpu_errata.c                     |  55 ++-
 arch/arm64/kernel/cpufeature.c                     |  28 +-
 arch/arm64/kernel/cpuinfo.c                        |   8 +
 arch/arm64/kernel/efi.c                            |   4 +-
 arch/arm64/kernel/entry-common.c                   |   8 +-
 arch/arm64/kernel/entry-fpsimd.S                   | 134 -------
 arch/arm64/kernel/fpsimd.c                         |  90 +++--
 arch/arm64/kernel/irq.c                            |  29 +-
 arch/arm64/kernel/patching.c                       |   3 +-
 arch/arm64/kernel/pi/map_kernel.c                  |   2 +-
 arch/arm64/kernel/pi/map_range.c                   |   4 +-
 arch/arm64/kernel/pi/pi.h                          |   2 +-
 arch/arm64/kernel/proton-pack.c                    |  17 +-
 arch/arm64/kernel/smp.c                            |  14 +-
 arch/arm64/kernel/vmlinux.lds.S                    |   8 +-
 arch/arm64/kvm/arm.c                               |  16 +-
 arch/arm64/kvm/guest.c                             |   4 +-
 arch/arm64/kvm/hyp/entry.S                         |   1 -
 arch/arm64/kvm/hyp/fpsimd.S                        |  33 --
 arch/arm64/kvm/hyp/include/hyp/switch.h            |  23 +-
 arch/arm64/kvm/hyp/nvhe/Makefile                   |   2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c                 |  20 +-
 arch/arm64/kvm/hyp/nvhe/setup.c                    |   4 +-
 arch/arm64/kvm/hyp/vhe/Makefile                    |   2 +-
 arch/arm64/kvm/mmu.c                               |   5 +
 arch/arm64/mm/fixmap.c                             |   6 +-
 arch/arm64/mm/kasan_init.c                         |   2 +-
 arch/arm64/mm/mmap.c                               |   4 +-
 arch/arm64/mm/mmu.c                                | 145 ++++----
 arch/arm64/mm/pageattr.c                           |   2 +-
 arch/arm64/mm/ptdump.c                             |   2 +-
 arch/arm64/tools/cpucaps                           |   2 +-
 arch/arm64/tools/sysreg                            |  74 ++++
 arch/powerpc/lib/code-patching.c                   |  52 +--
 arch/sh/mm/init.c                                  |   3 -
 drivers/perf/Kconfig                               |   4 +-
 drivers/perf/arm-cmn.c                             |  23 +-
 drivers/resctrl/mpam_devices.c                     |  23 +-
 include/linux/linkage.h                            |   4 +
 include/linux/pgtable.h                            |   2 +-
 mm/kasan/init.c                                    |  10 +-
 mm/mm_init.c                                       |   2 +-
 tools/testing/selftests/arm64/abi/hwcap.c          | 116 +++++++
 .../testing/selftests/arm64/signal/test_signals.h  |   2 +
 .../selftests/arm64/signal/test_signals_utils.c    |   3 +
 .../selftests/arm64/signal/test_signals_utils.h    |  16 +
 .../signal/testcases/poe_missing_poe_context.c     |  73 ++++
 .../selftests/arm64/signal/testcases/poe_restore.c |  64 ++++
 .../selftests/arm64/signal/testcases/poe_siginfo.c |  15 -
 tools/testing/selftests/mm/pkey-arm64.h            |   3 +-
 76 files changed, 1275 insertions(+), 966 deletions(-)
 delete mode 100644 arch/arm64/include/asm/device.h
 delete mode 100644 arch/arm64/include/asm/fpsimdmacros.h
 delete mode 100644 arch/arm64/kernel/entry-fpsimd.S
 delete mode 100644 arch/arm64/kvm/hyp/fpsimd.S
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_missing_poe_context.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_restore.c


^ permalink raw reply

* Re: [PATCH v2] arm64/irqflags: __always_inline the arch_local_irq_*() helpers
From: Breno Leitao @ 2026-06-15 10:07 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, leo.bras, leo.yan, linux-arm-kernel,
	linux-kernel, palmer, paulmck, puranjay, usama.arif, rmikey,
	kernel-team
In-Reply-To: <ae9f6_MSCMfgG8VT@J2N7QTR9R3.cambridge.arm.com>

On Mon, Apr 27, 2026 at 02:08:59PM +0100, Mark Rutland wrote:
> On Mon, Apr 27, 2026 at 01:26:18PM +0100, Catalin Marinas wrote:
> > On Tue, Apr 21, 2026 at 08:58:57AM -0700, Breno Leitao wrote:
> > > Force-inline all of the arch_local_irq_*() wrappers so they cannot be
> > > emitted out-of-line:
> > > 
> > >   - arch_local_irq_enable()
> > >   - arch_local_irq_disable()
> > >   - arch_local_save_flags()
> > >   - arch_irqs_disabled_flags()
> > >   - arch_irqs_disabled()
> > >   - arch_local_irq_save()
> > >   - arch_local_irq_restore()
> > 
> > I'll queue this, thanks!
> > 
> > I think we should also do local_daif_{mask,restore,inherit} as they seem
> > to be called from noinstr locations in entry-common.c.
> 
> I agree we probably should mark those as __always_inline, but I beleive
> they're safe as-is.

Returning to this topic, I've also observed arch_counter_get_cntpct()
appearing in backtraces during lock spinning, which obscures the actual
code location and makes debugging less straightforward.

This scenario is common at Meta when a remote function call targets a
stuck CPU, resulting in an infinite loop, and dying at arch_counter_get_cntpc().

Is it worth making arch_counter_get_cntpc() always inline as well?


 [     T18] Sending NMI from CPU 1 to CPUs 18:
 [     C18] NMI backtrace for cpu 18
 [     C18] CPU: 18 UID: 0 PID: 7467 Comm: dynoKernelMon Kdump: loaded Not tainted 6.16.1-0_fbk3_0_gd6c130b80483 #1 NONE
 [     C18] pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
 [     C18] pc : arch_counter_get_cntpct+0x14/0x18
 [     C18] lr : ktime_get_mono_fast_ns+0x5c/0x1a8
 [     C18] sp : ffff8000d43afa30
 [     C18] x29: ffff8000d43afa30 x28: 00000000000f4240 x27: 0000000000000000
 [     C18] x26: 0000028ac8fe4369 x25: 0000000000001388 x24: 000000000054fb54
 [     C18] x23: ffff800082fbd000 x22: 0000028f22c480dd x21: ffff800081b60000
 [     C18] x20: ffff800080c71058 x19: ffff800082fb8848 x18: 0000000000000018
 [     C18] x17: 0000000000000058 x16: 00000000ffffffff x15: 00000000ffff0000
 [     C18] x14: 0000000000000002 x13: 0000000000000003 x12: 000000000000ffd8
 [     C18] x11: ffff8000d43affd8 x10: 0000000000000038 x9 : 0000000000000000
 [     C18] x8 : ffff8000d43afa30 x7 : ff7fff7f7f7f7f7f x6 : 000000000000000a
 [     C18] x5 : ffff8000832772ba x4 : 0000000000000000 x3 : 0000000000000000
 [     C18] x2 : 0000000000000000 x1 : 000000000014b4c0 x0 : 000002a87573b7a0
 [     C18] Call trace:
 [     C18]  arch_counter_get_cntpct+0x14/0x18 (P)
 [     C18]  smp_call_function_single+0x2d0/0xd90
 [     C18]  event_function_call+0x80/0x2e0
 [     C18]  _perf_event_disable+0x5c/0xf0
 [     C18]  perf_ioctl+0x170/0x890
 [     C18]  __arm64_sys_ioctl+0xbb0/0xd28
 [     C18]  invoke_syscall+0x4c/0xd0
 [     C18]  do_el0_svc+0x80/0xb8
 [     C18]  el0_svc+0x30/0xf0
 [     C18]  el0t_64_sync_handler+0x70/0x100
 [     C18]  el0t_64_sync+0x17c/0x180


Thanks,
--breno


^ permalink raw reply

* Re: [PATCH] KVM: arm64: Sync SPSR_EL1 when injecting an exception into a pVM
From: Will Deacon @ 2026-06-15 10:05 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, linux-arm-kernel, kvmarm,
	linux-kernel, Joey Gouly, Steffen Eiden, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Sascha Bischoff, Andrew Jones
In-Reply-To: <20260612113414.1022901-1-tabba@google.com>

On Fri, Jun 12, 2026 at 12:34:14PM +0100, Fuad Tabba wrote:
> When pKVM injects a synchronous exception into a protected guest, it
> re-enters without restoring the guest's EL1 sysregs and writes the EL1
> exception registers to hardware by hand: ESR_EL1 and ELR_EL1, but not
> SPSR_EL1. enter_exception64() sets SPSR_EL1 (the interrupted PSTATE)
> only in memory, so the guest's handler reads a stale SPSR_EL1 and
> restores the wrong PSTATE on eret.
> 
> Write SPSR_EL1 alongside the other exception registers.
> 
> Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers")
> Reported-by: sashiko <sashiko@sashiko.dev>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/sys_regs.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
> index 8c3fbb413a06..1a7d5cd16d72 100644
> --- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
> +++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
> @@ -268,6 +268,7 @@ static void inject_sync64(struct kvm_vcpu *vcpu, u64 esr)
>  
>  	write_sysreg_el1(esr, SYS_ESR);
>  	write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
> +	write_sysreg_el1(read_sysreg_el2(SYS_SPSR), SYS_SPSR);
>  	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
>  	write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
>  }

Is SPSR_EL1 not set in enter_exception64() using vcpu_cpsr(vcpu), which
*is* set here? I'm just a bit wary of the report, as I'd have expected
fireworks if we weren't initialising the guest's SPSR on the exception
injection path.

Will


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox