* [net-next RFC PATCH v5 08/10] dt-bindings: net: pcs: Document support for Airoha Ethernet PCS
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Document support for Airoha Ethernet PCS for AN7581 SoC.
Airoha AN7581 SoC expose multiple Physical Coding Sublayer (PCS) for
the various Serdes port supporting different Media Independent Interface
(10BASE-R, USXGMII, 2500BASE-X, 1000BASE-X, SGMII).
This follow the new PCS provider with the use of #pcs-cells property.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
.../bindings/net/pcs/airoha,pcs.yaml | 112 ++++++++++++++++++
1 file changed, 112 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/pcs/airoha,pcs.yaml
diff --git a/Documentation/devicetree/bindings/net/pcs/airoha,pcs.yaml b/Documentation/devicetree/bindings/net/pcs/airoha,pcs.yaml
new file mode 100644
index 000000000000..8bcf7757c728
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/pcs/airoha,pcs.yaml
@@ -0,0 +1,112 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/pcs/airoha,pcs.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Airoha Ethernet PCS and Serdes
+
+maintainers:
+ - Christian Marangi <ansuelsmth@gmail.com>
+
+description:
+ Airoha AN7581 SoC expose multiple Physical Coding Sublayer (PCS) for
+ the various Serdes port supporting different Media Independent Interface
+ (10BASE-R, USXGMII, 2500BASE-X, 1000BASE-X, SGMII).
+
+properties:
+ compatible:
+ enum:
+ - airoha,an7581-pcs-eth
+ - airoha,an7581-pcs-pon
+
+ reg:
+ items:
+ - description: XFI MAC reg
+ - description: HSGMII AN reg
+ - description: HSGMII PCS reg
+ - description: MULTI SGMII reg
+ - description: USXGMII reg
+ - description: HSGMII rate adaption reg
+ - description: XFI Analog register
+ - description: XFI PMA (Physical Medium Attachment) register
+
+ reg-names:
+ items:
+ - const: xfi_mac
+ - const: hsgmii_an
+ - const: hsgmii_pcs
+ - const: multi_sgmii
+ - const: usxgmii
+ - const: hsgmii_rate_adp
+ - const: xfi_ana
+ - const: xfi_pma
+
+ resets:
+ items:
+ - description: MAC reset
+ - description: PHY reset
+
+ reset-names:
+ items:
+ - const: mac
+ - const: phy
+
+ "#pcs-cells":
+ const: 0
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - resets
+ - reset-names
+ - "#pcs-cells"
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/reset/airoha,en7581-reset.h>
+
+ pcs@1fa08000 {
+ compatible = "airoha,an7581-pcs-pon";
+ reg = <0x1fa08000 0x1000>,
+ <0x1fa80000 0x60>,
+ <0x1fa80a00 0x164>,
+ <0x1fa84000 0x450>,
+ <0x1fa85900 0x338>,
+ <0x1fa86000 0x300>,
+ <0x1fa8a000 0x1000>,
+ <0x1fa8b000 0x1000>;
+ reg-names = "xfi_mac", "hsgmii_an", "hsgmii_pcs",
+ "multi_sgmii", "usxgmii",
+ "hsgmii_rate_adp", "xfi_ana", "xfi_pma";
+
+ resets = <&scuclk EN7581_XPON_MAC_RST>,
+ <&scuclk EN7581_XPON_PHY_RST>;
+ reset-names = "mac", "phy";
+
+ #pcs-cells = <0>;
+ };
+
+ pcs@1fa09000 {
+ compatible = "airoha,an7581-pcs-eth";
+ reg = <0x1fa09000 0x1000>,
+ <0x1fa70000 0x60>,
+ <0x1fa70a00 0x164>,
+ <0x1fa74000 0x450>,
+ <0x1fa75900 0x338>,
+ <0x1fa76000 0x300>,
+ <0x1fa7a000 0x1000>,
+ <0x1fa7b000 0x1000>;
+ reg-names = "xfi_mac", "hsgmii_an", "hsgmii_pcs",
+ "multi_sgmii", "usxgmii",
+ "hsgmii_rate_adp", "xfi_ana", "xfi_pma";
+
+ resets = <&scuclk EN7581_XSI_MAC_RST>,
+ <&scuclk EN7581_XSI_PHY_RST>;
+ reset-names = "mac", "phy";
+
+ #pcs-cells = <0>;
+ };
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 07/10] net: phylink: add .pcs_link_down PCS OP
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Permit for PCS driver to define specific operation to torn down the link
between the MAC and the PCS.
This might be needed for some PCS that reset counter or require special
reset to correctly work if the link needs to be restored later.
On phylink_link_down() call, the additional phylink_pcs_link_down() will
be called before .mac_link_down to torn down the link.
PCS driver will need to define .pcs_link_down to make use of this.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/phy/phylink.c | 8 ++++++++
include/linux/phylink.h | 2 ++
2 files changed, 10 insertions(+)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 4ef31bb4eb05..ef2306840c5b 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1036,6 +1036,12 @@ static void phylink_pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode,
pcs->ops->pcs_link_up(pcs, neg_mode, interface, speed, duplex);
}
+static void phylink_pcs_link_down(struct phylink_pcs *pcs)
+{
+ if (pcs && pcs->ops->pcs_link_down)
+ pcs->ops->pcs_link_down(pcs);
+}
+
static void phylink_pcs_disable_eee(struct phylink_pcs *pcs)
{
if (pcs && pcs->ops->pcs_disable_eee)
@@ -1735,6 +1741,8 @@ static void phylink_link_down(struct phylink *pl)
phylink_deactivate_lpi(pl);
+ phylink_pcs_link_down(pl->pcs);
+
pl->mac_ops->mac_link_down(pl->config, pl->act_link_an_mode,
pl->cur_interface);
phylink_info(pl, "Link is Down\n");
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index aef4a4fcf6e5..aefdca01b77b 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -523,6 +523,7 @@ struct phylink_pcs {
* @pcs_an_restart: restart 802.3z BaseX autonegotiation.
* @pcs_link_up: program the PCS for the resolved link configuration
* (where necessary).
+ * @pcs_link_down: torn down link between MAC and PCS.
* @pcs_disable_eee: optional notification to PCS that EEE has been disabled
* at the MAC.
* @pcs_enable_eee: optional notification to PCS that EEE will be enabled at
@@ -550,6 +551,7 @@ struct phylink_pcs_ops {
void (*pcs_an_restart)(struct phylink_pcs *pcs);
void (*pcs_link_up)(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface, int speed, int duplex);
+ void (*pcs_link_down)(struct phylink_pcs *pcs);
void (*pcs_disable_eee)(struct phylink_pcs *pcs);
void (*pcs_enable_eee)(struct phylink_pcs *pcs);
int (*pcs_pre_init)(struct phylink_pcs *pcs);
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 06/10] dt-bindings: net: ethernet-controller: permit to define multiple PCS
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Drop the limitation of a single PCS in pcs-handle property. Multiple PCS
can be defined for an ethrnet-controller node to support various PHY
interface mode type.
It's very common for SoCs to have a 2 or more dedicated PCS for Base-X
(for example SGMII, 1000base-x, 2500base-x, ...) and Base-R (for example
USXGMII,10base-r, ...) with the MAC selecting one of the other based on
the attached PHY.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
Documentation/devicetree/bindings/net/ethernet-controller.yaml | 2 --
1 file changed, 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml b/Documentation/devicetree/bindings/net/ethernet-controller.yaml
index 1bafd687dcb1..51a8a418955d 100644
--- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml
@@ -85,8 +85,6 @@ properties:
pcs-handle:
$ref: /schemas/types.yaml#/definitions/phandle-array
- items:
- maxItems: 1
description:
Specifies a reference to a node representing a PCS PHY device on a MDIO
bus to link with an external PHY (phy-handle) if exists.
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 05/10] net: phylink: support late PCS provider attach
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Add support for late PCS provider attachment to a phylink instance.
This works by creating a global notifier for the PCS provider and
making each phylink instance that makes use of fwnode subscribe to
this notifier.
The PCS notifier will emit the event FWNODE_PCS_PROVIDER_ADD every time
a new PCS provider is added.
phylink will then react to this event and will call the new function
fwnode_phylink_pcs_get_from_fwnode() that will check if the PCS fwnode
provided by the event is present in the phy-handle property of the
phylink instance.
If a related PCS is found, then such PCS is added to the phylink
instance PCS list.
Then we link the PCS to the phylink instance if it's not disable and we
refresh the supported interfaces of the phylink instance.
Finally we check if we are in a major_config_failed scenario and trigger
an interface reconfiguration in the next phylink resolve.
In the example scenario where the link was previously torn down due to
removal of PCS, the link will be established again as the PCS came back
and is now available to phylink.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/pcs/pcs.c | 40 ++++++++++++++++++++++++++++++
drivers/net/phy/phylink.c | 52 +++++++++++++++++++++++++++++++++++++++
include/linux/pcs/pcs.h | 48 ++++++++++++++++++++++++++++++++++++
3 files changed, 140 insertions(+)
diff --git a/drivers/net/pcs/pcs.c b/drivers/net/pcs/pcs.c
index 72f56f55d198..911d95cf1b09 100644
--- a/drivers/net/pcs/pcs.c
+++ b/drivers/net/pcs/pcs.c
@@ -22,6 +22,13 @@ struct fwnode_pcs_provider {
static LIST_HEAD(fwnode_pcs_providers);
static DEFINE_MUTEX(fwnode_pcs_mutex);
+static BLOCKING_NOTIFIER_HEAD(fwnode_pcs_notify_list);
+
+int register_fwnode_pcs_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&fwnode_pcs_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(register_fwnode_pcs_notifier);
struct phylink_pcs *fwnode_pcs_simple_get(struct fwnode_reference_args *pcsspec,
void *data)
@@ -55,6 +62,10 @@ int fwnode_pcs_add_provider(struct fwnode_handle *fwnode,
fwnode_dev_initialized(fwnode, true);
+ blocking_notifier_call_chain(&fwnode_pcs_notify_list,
+ FWNODE_PCS_PROVIDER_ADD,
+ fwnode);
+
return 0;
}
EXPORT_SYMBOL_GPL(fwnode_pcs_add_provider);
@@ -147,6 +158,35 @@ struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode, int index)
}
EXPORT_SYMBOL_GPL(fwnode_pcs_get);
+struct phylink_pcs *
+fwnode_phylink_pcs_get_from_fwnode(struct fwnode_handle *fwnode,
+ struct fwnode_handle *pcs_fwnode)
+{
+ struct fwnode_reference_args pcsspec;
+ int index = 0;
+ int ret;
+
+ /* Loop until we find a matching PCS node or
+ * fwnode_parse_pcsspec() returns error
+ * if we don't have any other PCS reference to check.
+ */
+ while (true) {
+ ret = fwnode_parse_pcsspec(fwnode, index, NULL, &pcsspec);
+ if (ret)
+ return ERR_PTR(ret);
+
+ /* Exit loop if we found the matching PCS node */
+ if (pcsspec.fwnode == pcs_fwnode)
+ break;
+
+ /* Check the next PCS reference */
+ index++;
+ }
+
+ return fwnode_pcs_get(fwnode, index);
+}
+EXPORT_SYMBOL_GPL(fwnode_phylink_pcs_get_from_fwnode);
+
static int fwnode_phylink_pcs_count(struct fwnode_handle *fwnode,
unsigned int *num_pcs)
{
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index ca4f4f655a31..4ef31bb4eb05 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -12,6 +12,7 @@
#include <linux/netdevice.h>
#include <linux/of.h>
#include <linux/of_mdio.h>
+#include <linux/pcs/pcs.h>
#include <linux/phy.h>
#include <linux/phy_fixed.h>
#include <linux/phylink.h>
@@ -62,6 +63,7 @@ struct phylink {
/* List of available PCS */
struct list_head pcs_list;
+ struct notifier_block fwnode_pcs_nb;
/* What interface are supported by the current link.
* Can change on removal or addition of new PCS.
@@ -1960,6 +1962,51 @@ int phylink_set_fixed_link(struct phylink *pl,
}
EXPORT_SYMBOL_GPL(phylink_set_fixed_link);
+static int pcs_provider_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct phylink *pl = container_of(self, struct phylink, fwnode_pcs_nb);
+ struct fwnode_handle *pcs_fwnode = data;
+ struct phylink_pcs *pcs;
+
+ /* Check if the just added PCS provider is
+ * in the phylink instance phy-handle property
+ */
+ pcs = fwnode_phylink_pcs_get_from_fwnode(dev_fwnode(pl->config->dev),
+ pcs_fwnode);
+ if (IS_ERR(pcs))
+ return NOTIFY_DONE;
+
+ /* Add the PCS */
+ rtnl_lock();
+
+ list_add(&pcs->list, &pl->pcs_list);
+
+ /* Link phylink if we are started */
+ if (!pl->phylink_disable_state)
+ pcs->phylink = pl;
+
+ /* Refresh supported interfaces */
+ phy_interface_copy(pl->supported_interfaces,
+ pl->config->supported_interfaces);
+ list_for_each_entry(pcs, &pl->pcs_list, list)
+ phy_interface_or(pl->supported_interfaces,
+ pl->supported_interfaces,
+ pcs->supported_interfaces);
+
+ mutex_lock(&pl->state_mutex);
+ /* Force an interface reconfig if major config fail */
+ if (pl->major_config_failed)
+ pl->force_major_config = true;
+ mutex_unlock(&pl->state_mutex);
+
+ rtnl_unlock();
+
+ phylink_run_resolve(pl);
+
+ return NOTIFY_OK;
+}
+
/**
* phylink_create() - create a phylink instance
* @config: a pointer to the target &struct phylink_config
@@ -2015,6 +2062,11 @@ struct phylink *phylink_create(struct phylink_config *config,
pl->supported_interfaces,
pcs->supported_interfaces);
+ if (!phy_interface_empty(config->pcs_interfaces)) {
+ pl->fwnode_pcs_nb.notifier_call = pcs_provider_notify;
+ register_fwnode_pcs_notifier(&pl->fwnode_pcs_nb);
+ }
+
pl->config = config;
if (config->type == PHYLINK_NETDEV) {
pl->netdev = to_net_dev(config->dev);
diff --git a/include/linux/pcs/pcs.h b/include/linux/pcs/pcs.h
index 33244e3a442b..dfd3dc0f86f6 100644
--- a/include/linux/pcs/pcs.h
+++ b/include/linux/pcs/pcs.h
@@ -4,7 +4,24 @@
#include <linux/phylink.h>
+enum fwnode_pcs_notify_event {
+ FWNODE_PCS_PROVIDER_ADD,
+};
+
#if IS_ENABLED(CONFIG_FWNODE_PCS)
+/**
+ * register_fwnode_pcs_notifier - Register a notifier block for fwnode
+ * PCS events
+ * @nb: pointer to the notifier block
+ *
+ * Registers a notifier block to the fwnode_pcs_notify_list blocking
+ * notifier chain. This allows phylink instance to subscribe for
+ * PCS provider events.
+ *
+ * Returns: 0 or a negative error.
+ */
+int register_fwnode_pcs_notifier(struct notifier_block *nb);
+
/**
* fwnode_pcs_get - Retrieves a PCS from a firmware node
* @fwnode: firmware node
@@ -20,6 +37,25 @@
struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode,
int index);
+/**
+ * fwnode_phylink_pcs_get_from_fwnode - Retrieves the PCS provided
+ * by the firmware node from a
+ * firmware node
+ * @fwnode: firmware node
+ * @pcs_fwnode: PCS firmware node
+ *
+ * Parse 'pcs-handle' in 'fwnode' and get the PCS that match
+ * 'pcs_fwnode' firmware node.
+ *
+ * Returns: a pointer to the phylink_pcs or a negative
+ * error pointer. Can return -EPROBE_DEFER if the PCS is not
+ * present in global providers list (either due to driver
+ * still needs to be probed or it failed to probe/removed)
+ */
+struct phylink_pcs *
+fwnode_phylink_pcs_get_from_fwnode(struct fwnode_handle *fwnode,
+ struct fwnode_handle *pcs_fwnode);
+
/**
* fwnode_phylink_pcs_parse - generic PCS parse for fwnode PCS provider
* @fwnode: firmware node
@@ -39,12 +75,24 @@ int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
struct phylink_pcs **available_pcs,
unsigned int *num_pcs);
#else
+static inline int register_fwnode_pcs_notifier(struct notifier_block *nb)
+{
+ return -EOPNOTSUPP;
+}
+
static inline struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode,
int index)
{
return ERR_PTR(-ENOENT);
}
+static inline struct phylink_pcs *
+fwnode_phylink_pcs_get_from_fwnode(struct fwnode_handle *fwnode,
+ struct fwnode_handle *pcs_fwnode)
+{
+ return ERR_PTR(-ENOENT);
+}
+
static inline int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
struct phylink_pcs **available_pcs,
unsigned int *num_pcs)
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 04/10] net: pcs: implement Firmware node support for PCS driver
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Implement the foundation of Firmware node support for PCS driver.
To support this, implement a simple Provider API where a PCS driver can
expose multiple PCS with an xlate .get function.
PCS driver will have to call fwnode_pcs_add_provider() and pass the
firmware node pointer and a xlate function to return the correct PCS for
the passed #pcs-cells.
This will register the PCS in a global list of providers so that
consumer can access it.
The consumer will then use fwnode_pcs_get() to get the actual PCS by
passing the firmware node pointer and the index for #pcs-cells.
For a simple implementation where #pcs-cells is 0 and the PCS driver
expose a single PCS, the xlate function fwnode_pcs_simple_get() is
provided.
For an advanced implementation a custom xlate function is required.
One removal the PCS driver should first delete itself from the provider
list using fwnode_pcs_del_provider() and then call phylink_release_pcs()
on every PCS the driver provides.
A generic function fwnode_phylink_pcs_parse() is provided for MAC driver
that will declare PCS in DT (or ACPI).
This function will parse "pcs-handle" property and fill the passed array
with the parsed PCS in available_pcs up to the passed num_pcs value.
It's also possible to pass NULL as array to only parse the PCS and
update the num_pcs value with the count of scanned PCS.
Co-developed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/pcs/Kconfig | 6 +
drivers/net/pcs/Makefile | 1 +
drivers/net/pcs/pcs.c | 201 +++++++++++++++++++++++++++++++
include/linux/pcs/pcs-provider.h | 41 +++++++
include/linux/pcs/pcs.h | 56 +++++++++
5 files changed, 305 insertions(+)
create mode 100644 drivers/net/pcs/pcs.c
create mode 100644 include/linux/pcs/pcs-provider.h
create mode 100644 include/linux/pcs/pcs.h
diff --git a/drivers/net/pcs/Kconfig b/drivers/net/pcs/Kconfig
index e417fd66f660..874de743b22c 100644
--- a/drivers/net/pcs/Kconfig
+++ b/drivers/net/pcs/Kconfig
@@ -5,6 +5,12 @@
menu "PCS device drivers"
+config FWNODE_PCS
+ tristate
+ depends on (ACPI || OF)
+ help
+ Firmware node PCS accessors
+
config PCS_XPCS
tristate "Synopsys DesignWare Ethernet XPCS"
select PHYLINK
diff --git a/drivers/net/pcs/Makefile b/drivers/net/pcs/Makefile
index 4f7920618b90..3005cdd89ab7 100644
--- a/drivers/net/pcs/Makefile
+++ b/drivers/net/pcs/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for Linux PCS drivers
+obj-$(CONFIG_FWNODE_PCS) += pcs.o
pcs_xpcs-$(CONFIG_PCS_XPCS) := pcs-xpcs.o pcs-xpcs-plat.o \
pcs-xpcs-nxp.o pcs-xpcs-wx.o
diff --git a/drivers/net/pcs/pcs.c b/drivers/net/pcs/pcs.c
new file mode 100644
index 000000000000..72f56f55d198
--- /dev/null
+++ b/drivers/net/pcs/pcs.c
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/mutex.h>
+#include <linux/property.h>
+#include <linux/phylink.h>
+#include <linux/pcs/pcs.h>
+#include <linux/pcs/pcs-provider.h>
+
+MODULE_DESCRIPTION("PCS library");
+MODULE_AUTHOR("Christian Marangi <ansuelsmth@gmail.com>");
+MODULE_LICENSE("GPL");
+
+struct fwnode_pcs_provider {
+ struct list_head link;
+
+ struct fwnode_handle *fwnode;
+ struct phylink_pcs *(*get)(struct fwnode_reference_args *pcsspec,
+ void *data);
+
+ void *data;
+};
+
+static LIST_HEAD(fwnode_pcs_providers);
+static DEFINE_MUTEX(fwnode_pcs_mutex);
+
+struct phylink_pcs *fwnode_pcs_simple_get(struct fwnode_reference_args *pcsspec,
+ void *data)
+{
+ return data;
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_simple_get);
+
+int fwnode_pcs_add_provider(struct fwnode_handle *fwnode,
+ struct phylink_pcs *(*get)(struct fwnode_reference_args *pcsspec,
+ void *data),
+ void *data)
+{
+ struct fwnode_pcs_provider *pp;
+
+ if (!fwnode)
+ return 0;
+
+ pp = kzalloc_obj(*pp);
+ if (!pp)
+ return -ENOMEM;
+
+ pp->fwnode = fwnode_handle_get(fwnode);
+ pp->data = data;
+ pp->get = get;
+
+ mutex_lock(&fwnode_pcs_mutex);
+ list_add(&pp->link, &fwnode_pcs_providers);
+ mutex_unlock(&fwnode_pcs_mutex);
+ pr_debug("Added pcs provider from %pfwf\n", fwnode);
+
+ fwnode_dev_initialized(fwnode, true);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_add_provider);
+
+void fwnode_pcs_del_provider(struct fwnode_handle *fwnode)
+{
+ struct fwnode_pcs_provider *pp;
+
+ if (!fwnode)
+ return;
+
+ mutex_lock(&fwnode_pcs_mutex);
+ list_for_each_entry(pp, &fwnode_pcs_providers, link) {
+ if (pp->fwnode == fwnode) {
+ list_del(&pp->link);
+ fwnode_dev_initialized(pp->fwnode, false);
+ fwnode_handle_put(pp->fwnode);
+ kfree(pp);
+ break;
+ }
+ }
+ mutex_unlock(&fwnode_pcs_mutex);
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_del_provider);
+
+static int fwnode_parse_pcsspec(const struct fwnode_handle *fwnode, int index,
+ const char *name,
+ struct fwnode_reference_args *out_args)
+{
+ int ret;
+
+ if (!fwnode)
+ return -ENOENT;
+
+ if (name)
+ index = fwnode_property_match_string(fwnode, "pcs-names",
+ name);
+
+ ret = fwnode_property_get_reference_args(fwnode, "pcs-handle",
+ "#pcs-cells",
+ -1, index, out_args);
+ if (ret || (name && index < 0))
+ return ret;
+
+ return 0;
+}
+
+static struct phylink_pcs *
+fwnode_pcs_get_from_pcsspec(struct fwnode_reference_args *pcsspec)
+{
+ struct fwnode_pcs_provider *provider;
+ struct phylink_pcs *pcs = ERR_PTR(-ENODEV);
+
+ if (!pcsspec)
+ return ERR_PTR(-EINVAL);
+
+ mutex_lock(&fwnode_pcs_mutex);
+ list_for_each_entry(provider, &fwnode_pcs_providers, link) {
+ if (provider->fwnode == pcsspec->fwnode) {
+ pcs = provider->get(pcsspec, provider->data);
+ if (!IS_ERR(pcs))
+ break;
+ }
+ }
+ mutex_unlock(&fwnode_pcs_mutex);
+
+ return pcs;
+}
+
+static struct phylink_pcs *__fwnode_pcs_get(struct fwnode_handle *fwnode,
+ int index, const char *con_id)
+{
+ struct fwnode_reference_args pcsspec;
+ struct phylink_pcs *pcs;
+ int ret;
+
+ ret = fwnode_parse_pcsspec(fwnode, index, con_id, &pcsspec);
+ if (ret)
+ return ERR_PTR(ret);
+
+ pcs = fwnode_pcs_get_from_pcsspec(&pcsspec);
+ fwnode_handle_put(pcsspec.fwnode);
+
+ return pcs;
+}
+
+struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode, int index)
+{
+ return __fwnode_pcs_get(fwnode, index, NULL);
+}
+EXPORT_SYMBOL_GPL(fwnode_pcs_get);
+
+static int fwnode_phylink_pcs_count(struct fwnode_handle *fwnode,
+ unsigned int *num_pcs)
+{
+ struct fwnode_reference_args out_args;
+ int index = 0;
+ int ret;
+
+ while (true) {
+ ret = fwnode_property_get_reference_args(fwnode, "pcs-handle",
+ "#pcs-cells",
+ -1, index, &out_args);
+ /* We expect to reach an -ENOENT error while counting */
+ if (ret)
+ break;
+
+ fwnode_handle_put(out_args.fwnode);
+ index++;
+ }
+
+ /* Update num_pcs with parsed PCS */
+ *num_pcs = index;
+
+ /* Return error if we didn't found any PCS */
+ return index > 0 ? 0 : -ENOENT;
+}
+
+int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
+ struct phylink_pcs **available_pcs,
+ unsigned int *num_pcs)
+{
+ int i;
+
+ if (!fwnode_property_present(fwnode, "pcs-handle"))
+ return -ENODEV;
+
+ /* With available_pcs NULL, only count the PCS */
+ if (!available_pcs)
+ return fwnode_phylink_pcs_count(fwnode, num_pcs);
+
+ for (i = 0; i < *num_pcs; i++) {
+ struct phylink_pcs *pcs;
+
+ pcs = fwnode_pcs_get(fwnode, i);
+ if (IS_ERR(pcs))
+ return PTR_ERR(pcs);
+
+ available_pcs[i] = pcs;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fwnode_phylink_pcs_parse);
diff --git a/include/linux/pcs/pcs-provider.h b/include/linux/pcs/pcs-provider.h
new file mode 100644
index 000000000000..ae51c108147e
--- /dev/null
+++ b/include/linux/pcs/pcs-provider.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __LINUX_PCS_PROVIDER_H
+#define __LINUX_PCS_PROVIDER_H
+
+/**
+ * fwnode_pcs_simple_get - Simple xlate function to retrieve PCS
+ * @pcsspec: reference arguments
+ * @data: Context data (assumed assigned to the single PCS)
+ *
+ * Returns: the PCS pointed by data.
+ */
+struct phylink_pcs *fwnode_pcs_simple_get(struct fwnode_reference_args *pcsspec,
+ void *data);
+
+/**
+ * fwnode_pcs_add_provider - Registers a new PCS provider
+ * @fwnode: Firmware node
+ * @get: xlate function to retrieve the PCS
+ * @data: Context data
+ *
+ * Register and add a new PCS to the global providers list
+ * for the firmware node. A function to get the PCS from
+ * firmware node with the use fwnode reference arguments.
+ * To the get function is also passed the interface type
+ * requested for the PHY. PCS driver will use the passed
+ * interface to understand if the PCS can support it or not.
+ *
+ * Returns: 0 on success or -ENOMEM on allocation failure.
+ */
+int fwnode_pcs_add_provider(struct fwnode_handle *fwnode,
+ struct phylink_pcs *(*get)(struct fwnode_reference_args *pcsspec,
+ void *data),
+ void *data);
+
+/**
+ * fwnode_pcs_del_provider - Removes a PCS provider
+ * @fwnode: Firmware node
+ */
+void fwnode_pcs_del_provider(struct fwnode_handle *fwnode);
+
+#endif /* __LINUX_PCS_PROVIDER_H */
diff --git a/include/linux/pcs/pcs.h b/include/linux/pcs/pcs.h
new file mode 100644
index 000000000000..33244e3a442b
--- /dev/null
+++ b/include/linux/pcs/pcs.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __LINUX_PCS_H
+#define __LINUX_PCS_H
+
+#include <linux/phylink.h>
+
+#if IS_ENABLED(CONFIG_FWNODE_PCS)
+/**
+ * fwnode_pcs_get - Retrieves a PCS from a firmware node
+ * @fwnode: firmware node
+ * @index: index fwnode PCS handle in firmware node
+ *
+ * Get a PCS from the firmware node at index.
+ *
+ * Returns: a pointer to the phylink_pcs or a negative
+ * error pointer. Can return -EPROBE_DEFER if the PCS is not
+ * present in global providers list (either due to driver
+ * still needs to be probed or it failed to probe/removed)
+ */
+struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode,
+ int index);
+
+/**
+ * fwnode_phylink_pcs_parse - generic PCS parse for fwnode PCS provider
+ * @fwnode: firmware node
+ * @available_pcs: pointer to preallocated array of PCS
+ * @num_pcs: where to store count of parsed PCS
+ *
+ * Generic helper function to fill available_pcs array with PCS parsed
+ * from a "pcs-handle" fwnode property defined in firmware node up to
+ * passed num_pcs.
+ *
+ * If available_pcs is NULL, num_pcs is updated with the count of the
+ * parsed PCS.
+ *
+ * Returns: 0 or a negative error.
+ */
+int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
+ struct phylink_pcs **available_pcs,
+ unsigned int *num_pcs);
+#else
+static inline struct phylink_pcs *fwnode_pcs_get(struct fwnode_handle *fwnode,
+ int index)
+{
+ return ERR_PTR(-ENOENT);
+}
+
+static inline int fwnode_phylink_pcs_parse(struct fwnode_handle *fwnode,
+ struct phylink_pcs **available_pcs,
+ unsigned int *num_pcs)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
+#endif /* __LINUX_PCS_H */
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 03/10] net: phylink: add phylink_release_pcs() to externally release a PCS
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Add phylink_release_pcs() to externally release a PCS from a phylink
instance. This can be used to handle case when a single PCS needs to be
removed and the phylink instance needs to be refreshed.
On calling phylink_release_pcs(), the PCS will be removed from the
phylink internal PCS list and the phylink supported_interfaces value is
reparsed with the remaining PCS interfaces.
Also a phylink resolve is triggered to handle the PCS removal.
The flag force_major_config is set to make phylink resolve reconfigure
the interface (even if it didn't change) is also added.
This is needed to handle the special case when the current PCS used
by phylink is removed and a major_config is needed to propagae the
configuration change. With this option enabled we also force mac_config
even if the PHY link is not up for the in-band case.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/phy/phylink.c | 49 +++++++++++++++++++++++++++++++++++++++
include/linux/phylink.h | 2 ++
2 files changed, 51 insertions(+)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 1a261060d78e..ca4f4f655a31 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -923,6 +923,55 @@ static void phylink_resolve_an_pause(struct phylink_link_state *state)
}
}
+/**
+ * phylink_release_pcs - Removes a PCS from the phylink PCS available list
+ * @pcs: a pointer to the phylink_pcs struct to be released
+ *
+ * This function release a PCS from the phylink PCS available list if
+ * actually in use. It also refreshes the supported interfaces of the
+ * phylink instance by copying the supported interfaces from the phylink
+ * conf and merging the supported interfaces of the remaining available PCS
+ * in the list and trigger a resolve.
+ */
+void phylink_release_pcs(struct phylink_pcs *pcs)
+{
+ struct phylink *pl;
+
+ ASSERT_RTNL();
+
+ pl = pcs->phylink;
+ if (!pl)
+ return;
+
+ list_del(&pcs->list);
+ pcs->phylink = NULL;
+
+ /* Check if we are removing the PCS currently
+ * in use by phylink. If this is the case,
+ * force phylink resolve to reconfigure the interface
+ * mode and set the phylink PCS to NULL.
+ */
+ if (pl->pcs == pcs) {
+ mutex_lock(&pl->state_mutex);
+
+ pl->force_major_config = true;
+ pl->pcs = NULL;
+
+ mutex_unlock(&pl->state_mutex);
+ }
+
+ /* Refresh supported interfaces */
+ phy_interface_copy(pl->supported_interfaces,
+ pl->config->supported_interfaces);
+ list_for_each_entry(pcs, &pl->pcs_list, list)
+ phy_interface_or(pl->supported_interfaces,
+ pl->supported_interfaces,
+ pcs->supported_interfaces);
+
+ phylink_run_resolve(pl);
+}
+EXPORT_SYMBOL_GPL(phylink_release_pcs);
+
static unsigned int phylink_pcs_inband_caps(struct phylink_pcs *pcs,
phy_interface_t interface)
{
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 9c5a43febde1..aef4a4fcf6e5 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -746,6 +746,8 @@ void phylink_disconnect_phy(struct phylink *);
int phylink_set_fixed_link(struct phylink *,
const struct phylink_link_state *);
+void phylink_release_pcs(struct phylink_pcs *pcs);
+
void phylink_mac_change(struct phylink *, bool up);
void phylink_pcs_change(struct phylink_pcs *, bool up);
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 02/10] net: phylink: introduce internal phylink PCS handling
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Introduce internal handling of PCS for phylink. This is an alternative
to .mac_select_pcs that moves the selection logic of the PCS entirely to
phylink with the usage of the supported_interface value in the PCS
struct.
MAC should now provide an array of available PCS in phylink_config in
.available_pcs and fill the .num_available_pcs with the number of
elements in the array. MAC should also define a new bitmap,
pcs_interfaces, in phylink_config to define for what interface mode a
dedicated PCS is required.
On phylink_create() this array is parsed and a linked list of PCS is
created based on the PCS passed in phylink_config.
Also the supported_interface value in phylink struct is updated with the
new supported_interface from the provided PCS.
On phylink_start() every PCS in phylink PCS list gets attached to the
phylink instance. This is done by setting the phylink value in
phylink_pcs struct to the phylink instance.
On phylink_stop(), every PCS in phylink PCS list is detached from the
phylink instance. This is done by setting the phylink value in
phylink_pcs struct to NULL.
phylink_validate_mac_and_pcs(), phylink_major_config() and
phylink_inband_caps() are updated to support this new implementation
with the PCS list stored in phylink.
They will make use of phylink_validate_pcs_interface() that will loop
for every PCS in the phylink PCS available list and find one that supports
the passed interface.
phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
where if a supported_interface value is not set for the PCS struct, then
it's assumed every interface is supported.
A MAC is required to implement either a .mac_select_pcs or make use of
the PCS list implementation. Implementing both will result in a fail
on MAC/PCS validation.
phylink value in phylink_pcs struct with this implementation is used to
track from PCS side when it's attached to a phylink instance. PCS driver
will make use of this information to correctly detach from a phylink
instance if needed.
The .mac_select_pcs implementation is not changed but it's expected that
every MAC driver migrates to the new implementation to later deprecate
and remove .mac_select_pcs.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/phy/phylink.c | 149 +++++++++++++++++++++++++++++++++-----
include/linux/phylink.h | 11 +++
2 files changed, 141 insertions(+), 19 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 4d59c0dd78db..1a261060d78e 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -60,6 +60,9 @@ struct phylink {
/* The link configuration settings */
struct phylink_link_state link_config;
+ /* List of available PCS */
+ struct list_head pcs_list;
+
/* What interface are supported by the current link.
* Can change on removal or addition of new PCS.
*/
@@ -154,6 +157,8 @@ static const phy_interface_t phylink_sfp_interface_preference[] = {
static DECLARE_PHY_INTERFACE_MASK(phylink_sfp_interfaces);
+static void phylink_run_resolve(struct phylink *pl);
+
/**
* phylink_set_port_modes() - set the port type modes in the ethtool mask
* @mask: ethtool link mode mask
@@ -518,22 +523,59 @@ static void phylink_validate_mask_caps(unsigned long *supported,
linkmode_and(state->advertising, state->advertising, mask);
}
+static int phylink_validate_pcs_interface(struct phylink_pcs *pcs,
+ phy_interface_t interface)
+{
+ /* If PCS define an empty supported_interfaces value, assume
+ * all interface are supported.
+ */
+ if (phy_interface_empty(pcs->supported_interfaces))
+ return 0;
+
+ /* Ensure that this PCS supports the interface mode */
+ if (!test_bit(interface, pcs->supported_interfaces))
+ return -EINVAL;
+
+ return 0;
+}
+
static int phylink_validate_mac_and_pcs(struct phylink *pl,
unsigned long *supported,
struct phylink_link_state *state)
{
- struct phylink_pcs *pcs = NULL;
unsigned long capabilities;
+ struct phylink_pcs *pcs;
+ bool pcs_found = false;
int ret;
/* Get the PCS for this interface mode */
if (pl->mac_ops->mac_select_pcs) {
+ /* Make sure either PCS internal validation or .mac_select_pcs
+ * is used. Return error if both are defined.
+ */
+ if (!list_empty(&pl->pcs_list)) {
+ phylink_err(pl, "either phylink_pcs_add() or .mac_select_pcs must be used\n");
+ return -EINVAL;
+ }
+
pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
if (IS_ERR(pcs))
return PTR_ERR(pcs);
+
+ pcs_found = !!pcs;
+ } else {
+ /* Check every assigned PCS and search for one that supports
+ * the interface.
+ */
+ list_for_each_entry(pcs, &pl->pcs_list, list) {
+ if (!phylink_validate_pcs_interface(pcs, state->interface)) {
+ pcs_found = true;
+ break;
+ }
+ }
}
- if (pcs) {
+ if (pcs_found) {
/* The PCS, if present, must be setup before phylink_create()
* has been called. If the ops is not initialised, print an
* error and backtrace rather than oopsing the kernel.
@@ -545,13 +587,10 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
return -EINVAL;
}
- /* Ensure that this PCS supports the interface which the MAC
- * returned it for. It is an error for the MAC to return a PCS
- * that does not support the interface mode.
- */
- if (!phy_interface_empty(pcs->supported_interfaces) &&
- !test_bit(state->interface, pcs->supported_interfaces)) {
- phylink_err(pl, "MAC returned PCS which does not support %s\n",
+ /* Recheck PCS to handle legacy way for .mac_select_pcs */
+ ret = phylink_validate_pcs_interface(pcs, state->interface);
+ if (ret) {
+ phylink_err(pl, "selected PCS does not support %s\n",
phy_modes(state->interface));
return -EINVAL;
}
@@ -965,12 +1004,22 @@ static unsigned int phylink_inband_caps(struct phylink *pl,
phy_interface_t interface)
{
struct phylink_pcs *pcs;
+ bool pcs_found = false;
- if (!pl->mac_ops->mac_select_pcs)
- return 0;
+ if (pl->mac_ops->mac_select_pcs) {
+ pcs = pl->mac_ops->mac_select_pcs(pl->config,
+ interface);
+ pcs_found = !!pcs;
+ } else {
+ list_for_each_entry(pcs, &pl->pcs_list, list) {
+ if (!phylink_validate_pcs_interface(pcs, interface)) {
+ pcs_found = true;
+ break;
+ }
+ }
+ }
- pcs = pl->mac_ops->mac_select_pcs(pl->config, interface);
- if (!pcs)
+ if (!pcs_found)
return 0;
return phylink_pcs_inband_caps(pcs, interface);
@@ -1265,10 +1314,36 @@ static void phylink_major_config(struct phylink *pl, bool restart,
pl->major_config_failed = true;
return;
}
+ /* Find a PCS in available PCS list for the requested interface.
+ * This doesn't overwrite the previous .mac_select_pcs as either
+ * .mac_select_pcs or PCS list implementation are permitted.
+ *
+ * Skip searching if the MAC doesn't require a dedicaed PCS for
+ * the requested interface.
+ */
+ } else if (test_bit(state->interface, pl->config->pcs_interfaces)) {
+ bool pcs_found = false;
+
+ list_for_each_entry(pcs, &pl->pcs_list, list) {
+ if (!phylink_validate_pcs_interface(pcs,
+ state->interface)) {
+ pcs_found = true;
+ break;
+ }
+ }
+
+ if (!pcs_found) {
+ phylink_err(pl,
+ "couldn't find a PCS for %s\n",
+ phy_modes(state->interface));
- pcs_changed = pl->pcs != pcs;
+ pl->major_config_failed = true;
+ return;
+ }
}
+ pcs_changed = pl->pcs != pcs;
+
phylink_pcs_neg_mode(pl, pcs, state->interface, state->advertising);
phylink_dbg(pl, "major config, active %s/%s/%s\n",
@@ -1295,11 +1370,13 @@ static void phylink_major_config(struct phylink *pl, bool restart,
if (pcs_changed) {
phylink_pcs_disable(pl->pcs);
- if (pl->pcs)
- pl->pcs->phylink = NULL;
+ if (pl->mac_ops->mac_select_pcs) {
+ if (pl->pcs)
+ pl->pcs->phylink = NULL;
- if (pcs)
- pcs->phylink = pl;
+ if (pcs)
+ pcs->phylink = pl;
+ }
pl->pcs = pcs;
}
@@ -1855,8 +1932,9 @@ struct phylink *phylink_create(struct phylink_config *config,
phy_interface_t iface,
const struct phylink_mac_ops *mac_ops)
{
+ struct phylink_pcs *pcs;
struct phylink *pl;
- int ret;
+ int i, ret;
/* Validate the supplied configuration */
if (phy_interface_empty(config->supported_interfaces)) {
@@ -1872,9 +1950,21 @@ struct phylink *phylink_create(struct phylink_config *config,
mutex_init(&pl->phydev_mutex);
mutex_init(&pl->state_mutex);
INIT_WORK(&pl->resolve, phylink_resolve);
+ INIT_LIST_HEAD(&pl->pcs_list);
+
+ /* Fill the PCS list with available PCS from phylink config */
+ for (i = 0; i < config->num_available_pcs; i++) {
+ pcs = config->available_pcs[i];
+
+ list_add(&pcs->list, &pl->pcs_list);
+ }
phy_interface_copy(pl->supported_interfaces,
config->supported_interfaces);
+ list_for_each_entry(pcs, &pl->pcs_list, list)
+ phy_interface_or(pl->supported_interfaces,
+ pl->supported_interfaces,
+ pcs->supported_interfaces);
pl->config = config;
if (config->type == PHYLINK_NETDEV) {
@@ -1953,10 +2043,16 @@ EXPORT_SYMBOL_GPL(phylink_create);
*/
void phylink_destroy(struct phylink *pl)
{
+ struct phylink_pcs *pcs, *tmp;
+
sfp_bus_del_upstream(pl->sfp_bus);
if (pl->link_gpio)
gpiod_put(pl->link_gpio);
+ /* Remove every PCS from phylink PCS list */
+ list_for_each_entry_safe(pcs, tmp, &pl->pcs_list, list)
+ list_del(&pcs->list);
+
cancel_work_sync(&pl->resolve);
kfree(pl);
}
@@ -2437,6 +2533,7 @@ static irqreturn_t phylink_link_handler(int irq, void *data)
*/
void phylink_start(struct phylink *pl)
{
+ struct phylink_pcs *pcs;
bool poll = false;
ASSERT_RTNL();
@@ -2463,6 +2560,10 @@ void phylink_start(struct phylink *pl)
pl->pcs_state = PCS_STATE_STARTED;
+ /* link available PCS to phylink struct */
+ list_for_each_entry(pcs, &pl->pcs_list, list)
+ pcs->phylink = pl;
+
phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_STOPPED);
if (pl->cfg_link_an_mode == MLO_AN_FIXED && pl->link_gpio) {
@@ -2507,6 +2608,8 @@ EXPORT_SYMBOL_GPL(phylink_start);
*/
void phylink_stop(struct phylink *pl)
{
+ struct phylink_pcs *pcs;
+
ASSERT_RTNL();
if (pl->sfp_bus)
@@ -2524,6 +2627,14 @@ void phylink_stop(struct phylink *pl)
pl->pcs_state = PCS_STATE_DOWN;
phylink_pcs_disable(pl->pcs);
+
+ /* Drop link between phylink and PCS */
+ list_for_each_entry(pcs, &pl->pcs_list, list)
+ pcs->phylink = NULL;
+
+ /* Restore original supported interfaces */
+ phy_interface_copy(pl->supported_interfaces,
+ pl->config->supported_interfaces);
}
EXPORT_SYMBOL_GPL(phylink_stop);
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 2bc0db3d52ac..9c5a43febde1 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -151,6 +151,8 @@ enum phylink_op_type {
* if MAC link is at %MLO_AN_FIXED mode.
* @supported_interfaces: bitmap describing which PHY_INTERFACE_MODE_xxx
* are supported by the MAC/PCS.
+ * @pcs_interfaces: bitmap describing for which PHY_INTERFACE_MODE_xxx a
+ * dedicated PCS is required.
* @lpi_interfaces: bitmap describing which PHY interface modes can support
* LPI signalling.
* @mac_capabilities: MAC pause/speed/duplex capabilities.
@@ -160,6 +162,8 @@ enum phylink_op_type {
* @wol_phy_legacy: Use Wake-on-Lan with PHY even if phy_can_wakeup() is false
* @wol_phy_speed_ctrl: Use phy speed control on suspend/resume
* @wol_mac_support: Bitmask of MAC supported %WAKE_* options
+ * @available_pcs: array of available phylink_pcs PCS
+ * @num_available_pcs: num of available phylink_pcs PCS
*/
struct phylink_config {
struct device *dev;
@@ -172,6 +176,7 @@ struct phylink_config {
void (*get_fixed_state)(struct phylink_config *config,
struct phylink_link_state *state);
DECLARE_PHY_INTERFACE_MASK(supported_interfaces);
+ DECLARE_PHY_INTERFACE_MASK(pcs_interfaces);
DECLARE_PHY_INTERFACE_MASK(lpi_interfaces);
unsigned long mac_capabilities;
unsigned long lpi_capabilities;
@@ -182,6 +187,9 @@ struct phylink_config {
bool wol_phy_legacy;
bool wol_phy_speed_ctrl;
u32 wol_mac_support;
+
+ struct phylink_pcs **available_pcs;
+ unsigned int num_available_pcs;
};
void phylink_limit_mac_speed(struct phylink_config *config, u32 max_speed);
@@ -497,6 +505,9 @@ struct phylink_pcs {
struct phylink *phylink;
bool poll;
bool rxc_always_on;
+
+ /* private: */
+ struct list_head list;
};
/**
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 01/10] net: phylink: keep and use MAC supported_interfaces in phylink struct
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-1-ansuelsmth@gmail.com>
Add in phylink struct a copy of supported_interfaces from phylink_config
and make use of that instead of relying on phylink_config value.
This in preparation for support of PCS handling internally to phylink
where a PCS can be removed or added after the phylink is created and we
need both a reference of the supported_interfaces value from
phylink_config and an internal value that can be updated with the new
PCS info.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/phy/phylink.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 087ac63f9193..4d59c0dd78db 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -60,6 +60,11 @@ struct phylink {
/* The link configuration settings */
struct phylink_link_state link_config;
+ /* What interface are supported by the current link.
+ * Can change on removal or addition of new PCS.
+ */
+ DECLARE_PHY_INTERFACE_MASK(supported_interfaces);
+
/* The current settings */
phy_interface_t cur_interface;
@@ -629,7 +634,7 @@ static int phylink_validate_mask(struct phylink *pl, struct phy_device *phy,
static int phylink_validate(struct phylink *pl, unsigned long *supported,
struct phylink_link_state *state)
{
- const unsigned long *interfaces = pl->config->supported_interfaces;
+ const unsigned long *interfaces = pl->supported_interfaces;
if (state->interface == PHY_INTERFACE_MODE_NA)
return phylink_validate_mask(pl, NULL, supported, state,
@@ -1868,6 +1873,9 @@ struct phylink *phylink_create(struct phylink_config *config,
mutex_init(&pl->state_mutex);
INIT_WORK(&pl->resolve, phylink_resolve);
+ phy_interface_copy(pl->supported_interfaces,
+ config->supported_interfaces);
+
pl->config = config;
if (config->type == PHYLINK_NETDEV) {
pl->netdev = to_net_dev(config->dev);
@@ -2026,7 +2034,7 @@ static int phylink_validate_phy(struct phylink *pl, struct phy_device *phy,
* those which the host supports.
*/
phy_interface_and(interfaces, phy->possible_interfaces,
- pl->config->supported_interfaces);
+ pl->supported_interfaces);
if (phy_interface_empty(interfaces)) {
phylink_err(pl, "PHY has no common interfaces\n");
@@ -2828,12 +2836,12 @@ static phy_interface_t phylink_sfp_select_interface(struct phylink *pl,
return interface;
}
- if (!test_bit(interface, pl->config->supported_interfaces)) {
+ if (!test_bit(interface, pl->supported_interfaces)) {
phylink_err(pl,
"selection of interface failed, SFP selected %s (%u) but MAC supports %*pbl\n",
phy_modes(interface), interface,
(int)PHY_INTERFACE_MODE_MAX,
- pl->config->supported_interfaces);
+ pl->supported_interfaces);
return PHY_INTERFACE_MODE_NA;
}
@@ -3761,14 +3769,14 @@ static int phylink_sfp_config_optical(struct phylink *pl)
phylink_dbg(pl, "optical SFP: interfaces=[mac=%*pbl, sfp=%*pbl]\n",
(int)PHY_INTERFACE_MODE_MAX,
- pl->config->supported_interfaces,
+ pl->supported_interfaces,
(int)PHY_INTERFACE_MODE_MAX,
pl->sfp_interfaces);
/* Find the union of the supported interfaces by the PCS/MAC and
* the SFP module.
*/
- phy_interface_and(pl->sfp_interfaces, pl->config->supported_interfaces,
+ phy_interface_and(pl->sfp_interfaces, pl->supported_interfaces,
pl->sfp_interfaces);
if (phy_interface_empty(pl->sfp_interfaces)) {
phylink_err(pl, "unsupported SFP module: no common interface modes\n");
@@ -3939,7 +3947,7 @@ static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
/* Set the PHY's host supported interfaces */
phy_interface_and(phy->host_interfaces, phylink_sfp_interfaces,
- pl->config->supported_interfaces);
+ pl->supported_interfaces);
/* Do the initial configuration */
return phylink_sfp_config_phy(pl, phy);
--
2.53.0
^ permalink raw reply related
* [net-next RFC PATCH v5 00/10] net: pcs: Introduce support for fwnode PCS
From: Christian Marangi @ 2026-05-05 18:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Philipp Zabel,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Christian Marangi, Daniel Golle, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, llvm
This series introduce a most awaited feature that is correctly
provide PCS with fwnode without having to use specific export symbol
and additional handling of PCS in phylink.
At times there were 2 different implementation (this and the one
from Sean) but Sean agreed that this can be picked and used in favor
of his implementation as long as his case with race condition is
correctly handled.
---
First the PCS fwnode:
The concept is to implement a producer-consumer API similar to other
subsystem like clock or PHY.
That seems to be the best solution to the problem as PCS driver needs
to be detached from phylink and implement a simple way to provide a
PCS while maintaining support for probe defer or driver removal.
To keep the implementation simple, the PCS driver devs needs some
collaboration to correctly implement this. This is O.K. as helper
to correctly implement this are provided hence it's really a matter
of following a pattern to correct follow removal of a PCS driver.
A PCS provider have to implement and call fwnode_pcs_add_provider() in
probe function and define an xlate function to define how the PCS
should be provided based on the requested interface and phandle spec
defined in fwnode (based on the #pcs-cells)
fwnode_pcs_get() is provided to provide a specific PCS declared in
fwnode at index.
A simple xlate function is provided for simple single PCS
implementation, fwnode_pcs_simple_get.
A PCS provider on driver removal should first call
fwnode_pcs_del_provider() to delete itself as a provider and then
release the PCS from phylink with phylink_release_pcs() under rtnl
lock.
---
Second PCS handling in phylink:
We have the PCS problem for the only reason that in initial
implementation, we permitted way too much flexibility to MAC driver
and things started to deviate. At times we couldn't think SoC
would start to put PCS outside the MAC hence it was OK to assume
they would live in the same driver. With the introduction of
10g in more consumer devices, we are observing a rapid growth
of this pattern with multiple PCS external to MAC.
To put a stop on this, the only solution is to give back to phylink
control on PCS handling and enforce more robust supported interface
definition from both MAC and PCS side.
It's suggested to read patch 0003 of this series for more info, here
a brief explaination of the idea:
This series introduce handling of PCS in phylink and try to deprecate
.mac_select_pcs.
Phylink now might contain a linked list of available PCS and
those will be used for PCS selection on phylink_major_config.
MAC driver needs to define pcs_interfaces mask in phylink_config
for every interface that needs a dedicated PCS.
These PCS needs to be provided to phylink at phylink_create time
by setting the available_pcs and num_available_pcs in phylink_config.
A helper to parse PCS from fwnode is provided
fwnode_phylink_pcs_parse() that will fill a preallocated array of
PCS. (the same function can be used to get the number of PCS
defined in DT, more info in patch 0005)
phylink_create() will fill the internal PCS list with the passed
array of PCS. phylink_major_config and other user of .mac_select_pcs
are adapted to make use of this new PCS list.
The supported interface value is also moved internally to phylink
struct. This is to handle late removal and addition of PCS.
(the bonus effect to this is giving phylink a clear idea of what
is actually supported by the MAC and his constraint with PCS)
The supported interface mask in phylink is done by OR the
supported_interfaces in phylink_config with every PCS in PCS list.
PCS removal is supported by forcing a mac_config, refresh the
supported interfaces and run a phy_resolve().
PCS late addition is supported by introducing a global notifier
for PCS provider. If a phylink have the pcs_interfaces mask not
zero, it's registered to this notifier.
PCS provider will emit a global PCS add event to signal any
interface that a new PCS might be avialable.
The function will then check if the PCS is related to the MAC
fwnode and add it accordingly.
A user for this new implementation is provided as an Airoha PCS
driver. This was also tested downstream with the IPQ95xx QCOM SoC
and with the help of Daniel also on the various Mediatek MT7988
SoC with both SFP cage implementation and DSA attached.
Lots of tests were done with driver unbind/bind and with interface
up/down also by adding print to make sure major_config_fail gets
correctly triggered and reset once the PCS comes back.
The dedicated commits have longer description on the implementation
so it's suggested to also check there for additional info.
It's worth to mention that OpenWrt is currently using this on
Mediatek SoC and QCOM ipq807x/ipq60xx/ipq50xx and Airoha are
already ported in staging tree for testing.
---
Changes v5:
- Rebase on top of net-next
- Use the new force_major_config
- Reword some comments and commit description
- Return -ENODEV instead of -EPROBE_DEFER to perevent race condition
- Drop phy_interface_copy patch (Russell pushed an equivalent version)
Changes v4:
- Move patch 0002 phy_interface_copy to 0002 (fix bisectability
problem)
- Address review from Lorenzo for Airoha ethernet driver
- Fix kdoc error with missing Return (actually missing : before Return)
- Fix UNMET dependency reported error for CONFIG_FWNODE_PCS
- Revert to pcs.c instead of core.c (due to name conflict with other kmod)
- Fix clang compilation error for Airoha PCS driver
- Add missing inline function to pcs.h function
Changes v3:
- Out of RFC
- Fix various spelling mistake
- Drop circular dependency patch
- Complete Airoha Ethernet phylink integration
- Introduce .pcs_link_down PCS OP
Changes v2:
- Switch to fwnode
- Implement PCS provider notifier
- Better split changes
- Move supported_interfaces to phylink
- Add circular dependency patch
- Rework handling with indirect addition/removal and
trigger of phylink_resolve()
Christian Marangi (10):
net: phylink: keep and use MAC supported_interfaces in phylink struct
net: phylink: introduce internal phylink PCS handling
net: phylink: add phylink_release_pcs() to externally release a PCS
net: pcs: implement Firmware node support for PCS driver
net: phylink: support late PCS provider attach
dt-bindings: net: ethernet-controller: permit to define multiple PCS
net: phylink: add .pcs_link_down PCS OP
dt-bindings: net: pcs: Document support for Airoha Ethernet PCS
net: pcs: airoha: add PCS driver for Airoha SoC
net: airoha: add phylink support for GDM2/3/4
.../bindings/net/ethernet-controller.yaml | 2 -
.../bindings/net/pcs/airoha,pcs.yaml | 112 +
drivers/net/ethernet/airoha/Kconfig | 1 +
drivers/net/ethernet/airoha/airoha_eth.c | 144 +-
drivers/net/ethernet/airoha/airoha_eth.h | 3 +
drivers/net/ethernet/airoha/airoha_regs.h | 12 +
drivers/net/pcs/Kconfig | 13 +
drivers/net/pcs/Makefile | 2 +
drivers/net/pcs/pcs-airoha.c | 2921 +++++++++++++++++
drivers/net/pcs/pcs.c | 241 ++
drivers/net/phy/phylink.c | 280 +-
include/linux/pcs/pcs-provider.h | 41 +
include/linux/pcs/pcs.h | 104 +
include/linux/phylink.h | 15 +
14 files changed, 3862 insertions(+), 29 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/pcs/airoha,pcs.yaml
create mode 100644 drivers/net/pcs/pcs-airoha.c
create mode 100644 drivers/net/pcs/pcs.c
create mode 100644 include/linux/pcs/pcs-provider.h
create mode 100644 include/linux/pcs/pcs.h
--
2.53.0
^ permalink raw reply
* Re: [PATCH net v2 2/2] selftests/net: add packetdrill test for locked SO_RCVBUF SWS
From: Ankit Jain @ 2026-05-05 18:23 UTC (permalink / raw)
To: edumazet
Cc: kuba, netdev, davem, pabeni, ncardwell, kuniyu, horms, shuah,
quic_subashab, quic_stranche, linux-kselftest, linux-kernel,
karen.badiryan, ajay.kaher, alexey.makhalov,
vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu
In-Reply-To: <CANn89iK_SrGx4UuN-_h8LY0DFBwf8+hStba7vdjFnEogvo=N7A@mail.gmail.com>
Thanks for the review.
> I do not see the SWS effect you want to avoid in the first place.
>
> This test is an ad-hoc test about your change, but I still do not see
> why recomputing tp->ratio every time the rcvmss is increased is an
> issue on loopback interface.
For the packetdrill script, it is taking more time. To actually show
the window dropping to 0, I have to write a long script with many
packets and application reads. This is because TCP does not shrink the
right edge of an already open window.
Since the C-code fix in Patch 1 is tested and working fine, should I
send v3 with just the code fix first? I can work on the packetdrill
script and send it later in a separate patch. Or should I wait and
send both together?
Kindly suggest how I should proceed.
Thanks,
Ankit
^ permalink raw reply
* Re: [PATCH net v2 1/2] tcp: protect locked SO_RCVBUF from Silly Window Syndrome
From: Ankit Jain @ 2026-05-05 18:19 UTC (permalink / raw)
To: edumazet
Cc: kuba, netdev, davem, pabeni, ncardwell, kuniyu, horms, shuah,
quic_subashab, quic_stranche, linux-kselftest, linux-kernel,
karen.badiryan, ajay.kaher, alexey.makhalov,
vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu
In-Reply-To: <CANn89iKcaOK6rrY=m0SVfXdx3jTRXCE3T+jNQpw434UZ8oxtpw@mail.gmail.com>
Hi Eric,
Thanks for the review and suggestion.
> Testing tp->advmss is not doing what you want I think.
>
> A remote peer can send GRO packets with tiny segments, regardless of
> tp->advmss
>
> If GRO is what you are looking for, why not testing (skb->len > len) ?
I tested your suggested `skb->len > len` logic on our reproduction
setup. It works perfectly and the 504 timeouts are completely resolved.
Thanks,
Ankit
^ permalink raw reply
* Re: [PATCH net-next v5 3/3] gve: implement PTP gettimex64
From: Jordan Rhee @ 2026-05-05 18:22 UTC (permalink / raw)
To: Paolo Abeni
Cc: Harshitha Ramamurthy, netdev, joshwash, andrew+netdev, davem,
edumazet, kuba, richardcochran, jstultz, tglx, sboyd, willemb,
nktgrg, jfraker, ziweixiao, maolson, thostet, alok.a.tiwari,
pkaligineedi, horms, dwmw2, jacob.e.keller, yyd, linux-kernel,
Naman Gulati
In-Reply-To: <6259cf59-b054-42bb-9dad-5f62a54a2bc0@redhat.com>
On Tue, May 5, 2026 at 1:08 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 4/29/26 3:28 AM, Harshitha Ramamurthy wrote:
> > From: Jordan Rhee <jordanrhee@google.com>
> >
> > Enable chrony and phc2sys to synchronize system clock to NIC clock.
> >
> > The system cycle counters are sampled by the device to minimize the
> > uncertainty window. If the system times are sampled in the host, the
> > delta between pre and post readings is 100us or more due to AQ command
> > latency. The system times returned by the device have a delta of ~1us,
> > which enables significantly more accurate clock synchronization.
> >
> > Reviewed-by: Willem de Bruijn <willemb@google.com>
> > Reviewed-by: Kevin Yang <yyd@google.com>
> > Reviewed-by: Naman Gulati <namangulati@google.com>
> > Signed-off-by: Jordan Rhee <jordanrhee@google.com>
> > Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
> > ---
> > Changes in v5:
> > - Reformulate retry loop in terms of total timeout (Jakub Kicinski)
> >
> > Changes in v3:
> > - Take system time snapshot inside the mutex
> > - Return -EOPNOTSUPP if cross-timestamp is requested on an arch other
> > than x86 or arm64
> >
> > Changes in v2:
> > - fix compilation warning on ARM by casting cycles_t to u64
> > ---
> > drivers/net/ethernet/google/gve/gve_adminq.h | 4 +-
> > drivers/net/ethernet/google/gve/gve_ptp.c | 196 ++++++++++++++++++-
> > 2 files changed, 191 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
> > index 22a74b6aa17e..e6dcf6da9091 100644
> > --- a/drivers/net/ethernet/google/gve/gve_adminq.h
> > +++ b/drivers/net/ethernet/google/gve/gve_adminq.h
> > @@ -411,8 +411,8 @@ static_assert(sizeof(struct gve_adminq_report_nic_ts) == 16);
> >
> > struct gve_nic_ts_report {
> > __be64 nic_timestamp; /* NIC clock in nanoseconds */
> > - __be64 reserved1;
> > - __be64 reserved2;
> > + __be64 pre_cycles; /* System cycle counter before NIC clock read */
> > + __be64 post_cycles; /* System cycle counter after NIC clock read */
> > __be64 reserved3;
> > __be64 reserved4;
> > };
> > diff --git a/drivers/net/ethernet/google/gve/gve_ptp.c b/drivers/net/ethernet/google/gve/gve_ptp.c
> > index ad15f1209a83..c6c98ef825aa 100644
> > --- a/drivers/net/ethernet/google/gve/gve_ptp.c
> > +++ b/drivers/net/ethernet/google/gve/gve_ptp.c
> > @@ -10,28 +10,210 @@
> > /* Interval to schedule a nic timestamp calibration, 250ms. */
> > #define GVE_NIC_TS_SYNC_INTERVAL_MS 250
> >
> > +/*
> > + * Stores cycle counter samples in get_cycles() units from a
> > + * sandwiched NIC clock read
> > + */
> > +struct gve_sysclock_sample {
> > + /* system time snapshot taken just before issuing AdminQ command */
> > + struct system_time_snapshot snapshot;
> > + /* Cycle counter from NIC before clock read */
> > + u64 nic_pre_cycles;
> > + /* Cycle counter from NIC after clock read */
> > + u64 nic_post_cycles;
> > + /* Cycle counter from host before issuing AQ command */
> > + cycles_t host_pre_cycles;
> > + /* Cycle counter from host after AQ command returns */
> > + cycles_t host_post_cycles;
> > +};
> > +
> > +/*
> > + * Read NIC clock by issuing the AQ command. The command is subject to
> > + * rate limiting and may need to be retried. Requires nic_ts_read_lock
> > + * to be held.
> > + */
> > +static int gve_ptp_read_timestamp(struct gve_ptp *ptp, cycles_t *pre_cycles,
> > + cycles_t *post_cycles,
> > + struct system_time_snapshot *snap)
> > +{
> > + unsigned long deadline = jiffies + msecs_to_jiffies(100);
> > + unsigned long delay_us = 1000;
> > + int err;
> > +
> > + lockdep_assert_held(&ptp->nic_ts_read_lock);
> > +
> > + do {
> > + if (snap)
> > + ktime_get_snapshot(snap);
> > +
> > + *pre_cycles = get_cycles();
> > + err = gve_adminq_report_nic_ts(ptp->priv,
> > + ptp->nic_ts_report_bus);
> > +
> > + /* Prevent get_cycles() from being speculatively executed
> > + * before the AdminQ command
> > + */
> > + rmb();
> > + *post_cycles = get_cycles();
> > + if (likely(err != -EAGAIN))
> > + return err;
> > +
> > + fsleep(delay_us);
> > +
> > + /* Exponential backoff */
> > + delay_us *= 2;
> > + } while (time_before(jiffies, deadline));
> > +
> > + return -ETIMEDOUT;
> > +}
> > +
> > /* Read the nic timestamp from hardware via the admin queue. */
> > -static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw)
> > +static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw,
> > + struct gve_sysclock_sample *sysclock)
> > {
> > + cycles_t host_pre_cycles, host_post_cycles;
> > + struct gve_nic_ts_report *ts_report;
> > int err;
> >
> > mutex_lock(&ptp->nic_ts_read_lock);
> > - err = gve_adminq_report_nic_ts(ptp->priv, ptp->nic_ts_report_bus);
> > - if (err)
> > + err = gve_ptp_read_timestamp(ptp, &host_pre_cycles, &host_post_cycles,
> > + sysclock ? &sysclock->snapshot : NULL);
> > + if (err) {
> > + dev_err_ratelimited(&ptp->priv->pdev->dev,
> > + "AdminQ timestamp read failed: %d\n", err);
> > goto out;
> > + }
> >
> > - *nic_raw = be64_to_cpu(ptp->nic_ts_report->nic_timestamp);
> > + ts_report = ptp->nic_ts_report;
> > + *nic_raw = be64_to_cpu(ts_report->nic_timestamp);
> > +
> > + if (sysclock) {
> > + sysclock->nic_pre_cycles = be64_to_cpu(ts_report->pre_cycles);
> > + sysclock->nic_post_cycles = be64_to_cpu(ts_report->post_cycles);
> > + sysclock->host_pre_cycles = host_pre_cycles;
> > + sysclock->host_post_cycles = host_post_cycles;
> > + }
> >
> > out:
> > mutex_unlock(&ptp->nic_ts_read_lock);
> > return err;
> > }
> >
> > +struct gve_cycles_to_clock_callback_ctx {
> > + u64 cycles;
> > +};
> > +
> > +static int gve_cycles_to_clock_fn(ktime_t *device_time,
> > + struct system_counterval_t *system_counterval,
> > + void *ctx)
> > +{
> > + struct gve_cycles_to_clock_callback_ctx *context = ctx;
> > +
> > + *device_time = 0;
> > +
> > + system_counterval->cycles = context->cycles;
> > + system_counterval->use_nsecs = false;
> > +
> > + if (IS_ENABLED(CONFIG_X86))
> > + system_counterval->cs_id = CSID_X86_TSC;
> > + else if (IS_ENABLED(CONFIG_ARM64))
> > + system_counterval->cs_id = CSID_ARM_ARCH_COUNTER;
> > + else
> > + return -EOPNOTSUPP;
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * Convert a raw cycle count (e.g. from get_cycles()) to the system clock
> > + * type specified by clockid. The system_time_snapshot must be taken before
> > + * the cycle counter is sampled.
> > + */
> > +static int gve_cycles_to_timespec64(struct gve_priv *priv, clockid_t clockid,
> > + struct system_time_snapshot *snap,
> > + u64 cycles, struct timespec64 *ts)
> > +{
> > + struct gve_cycles_to_clock_callback_ctx ctx = {0};
> > + struct system_device_crosststamp xtstamp;
> > + int err;
> > +
> > + ctx.cycles = cycles;
> > + err = get_device_system_crosststamp(gve_cycles_to_clock_fn, &ctx, snap,
> > + &xtstamp);
> > + if (err) {
> > + dev_err_ratelimited(&priv->pdev->dev,
> > + "get_device_system_crosststamp() failed to convert %lld cycles to system time: %d\n",
> > + cycles,
> > + err);
> > + return err;
> > + }
> > +
> > + switch (clockid) {
> > + case CLOCK_REALTIME:
> > + *ts = ktime_to_timespec64(xtstamp.sys_realtime);
> > + break;
> > + case CLOCK_MONOTONIC_RAW:
> > + *ts = ktime_to_timespec64(xtstamp.sys_monoraw);
> > + break;
> > + default:
> > + dev_err_ratelimited(&priv->pdev->dev,
> > + "Cycle count conversion to clockid %d not supported\n",
> > + clockid);
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int gve_ptp_gettimex64(struct ptp_clock_info *info,
> > struct timespec64 *ts,
> > struct ptp_system_timestamp *sts)
> > {
> > - return -EOPNOTSUPP;
> > + struct gve_ptp *ptp = container_of(info, struct gve_ptp, info);
> > + struct gve_sysclock_sample sysclock = {0};
> > + struct gve_priv *priv = ptp->priv;
> > + u64 nic_ts;
> > + int err;
> > +
> > + if (sts && !(IS_ENABLED(CONFIG_X86) || IS_ENABLED(CONFIG_ARM64)))
> > + return -EOPNOTSUPP;
> > +
> > + err = gve_clock_nic_ts_read(ptp, &nic_ts, sts ? &sysclock : NULL);
> > + if (err)
> > + return err;
> > +
> > + if (sts) {
> > + /* Reject samples with out of order system clock values */
> > + if (!(sysclock.host_pre_cycles <= sysclock.nic_pre_cycles &&
> > + sysclock.nic_pre_cycles <= sysclock.nic_post_cycles &&
> > + sysclock.nic_post_cycles <= sysclock.host_post_cycles)) {
> > + dev_err_ratelimited(&priv->pdev->dev,
> > + "AdminQ system clock cycle counts out of order. Expecting %llu <= %llu <= %llu <= %llu\n",
> > + (u64)sysclock.host_pre_cycles,
> > + sysclock.nic_pre_cycles,
> > + sysclock.nic_post_cycles,
> > + (u64)sysclock.host_post_cycles);
> > + return -EBADMSG;
>
> Sashiko/gemini is reporting the following:
>
> ---
> If older firmware does not support this feature and returns 0 for the
> pre_cycles and post_cycles fields, won't host_pre_cycles <=
> nic_pre_cycles evaluate to false since get_cycles() will be greater than
> 0? If this occurs, the driver returns -EBADMSG instead of -EOPNOTSUPP.
> Does returning -EBADMSG prevent userspace tools from gracefully falling
> back to legacy PTP ioctls like PTP_SYS_OFFSET, causing PTP
> synchronization to fail completely on older firmwares?
> ---
>
> which looks legit to me, or am I missing something? Note that
> proactively triaging sashiko comments would help maintainers a lot.
>
> Thanks,
>
> Paolo
>
Hi Paolo, my apologies for not addressing this proactively.
Firmware is not allowed to return 0 for pre/post cycles. This would
violate the API contract, so returning failure is the appropriate
response.
The kernel PTP stack has no special handling for EBADMSG vs
EOPNOTSUPP, and neither does Chrony
(https://github.com/mlichvar/chrony/blob/866bedef0b648fdd6cc68f8cb4d75e4303244c76/sys_linux.c#L959).
Returning EOPNOTSUPP instead of EBADMSG would not change the behavior
of upper layers.
> > +
> > + err = gve_cycles_to_timespec64(priv, sts->clockid,
> > + &sysclock.snapshot,
> > + sysclock.nic_pre_cycles,
> > + &sts->pre_ts);
> > + if (err)
> > + return err;
>
> Sashiko writes:
>
> Looking at gve_cycles_to_timespec64(), it relies on
> get_device_system_crosststamp() which hardcodes the clocksource to
> CSID_X86_TSC or CSID_ARM_ARCH_COUNTER.
>
> If the guest timekeeper clocksource is something else, such as kvm-clock,
> get_device_system_crosststamp() will return -ENODEV which is propagated here.
>
> Since userspace generally only falls back to legacy PTP ioctls when receiving
> -EOPNOTSUPP or -ENOTTY, will returning -ENODEV cause the extended ioctl to
> fatally fail and break synchronization? Should we intercept these errors and
> return -EOPNOTSUPP instead to preserve the fallback behavior?
The system clocksource returned by the callback does not need to match
the current system clocksource. get_device_system_crosststamp()
converts the supplied value to the system clocksource using
convert_base_to_cs(). I've validated that cross-timestamp still works
when system clocksource is kvm-clock:
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc kvm-clock acpi_pm
$ echo "kvm-clock" | sudo tee
/sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock
$ sudo systemctl start chrony
$ chronyc tracking
Reference ID : 50484330 (PHC0)
Stratum : 1
Ref time (UTC) : Tue May 05 18:18:20 2026
System time : 0.000000703 seconds fast of NTP time
Last offset : +0.000001757 seconds
RMS offset : 0.000004796 seconds
Frequency : 0.164 ppm fast
Residual freq : +0.046 ppm
Skew : 3.106 ppm
Root delay : 0.000000001 seconds
Root dispersion : 0.000013117 seconds
Update interval : 0.5 seconds
Leap status : Normal
^ permalink raw reply
* [PATCH iproute2] rdama: sync kernel headers and fix build
From: Stephen Hemminger @ 2026-05-05 18:10 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
The upstream kernel removed the 'RES_' prefix from RDMA netlink
command and attribute names, but the rdma tool was still using
the old names, causing build failures.
Update all references to match the new kernel header names:
- RDMA_NLDEV_CMD_RES_FRMR_POOLS_* → RDMA_NLDEV_CMD_FRMR_POOLS_*
- RDMA_NLDEV_ATTR_RES_FRMR_POOL* → RDMA_NLDEV_ATTR_FRMR_POOL*
- RDMA_NLDEV_ATTR_FRMR_POOL_PINNED → RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES
- RDMA_NLDEV_ATTR_FRMR_POOL_AGING_PERIOD → RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD
Fixes build error:
res.h:203:26: error: 'RDMA_NLDEV_CMD_RES_FRMR_POOLS_GET' undeclared
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
rdma/include/uapi/rdma/rdma_netlink.h | 30 ++++----
rdma/res-frmr-pools.c | 98 +++++++++++++--------------
rdma/res.h | 2 +-
3 files changed, 65 insertions(+), 65 deletions(-)
diff --git a/rdma/include/uapi/rdma/rdma_netlink.h b/rdma/include/uapi/rdma/rdma_netlink.h
index 8709e558..4356ec4a 100644
--- a/rdma/include/uapi/rdma/rdma_netlink.h
+++ b/rdma/include/uapi/rdma/rdma_netlink.h
@@ -308,9 +308,9 @@ enum rdma_nldev_command {
RDMA_NLDEV_CMD_MONITOR,
- RDMA_NLDEV_CMD_RES_FRMR_POOLS_GET, /* can dump */
+ RDMA_NLDEV_CMD_FRMR_POOLS_GET, /* can dump */
- RDMA_NLDEV_CMD_RES_FRMR_POOLS_SET,
+ RDMA_NLDEV_CMD_FRMR_POOLS_SET,
RDMA_NLDEV_NUM_OPS
};
@@ -590,19 +590,19 @@ enum rdma_nldev_attr {
/*
* FRMR Pools attributes
*/
- RDMA_NLDEV_ATTR_RES_FRMR_POOLS, /* nested table */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_ENTRY, /* nested table */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY, /* nested table */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS, /* u8 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS, /* u32 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY, /* u64 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS, /* u64 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES, /* u32 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE, /* u64 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE, /* u64 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_AGING_PERIOD, /* u32 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED, /* u32 */
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY, /* u64 */
+ RDMA_NLDEV_ATTR_FRMR_POOLS, /* nested table */
+ RDMA_NLDEV_ATTR_FRMR_POOL_ENTRY, /* nested table */
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY, /* nested table */
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS, /* u8 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS, /* u32 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY, /* u64 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS, /* u64 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES, /* u32 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE, /* u64 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE, /* u64 */
+ RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD, /* u32 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES, /* u32 */
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY, /* u64 */
/*
* Always the end
diff --git a/rdma/res-frmr-pools.c b/rdma/res-frmr-pools.c
index abcd2188..d5faa5c1 100644
--- a/rdma/res-frmr-pools.c
+++ b/rdma/res-frmr-pools.c
@@ -80,83 +80,83 @@ static int res_frmr_pools_line(struct rd *rd, const char *name, int idx,
char key_str[FRMR_POOL_KEY_MAX_LEN];
struct frmr_pool_key key = { 0 };
- if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY]) {
+ if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_KEY]) {
if (mnl_attr_parse_nested(
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY],
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_KEY],
rd_attr_cb, key_tb) != MNL_CB_OK)
return MNL_CB_ERROR;
- if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS])
+ if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS])
key.ats = mnl_attr_get_u8(
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS]);
- if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS])
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]);
+ if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS])
key.access_flags = mnl_attr_get_u32(
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS]);
- if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY])
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]);
+ if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY])
key.vendor_key = mnl_attr_get_u64(
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY]);
- if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS])
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]);
+ if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS])
key.num_dma_blocks = mnl_attr_get_u64(
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
- if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY])
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
+ if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY])
kernel_vendor_key = mnl_attr_get_u64(
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
if (rd_is_filtered_attr(
rd, "ats", key.ats,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS]))
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]))
goto out;
if (rd_is_filtered_attr(
rd, "access_flags", key.access_flags,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS]))
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]))
goto out;
if (rd_is_filtered_attr(
rd, "vendor_key", key.vendor_key,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY]))
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]))
goto out;
if (rd_is_filtered_attr(
rd, "num_dma_blocks", key.num_dma_blocks,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS]))
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]))
goto out;
}
- if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES])
+ if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES])
queue_handles = mnl_attr_get_u32(
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES]);
if (rd_is_filtered_attr(
rd, "queue", queue_handles,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES]))
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES]))
goto out;
- if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE])
+ if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE])
in_use = mnl_attr_get_u64(
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE]);
if (rd_is_filtered_attr(rd, "in_use", in_use,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE]))
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE]))
goto out;
- if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE])
+ if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE])
max_in_use = mnl_attr_get_u64(
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE]);
if (rd_is_filtered_attr(
rd, "max_in_use", max_in_use,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE]))
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE]))
goto out;
- if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED])
+ if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES])
pinned_handles = mnl_attr_get_u32(
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]);
if (rd_is_filtered_attr(rd, "pinned", pinned_handles,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED]))
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]))
goto out;
open_json_object(NULL);
print_dev(idx, name);
- if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY]) {
+ if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_KEY]) {
snprintf(key_str, sizeof(key_str),
"%" PRIx64 ":%" PRIx64 ":%x:%s",
key.vendor_key, key.num_dma_blocks,
@@ -166,30 +166,30 @@ static int res_frmr_pools_line(struct rd *rd, const char *name, int idx,
if (rd->show_details) {
res_print_u32(
"ats", key.ats,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS]);
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]);
res_print_u32(
"access_flags", key.access_flags,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS]);
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]);
res_print_u64(
"vendor_key", key.vendor_key,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY]);
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]);
res_print_u64(
"num_dma_blocks", key.num_dma_blocks,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
res_print_u64(
"kernel_vendor_key", kernel_vendor_key,
- key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
+ key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
}
}
res_print_u32("queue", queue_handles,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES]);
res_print_u64("in_use", in_use,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE]);
res_print_u64("max_in_use", max_in_use,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE]);
res_print_u32("pinned", pinned_handles,
- nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED]);
+ nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]);
print_driver_table(rd, nla_line[RDMA_NLDEV_ATTR_DRIVER]);
close_json_object();
@@ -215,12 +215,12 @@ int res_frmr_pools_parse_cb(const struct nlmsghdr *nlh, void *data)
mnl_attr_parse(nlh, 0, rd_attr_cb, tb);
if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
- !tb[RDMA_NLDEV_ATTR_RES_FRMR_POOLS])
+ !tb[RDMA_NLDEV_ATTR_FRMR_POOLS])
return MNL_CB_ERROR;
name = mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_DEV_NAME]);
idx = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
- nla_table = tb[RDMA_NLDEV_ATTR_RES_FRMR_POOLS];
+ nla_table = tb[RDMA_NLDEV_ATTR_FRMR_POOLS];
mnl_attr_for_each_nested(nla_entry, nla_table) {
struct nlattr *nla_line[RDMA_NLDEV_ATTR_MAX] = {};
@@ -256,10 +256,10 @@ static int res_frmr_pools_one_set_aging(struct rd *rd)
return -EINVAL;
}
- rd_prepare_msg(rd, RDMA_NLDEV_CMD_RES_FRMR_POOLS_SET, &seq,
+ rd_prepare_msg(rd, RDMA_NLDEV_CMD_FRMR_POOLS_SET, &seq,
(NLM_F_REQUEST | NLM_F_ACK));
mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
- mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_AGING_PERIOD,
+ mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD,
aging_period);
return rd_sendrecv_msg(rd, seq);
@@ -294,24 +294,24 @@ static int res_frmr_pools_one_set_pinned(struct rd *rd)
return -EINVAL;
}
- rd_prepare_msg(rd, RDMA_NLDEV_CMD_RES_FRMR_POOLS_SET, &seq,
+ rd_prepare_msg(rd, RDMA_NLDEV_CMD_FRMR_POOLS_SET, &seq,
(NLM_F_REQUEST | NLM_F_ACK));
mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
- mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED,
+ mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES,
pinned_value);
key_attr =
- mnl_attr_nest_start(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY);
- mnl_attr_put_u8(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS,
+ mnl_attr_nest_start(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_KEY);
+ mnl_attr_put_u8(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS,
pool_key.ats);
mnl_attr_put_u32(rd->nlh,
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS,
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS,
pool_key.access_flags);
- mnl_attr_put_u64(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY,
+ mnl_attr_put_u64(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY,
pool_key.vendor_key);
mnl_attr_put_u64(rd->nlh,
- RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS,
+ RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS,
pool_key.num_dma_blocks);
mnl_attr_nest_end(rd->nlh, key_attr);
diff --git a/rdma/res.h b/rdma/res.h
index 8d7b4a0b..1f71115b 100644
--- a/rdma/res.h
+++ b/rdma/res.h
@@ -200,7 +200,7 @@ struct filters frmr_pools_valid_filters[MAX_NUMBER_OF_FILTERS] = {
{ .name = "pinned", .is_number = true },
};
-RES_FUNC(res_frmr_pools, RDMA_NLDEV_CMD_RES_FRMR_POOLS_GET,
+RES_FUNC(res_frmr_pools, RDMA_NLDEV_CMD_FRMR_POOLS_GET,
frmr_pools_valid_filters, true, 0);
int res_frmr_pools_set(struct rd *rd);
--
2.53.0
^ permalink raw reply related
* [PATCH 3/3] [net-next v5] w5100: remove unused gpio link detection
From: Arnd Bergmann @ 2026-05-05 18:04 UTC (permalink / raw)
To: netdev; +Cc: Rob Herring, Linus Walleij, Bartosz Golaszewski, Arnd Bergmann
In-Reply-To: <20260505180459.1247690-1-arnd@kernel.org>
From: Arnd Bergmann <arnd@arndb.de>
Since the platform_device support is now gone, nothing ever passes a
valid gpio number, and all the link state handling can go away.
An earlier version of my patch changed this to look up the GPIO descriptor
from devicetree and convert it all to the modern interface, but there
are no users of that binding at the moment.
Remove the gpio handling, which is now one of the last users of the
legacy gpio interface in platform-independent code.
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20230127095839.3266452-1-arnd@kernel.org/
---
v5: rewrite to just remove gpio support
v4: rebase to 7.1
v3: include linux/gpio/consumer.h to avoid build failure without GPIOLIB
v2: replace CONFIG_WIZNET_BUS_SHIFT with a constant
---
drivers/net/ethernet/wiznet/w5100-spi.c | 2 +-
drivers/net/ethernet/wiznet/w5100.c | 62 +------------------------
drivers/net/ethernet/wiznet/w5100.h | 3 +-
3 files changed, 3 insertions(+), 64 deletions(-)
diff --git a/drivers/net/ethernet/wiznet/w5100-spi.c b/drivers/net/ethernet/wiznet/w5100-spi.c
index 990a3cce8c0f..d2c5e99ab9c7 100644
--- a/drivers/net/ethernet/wiznet/w5100-spi.c
+++ b/drivers/net/ethernet/wiznet/w5100-spi.c
@@ -450,7 +450,7 @@ static int w5100_spi_probe(struct spi_device *spi)
return -EINVAL;
}
- return w5100_probe(&spi->dev, ops, priv_size, mac, spi->irq, -EINVAL);
+ return w5100_probe(&spi->dev, ops, priv_size, mac, spi->irq);
}
static void w5100_spi_remove(struct spi_device *spi)
diff --git a/drivers/net/ethernet/wiznet/w5100.c b/drivers/net/ethernet/wiznet/w5100.c
index cfe6813ce805..53d8dc642fbd 100644
--- a/drivers/net/ethernet/wiznet/w5100.c
+++ b/drivers/net/ethernet/wiznet/w5100.c
@@ -22,7 +22,6 @@
#include <linux/ioport.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
-#include <linux/gpio.h>
#include "w5100.h"
@@ -155,8 +154,6 @@ struct w5100_priv {
u16 s0_rx_buf_size;
int irq;
- int link_irq;
- int link_gpio;
struct napi_struct napi;
struct net_device *ndev;
@@ -417,16 +414,6 @@ static void w5100_get_drvinfo(struct net_device *ndev,
sizeof(info->bus_info));
}
-static u32 w5100_get_link(struct net_device *ndev)
-{
- struct w5100_priv *priv = netdev_priv(ndev);
-
- if (gpio_is_valid(priv->link_gpio))
- return !!gpio_get_value(priv->link_gpio);
-
- return 1;
-}
-
static u32 w5100_get_msglevel(struct net_device *ndev)
{
struct w5100_priv *priv = netdev_priv(ndev);
@@ -629,24 +616,6 @@ static irqreturn_t w5100_interrupt(int irq, void *ndev_instance)
return IRQ_HANDLED;
}
-static irqreturn_t w5100_detect_link(int irq, void *ndev_instance)
-{
- struct net_device *ndev = ndev_instance;
- struct w5100_priv *priv = netdev_priv(ndev);
-
- if (netif_running(ndev)) {
- if (gpio_get_value(priv->link_gpio) != 0) {
- netif_info(priv, link, ndev, "link is up\n");
- netif_carrier_on(ndev);
- } else {
- netif_info(priv, link, ndev, "link is down\n");
- netif_carrier_off(ndev);
- }
- }
-
- return IRQ_HANDLED;
-}
-
static void w5100_setrx_work(struct work_struct *work)
{
struct w5100_priv *priv = container_of(work, struct w5100_priv,
@@ -690,9 +659,6 @@ static int w5100_open(struct net_device *ndev)
w5100_hw_start(priv);
napi_enable(&priv->napi);
netif_start_queue(ndev);
- if (!gpio_is_valid(priv->link_gpio) ||
- gpio_get_value(priv->link_gpio) != 0)
- netif_carrier_on(ndev);
return 0;
}
@@ -712,7 +678,6 @@ static const struct ethtool_ops w5100_ethtool_ops = {
.get_drvinfo = w5100_get_drvinfo,
.get_msglevel = w5100_get_msglevel,
.set_msglevel = w5100_set_msglevel,
- .get_link = w5100_get_link,
.get_regs_len = w5100_get_regs_len,
.get_regs = w5100_get_regs,
};
@@ -735,8 +700,7 @@ void *w5100_ops_priv(const struct net_device *ndev)
EXPORT_SYMBOL_GPL(w5100_ops_priv);
int w5100_probe(struct device *dev, const struct w5100_ops *ops,
- int sizeof_ops_priv, const void *mac_addr, int irq,
- int link_gpio)
+ int sizeof_ops_priv, const void *mac_addr, int irq)
{
struct w5100_priv *priv;
struct net_device *ndev;
@@ -787,7 +751,6 @@ int w5100_probe(struct device *dev, const struct w5100_ops *ops,
priv->ndev = ndev;
priv->ops = ops;
priv->irq = irq;
- priv->link_gpio = link_gpio;
ndev->netdev_ops = &w5100_netdev_ops;
ndev->ethtool_ops = &w5100_ethtool_ops;
@@ -840,26 +803,8 @@ int w5100_probe(struct device *dev, const struct w5100_ops *ops,
if (err)
goto err_hw;
- if (gpio_is_valid(priv->link_gpio)) {
- char *link_name = devm_kzalloc(dev, 16, GFP_KERNEL);
-
- if (!link_name) {
- err = -ENOMEM;
- goto err_gpio;
- }
- snprintf(link_name, 16, "%s-link", netdev_name(ndev));
- priv->link_irq = gpio_to_irq(priv->link_gpio);
- if (request_any_context_irq(priv->link_irq, w5100_detect_link,
- IRQF_TRIGGER_RISING |
- IRQF_TRIGGER_FALLING,
- link_name, priv->ndev) < 0)
- priv->link_gpio = -EINVAL;
- }
-
return 0;
-err_gpio:
- free_irq(priv->irq, ndev);
err_hw:
destroy_workqueue(priv->xfer_wq);
err_wq:
@@ -877,8 +822,6 @@ void w5100_remove(struct device *dev)
w5100_hw_reset(priv);
free_irq(priv->irq, ndev);
- if (gpio_is_valid(priv->link_gpio))
- free_irq(priv->link_irq, ndev);
flush_work(&priv->setrx_work);
flush_work(&priv->restart_work);
@@ -914,9 +857,6 @@ static int w5100_resume(struct device *dev)
w5100_hw_start(priv);
netif_device_attach(ndev);
- if (!gpio_is_valid(priv->link_gpio) ||
- gpio_get_value(priv->link_gpio) != 0)
- netif_carrier_on(ndev);
}
return 0;
}
diff --git a/drivers/net/ethernet/wiznet/w5100.h b/drivers/net/ethernet/wiznet/w5100.h
index 481af3b6d9e8..76e1b8149041 100644
--- a/drivers/net/ethernet/wiznet/w5100.h
+++ b/drivers/net/ethernet/wiznet/w5100.h
@@ -29,8 +29,7 @@ struct w5100_ops {
void *w5100_ops_priv(const struct net_device *ndev);
int w5100_probe(struct device *dev, const struct w5100_ops *ops,
- int sizeof_ops_priv, const void *mac_addr, int irq,
- int link_gpio);
+ int sizeof_ops_priv, const void *mac_addr, int irq);
void w5100_remove(struct device *dev);
extern const struct dev_pm_ops w5100_pm_ops;
--
2.39.5
^ permalink raw reply related
* [PATCH 2/3] [net-next]: w5300: remove unused driver
From: Arnd Bergmann @ 2026-05-05 18:04 UTC (permalink / raw)
To: netdev; +Cc: Rob Herring, Linus Walleij, Bartosz Golaszewski, Arnd Bergmann
In-Reply-To: <20260505180459.1247690-1-arnd@kernel.org>
From: Arnd Bergmann <arnd@arndb.de>
Unlike w5100, this driver does not support SPI mode or devicetree
bindings, and is hence entirely unusable without third-party board
support patches that likely haven't existed for any recent kernel
version.
Remove the entire driver.
If anyone is in fact using it with their custom board files, they
can bring it back and include an earlier patch I sent to add
DT based probing for the GPIO lines.
Link: https://lore.kernel.org/all/20260427142924.2702598-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/wiznet/Kconfig | 40 --
drivers/net/ethernet/wiznet/Makefile | 1 -
drivers/net/ethernet/wiznet/w5300.c | 687 ---------------------------
3 files changed, 728 deletions(-)
delete mode 100644 drivers/net/ethernet/wiznet/w5300.c
diff --git a/drivers/net/ethernet/wiznet/Kconfig b/drivers/net/ethernet/wiznet/Kconfig
index 67b3376b39c7..bc83f6d389b1 100644
--- a/drivers/net/ethernet/wiznet/Kconfig
+++ b/drivers/net/ethernet/wiznet/Kconfig
@@ -29,44 +29,4 @@ config WIZNET_W5100
To compile this driver as a module, choose M here: the module
will be called w5100.
-config WIZNET_W5300
- tristate "WIZnet W5300 Ethernet support"
- depends on HAS_IOMEM
- help
- Support for WIZnet W5300 chips.
-
- W5300 is a single chip with integrated 10/100 Ethernet MAC,
- PHY and hardware TCP/IP stack, but this driver is limited to
- the MAC and PHY functions only, onchip TCP/IP is unused.
-
- To compile this driver as a module, choose M here: the module
- will be called w5300.
-
-choice
- prompt "WIZnet interface mode"
- depends on WIZNET_W5100 || WIZNET_W5300
- default WIZNET_BUS_ANY
-
-config WIZNET_BUS_DIRECT
- bool "Direct address bus mode"
- help
- In direct address mode host system can directly access all registers
- after mapping to Memory-Mapped I/O space.
-
-config WIZNET_BUS_INDIRECT
- bool "Indirect address bus mode"
- help
- In indirect address mode host system indirectly accesses registers
- using Indirect Mode Address Register and Indirect Mode Data Register,
- which are directly mapped to Memory-Mapped I/O space.
-
-config WIZNET_BUS_ANY
- bool "Select interface mode in runtime"
- help
- If interface mode is unknown in compile time, it can be selected
- in runtime from board/platform resources configuration.
-
- Performance may decrease compared to explicitly selected bus mode.
-endchoice
-
endif # NET_VENDOR_WIZNET
diff --git a/drivers/net/ethernet/wiznet/Makefile b/drivers/net/ethernet/wiznet/Makefile
index a97fdcdf4632..502150d6ead2 100644
--- a/drivers/net/ethernet/wiznet/Makefile
+++ b/drivers/net/ethernet/wiznet/Makefile
@@ -1,3 +1,2 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_WIZNET_W5100) += w5100.o w5100-spi.o
-obj-$(CONFIG_WIZNET_W5300) += w5300.o
diff --git a/drivers/net/ethernet/wiznet/w5300.c b/drivers/net/ethernet/wiznet/w5300.c
deleted file mode 100644
index 3e711dea3b2c..000000000000
--- a/drivers/net/ethernet/wiznet/w5300.c
+++ /dev/null
@@ -1,687 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * Ethernet driver for the WIZnet W5300 chip.
- *
- * Copyright (C) 2008-2009 WIZnet Co.,Ltd.
- * Copyright (C) 2011 Taehun Kim <kth3321 <at> gmail.com>
- * Copyright (C) 2012 Mike Sinkovsky <msink@permonline.ru>
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/netdevice.h>
-#include <linux/etherdevice.h>
-#include <linux/platform_device.h>
-#include <linux/platform_data/wiznet.h>
-#include <linux/ethtool.h>
-#include <linux/skbuff.h>
-#include <linux/types.h>
-#include <linux/errno.h>
-#include <linux/delay.h>
-#include <linux/slab.h>
-#include <linux/spinlock.h>
-#include <linux/io.h>
-#include <linux/ioport.h>
-#include <linux/interrupt.h>
-#include <linux/irq.h>
-#include <linux/gpio.h>
-
-#define DRV_NAME "w5300"
-#define DRV_VERSION "2012-04-04"
-
-MODULE_DESCRIPTION("WIZnet W5300 Ethernet driver v"DRV_VERSION);
-MODULE_AUTHOR("Mike Sinkovsky <msink@permonline.ru>");
-MODULE_ALIAS("platform:"DRV_NAME);
-MODULE_LICENSE("GPL");
-
-/*
- * Registers
- */
-#define W5300_MR 0x0000 /* Mode Register */
-#define MR_DBW (1 << 15) /* Data bus width */
-#define MR_MPF (1 << 14) /* Mac layer pause frame */
-#define MR_WDF(n) (((n)&7)<<11) /* Write data fetch time */
-#define MR_RDH (1 << 10) /* Read data hold time */
-#define MR_FS (1 << 8) /* FIFO swap */
-#define MR_RST (1 << 7) /* S/W reset */
-#define MR_PB (1 << 4) /* Ping block */
-#define MR_DBS (1 << 2) /* Data bus swap */
-#define MR_IND (1 << 0) /* Indirect mode */
-#define W5300_IR 0x0002 /* Interrupt Register */
-#define W5300_IMR 0x0004 /* Interrupt Mask Register */
-#define IR_S0 0x0001 /* S0 interrupt */
-#define W5300_SHARL 0x0008 /* Source MAC address (0123) */
-#define W5300_SHARH 0x000c /* Source MAC address (45) */
-#define W5300_TMSRL 0x0020 /* Transmit Memory Size (0123) */
-#define W5300_TMSRH 0x0024 /* Transmit Memory Size (4567) */
-#define W5300_RMSRL 0x0028 /* Receive Memory Size (0123) */
-#define W5300_RMSRH 0x002c /* Receive Memory Size (4567) */
-#define W5300_MTYPE 0x0030 /* Memory Type */
-#define W5300_IDR 0x00fe /* Chip ID register */
-#define IDR_W5300 0x5300 /* =0x5300 for WIZnet W5300 */
-#define W5300_S0_MR 0x0200 /* S0 Mode Register */
-#define S0_MR_CLOSED 0x0000 /* Close mode */
-#define S0_MR_MACRAW 0x0004 /* MAC RAW mode (promiscuous) */
-#define S0_MR_MACRAW_MF 0x0044 /* MAC RAW mode (filtered) */
-#define W5300_S0_CR 0x0202 /* S0 Command Register */
-#define S0_CR_OPEN 0x0001 /* OPEN command */
-#define S0_CR_CLOSE 0x0010 /* CLOSE command */
-#define S0_CR_SEND 0x0020 /* SEND command */
-#define S0_CR_RECV 0x0040 /* RECV command */
-#define W5300_S0_IMR 0x0204 /* S0 Interrupt Mask Register */
-#define W5300_S0_IR 0x0206 /* S0 Interrupt Register */
-#define S0_IR_RECV 0x0004 /* Receive interrupt */
-#define S0_IR_SENDOK 0x0010 /* Send OK interrupt */
-#define W5300_S0_SSR 0x0208 /* S0 Socket Status Register */
-#define W5300_S0_TX_WRSR 0x0220 /* S0 TX Write Size Register */
-#define W5300_S0_TX_FSR 0x0224 /* S0 TX Free Size Register */
-#define W5300_S0_RX_RSR 0x0228 /* S0 Received data Size */
-#define W5300_S0_TX_FIFO 0x022e /* S0 Transmit FIFO */
-#define W5300_S0_RX_FIFO 0x0230 /* S0 Receive FIFO */
-#define W5300_REGS_LEN 0x0400
-
-/*
- * Device driver private data structure
- */
-struct w5300_priv {
- void __iomem *base;
- spinlock_t reg_lock;
- bool indirect;
- u16 (*read) (struct w5300_priv *priv, u16 addr);
- void (*write)(struct w5300_priv *priv, u16 addr, u16 data);
- int irq;
- int link_irq;
- int link_gpio;
-
- struct napi_struct napi;
- struct net_device *ndev;
- bool promisc;
- u32 msg_enable;
-};
-
-/************************************************************************
- *
- * Lowlevel I/O functions
- *
- ***********************************************************************/
-
-/*
- * In direct address mode host system can directly access W5300 registers
- * after mapping to Memory-Mapped I/O space.
- *
- * 0x400 bytes are required for memory space.
- */
-static inline u16 w5300_read_direct(struct w5300_priv *priv, u16 addr)
-{
- return ioread16(priv->base + (addr << CONFIG_WIZNET_BUS_SHIFT));
-}
-
-static inline void w5300_write_direct(struct w5300_priv *priv,
- u16 addr, u16 data)
-{
- iowrite16(data, priv->base + (addr << CONFIG_WIZNET_BUS_SHIFT));
-}
-
-/*
- * In indirect address mode host system indirectly accesses registers by
- * using Indirect Mode Address Register (IDM_AR) and Indirect Mode Data
- * Register (IDM_DR), which are directly mapped to Memory-Mapped I/O space.
- * Mode Register (MR) is directly accessible.
- *
- * Only 0x06 bytes are required for memory space.
- */
-#define W5300_IDM_AR 0x0002 /* Indirect Mode Address */
-#define W5300_IDM_DR 0x0004 /* Indirect Mode Data */
-
-static u16 w5300_read_indirect(struct w5300_priv *priv, u16 addr)
-{
- unsigned long flags;
- u16 data;
-
- spin_lock_irqsave(&priv->reg_lock, flags);
- w5300_write_direct(priv, W5300_IDM_AR, addr);
- data = w5300_read_direct(priv, W5300_IDM_DR);
- spin_unlock_irqrestore(&priv->reg_lock, flags);
-
- return data;
-}
-
-static void w5300_write_indirect(struct w5300_priv *priv, u16 addr, u16 data)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&priv->reg_lock, flags);
- w5300_write_direct(priv, W5300_IDM_AR, addr);
- w5300_write_direct(priv, W5300_IDM_DR, data);
- spin_unlock_irqrestore(&priv->reg_lock, flags);
-}
-
-#if defined(CONFIG_WIZNET_BUS_DIRECT)
-#define w5300_read w5300_read_direct
-#define w5300_write w5300_write_direct
-
-#elif defined(CONFIG_WIZNET_BUS_INDIRECT)
-#define w5300_read w5300_read_indirect
-#define w5300_write w5300_write_indirect
-
-#else /* CONFIG_WIZNET_BUS_ANY */
-#define w5300_read priv->read
-#define w5300_write priv->write
-#endif
-
-static u32 w5300_read32(struct w5300_priv *priv, u16 addr)
-{
- u32 data;
- data = w5300_read(priv, addr) << 16;
- data |= w5300_read(priv, addr + 2);
- return data;
-}
-
-static void w5300_write32(struct w5300_priv *priv, u16 addr, u32 data)
-{
- w5300_write(priv, addr, data >> 16);
- w5300_write(priv, addr + 2, data);
-}
-
-static int w5300_command(struct w5300_priv *priv, u16 cmd)
-{
- unsigned long timeout = jiffies + msecs_to_jiffies(100);
-
- w5300_write(priv, W5300_S0_CR, cmd);
-
- while (w5300_read(priv, W5300_S0_CR) != 0) {
- if (time_after(jiffies, timeout))
- return -EIO;
- cpu_relax();
- }
-
- return 0;
-}
-
-static void w5300_read_frame(struct w5300_priv *priv, u8 *buf, int len)
-{
- u16 fifo;
- int i;
-
- for (i = 0; i < len; i += 2) {
- fifo = w5300_read(priv, W5300_S0_RX_FIFO);
- *buf++ = fifo >> 8;
- *buf++ = fifo;
- }
- fifo = w5300_read(priv, W5300_S0_RX_FIFO);
- fifo = w5300_read(priv, W5300_S0_RX_FIFO);
-}
-
-static void w5300_write_frame(struct w5300_priv *priv, u8 *buf, int len)
-{
- u16 fifo;
- int i;
-
- for (i = 0; i < len; i += 2) {
- fifo = *buf++ << 8;
- fifo |= *buf++;
- w5300_write(priv, W5300_S0_TX_FIFO, fifo);
- }
- w5300_write32(priv, W5300_S0_TX_WRSR, len);
-}
-
-static void w5300_write_macaddr(struct w5300_priv *priv)
-{
- struct net_device *ndev = priv->ndev;
- w5300_write32(priv, W5300_SHARL,
- ndev->dev_addr[0] << 24 |
- ndev->dev_addr[1] << 16 |
- ndev->dev_addr[2] << 8 |
- ndev->dev_addr[3]);
- w5300_write(priv, W5300_SHARH,
- ndev->dev_addr[4] << 8 |
- ndev->dev_addr[5]);
-}
-
-static void w5300_hw_reset(struct w5300_priv *priv)
-{
- w5300_write_direct(priv, W5300_MR, MR_RST);
- mdelay(5);
- w5300_write_direct(priv, W5300_MR, priv->indirect ?
- MR_WDF(7) | MR_PB | MR_IND :
- MR_WDF(7) | MR_PB);
- w5300_write(priv, W5300_IMR, 0);
- w5300_write_macaddr(priv);
-
- /* Configure 128K of internal memory
- * as 64K RX fifo and 64K TX fifo
- */
- w5300_write32(priv, W5300_RMSRL, 64 << 24);
- w5300_write32(priv, W5300_RMSRH, 0);
- w5300_write32(priv, W5300_TMSRL, 64 << 24);
- w5300_write32(priv, W5300_TMSRH, 0);
- w5300_write(priv, W5300_MTYPE, 0x00ff);
-}
-
-static void w5300_hw_start(struct w5300_priv *priv)
-{
- w5300_write(priv, W5300_S0_MR, priv->promisc ?
- S0_MR_MACRAW : S0_MR_MACRAW_MF);
- w5300_command(priv, S0_CR_OPEN);
- w5300_write(priv, W5300_S0_IMR, S0_IR_RECV | S0_IR_SENDOK);
- w5300_write(priv, W5300_IMR, IR_S0);
-}
-
-static void w5300_hw_close(struct w5300_priv *priv)
-{
- w5300_write(priv, W5300_IMR, 0);
- w5300_command(priv, S0_CR_CLOSE);
-}
-
-/***********************************************************************
- *
- * Device driver functions / callbacks
- *
- ***********************************************************************/
-
-static void w5300_get_drvinfo(struct net_device *ndev,
- struct ethtool_drvinfo *info)
-{
- strscpy(info->driver, DRV_NAME, sizeof(info->driver));
- strscpy(info->version, DRV_VERSION, sizeof(info->version));
- strscpy(info->bus_info, dev_name(ndev->dev.parent),
- sizeof(info->bus_info));
-}
-
-static u32 w5300_get_link(struct net_device *ndev)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
-
- if (gpio_is_valid(priv->link_gpio))
- return !!gpio_get_value(priv->link_gpio);
-
- return 1;
-}
-
-static u32 w5300_get_msglevel(struct net_device *ndev)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
-
- return priv->msg_enable;
-}
-
-static void w5300_set_msglevel(struct net_device *ndev, u32 value)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
-
- priv->msg_enable = value;
-}
-
-static int w5300_get_regs_len(struct net_device *ndev)
-{
- return W5300_REGS_LEN;
-}
-
-static void w5300_get_regs(struct net_device *ndev,
- struct ethtool_regs *regs, void *_buf)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
- u8 *buf = _buf;
- u16 addr;
- u16 data;
-
- regs->version = 1;
- for (addr = 0; addr < W5300_REGS_LEN; addr += 2) {
- switch (addr & 0x23f) {
- case W5300_S0_TX_FIFO: /* cannot read TX_FIFO */
- case W5300_S0_RX_FIFO: /* cannot read RX_FIFO */
- data = 0xffff;
- break;
- default:
- data = w5300_read(priv, addr);
- break;
- }
- *buf++ = data >> 8;
- *buf++ = data;
- }
-}
-
-static void w5300_tx_timeout(struct net_device *ndev, unsigned int txqueue)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
-
- netif_stop_queue(ndev);
- w5300_hw_reset(priv);
- w5300_hw_start(priv);
- ndev->stats.tx_errors++;
- netif_trans_update(ndev);
- netif_wake_queue(ndev);
-}
-
-static netdev_tx_t w5300_start_tx(struct sk_buff *skb, struct net_device *ndev)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
-
- netif_stop_queue(ndev);
-
- w5300_write_frame(priv, skb->data, skb->len);
- ndev->stats.tx_packets++;
- ndev->stats.tx_bytes += skb->len;
- dev_kfree_skb(skb);
- netif_dbg(priv, tx_queued, ndev, "tx queued\n");
-
- w5300_command(priv, S0_CR_SEND);
-
- return NETDEV_TX_OK;
-}
-
-static int w5300_napi_poll(struct napi_struct *napi, int budget)
-{
- struct w5300_priv *priv = container_of(napi, struct w5300_priv, napi);
- struct net_device *ndev = priv->ndev;
- struct sk_buff *skb;
- int rx_count;
- u16 rx_len;
-
- for (rx_count = 0; rx_count < budget; rx_count++) {
- u32 rx_fifo_len = w5300_read32(priv, W5300_S0_RX_RSR);
- if (rx_fifo_len == 0)
- break;
-
- rx_len = w5300_read(priv, W5300_S0_RX_FIFO);
-
- skb = netdev_alloc_skb_ip_align(ndev, roundup(rx_len, 2));
- if (unlikely(!skb)) {
- u32 i;
- for (i = 0; i < rx_fifo_len; i += 2)
- w5300_read(priv, W5300_S0_RX_FIFO);
- ndev->stats.rx_dropped++;
- return -ENOMEM;
- }
-
- skb_put(skb, rx_len);
- w5300_read_frame(priv, skb->data, rx_len);
- skb->protocol = eth_type_trans(skb, ndev);
-
- netif_receive_skb(skb);
- ndev->stats.rx_packets++;
- ndev->stats.rx_bytes += rx_len;
- }
-
- if (rx_count < budget) {
- napi_complete_done(napi, rx_count);
- w5300_write(priv, W5300_IMR, IR_S0);
- }
-
- return rx_count;
-}
-
-static irqreturn_t w5300_interrupt(int irq, void *ndev_instance)
-{
- struct net_device *ndev = ndev_instance;
- struct w5300_priv *priv = netdev_priv(ndev);
-
- int ir = w5300_read(priv, W5300_S0_IR);
- if (!ir)
- return IRQ_NONE;
- w5300_write(priv, W5300_S0_IR, ir);
-
- if (ir & S0_IR_SENDOK) {
- netif_dbg(priv, tx_done, ndev, "tx done\n");
- netif_wake_queue(ndev);
- }
-
- if (ir & S0_IR_RECV) {
- if (napi_schedule_prep(&priv->napi)) {
- w5300_write(priv, W5300_IMR, 0);
- __napi_schedule(&priv->napi);
- }
- }
-
- return IRQ_HANDLED;
-}
-
-static irqreturn_t w5300_detect_link(int irq, void *ndev_instance)
-{
- struct net_device *ndev = ndev_instance;
- struct w5300_priv *priv = netdev_priv(ndev);
-
- if (netif_running(ndev)) {
- if (gpio_get_value(priv->link_gpio) != 0) {
- netif_info(priv, link, ndev, "link is up\n");
- netif_carrier_on(ndev);
- } else {
- netif_info(priv, link, ndev, "link is down\n");
- netif_carrier_off(ndev);
- }
- }
-
- return IRQ_HANDLED;
-}
-
-static void w5300_set_rx_mode(struct net_device *ndev)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
- bool set_promisc = (ndev->flags & IFF_PROMISC) != 0;
-
- if (priv->promisc != set_promisc) {
- priv->promisc = set_promisc;
- w5300_hw_start(priv);
- }
-}
-
-static int w5300_set_macaddr(struct net_device *ndev, void *addr)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
- struct sockaddr *sock_addr = addr;
-
- if (!is_valid_ether_addr(sock_addr->sa_data))
- return -EADDRNOTAVAIL;
- eth_hw_addr_set(ndev, sock_addr->sa_data);
- w5300_write_macaddr(priv);
- return 0;
-}
-
-static int w5300_open(struct net_device *ndev)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
-
- netif_info(priv, ifup, ndev, "enabling\n");
- w5300_hw_start(priv);
- napi_enable(&priv->napi);
- netif_start_queue(ndev);
- if (!gpio_is_valid(priv->link_gpio) ||
- gpio_get_value(priv->link_gpio) != 0)
- netif_carrier_on(ndev);
- return 0;
-}
-
-static int w5300_stop(struct net_device *ndev)
-{
- struct w5300_priv *priv = netdev_priv(ndev);
-
- netif_info(priv, ifdown, ndev, "shutting down\n");
- w5300_hw_close(priv);
- netif_carrier_off(ndev);
- netif_stop_queue(ndev);
- napi_disable(&priv->napi);
- return 0;
-}
-
-static const struct ethtool_ops w5300_ethtool_ops = {
- .get_drvinfo = w5300_get_drvinfo,
- .get_msglevel = w5300_get_msglevel,
- .set_msglevel = w5300_set_msglevel,
- .get_link = w5300_get_link,
- .get_regs_len = w5300_get_regs_len,
- .get_regs = w5300_get_regs,
-};
-
-static const struct net_device_ops w5300_netdev_ops = {
- .ndo_open = w5300_open,
- .ndo_stop = w5300_stop,
- .ndo_start_xmit = w5300_start_tx,
- .ndo_tx_timeout = w5300_tx_timeout,
- .ndo_set_rx_mode = w5300_set_rx_mode,
- .ndo_set_mac_address = w5300_set_macaddr,
- .ndo_validate_addr = eth_validate_addr,
-};
-
-static int w5300_hw_probe(struct platform_device *pdev)
-{
- struct wiznet_platform_data *data = dev_get_platdata(&pdev->dev);
- struct net_device *ndev = platform_get_drvdata(pdev);
- struct w5300_priv *priv = netdev_priv(ndev);
- const char *name = netdev_name(ndev);
- struct resource *mem;
- int mem_size;
- int irq;
- int ret;
-
- if (data && is_valid_ether_addr(data->mac_addr)) {
- eth_hw_addr_set(ndev, data->mac_addr);
- } else {
- eth_hw_addr_random(ndev);
- }
-
- priv->base = devm_platform_get_and_ioremap_resource(pdev, 0, &mem);
- if (IS_ERR(priv->base))
- return PTR_ERR(priv->base);
-
- mem_size = resource_size(mem);
-
- spin_lock_init(&priv->reg_lock);
- priv->indirect = mem_size < W5300_BUS_DIRECT_SIZE;
- if (priv->indirect) {
- priv->read = w5300_read_indirect;
- priv->write = w5300_write_indirect;
- } else {
- priv->read = w5300_read_direct;
- priv->write = w5300_write_direct;
- }
-
- w5300_hw_reset(priv);
- if (w5300_read(priv, W5300_IDR) != IDR_W5300)
- return -ENODEV;
-
- irq = platform_get_irq(pdev, 0);
- if (irq < 0)
- return irq;
- ret = request_irq(irq, w5300_interrupt,
- IRQ_TYPE_LEVEL_LOW, name, ndev);
- if (ret < 0)
- return ret;
- priv->irq = irq;
-
- priv->link_gpio = data ? data->link_gpio : -EINVAL;
- if (gpio_is_valid(priv->link_gpio)) {
- char *link_name = devm_kzalloc(&pdev->dev, 16, GFP_KERNEL);
- if (!link_name)
- return -ENOMEM;
- snprintf(link_name, 16, "%s-link", name);
- priv->link_irq = gpio_to_irq(priv->link_gpio);
- if (request_any_context_irq(priv->link_irq, w5300_detect_link,
- IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING,
- link_name, priv->ndev) < 0)
- priv->link_gpio = -EINVAL;
- }
-
- netdev_info(ndev, "at 0x%llx irq %d\n", (u64)mem->start, irq);
- return 0;
-}
-
-static int w5300_probe(struct platform_device *pdev)
-{
- struct w5300_priv *priv;
- struct net_device *ndev;
- int err;
-
- ndev = alloc_etherdev(sizeof(*priv));
- if (!ndev)
- return -ENOMEM;
- SET_NETDEV_DEV(ndev, &pdev->dev);
- platform_set_drvdata(pdev, ndev);
- priv = netdev_priv(ndev);
- priv->ndev = ndev;
-
- ndev->netdev_ops = &w5300_netdev_ops;
- ndev->ethtool_ops = &w5300_ethtool_ops;
- ndev->watchdog_timeo = HZ;
- netif_napi_add_weight(ndev, &priv->napi, w5300_napi_poll, 16);
-
- /* This chip doesn't support VLAN packets with normal MTU,
- * so disable VLAN for this device.
- */
- ndev->features |= NETIF_F_VLAN_CHALLENGED;
-
- err = register_netdev(ndev);
- if (err < 0)
- goto err_register;
-
- err = w5300_hw_probe(pdev);
- if (err < 0)
- goto err_hw_probe;
-
- return 0;
-
-err_hw_probe:
- unregister_netdev(ndev);
-err_register:
- free_netdev(ndev);
- return err;
-}
-
-static void w5300_remove(struct platform_device *pdev)
-{
- struct net_device *ndev = platform_get_drvdata(pdev);
- struct w5300_priv *priv = netdev_priv(ndev);
-
- w5300_hw_reset(priv);
- free_irq(priv->irq, ndev);
- if (gpio_is_valid(priv->link_gpio))
- free_irq(priv->link_irq, ndev);
-
- unregister_netdev(ndev);
- free_netdev(ndev);
-}
-
-#ifdef CONFIG_PM_SLEEP
-static int w5300_suspend(struct device *dev)
-{
- struct net_device *ndev = dev_get_drvdata(dev);
- struct w5300_priv *priv = netdev_priv(ndev);
-
- if (netif_running(ndev)) {
- netif_carrier_off(ndev);
- netif_device_detach(ndev);
-
- w5300_hw_close(priv);
- }
- return 0;
-}
-
-static int w5300_resume(struct device *dev)
-{
- struct net_device *ndev = dev_get_drvdata(dev);
- struct w5300_priv *priv = netdev_priv(ndev);
-
- if (!netif_running(ndev)) {
- w5300_hw_reset(priv);
- w5300_hw_start(priv);
-
- netif_device_attach(ndev);
- if (!gpio_is_valid(priv->link_gpio) ||
- gpio_get_value(priv->link_gpio) != 0)
- netif_carrier_on(ndev);
- }
- return 0;
-}
-#endif /* CONFIG_PM_SLEEP */
-
-static SIMPLE_DEV_PM_OPS(w5300_pm_ops, w5300_suspend, w5300_resume);
-
-static struct platform_driver w5300_driver = {
- .driver = {
- .name = DRV_NAME,
- .pm = &w5300_pm_ops,
- },
- .probe = w5300_probe,
- .remove = w5300_remove,
-};
-
-module_platform_driver(w5300_driver);
--
2.39.5
^ permalink raw reply related
* [PATCH 1/3] [net-next] w5100: remove MMIO support
From: Arnd Bergmann @ 2026-05-05 18:04 UTC (permalink / raw)
To: netdev; +Cc: Rob Herring, Linus Walleij, Bartosz Golaszewski, Arnd Bergmann
From: Arnd Bergmann <arnd@arndb.de>
This driver supports both SPI and MMIO based register access, but only
the former has devicetree support. While MMIO mode would have worked
with old-style board files, those have never defined such a device
upstream.
Remove the MMIO mode, leaving SPI as the only way to use this driver,
but leave it in two loadable modules. More cleanups can be done by
combining the two into one file.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/wiznet/Kconfig | 19 +-
drivers/net/ethernet/wiznet/Makefile | 3 +-
drivers/net/ethernet/wiznet/w5100.c | 350 ---------------------------
include/linux/platform_data/wiznet.h | 23 --
4 files changed, 3 insertions(+), 392 deletions(-)
delete mode 100644 include/linux/platform_data/wiznet.h
diff --git a/drivers/net/ethernet/wiznet/Kconfig b/drivers/net/ethernet/wiznet/Kconfig
index 4bac2ad2d6a1..67b3376b39c7 100644
--- a/drivers/net/ethernet/wiznet/Kconfig
+++ b/drivers/net/ethernet/wiznet/Kconfig
@@ -5,7 +5,6 @@
config NET_VENDOR_WIZNET
bool "WIZnet devices"
- depends on HAS_IOMEM
default y
help
If you have a network (Ethernet) card belonging to this class, say Y.
@@ -18,8 +17,8 @@ config NET_VENDOR_WIZNET
if NET_VENDOR_WIZNET
config WIZNET_W5100
- tristate "WIZnet W5100 Ethernet support"
- depends on HAS_IOMEM
+ tristate "WIZnet W5100/W5200/W5500 Ethernet support for SPI mode"
+ depends on SPI
help
Support for WIZnet W5100 chips.
@@ -70,18 +69,4 @@ config WIZNET_BUS_ANY
Performance may decrease compared to explicitly selected bus mode.
endchoice
-config WIZNET_W5100_SPI
- tristate "WIZnet W5100/W5200/W5500 Ethernet support for SPI mode"
- depends on WIZNET_BUS_ANY && WIZNET_W5100
- depends on SPI
- help
- In SPI mode host system accesses registers using SPI protocol
- (mode 0) on the SPI bus.
-
- Performance decreases compared to other bus interface mode.
- In W5100 SPI mode, burst READ/WRITE processing are not provided.
-
- To compile this driver as a module, choose M here: the module
- will be called w5100-spi.
-
endif # NET_VENDOR_WIZNET
diff --git a/drivers/net/ethernet/wiznet/Makefile b/drivers/net/ethernet/wiznet/Makefile
index 78104f0bf415..a97fdcdf4632 100644
--- a/drivers/net/ethernet/wiznet/Makefile
+++ b/drivers/net/ethernet/wiznet/Makefile
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_WIZNET_W5100) += w5100.o
-obj-$(CONFIG_WIZNET_W5100_SPI) += w5100-spi.o
+obj-$(CONFIG_WIZNET_W5100) += w5100.o w5100-spi.o
obj-$(CONFIG_WIZNET_W5300) += w5300.o
diff --git a/drivers/net/ethernet/wiznet/w5100.c b/drivers/net/ethernet/wiznet/w5100.c
index c5424d882135..cfe6813ce805 100644
--- a/drivers/net/ethernet/wiznet/w5100.c
+++ b/drivers/net/ethernet/wiznet/w5100.c
@@ -11,7 +11,6 @@
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/platform_device.h>
-#include <linux/platform_data/wiznet.h>
#include <linux/ethtool.h>
#include <linux/skbuff.h>
#include <linux/types.h>
@@ -172,311 +171,6 @@ struct w5100_priv {
struct work_struct restart_work;
};
-/************************************************************************
- *
- * Lowlevel I/O functions
- *
- ***********************************************************************/
-
-struct w5100_mmio_priv {
- void __iomem *base;
- /* Serialize access in indirect address mode */
- spinlock_t reg_lock;
-};
-
-static inline struct w5100_mmio_priv *w5100_mmio_priv(struct net_device *dev)
-{
- return w5100_ops_priv(dev);
-}
-
-static inline void __iomem *w5100_mmio(struct net_device *ndev)
-{
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
-
- return mmio_priv->base;
-}
-
-/*
- * In direct address mode host system can directly access W5100 registers
- * after mapping to Memory-Mapped I/O space.
- *
- * 0x8000 bytes are required for memory space.
- */
-static inline int w5100_read_direct(struct net_device *ndev, u32 addr)
-{
- return ioread8(w5100_mmio(ndev) + (addr << CONFIG_WIZNET_BUS_SHIFT));
-}
-
-static inline int __w5100_write_direct(struct net_device *ndev, u32 addr,
- u8 data)
-{
- iowrite8(data, w5100_mmio(ndev) + (addr << CONFIG_WIZNET_BUS_SHIFT));
-
- return 0;
-}
-
-static inline int w5100_write_direct(struct net_device *ndev, u32 addr, u8 data)
-{
- __w5100_write_direct(ndev, addr, data);
-
- return 0;
-}
-
-static int w5100_read16_direct(struct net_device *ndev, u32 addr)
-{
- u16 data;
- data = w5100_read_direct(ndev, addr) << 8;
- data |= w5100_read_direct(ndev, addr + 1);
- return data;
-}
-
-static int w5100_write16_direct(struct net_device *ndev, u32 addr, u16 data)
-{
- __w5100_write_direct(ndev, addr, data >> 8);
- __w5100_write_direct(ndev, addr + 1, data);
-
- return 0;
-}
-
-static int w5100_readbulk_direct(struct net_device *ndev, u32 addr, u8 *buf,
- int len)
-{
- int i;
-
- for (i = 0; i < len; i++, addr++)
- *buf++ = w5100_read_direct(ndev, addr);
-
- return 0;
-}
-
-static int w5100_writebulk_direct(struct net_device *ndev, u32 addr,
- const u8 *buf, int len)
-{
- int i;
-
- for (i = 0; i < len; i++, addr++)
- __w5100_write_direct(ndev, addr, *buf++);
-
- return 0;
-}
-
-static int w5100_mmio_init(struct net_device *ndev)
-{
- struct platform_device *pdev = to_platform_device(ndev->dev.parent);
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
-
- spin_lock_init(&mmio_priv->reg_lock);
-
- mmio_priv->base = devm_platform_get_and_ioremap_resource(pdev, 0, NULL);
- if (IS_ERR(mmio_priv->base))
- return PTR_ERR(mmio_priv->base);
-
- return 0;
-}
-
-static const struct w5100_ops w5100_mmio_direct_ops = {
- .chip_id = W5100,
- .read = w5100_read_direct,
- .write = w5100_write_direct,
- .read16 = w5100_read16_direct,
- .write16 = w5100_write16_direct,
- .readbulk = w5100_readbulk_direct,
- .writebulk = w5100_writebulk_direct,
- .init = w5100_mmio_init,
-};
-
-/*
- * In indirect address mode host system indirectly accesses registers by
- * using Indirect Mode Address Register (IDM_AR) and Indirect Mode Data
- * Register (IDM_DR), which are directly mapped to Memory-Mapped I/O space.
- * Mode Register (MR) is directly accessible.
- *
- * Only 0x04 bytes are required for memory space.
- */
-#define W5100_IDM_AR 0x01 /* Indirect Mode Address Register */
-#define W5100_IDM_DR 0x03 /* Indirect Mode Data Register */
-
-static int w5100_read_indirect(struct net_device *ndev, u32 addr)
-{
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
- unsigned long flags;
- u8 data;
-
- spin_lock_irqsave(&mmio_priv->reg_lock, flags);
- w5100_write16_direct(ndev, W5100_IDM_AR, addr);
- data = w5100_read_direct(ndev, W5100_IDM_DR);
- spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
-
- return data;
-}
-
-static int w5100_write_indirect(struct net_device *ndev, u32 addr, u8 data)
-{
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
- unsigned long flags;
-
- spin_lock_irqsave(&mmio_priv->reg_lock, flags);
- w5100_write16_direct(ndev, W5100_IDM_AR, addr);
- w5100_write_direct(ndev, W5100_IDM_DR, data);
- spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
-
- return 0;
-}
-
-static int w5100_read16_indirect(struct net_device *ndev, u32 addr)
-{
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
- unsigned long flags;
- u16 data;
-
- spin_lock_irqsave(&mmio_priv->reg_lock, flags);
- w5100_write16_direct(ndev, W5100_IDM_AR, addr);
- data = w5100_read_direct(ndev, W5100_IDM_DR) << 8;
- data |= w5100_read_direct(ndev, W5100_IDM_DR);
- spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
-
- return data;
-}
-
-static int w5100_write16_indirect(struct net_device *ndev, u32 addr, u16 data)
-{
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
- unsigned long flags;
-
- spin_lock_irqsave(&mmio_priv->reg_lock, flags);
- w5100_write16_direct(ndev, W5100_IDM_AR, addr);
- __w5100_write_direct(ndev, W5100_IDM_DR, data >> 8);
- w5100_write_direct(ndev, W5100_IDM_DR, data);
- spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
-
- return 0;
-}
-
-static int w5100_readbulk_indirect(struct net_device *ndev, u32 addr, u8 *buf,
- int len)
-{
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
- unsigned long flags;
- int i;
-
- spin_lock_irqsave(&mmio_priv->reg_lock, flags);
- w5100_write16_direct(ndev, W5100_IDM_AR, addr);
-
- for (i = 0; i < len; i++)
- *buf++ = w5100_read_direct(ndev, W5100_IDM_DR);
-
- spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
-
- return 0;
-}
-
-static int w5100_writebulk_indirect(struct net_device *ndev, u32 addr,
- const u8 *buf, int len)
-{
- struct w5100_mmio_priv *mmio_priv = w5100_mmio_priv(ndev);
- unsigned long flags;
- int i;
-
- spin_lock_irqsave(&mmio_priv->reg_lock, flags);
- w5100_write16_direct(ndev, W5100_IDM_AR, addr);
-
- for (i = 0; i < len; i++)
- __w5100_write_direct(ndev, W5100_IDM_DR, *buf++);
-
- spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
-
- return 0;
-}
-
-static int w5100_reset_indirect(struct net_device *ndev)
-{
- w5100_write_direct(ndev, W5100_MR, MR_RST);
- mdelay(5);
- w5100_write_direct(ndev, W5100_MR, MR_PB | MR_AI | MR_IND);
-
- return 0;
-}
-
-static const struct w5100_ops w5100_mmio_indirect_ops = {
- .chip_id = W5100,
- .read = w5100_read_indirect,
- .write = w5100_write_indirect,
- .read16 = w5100_read16_indirect,
- .write16 = w5100_write16_indirect,
- .readbulk = w5100_readbulk_indirect,
- .writebulk = w5100_writebulk_indirect,
- .init = w5100_mmio_init,
- .reset = w5100_reset_indirect,
-};
-
-#if defined(CONFIG_WIZNET_BUS_DIRECT)
-
-static int w5100_read(struct w5100_priv *priv, u32 addr)
-{
- return w5100_read_direct(priv->ndev, addr);
-}
-
-static int w5100_write(struct w5100_priv *priv, u32 addr, u8 data)
-{
- return w5100_write_direct(priv->ndev, addr, data);
-}
-
-static int w5100_read16(struct w5100_priv *priv, u32 addr)
-{
- return w5100_read16_direct(priv->ndev, addr);
-}
-
-static int w5100_write16(struct w5100_priv *priv, u32 addr, u16 data)
-{
- return w5100_write16_direct(priv->ndev, addr, data);
-}
-
-static int w5100_readbulk(struct w5100_priv *priv, u32 addr, u8 *buf, int len)
-{
- return w5100_readbulk_direct(priv->ndev, addr, buf, len);
-}
-
-static int w5100_writebulk(struct w5100_priv *priv, u32 addr, const u8 *buf,
- int len)
-{
- return w5100_writebulk_direct(priv->ndev, addr, buf, len);
-}
-
-#elif defined(CONFIG_WIZNET_BUS_INDIRECT)
-
-static int w5100_read(struct w5100_priv *priv, u32 addr)
-{
- return w5100_read_indirect(priv->ndev, addr);
-}
-
-static int w5100_write(struct w5100_priv *priv, u32 addr, u8 data)
-{
- return w5100_write_indirect(priv->ndev, addr, data);
-}
-
-static int w5100_read16(struct w5100_priv *priv, u32 addr)
-{
- return w5100_read16_indirect(priv->ndev, addr);
-}
-
-static int w5100_write16(struct w5100_priv *priv, u32 addr, u16 data)
-{
- return w5100_write16_indirect(priv->ndev, addr, data);
-}
-
-static int w5100_readbulk(struct w5100_priv *priv, u32 addr, u8 *buf, int len)
-{
- return w5100_readbulk_indirect(priv->ndev, addr, buf, len);
-}
-
-static int w5100_writebulk(struct w5100_priv *priv, u32 addr, const u8 *buf,
- int len)
-{
- return w5100_writebulk_indirect(priv->ndev, addr, buf, len);
-}
-
-#else /* CONFIG_WIZNET_BUS_ANY */
-
static int w5100_read(struct w5100_priv *priv, u32 addr)
{
return priv->ops->read(priv->ndev, addr);
@@ -508,8 +202,6 @@ static int w5100_writebulk(struct w5100_priv *priv, u32 addr, const u8 *buf,
return priv->ops->writebulk(priv->ndev, addr, buf, len);
}
-#endif
-
static int w5100_readbuf(struct w5100_priv *priv, u16 offset, u8 *buf, int len)
{
u32 addr;
@@ -1035,38 +727,6 @@ static const struct net_device_ops w5100_netdev_ops = {
.ndo_validate_addr = eth_validate_addr,
};
-static int w5100_mmio_probe(struct platform_device *pdev)
-{
- struct wiznet_platform_data *data = dev_get_platdata(&pdev->dev);
- const void *mac_addr = NULL;
- struct resource *mem;
- const struct w5100_ops *ops;
- int irq;
-
- if (data && is_valid_ether_addr(data->mac_addr))
- mac_addr = data->mac_addr;
-
- mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- if (!mem)
- return -EINVAL;
- if (resource_size(mem) < W5100_BUS_DIRECT_SIZE)
- ops = &w5100_mmio_indirect_ops;
- else
- ops = &w5100_mmio_direct_ops;
-
- irq = platform_get_irq(pdev, 0);
- if (irq < 0)
- return irq;
-
- return w5100_probe(&pdev->dev, ops, sizeof(struct w5100_mmio_priv),
- mac_addr, irq, data ? data->link_gpio : -EINVAL);
-}
-
-static void w5100_mmio_remove(struct platform_device *pdev)
-{
- w5100_remove(&pdev->dev);
-}
-
void *w5100_ops_priv(const struct net_device *ndev)
{
return netdev_priv(ndev) +
@@ -1264,13 +924,3 @@ static int w5100_resume(struct device *dev)
SIMPLE_DEV_PM_OPS(w5100_pm_ops, w5100_suspend, w5100_resume);
EXPORT_SYMBOL_GPL(w5100_pm_ops);
-
-static struct platform_driver w5100_mmio_driver = {
- .driver = {
- .name = DRV_NAME,
- .pm = &w5100_pm_ops,
- },
- .probe = w5100_mmio_probe,
- .remove = w5100_mmio_remove,
-};
-module_platform_driver(w5100_mmio_driver);
diff --git a/include/linux/platform_data/wiznet.h b/include/linux/platform_data/wiznet.h
deleted file mode 100644
index 1154c4db8a13..000000000000
--- a/include/linux/platform_data/wiznet.h
+++ /dev/null
@@ -1,23 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * Ethernet driver for the WIZnet W5x00 chip.
- */
-
-#ifndef PLATFORM_DATA_WIZNET_H
-#define PLATFORM_DATA_WIZNET_H
-
-#include <linux/if_ether.h>
-
-struct wiznet_platform_data {
- int link_gpio;
- u8 mac_addr[ETH_ALEN];
-};
-
-#ifndef CONFIG_WIZNET_BUS_SHIFT
-#define CONFIG_WIZNET_BUS_SHIFT 0
-#endif
-
-#define W5100_BUS_DIRECT_SIZE (0x8000 << CONFIG_WIZNET_BUS_SHIFT)
-#define W5300_BUS_DIRECT_SIZE (0x0400 << CONFIG_WIZNET_BUS_SHIFT)
-
-#endif /* PLATFORM_DATA_WIZNET_H */
--
2.39.5
^ permalink raw reply related
* Re: [PATCH net 1/1] net/rds: handle zerocopy send cleanup before the message is queued
From: Allison Henderson @ 2026-05-05 18:04 UTC (permalink / raw)
To: Paolo Abeni, linux-rdma, rds-devel
Cc: davem, edumazet, kuba, horms, santosh.shilimkar, sowmini.varadhan,
willemb, yuantan098, yifanwucs, tomapufckgml, bird, lx24,
tonanli66, Ren Wei, netdev
In-Reply-To: <673288e7-37ac-4533-a4d3-2fa87dc282f1@redhat.com>
On Tue, 2026-05-05 at 15:32 +0200, Paolo Abeni wrote:
> On 5/1/26 9:40 PM, Allison Henderson wrote:
> > On Fri, 2026-05-01 at 09:08 +0800, Ren Wei wrote:
> > > From: Nan Li <tonanli66@gmail.com>
> > >
> > > A zerocopy send can fail after user pages have been pinned but before
> > > the message is attached to the sending socket.
> > >
> > > The purge path currently infers zerocopy state from rm->m_rs, so an
> > > unqueued message can be cleaned up as if it owned normal payload pages.
> > > However, zerocopy ownership is really determined by the presence of
> > > op_mmp_znotifier, regardless of whether the message has reached the
> > > socket queue.
> > >
> > > Capture op_mmp_znotifier up front in rds_message_purge() and use it as
> > > the cleanup discriminator. If the message is already associated with a
> > > socket, keep the existing completion path. Otherwise, drop the pinned
> > > page accounting directly and release the notifier before putting the
> > > payload pages.
> > >
> > > This keeps early send failure cleanup consistent with the zerocopy
> > > lifetime rules without changing the normal queued completion path.
> > >
> > > Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
> > > Cc: stable@kernel.org
> > > Reported-by: Yuan Tan <yuantan098@gmail.com>
> > > Reported-by: Yifan Wu <yifanwucs@gmail.com>
> > > Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> > > Reported-by: Xin Liu <bird@lzu.edu.cn>
> > > Co-developed-by: Xiao Liu <lx24@stu.ynu.edu.cn>
> > > Signed-off-by: Xiao Liu <lx24@stu.ynu.edu.cn>
> > > Signed-off-by: Nan Li <tonanli66@gmail.com>
> > > Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> >
> > This fix looks fine to me. Thanks Ren Wei!
> > Reviewed-by: Allison Henderson <achender@kernel.org>
>
> Note that sashiko spotted more pre-existing problems in this area,
> please have a look:
>
> https://sashiko.dev/#/patchset/d2ea98a6313d5467bac00f7c9fef8c7acddb9258.1777550074.git.tonanli66%40gmail.com
>
> /P
Thanks for pointing this out. It looks like a valid catch, I will see if I can get a fix out today or tomorrow.
Thank you!
Allison
>
^ permalink raw reply
* [PATCH net] net: shaper: Reject reparenting of existing nodes
From: Mohsin Bashir @ 2026-05-05 17:43 UTC (permalink / raw)
To: netdev
Cc: alexanderduyck, davem, edumazet, horms, kees, kuba, linux-kernel,
mohsin.bashr, p, pabeni
When an existing node-scope shaper is moved to a different parent
via the group operation, the framework fails to update the leaves
count on both the old and new parent shapers. Only newly created
nodes (handle.id == NET_SHAPER_ID_UNSPEC) trigger the parent
leaves increment at line 1039.
This causes the parent's leaves counter to diverge from the
actual number of children in the xarray. When the node is later
deleted, pre_del_node() allocates an array sized by the stale
leaves count, but the xarray iteration finds more children than
expected, hitting the WARN_ON_ONCE guard and returning -EINVAL.
Rather than adding reparenting support with complex leaves count
bookkeeping, reject group calls that attempt to change an existing
node's parent. Updates to an existing node's rate or leaves under
the same parent remain permitted. We expect that for any modification
of the topology user should always create new groups and let the
kernel garbage collect the leaf-less nodes.
Fixes: 5d5d4700e75d ("net-shapers: implement NL group operation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <hmohsin@meta.com>
---
net/shaper/shaper.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/net/shaper/shaper.c b/net/shaper/shaper.c
index 94bc9c7382ea..81e32e91bade 100644
--- a/net/shaper/shaper.c
+++ b/net/shaper/shaper.c
@@ -966,13 +966,26 @@ static int __net_shaper_group(struct net_shaper_binding *binding,
if (node->handle.scope == NET_SHAPER_SCOPE_NODE) {
new_node = node->handle.id == NET_SHAPER_ID_UNSPEC;
- if (!new_node && !net_shaper_lookup(binding, &node->handle)) {
- /* The related attribute is not available when
- * reaching here from the delete() op.
- */
- NL_SET_ERR_MSG_FMT(extack, "Node shaper %d:%d does not exists",
- node->handle.scope, node->handle.id);
- return -ENOENT;
+ if (!new_node) {
+ struct net_shaper *cur;
+
+ cur = net_shaper_lookup(binding, &node->handle);
+ if (!cur) {
+ /* The related attribute is not available
+ * when reaching here from the delete() op.
+ */
+ NL_SET_ERR_MSG_FMT(extack, "Node shaper %d:%d does not exist",
+ node->handle.scope,
+ node->handle.id);
+ return -ENOENT;
+ }
+ if (net_shaper_handle_cmp(&cur->parent,
+ &node->parent)) {
+ NL_SET_ERR_MSG_FMT(extack, "Cannot reparent node shaper %d:%d",
+ node->handle.scope,
+ node->handle.id);
+ return -EOPNOTSUPP;
+ }
}
/* When unspecified, the node parent scope is inherited from
--
2.52.0
^ permalink raw reply related
* Re: [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability
From: Harshitha Ramamurthy @ 2026-05-05 17:41 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Saeed Mahameed, Tariq Toukan, Mark Bloch,
Leon Romanovsky, Alexander Duyck, kernel-team, Daniel Borkmann,
Nikolay Aleksandrov, Shuah Khan, netdev, linux-doc, linux-kernel,
linux-rdma, bpf, linux-kselftest, Stanislav Fomichev,
Mina Almasry, Bobby Eshleman
In-Reply-To: <20260504-tcp-dm-netkit-v2-1-56d52ac72fd4@meta.com>
On Mon, May 4, 2026 at 5:27 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
>
> From: Bobby Eshleman <bobbyeshleman@meta.com>
>
> Devices that support netmem TX previously set dev->netmem_tx = true.
> This was checked in validate_xmit_unreadable_skb() to drop unreadable
> skbs (skbs with dmabuf-backed frags) before they reach drivers that
> would mishandle them or devices that would not have the iommu mappings
> for them.
>
> Some virtual devices like netkit (or ifb) never DMA and never touch frag
> contents, as they essentially just forward the skb to another device.
> They are unable to forward unreadable skbs, however, because they fail
> to pass TX validation checks on dev->netmem_tx. This single bit flag
> doesn't give the TX validator enough information to differentiate
> devices that will attempt DMA on the unreadable skb and those that will
> simply route it untouched.
>
> This patch fixes this issue by adding an additional bit to netmem_tx, so
> that drivers can indicate 1) if they have netmem support, and 2) if they
> do, are they DMA-capable or not?
>
> Replace the boolean with a 2-bit enum:
>
> NETMEM_TX_NONE - no netmem TX support (drop unreadable skbs)
> NETMEM_TX_DMA - full support, device does DMA
> NETMEM_TX_NO_DMA - pass-through, device never DMAs
>
> Update drivers to reflect these definitions. NIC drivers use
> NETMEM_TX_DMA, and netkit uses NETMEM_TX_NO_DMA.
>
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
> Changes in v2:
> - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> ---
> Documentation/networking/net_cachelines/net_device.rst | 2 +-
> Documentation/networking/netmem.rst | 8 +++++++-
> Documentation/translations/zh_CN/networking/netmem.rst | 7 ++++++-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> drivers/net/ethernet/google/gve/gve_main.c | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
> drivers/net/ethernet/meta/fbnic/fbnic_netdev.c | 2 +-
> drivers/net/netkit.c | 1 +
> include/linux/netdevice.h | 11 +++++++++--
> 9 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
> index 1c19bb7705df..c85784259544 100644
> --- a/Documentation/networking/net_cachelines/net_device.rst
> +++ b/Documentation/networking/net_cachelines/net_device.rst
> @@ -10,7 +10,7 @@ Type Name fastpath_tx_acce
> =================================== =========================== =================== =================== ===================================================================================
> unsigned_long:32 priv_flags read_mostly __dev_queue_xmit(tx)
> unsigned_long:1 lltx read_mostly HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx)
> -unsigned long:1 netmem_tx:1; read_mostly
> +unsigned long:2 netmem_tx:2; read_mostly
> char name[16]
> struct netdev_name_node* name_node
> struct dev_ifalias* ifalias
> diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst
> index b63aded46337..217869d1108d 100644
> --- a/Documentation/networking/netmem.rst
> +++ b/Documentation/networking/netmem.rst
> @@ -95,4 +95,10 @@ Driver TX Requirements
> netdev@, or reach out to the maintainers and/or almasrymina@google.com for
> help adding the netmem API.
>
> -2. Driver should declare support by setting `netdev->netmem_tx = true`
> +2. Driver should declare support by setting `netdev->netmem_tx` to the
> + appropriate mode:
> +
> + - `NETMEM_TX_DMA`: for physical devices that perform DMA.
> +
> + - `NETMEM_TX_NO_DMA`: for virtual or passthrough devices that do
> + not DMA, but still support handling of netmem-backed skbs.
> diff --git a/Documentation/translations/zh_CN/networking/netmem.rst b/Documentation/translations/zh_CN/networking/netmem.rst
> index fe351a240f02..320f3eacf51b 100644
> --- a/Documentation/translations/zh_CN/networking/netmem.rst
> +++ b/Documentation/translations/zh_CN/networking/netmem.rst
> @@ -89,4 +89,9 @@ dma-mapping API 去处理。
> 使用某个还不存在的 netmem API,你可以自行添加并提交到 netdev@,也可以联系维护
> 人员或者发送邮件至 almasrymina@google.com 寻求帮助。
>
> -2. 驱动程序应通过设置 netdev->netmem_tx = true 来表明自身支持 netmem 功能。
> +2. 驱动程序应将 `netdev->netmem_tx` 设置为适当的模式:
> +
> + - `NETMEM_TX_DMA`:适用于执行 DMA 的物理设备。
> +
> + - `NETMEM_TX_NO_DMA`:适用于不执行 DMA 的虚拟或透传设备,但仍支持
> + 处理 netmem 支持的 skb。
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 8c55874f44ca..ed9c22dc4a5a 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -17120,7 +17120,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops_unsupp;
> if (BNXT_SUPPORTS_QUEUE_API(bp))
> dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
> - dev->netmem_tx = true;
> + dev->netmem_tx = NETMEM_TX_DMA;
>
> rc = register_netdev(dev);
> if (rc)
> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index 424d973c97f2..dd2b8f087163 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -2894,7 +2894,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> goto abort_with_wq;
>
> if (!gve_is_gqi(priv) && !gve_is_qpl(priv))
> - dev->netmem_tx = true;
> + dev->netmem_tx = NETMEM_TX_DMA;
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
>
> err = register_netdev(dev);
> if (err)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 5a46870c4b74..fc49aae38807 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -5924,7 +5924,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>
> netdev->priv_flags |= IFF_UNICAST_FLT;
>
> - netdev->netmem_tx = true;
> + netdev->netmem_tx = NETMEM_TX_DMA;
>
> netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
> mlx5e_set_xdp_feature(priv);
> diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> index c406a3b56b37..138e522ef9b9 100644
> --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> @@ -752,7 +752,7 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
> netdev->netdev_ops = &fbnic_netdev_ops;
> netdev->stat_ops = &fbnic_stat_ops;
> netdev->queue_mgmt_ops = &fbnic_queue_mgmt_ops;
> - netdev->netmem_tx = true;
> + netdev->netmem_tx = NETMEM_TX_DMA;
>
> fbnic_set_ethtool_ops(netdev);
>
> diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
> index 5e2eecc3165d..0ad6a806d7d5 100644
> --- a/drivers/net/netkit.c
> +++ b/drivers/net/netkit.c
> @@ -466,6 +466,7 @@ static void netkit_setup(struct net_device *dev)
> dev->priv_flags |= IFF_NO_QUEUE;
> dev->priv_flags |= IFF_DISABLE_NETPOLL;
> dev->lltx = true;
> + dev->netmem_tx = NETMEM_TX_NO_DMA;
>
> dev->netdev_ops = &netkit_netdev_ops;
> dev->ethtool_ops = &netkit_ethtool_ops;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0e1e581efc5a..11d68e75eb4f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1788,6 +1788,12 @@ enum netdev_stat_type {
> NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
> };
>
> +enum netmem_tx_mode {
> + NETMEM_TX_NONE, /* no netmem TX support */
> + NETMEM_TX_DMA, /* DMA-capable netmem TX (real HW) */
> + NETMEM_TX_NO_DMA, /* no DMA, e.g. passthrough for virtual devs */
> +};
> +
> enum netdev_reg_state {
> NETREG_UNINITIALIZED = 0,
> NETREG_REGISTERED, /* completed register_netdevice */
> @@ -1809,7 +1815,8 @@ enum netdev_reg_state {
> * @lltx: device supports lockless Tx. Deprecated for real HW
> * drivers. Mainly used by logical interfaces, such as
> * bonding and tunnels
> - * @netmem_tx: device support netmem_tx.
> + * @netmem_tx: device netmem TX mode (NETMEM_TX_NONE, NETMEM_TX_DMA,
> + * or NETMEM_TX_NO_DMA).
> *
> * @name: This is the first field of the "visible" part of this structure
> * (i.e. as seen by users in the "Space.c" file). It is the name
> @@ -2132,7 +2139,7 @@ struct net_device {
> struct_group(priv_flags_fast,
> unsigned long priv_flags:32;
> unsigned long lltx:1;
> - unsigned long netmem_tx:1;
> + unsigned long netmem_tx:2;
> );
> const struct net_device_ops *netdev_ops;
> const struct header_ops *header_ops;
>
> --
> 2.52.0
>
^ permalink raw reply
* [PATCH net-next v3 5/5] net/sched: netem: add per-impairment extended statistics
From: Stephen Hemminger @ 2026-05-05 17:38 UTC (permalink / raw)
To: netdev
Cc: jhs, jiri, Stephen Hemminger, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, open list
In-Reply-To: <20260505174059.651260-1-stephen@networkplumber.org>
Adds new counters that keep track of when netem applied
impairments (delay, loss, corruption, duplication, reordering).
Add a struct tc_netem_xstats reported via TCA_STATS_APP so that
userspace (tc -s qdisc show) can display per-impairment counters.
Also fixes a pre-existing accounting bug: ECN CE-marked packets
were being counted as drops (that is not what other qdisc's do).
Instead, now that ther is a proper ecn_marked counter,
the spurious qdisc_qstats_drop() call can be removed.
These counters are for information only and follow the
access pattern done by other qdisc implementations.
Accompanying iproute2 change is submitted separately.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
include/uapi/linux/pkt_sched.h | 9 +++++++++
net/sched/sch_netem.c | 36 ++++++++++++++++++++++++++++++++--
2 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 66e8072f44df..1c84c8076e22 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -569,6 +569,15 @@ struct tc_netem_gemodel {
#define NETEM_DIST_SCALE 8192
#define NETEM_DIST_MAX 16384
+struct tc_netem_xstats {
+ __u64 delayed; /* packets delayed */
+ __u64 dropped; /* packets dropped by loss model */
+ __u64 corrupted; /* packets with bit errors injected */
+ __u64 duplicated; /* duplicate packets generated */
+ __u64 reordered; /* packets sent out of order */
+ __u64 ecn_marked; /* packets ECN CE-marked (not dropped)*/
+};
+
/* DRR */
enum {
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 020dbfb6e4a7..af6b95922a5d 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -152,6 +152,14 @@ struct netem_sched_data {
u8 state;
} clg;
+ /* Impairment counters */
+ u64 delayed;
+ u64 dropped;
+ u64 corrupted;
+ u64 duplicated;
+ u64 ecn_marked;
+ u64 reordered;
+
/* Cold tail: slot reschedule config and the watchdog timer. */
struct tc_netem_slot slot_config;
struct qdisc_watchdog watchdog;
@@ -462,17 +470,21 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
skb->prev = NULL;
/* Random duplication */
- if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
+ if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng)) {
++count;
+ WRITE_ONCE(q->duplicated, q->duplicated + 1);
+ }
/* Drop packet? */
if (loss_event(q)) {
if (q->ecn && INET_ECN_set_ce(skb))
- qdisc_qstats_drop(sch); /* mark packet */
+ WRITE_ONCE(q->ecn_marked, q->ecn_marked + 1);
else
--count;
}
+
if (count == 0) {
+ WRITE_ONCE(q->dropped, q->dropped + 1);
qdisc_qstats_drop(sch);
__qdisc_drop(skb, to_free);
return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
@@ -518,6 +530,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (skb->len) {
u32 offset = get_random_u32_below(skb->len);
skb->data[offset] ^= 1 << get_random_u32_below(8);
+ WRITE_ONCE(q->corrupted, q->corrupted + 1);
}
}
@@ -599,12 +612,15 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
cb->time_to_send = now + delay;
++q->counter;
+ WRITE_ONCE(q->delayed, q->delayed + 1);
+
tfifo_enqueue(skb, sch);
} else {
/*
* Do re-ordering by putting one out of N packets at the front
* of the queue.
*/
+ WRITE_ONCE(q->reordered, q->reordered + 1);
cb->time_to_send = ktime_get_ns();
q->counter = 0;
@@ -1345,6 +1361,21 @@ static int netem_dump(struct Qdisc *sch, struct sk_buff *skb)
return -1;
}
+static int netem_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+ struct netem_sched_data *q = qdisc_priv(sch);
+ struct tc_netem_xstats st = {
+ .delayed = READ_ONCE(q->delayed),
+ .dropped = READ_ONCE(q->dropped),
+ .corrupted = READ_ONCE(q->corrupted),
+ .duplicated = READ_ONCE(q->duplicated),
+ .reordered = READ_ONCE(q->reordered),
+ .ecn_marked = READ_ONCE(q->ecn_marked),
+ };
+
+ return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
static int netem_dump_class(struct Qdisc *sch, unsigned long cl,
struct sk_buff *skb, struct tcmsg *tcm)
{
@@ -1407,6 +1438,7 @@ static struct Qdisc_ops netem_qdisc_ops __read_mostly = {
.destroy = netem_destroy,
.change = netem_change,
.dump = netem_dump,
+ .dump_stats = netem_dump_stats,
.owner = THIS_MODULE,
};
MODULE_ALIAS_NET_SCH("netem");
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v3 4/5] net/sched: netem: handle multi-segment skb in corruption
From: Stephen Hemminger @ 2026-05-05 17:38 UTC (permalink / raw)
To: netdev
Cc: jhs, jiri, Stephen Hemminger, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, open list
In-Reply-To: <20260505174059.651260-1-stephen@networkplumber.org>
The packet corruption code only flipped bits in the linear
header portion of the skb, skipping corruption when
skb_headlen() was zero.
Linearize the whole skb if necessary before corruption.
That step also makes calling skb_unshare() unnecessary.
Extends d64cb81dcbd5 ("net/sched: sch_netem: fix out-of-bounds access
in packet corruption") with a more general solution.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 53bdfa77ee2d..020dbfb6e4a7 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -508,21 +508,17 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_skb_cb(skb)->pkt_len = skb->len;
}
- skb = skb_unshare(skb, GFP_ATOMIC);
- if (unlikely(!skb)) {
- qdisc_qstats_drop(sch);
- goto finish_segs;
- }
- if (skb->ip_summed == CHECKSUM_PARTIAL &&
- skb_checksum_help(skb)) {
+ if (skb_linearize(skb) ||
+ (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))) {
qdisc_drop(skb, sch, to_free);
skb = NULL;
goto finish_segs;
}
- if (skb_headlen(skb))
- skb->data[get_random_u32_below(skb_headlen(skb))] ^=
- 1 << get_random_u32_below(8);
+ if (skb->len) {
+ u32 offset = get_random_u32_below(skb->len);
+ skb->data[offset] ^= 1 << get_random_u32_below(8);
+ }
}
if (unlikely(sch->q.qlen >= sch->limit)) {
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v3 3/5] net/sched: netem: replace pr_info with netlink extack error messages
From: Stephen Hemminger @ 2026-05-05 17:38 UTC (permalink / raw)
To: netdev
Cc: jhs, jiri, Stephen Hemminger, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, open list
In-Reply-To: <20260505174059.651260-1-stephen@networkplumber.org>
Use netlink extack to report errors instead of sending them
to the kernel log with pr_info(). The error message can them be seen
with tc commands; and avoids log spam.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 2b0b5c032e70..53bdfa77ee2d 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -921,7 +921,8 @@ static void get_rate(struct netem_sched_data *q, const struct nlattr *attr)
q->cell_size_reciprocal = (struct reciprocal_value) { 0 };
}
-static int get_loss_clg(struct netem_sched_data *q, const struct nlattr *attr)
+static int get_loss_clg(struct netem_sched_data *q, const struct nlattr *attr,
+ struct netlink_ext_ack *extack)
{
const struct nlattr *la;
int rem;
@@ -934,7 +935,8 @@ static int get_loss_clg(struct netem_sched_data *q, const struct nlattr *attr)
const struct tc_netem_gimodel *gi = nla_data(la);
if (nla_len(la) < sizeof(struct tc_netem_gimodel)) {
- pr_info("netem: incorrect gi model size\n");
+ NL_SET_ERR_MSG_ATTR(extack, la,
+ "netem: incorrect gi model size");
return -EINVAL;
}
@@ -953,7 +955,8 @@ static int get_loss_clg(struct netem_sched_data *q, const struct nlattr *attr)
const struct tc_netem_gemodel *ge = nla_data(la);
if (nla_len(la) < sizeof(struct tc_netem_gemodel)) {
- pr_info("netem: incorrect ge model size\n");
+ NL_SET_ERR_MSG_ATTR(extack, la,
+ "netem: incorrect ge model size");
return -EINVAL;
}
@@ -967,7 +970,8 @@ static int get_loss_clg(struct netem_sched_data *q, const struct nlattr *attr)
}
default:
- pr_info("netem: unknown loss type %u\n", type);
+ NL_SET_ERR_MSG_ATTR_FMT(extack, la,
+ "netem: unknown loss type %u", type);
return -EINVAL;
}
}
@@ -990,19 +994,21 @@ static const struct nla_policy netem_policy[TCA_NETEM_MAX + 1] = {
};
static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
- const struct nla_policy *policy, int len)
+ const struct nla_policy *policy, int len,
+ struct netlink_ext_ack *extack)
{
int nested_len = nla_len(nla) - NLA_ALIGN(len);
if (nested_len < 0) {
- pr_info("netem: invalid attributes len %d\n", nested_len);
+ NL_SET_ERR_MSG_FMT(extack, "netem: invalid attributes len %d < %d",
+ nla_len(nla), NLA_ALIGN(len));
return -EINVAL;
}
if (nested_len >= nla_attr_size(0))
return nla_parse_deprecated(tb, maxtype,
nla_data(nla) + NLA_ALIGN(len),
- nested_len, policy, NULL);
+ nested_len, policy, extack);
memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
return 0;
@@ -1057,7 +1063,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
int ret;
qopt = nla_data(opt);
- ret = parse_attr(tb, TCA_NETEM_MAX, opt, netem_policy, sizeof(*qopt));
+ ret = parse_attr(tb, TCA_NETEM_MAX, opt, netem_policy, sizeof(*qopt), extack);
if (ret < 0)
return ret;
@@ -1097,7 +1103,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
old_loss_model = q->loss_model;
if (tb[TCA_NETEM_LOSS]) {
- ret = get_loss_clg(q, tb[TCA_NETEM_LOSS]);
+ ret = get_loss_clg(q, tb[TCA_NETEM_LOSS], extack);
if (ret) {
q->loss_model = old_loss_model;
q->clg = old_clg;
@@ -1193,8 +1199,6 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
prandom_seed_state(&q->prng.prng_state, q->prng.seed);
ret = netem_change(sch, opt, extack);
- if (ret)
- pr_info("netem: change failed\n");
return ret;
}
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v3 2/5] net/sched: netem: remove useless VERSION
From: Stephen Hemminger @ 2026-05-05 17:38 UTC (permalink / raw)
To: netdev
Cc: jhs, jiri, Stephen Hemminger, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, open list
In-Reply-To: <20260505174059.651260-1-stephen@networkplumber.org>
The version printed was never updated and kernel version is
better indication of what is fixed or not.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 616d33879fdc..2b0b5c032e70 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -27,8 +27,6 @@
#include <net/pkt_sched.h>
#include <net/inet_ecn.h>
-#define VERSION "1.3"
-
/* Network Emulation Queuing algorithm.
====================================
@@ -1413,16 +1411,15 @@ static struct Qdisc_ops netem_qdisc_ops __read_mostly = {
};
MODULE_ALIAS_NET_SCH("netem");
-
static int __init netem_module_init(void)
{
- pr_info("netem: version " VERSION "\n");
return register_qdisc(&netem_qdisc_ops);
}
static void __exit netem_module_exit(void)
{
unregister_qdisc(&netem_qdisc_ops);
}
+
module_init(netem_module_init)
module_exit(netem_module_exit)
MODULE_LICENSE("GPL");
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v3 1/5] net/sched: netem: reorder struct netem_sched_data
From: Stephen Hemminger @ 2026-05-05 17:38 UTC (permalink / raw)
To: netdev
Cc: jhs, jiri, Stephen Hemminger, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, open list
In-Reply-To: <20260505174059.651260-1-stephen@networkplumber.org>
The current layout of struct netem_sched_data can be improved
by optimizing cache locality, compacting data types (use u8
for enum) and eliminating unused elements.
Reorganize the struct as follows:
- Cacheline 0 holds the tfifo state (t_root/t_head/t_tail/t_len),
counter, and the unconditional enqueue scalars
latency/jitter/rate/gap/loss.
- Cacheline 1 holds the remaining zero-check scalars
(duplicate/reorder/corrupt/ecn), all five crndstate correlation
structures, and loss_model.
- Cacheline 2 holds prng, delay_dist, the slot dequeue state,
slot_dist, and the inner classful qdisc pointer.
- Rate-shaping fields, q->limit (config-only; the fast path reads
sch->limit), and the CLG Markov state move to the warm tail.
- tc_netem_slot slot_config and qdisc_watchdog (only consulted on
slot reschedule and watchdog wake) move to the cold tail.
Also reorder struct clgstate to place the u8 state member after the
u32 transition probabilities. This removes the 3-byte interior hole
without changing the struct's size.
Should have no functional change.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 123 +++++++++++++++++++++---------------------
1 file changed, 63 insertions(+), 60 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bc18e1976b6e..616d33879fdc 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -71,89 +71,92 @@ struct disttable {
s16 table[] __counted_by(size);
};
-struct netem_sched_data {
- /* internal t(ime)fifo qdisc uses t_root and sch->limit */
- struct rb_root t_root;
-
- /* a linear queue; reduces rbtree rebalancing when jitter is low */
- struct sk_buff *t_head;
- struct sk_buff *t_tail;
-
- u32 t_len;
-
- /* optional qdisc for classful handling (NULL at netem init) */
- struct Qdisc *qdisc;
-
- struct qdisc_watchdog watchdog;
+/* Loss models */
+enum {
+ CLG_RANDOM,
+ CLG_4_STATES,
+ CLG_GILB_ELL,
+};
- s64 latency;
- s64 jitter;
+/* States in GE model */
+enum {
+ GOOD_STATE = 1,
+ BAD_STATE,
+};
- u32 loss;
- u32 ecn;
- u32 limit;
- u32 counter;
- u32 gap;
- u32 duplicate;
- u32 reorder;
- u32 corrupt;
- u64 rate;
- s32 packet_overhead;
- u32 cell_size;
- struct reciprocal_value cell_size_reciprocal;
- s32 cell_overhead;
+/* States in 4 state model */
+enum {
+ TX_IN_GAP_PERIOD = 1,
+ TX_IN_BURST_PERIOD,
+ LOST_IN_GAP_PERIOD,
+ LOST_IN_BURST_PERIOD,
+};
+struct netem_sched_data {
+ /* Cacheline 0: tfifo state and per-packet enqueue/dequeue scalars. */
+ struct rb_root t_root;
+ struct sk_buff *t_head;
+ struct sk_buff *t_tail;
+ u32 t_len;
+ u32 counter;
+ s64 latency;
+ s64 jitter;
+ u64 rate;
+ u32 gap;
+ u32 loss;
+
+ /* Cacheline 1: zero-check scalars and correlation states. */
+ u32 duplicate;
+ u32 reorder;
+ u32 corrupt;
+ u32 ecn;
struct crndstate {
u32 last;
u32 rho;
} delay_cor, loss_cor, dup_cor, reorder_cor, corrupt_cor;
+ u8 loss_model;
- struct prng {
+ /* Cacheline 2: PRNG, distribution tables, slot dequeue state etc. */
+ struct prng {
u64 seed;
struct rnd_state prng_state;
} prng;
+ struct disttable *delay_dist;
+ struct slotstate {
+ u64 slot_next;
+ s32 packets_left;
+ s32 bytes_left;
+ } slot;
+ struct disttable *slot_dist;
+ struct Qdisc *qdisc;
- struct disttable *delay_dist;
-
- enum {
- CLG_RANDOM,
- CLG_4_STATES,
- CLG_GILB_ELL,
- } loss_model;
-
- enum {
- TX_IN_GAP_PERIOD = 1,
- TX_IN_BURST_PERIOD,
- LOST_IN_GAP_PERIOD,
- LOST_IN_BURST_PERIOD,
- } _4_state_model;
-
- enum {
- GOOD_STATE = 1,
- BAD_STATE,
- } GE_state_model;
+ /*
+ * Warm: rate-shaping parameters (only read when rate != 0) and
+ * configuration-only fields. The fast path reads sch->limit, not
+ * q->limit.
+ */
+ s32 packet_overhead;
+ u32 cell_size;
+ struct reciprocal_value cell_size_reciprocal;
+ s32 cell_overhead;
+ u32 limit;
/* Correlated Loss Generation models */
struct clgstate {
- /* state of the Markov chain */
- u8 state;
-
/* 4-states and Gilbert-Elliot models */
u32 a1; /* p13 for 4-states or p for GE */
u32 a2; /* p31 for 4-states or r for GE */
u32 a3; /* p32 for 4-states or h for GE */
u32 a4; /* p14 for 4-states or 1-k for GE */
u32 a5; /* p23 used only in 4-states */
- } clg;
- struct tc_netem_slot slot_config;
- struct slotstate {
- u64 slot_next;
- s32 packets_left;
- s32 bytes_left;
- } slot;
+ /* state of the Markov chain */
+ u8 state;
+ } clg;
- struct disttable *slot_dist;
+ /* Cold tail: slot reschedule config and the watchdog timer. */
+ struct tc_netem_slot slot_config;
+ struct qdisc_watchdog watchdog;
};
/* Time stamp put into socket buffer control block
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v3 0/5] net/sched: netem: fixes and improvements
From: Stephen Hemminger @ 2026-05-05 17:38 UTC (permalink / raw)
To: netdev; +Cc: jhs, jiri, Stephen Hemminger
This is a collection of improvements to netem found while
investigating the fixes now in net tree.
V3 - fix the multi-skb corruption patch
- don't count ECN as dropped
- try and clarify to AI why statistics are ok
Stephen Hemminger (5):
net/sched: netem: reorder struct netem_sched_data
net/sched: netem: remove useless VERSION
net/sched: netem: replace pr_info with netlink extack error messages
net/sched: netem: handle multi-segment skb in corruption
net/sched: netem: add per-impairment extended statistics
include/uapi/linux/pkt_sched.h | 9 ++
net/sched/sch_netem.c | 206 +++++++++++++++++++--------------
2 files changed, 128 insertions(+), 87 deletions(-)
--
2.53.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox