* [PATCH net-next v2 0/2] net: dsa: mxl862xx: add statistics support
From: Daniel Golle @ 2026-04-12 0:01 UTC (permalink / raw)
To: Daniel Golle, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King, netdev,
linux-kernel
Cc: Frank Wunderlich, Chad Monroe, Cezary Wilmanski, Liang Xu,
Benny (Ying-Tsan) Weng, Jose Maria Verdu Munoz, Avinash Jayaraman,
John Crispin
Add per-port RMON statistics support for the MxL862xx DSA driver,
covering hardware-specific ethtool -S counters, standard IEEE 802.3
MAC/ctrl/pause statistics, and rtnl_link_stats64 via polled 64-bit
accumulation.
Changes since v1:
* trim mxl862xx_mib[] to counters not covered elsewhere only
* remove histogram counters (moved to .get_rmon_stats)
* remove RMON error counters (moved to .get_rmon_stats)
* remove counters already in .get_eth_mac_stats
* remove counters already in .get_stats64
* add mxl862xx_rmon_ranges[] and mxl862xx_get_rmon_stats()
Daniel Golle (2):
net: dsa: mxl862xx: add ethtool statistics support
net: dsa: mxl862xx: implement .get_stats64
drivers/net/dsa/mxl862xx/mxl862xx-api.h | 142 +++++++++
drivers/net/dsa/mxl862xx/mxl862xx-cmd.h | 3 +
drivers/net/dsa/mxl862xx/mxl862xx-host.c | 8 +-
drivers/net/dsa/mxl862xx/mxl862xx.c | 348 +++++++++++++++++++++++
drivers/net/dsa/mxl862xx/mxl862xx.h | 94 +++++-
5 files changed, 588 insertions(+), 7 deletions(-)
--
2.53.0
^ permalink raw reply
* [PATCH net-next v2 1/2] net: dsa: mxl862xx: add ethtool statistics support
From: Daniel Golle @ 2026-04-12 0:01 UTC (permalink / raw)
To: Daniel Golle, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King, netdev,
linux-kernel
Cc: Frank Wunderlich, Chad Monroe, Cezary Wilmanski, Liang Xu,
Benny (Ying-Tsan) Weng, Jose Maria Verdu Munoz, Avinash Jayaraman,
John Crispin
In-Reply-To: <cover.1775951347.git.daniel@makrotopia.org>
The MxL862xx firmware exposes per-port RMON counters through the
RMON_PORT_GET command, covering standard IEEE 802.3 MAC statistics
(unicast/multicast/broadcast packet and byte counts, collision
counters, pause frames) as well as hardware-specific counters such
as extended VLAN discard and MTU exceed events.
Add the RMON counter firmware API structures and command definitions.
Implement .get_strings, .get_sset_count, and .get_ethtool_stats for
legacy ethtool -S support. Implement .get_eth_mac_stats,
.get_eth_ctrl_stats, and .get_pause_stats for the standardized
IEEE 802.3 statistics interface.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
v2:
* trim mxl862xx_mib[] to counters not covered elsewhere only
* remove histogram counters (moved to .get_rmon_stats)
* remove RMON error counters (moved to .get_rmon_stats)
* remove counters already in .get_eth_mac_stats
* remove counters already in .get_stats64
* add mxl862xx_rmon_ranges[] and mxl862xx_get_rmon_stats()
drivers/net/dsa/mxl862xx/mxl862xx-api.h | 142 +++++++++++++++++++
drivers/net/dsa/mxl862xx/mxl862xx-cmd.h | 3 +
drivers/net/dsa/mxl862xx/mxl862xx.c | 173 ++++++++++++++++++++++++
3 files changed, 318 insertions(+)
diff --git a/drivers/net/dsa/mxl862xx/mxl862xx-api.h b/drivers/net/dsa/mxl862xx/mxl862xx-api.h
index c902e90397e5f..fb21ddc1bf1c0 100644
--- a/drivers/net/dsa/mxl862xx/mxl862xx-api.h
+++ b/drivers/net/dsa/mxl862xx/mxl862xx-api.h
@@ -1224,4 +1224,146 @@ struct mxl862xx_sys_fw_image_version {
__le32 iv_build_num;
} __packed;
+/**
+ * enum mxl862xx_port_type - Port Type
+ * @MXL862XX_LOGICAL_PORT: Logical Port
+ * @MXL862XX_PHYSICAL_PORT: Physical Port
+ * @MXL862XX_CTP_PORT: Connectivity Termination Port (CTP)
+ * @MXL862XX_BRIDGE_PORT: Bridge Port
+ */
+enum mxl862xx_port_type {
+ MXL862XX_LOGICAL_PORT = 0,
+ MXL862XX_PHYSICAL_PORT,
+ MXL862XX_CTP_PORT,
+ MXL862XX_BRIDGE_PORT,
+};
+
+/**
+ * enum mxl862xx_rmon_port_type - RMON counter table type
+ * @MXL862XX_RMON_CTP_PORT_RX: CTP RX counters
+ * @MXL862XX_RMON_CTP_PORT_TX: CTP TX counters
+ * @MXL862XX_RMON_BRIDGE_PORT_RX: Bridge port RX counters
+ * @MXL862XX_RMON_BRIDGE_PORT_TX: Bridge port TX counters
+ * @MXL862XX_RMON_CTP_PORT_PCE_BYPASS: CTP PCE bypass counters
+ * @MXL862XX_RMON_TFLOW_RX: TFLOW RX counters
+ * @MXL862XX_RMON_TFLOW_TX: TFLOW TX counters
+ * @MXL862XX_RMON_QMAP: QMAP counters
+ * @MXL862XX_RMON_METER: Meter counters
+ * @MXL862XX_RMON_PMAC: PMAC counters
+ */
+enum mxl862xx_rmon_port_type {
+ MXL862XX_RMON_CTP_PORT_RX = 0,
+ MXL862XX_RMON_CTP_PORT_TX,
+ MXL862XX_RMON_BRIDGE_PORT_RX,
+ MXL862XX_RMON_BRIDGE_PORT_TX,
+ MXL862XX_RMON_CTP_PORT_PCE_BYPASS,
+ MXL862XX_RMON_TFLOW_RX,
+ MXL862XX_RMON_TFLOW_TX,
+ MXL862XX_RMON_QMAP = 0x0e,
+ MXL862XX_RMON_METER = 0x19,
+ MXL862XX_RMON_PMAC = 0x1c,
+};
+
+/**
+ * struct mxl862xx_rmon_port_cnt - RMON counters for a port
+ * @port_type: Port type for counter retrieval (see &enum mxl862xx_port_type)
+ * @port_id: Ethernet port number (zero-based)
+ * @sub_if_id_group: Sub-interface ID group
+ * @pce_bypass: Separate CTP Tx counters when PCE is bypassed
+ * @rx_extended_vlan_discard_pkts: Discarded at extended VLAN operation
+ * @mtu_exceed_discard_pkts: Discarded due to MTU exceeded
+ * @tx_under_size_good_pkts: Tx undersize (<64) packet count
+ * @tx_oversize_good_pkts: Tx oversize (>1518) packet count
+ * @rx_good_pkts: Received good packet count
+ * @rx_unicast_pkts: Received unicast packet count
+ * @rx_broadcast_pkts: Received broadcast packet count
+ * @rx_multicast_pkts: Received multicast packet count
+ * @rx_fcserror_pkts: Received FCS error packet count
+ * @rx_under_size_good_pkts: Received undersize good packet count
+ * @rx_oversize_good_pkts: Received oversize good packet count
+ * @rx_under_size_error_pkts: Received undersize error packet count
+ * @rx_good_pause_pkts: Received good pause packet count
+ * @rx_oversize_error_pkts: Received oversize error packet count
+ * @rx_align_error_pkts: Received alignment error packet count
+ * @rx_filtered_pkts: Filtered packet count
+ * @rx64byte_pkts: Received 64-byte packet count
+ * @rx127byte_pkts: Received 65-127 byte packet count
+ * @rx255byte_pkts: Received 128-255 byte packet count
+ * @rx511byte_pkts: Received 256-511 byte packet count
+ * @rx1023byte_pkts: Received 512-1023 byte packet count
+ * @rx_max_byte_pkts: Received 1024-max byte packet count
+ * @tx_good_pkts: Transmitted good packet count
+ * @tx_unicast_pkts: Transmitted unicast packet count
+ * @tx_broadcast_pkts: Transmitted broadcast packet count
+ * @tx_multicast_pkts: Transmitted multicast packet count
+ * @tx_single_coll_count: Transmit single collision count
+ * @tx_mult_coll_count: Transmit multiple collision count
+ * @tx_late_coll_count: Transmit late collision count
+ * @tx_excess_coll_count: Transmit excessive collision count
+ * @tx_coll_count: Transmit collision count
+ * @tx_pause_count: Transmit pause packet count
+ * @tx64byte_pkts: Transmitted 64-byte packet count
+ * @tx127byte_pkts: Transmitted 65-127 byte packet count
+ * @tx255byte_pkts: Transmitted 128-255 byte packet count
+ * @tx511byte_pkts: Transmitted 256-511 byte packet count
+ * @tx1023byte_pkts: Transmitted 512-1023 byte packet count
+ * @tx_max_byte_pkts: Transmitted 1024-max byte packet count
+ * @tx_dropped_pkts: Transmit dropped packet count
+ * @tx_acm_dropped_pkts: Transmit ACM dropped packet count
+ * @rx_dropped_pkts: Received dropped packet count
+ * @rx_good_bytes: Received good byte count (64-bit)
+ * @rx_bad_bytes: Received bad byte count (64-bit)
+ * @tx_good_bytes: Transmitted good byte count (64-bit)
+ */
+struct mxl862xx_rmon_port_cnt {
+ __le32 port_type; /* enum mxl862xx_port_type */
+ __le16 port_id;
+ __le16 sub_if_id_group;
+ u8 pce_bypass;
+ __le32 rx_extended_vlan_discard_pkts;
+ __le32 mtu_exceed_discard_pkts;
+ __le32 tx_under_size_good_pkts;
+ __le32 tx_oversize_good_pkts;
+ __le32 rx_good_pkts;
+ __le32 rx_unicast_pkts;
+ __le32 rx_broadcast_pkts;
+ __le32 rx_multicast_pkts;
+ __le32 rx_fcserror_pkts;
+ __le32 rx_under_size_good_pkts;
+ __le32 rx_oversize_good_pkts;
+ __le32 rx_under_size_error_pkts;
+ __le32 rx_good_pause_pkts;
+ __le32 rx_oversize_error_pkts;
+ __le32 rx_align_error_pkts;
+ __le32 rx_filtered_pkts;
+ __le32 rx64byte_pkts;
+ __le32 rx127byte_pkts;
+ __le32 rx255byte_pkts;
+ __le32 rx511byte_pkts;
+ __le32 rx1023byte_pkts;
+ __le32 rx_max_byte_pkts;
+ __le32 tx_good_pkts;
+ __le32 tx_unicast_pkts;
+ __le32 tx_broadcast_pkts;
+ __le32 tx_multicast_pkts;
+ __le32 tx_single_coll_count;
+ __le32 tx_mult_coll_count;
+ __le32 tx_late_coll_count;
+ __le32 tx_excess_coll_count;
+ __le32 tx_coll_count;
+ __le32 tx_pause_count;
+ __le32 tx64byte_pkts;
+ __le32 tx127byte_pkts;
+ __le32 tx255byte_pkts;
+ __le32 tx511byte_pkts;
+ __le32 tx1023byte_pkts;
+ __le32 tx_max_byte_pkts;
+ __le32 tx_dropped_pkts;
+ __le32 tx_acm_dropped_pkts;
+ __le32 rx_dropped_pkts;
+ __le64 rx_good_bytes;
+ __le64 rx_bad_bytes;
+ __le64 tx_good_bytes;
+} __packed;
+
#endif /* __MXL862XX_API_H */
diff --git a/drivers/net/dsa/mxl862xx/mxl862xx-cmd.h b/drivers/net/dsa/mxl862xx/mxl862xx-cmd.h
index 45df37cde40d1..f1ea40aa7ea08 100644
--- a/drivers/net/dsa/mxl862xx/mxl862xx-cmd.h
+++ b/drivers/net/dsa/mxl862xx/mxl862xx-cmd.h
@@ -16,6 +16,7 @@
#define MXL862XX_BRDGPORT_MAGIC 0x400
#define MXL862XX_CTP_MAGIC 0x500
#define MXL862XX_QOS_MAGIC 0x600
+#define MXL862XX_RMON_MAGIC 0x700
#define MXL862XX_SWMAC_MAGIC 0xa00
#define MXL862XX_EXTVLAN_MAGIC 0xb00
#define MXL862XX_VLANFILTER_MAGIC 0xc00
@@ -43,6 +44,8 @@
#define MXL862XX_QOS_METERCFGSET (MXL862XX_QOS_MAGIC + 0x2)
#define MXL862XX_QOS_METERALLOC (MXL862XX_QOS_MAGIC + 0x2a)
+#define MXL862XX_RMON_PORT_GET (MXL862XX_RMON_MAGIC + 0x1)
+
#define MXL862XX_MAC_TABLEENTRYADD (MXL862XX_SWMAC_MAGIC + 0x2)
#define MXL862XX_MAC_TABLEENTRYREAD (MXL862XX_SWMAC_MAGIC + 0x3)
#define MXL862XX_MAC_TABLEENTRYQUERY (MXL862XX_SWMAC_MAGIC + 0x4)
diff --git a/drivers/net/dsa/mxl862xx/mxl862xx.c b/drivers/net/dsa/mxl862xx/mxl862xx.c
index fca9a3e36bb69..58bf7210c6d40 100644
--- a/drivers/net/dsa/mxl862xx/mxl862xx.c
+++ b/drivers/net/dsa/mxl862xx/mxl862xx.c
@@ -30,6 +30,38 @@
#define MXL862XX_API_READ_QUIET(dev, cmd, data) \
mxl862xx_api_wrap(dev, cmd, &(data), sizeof((data)), true, true)
+struct mxl862xx_mib_desc {
+ unsigned int size;
+ unsigned int offset;
+ const char *name;
+};
+
+#define MIB_DESC(_size, _name, _element) \
+{ \
+ .size = _size, \
+ .name = _name, \
+ .offset = offsetof(struct mxl862xx_rmon_port_cnt, _element) \
+}
+
+/* Hardware-specific counters not covered by any standardized stats callback. */
+static const struct mxl862xx_mib_desc mxl862xx_mib[] = {
+ MIB_DESC(1, "TxAcmDroppedPkts", tx_acm_dropped_pkts),
+ MIB_DESC(1, "RxFilteredPkts", rx_filtered_pkts),
+ MIB_DESC(1, "RxExtendedVlanDiscardPkts", rx_extended_vlan_discard_pkts),
+ MIB_DESC(1, "MtuExceedDiscardPkts", mtu_exceed_discard_pkts),
+ MIB_DESC(2, "RxBadBytes", rx_bad_bytes),
+};
+
+static const struct ethtool_rmon_hist_range mxl862xx_rmon_ranges[] = {
+ { 0, 64 },
+ { 65, 127 },
+ { 128, 255 },
+ { 256, 511 },
+ { 512, 1023 },
+ { 1024, 10240 },
+ {}
+};
+
#define MXL862XX_SDMA_PCTRLP(p) (0xbc0 + ((p) * 0x6))
#define MXL862XX_SDMA_PCTRL_EN BIT(0)
@@ -1734,6 +1766,140 @@ static int mxl862xx_port_bridge_flags(struct dsa_switch *ds, int port,
return 0;
}
+static void mxl862xx_get_strings(struct dsa_switch *ds, int port,
+ u32 stringset, u8 *data)
+{
+ int i;
+
+ if (stringset != ETH_SS_STATS)
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(mxl862xx_mib); i++)
+ ethtool_puts(&data, mxl862xx_mib[i].name);
+}
+
+static int mxl862xx_get_sset_count(struct dsa_switch *ds, int port, int sset)
+{
+ if (sset != ETH_SS_STATS)
+ return 0;
+
+ return ARRAY_SIZE(mxl862xx_mib);
+}
+
+static int mxl862xx_read_rmon(struct dsa_switch *ds, int port,
+ struct mxl862xx_rmon_port_cnt *cnt)
+{
+ memset(cnt, 0, sizeof(*cnt));
+ cnt->port_type = cpu_to_le32(MXL862XX_CTP_PORT);
+ cnt->port_id = cpu_to_le16(port);
+
+ return MXL862XX_API_READ(ds->priv, MXL862XX_RMON_PORT_GET, *cnt);
+}
+
+static void mxl862xx_get_ethtool_stats(struct dsa_switch *ds, int port,
+ u64 *data)
+{
+ const struct mxl862xx_mib_desc *mib;
+ struct mxl862xx_rmon_port_cnt cnt;
+ int ret, i;
+ void *field;
+
+ ret = mxl862xx_read_rmon(ds, port, &cnt);
+ if (ret) {
+ dev_err(ds->dev, "failed to read RMON stats on port %d\n", port);
+ return;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(mxl862xx_mib); i++) {
+ mib = &mxl862xx_mib[i];
+ field = (u8 *)&cnt + mib->offset;
+
+ if (mib->size == 1)
+ *data++ = le32_to_cpu(*(__le32 *)field);
+ else
+ *data++ = le64_to_cpu(*(__le64 *)field);
+ }
+}
+
+static void mxl862xx_get_eth_mac_stats(struct dsa_switch *ds, int port,
+ struct ethtool_eth_mac_stats *mac_stats)
+{
+ struct mxl862xx_rmon_port_cnt cnt;
+
+ if (mxl862xx_read_rmon(ds, port, &cnt))
+ return;
+
+ mac_stats->FramesTransmittedOK = le32_to_cpu(cnt.tx_good_pkts);
+ mac_stats->SingleCollisionFrames = le32_to_cpu(cnt.tx_single_coll_count);
+ mac_stats->MultipleCollisionFrames = le32_to_cpu(cnt.tx_mult_coll_count);
+ mac_stats->FramesReceivedOK = le32_to_cpu(cnt.rx_good_pkts);
+ mac_stats->FrameCheckSequenceErrors = le32_to_cpu(cnt.rx_fcserror_pkts);
+ mac_stats->AlignmentErrors = le32_to_cpu(cnt.rx_align_error_pkts);
+ mac_stats->OctetsTransmittedOK = le64_to_cpu(cnt.tx_good_bytes);
+ mac_stats->LateCollisions = le32_to_cpu(cnt.tx_late_coll_count);
+ mac_stats->FramesAbortedDueToXSColls = le32_to_cpu(cnt.tx_excess_coll_count);
+ mac_stats->OctetsReceivedOK = le64_to_cpu(cnt.rx_good_bytes);
+ mac_stats->MulticastFramesXmittedOK = le32_to_cpu(cnt.tx_multicast_pkts);
+ mac_stats->BroadcastFramesXmittedOK = le32_to_cpu(cnt.tx_broadcast_pkts);
+ mac_stats->MulticastFramesReceivedOK = le32_to_cpu(cnt.rx_multicast_pkts);
+ mac_stats->BroadcastFramesReceivedOK = le32_to_cpu(cnt.rx_broadcast_pkts);
+ mac_stats->FrameTooLongErrors = le32_to_cpu(cnt.rx_oversize_error_pkts);
+}
+
+static void mxl862xx_get_eth_ctrl_stats(struct dsa_switch *ds, int port,
+ struct ethtool_eth_ctrl_stats *ctrl_stats)
+{
+ struct mxl862xx_rmon_port_cnt cnt;
+
+ if (mxl862xx_read_rmon(ds, port, &cnt))
+ return;
+
+ ctrl_stats->MACControlFramesTransmitted = le32_to_cpu(cnt.tx_pause_count);
+ ctrl_stats->MACControlFramesReceived = le32_to_cpu(cnt.rx_good_pause_pkts);
+}
+
+static void mxl862xx_get_pause_stats(struct dsa_switch *ds, int port,
+ struct ethtool_pause_stats *pause_stats)
+{
+ struct mxl862xx_rmon_port_cnt cnt;
+
+ if (mxl862xx_read_rmon(ds, port, &cnt))
+ return;
+
+ pause_stats->tx_pause_frames = le32_to_cpu(cnt.tx_pause_count);
+ pause_stats->rx_pause_frames = le32_to_cpu(cnt.rx_good_pause_pkts);
+}
+
+static void mxl862xx_get_rmon_stats(struct dsa_switch *ds, int port,
+ struct ethtool_rmon_stats *rmon_stats,
+ const struct ethtool_rmon_hist_range **ranges)
+{
+ struct mxl862xx_rmon_port_cnt cnt;
+
+ if (mxl862xx_read_rmon(ds, port, &cnt))
+ return;
+
+ rmon_stats->undersize_pkts = le32_to_cpu(cnt.rx_under_size_good_pkts);
+ rmon_stats->oversize_pkts = le32_to_cpu(cnt.rx_oversize_good_pkts);
+ rmon_stats->fragments = le32_to_cpu(cnt.rx_under_size_error_pkts);
+ rmon_stats->jabbers = le32_to_cpu(cnt.rx_oversize_error_pkts);
+
+ rmon_stats->hist[0] = le32_to_cpu(cnt.rx64byte_pkts);
+ rmon_stats->hist[1] = le32_to_cpu(cnt.rx127byte_pkts);
+ rmon_stats->hist[2] = le32_to_cpu(cnt.rx255byte_pkts);
+ rmon_stats->hist[3] = le32_to_cpu(cnt.rx511byte_pkts);
+ rmon_stats->hist[4] = le32_to_cpu(cnt.rx1023byte_pkts);
+ rmon_stats->hist[5] = le32_to_cpu(cnt.rx_max_byte_pkts);
+
+ rmon_stats->hist_tx[0] = le32_to_cpu(cnt.tx64byte_pkts);
+ rmon_stats->hist_tx[1] = le32_to_cpu(cnt.tx127byte_pkts);
+ rmon_stats->hist_tx[2] = le32_to_cpu(cnt.tx255byte_pkts);
+ rmon_stats->hist_tx[3] = le32_to_cpu(cnt.tx511byte_pkts);
+ rmon_stats->hist_tx[4] = le32_to_cpu(cnt.tx1023byte_pkts);
+ rmon_stats->hist_tx[5] = le32_to_cpu(cnt.tx_max_byte_pkts);
+
+ *ranges = mxl862xx_rmon_ranges;
+}
static const struct dsa_switch_ops mxl862xx_switch_ops = {
.get_tag_protocol = mxl862xx_get_tag_protocol,
.setup = mxl862xx_setup,
@@ -1758,6 +1924,13 @@ static const struct dsa_switch_ops mxl862xx_switch_ops = {
.port_vlan_filtering = mxl862xx_port_vlan_filtering,
.port_vlan_add = mxl862xx_port_vlan_add,
.port_vlan_del = mxl862xx_port_vlan_del,
+ .get_strings = mxl862xx_get_strings,
+ .get_sset_count = mxl862xx_get_sset_count,
+ .get_ethtool_stats = mxl862xx_get_ethtool_stats,
+ .get_eth_mac_stats = mxl862xx_get_eth_mac_stats,
+ .get_eth_ctrl_stats = mxl862xx_get_eth_ctrl_stats,
+ .get_pause_stats = mxl862xx_get_pause_stats,
+ .get_rmon_stats = mxl862xx_get_rmon_stats,
};
static void mxl862xx_phylink_mac_config(struct phylink_config *config,
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v2 2/2] net: dsa: mxl862xx: implement .get_stats64
From: Daniel Golle @ 2026-04-12 0:02 UTC (permalink / raw)
To: Daniel Golle, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King, netdev,
linux-kernel
Cc: Frank Wunderlich, Chad Monroe, Cezary Wilmanski, Liang Xu,
Benny (Ying-Tsan) Weng, Jose Maria Verdu Munoz, Avinash Jayaraman,
John Crispin
In-Reply-To: <cover.1775951347.git.daniel@makrotopia.org>
Poll free-running firmware RMON counters every 2 seconds and accumulate
deltas into 64-bit per-port statistics. 32-bit packet counters wrap
in ~220s at 10 Gbps line rate with minimum-size frames; the 2s polling
interval provides a comfortable margin. The .get_stats64 callback
forces a fresh poll so that counters are always up to date when queried.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
v2: no changes
drivers/net/dsa/mxl862xx/mxl862xx-host.c | 8 +-
drivers/net/dsa/mxl862xx/mxl862xx.c | 175 +++++++++++++++++++++++
drivers/net/dsa/mxl862xx/mxl862xx.h | 94 +++++++++++-
3 files changed, 270 insertions(+), 7 deletions(-)
diff --git a/drivers/net/dsa/mxl862xx/mxl862xx-host.c b/drivers/net/dsa/mxl862xx/mxl862xx-host.c
index cadbdb590cf43..d55f9dff6433e 100644
--- a/drivers/net/dsa/mxl862xx/mxl862xx-host.c
+++ b/drivers/net/dsa/mxl862xx/mxl862xx-host.c
@@ -48,7 +48,7 @@ static void mxl862xx_crc_err_work_fn(struct work_struct *work)
dev_close(dp->conduit);
rtnl_unlock();
- clear_bit(0, &priv->crc_err);
+ clear_bit(MXL862XX_FLAG_CRC_ERR, &priv->flags);
}
/* Firmware CRC error codes (outside normal Zephyr errno range). */
@@ -247,7 +247,7 @@ static int mxl862xx_issue_cmd(struct mxl862xx_priv *priv, u16 cmd, u16 len)
ret = mxl862xx_crc6_verify(ctrl_enc, len_enc, &fw_result);
if (ret) {
- if (!test_and_set_bit(0, &priv->crc_err))
+ if (!test_and_set_bit(MXL862XX_FLAG_CRC_ERR, &priv->flags))
schedule_work(&priv->crc_err_work);
return -EIO;
}
@@ -314,7 +314,7 @@ static int mxl862xx_send_cmd(struct mxl862xx_priv *priv, u16 cmd, u16 size,
if (ret < 0) {
if ((ret == MXL862XX_FW_CRC6_ERR ||
ret == MXL862XX_FW_CRC16_ERR) &&
- !test_and_set_bit(0, &priv->crc_err))
+ !test_and_set_bit(MXL862XX_FLAG_CRC_ERR, &priv->flags))
schedule_work(&priv->crc_err_work);
if (!quiet)
dev_err(&priv->mdiodev->dev,
@@ -458,7 +458,7 @@ int mxl862xx_api_wrap(struct mxl862xx_priv *priv, u16 cmd, void *_data,
}
if (crc16(0xffff, (const u8 *)data, size) != crc) {
- if (!test_and_set_bit(0, &priv->crc_err))
+ if (!test_and_set_bit(MXL862XX_FLAG_CRC_ERR, &priv->flags))
schedule_work(&priv->crc_err_work);
ret = -EIO;
goto out;
diff --git a/drivers/net/dsa/mxl862xx/mxl862xx.c b/drivers/net/dsa/mxl862xx/mxl862xx.c
index 58bf7210c6d40..b60482d93a855 100644
--- a/drivers/net/dsa/mxl862xx/mxl862xx.c
+++ b/drivers/net/dsa/mxl862xx/mxl862xx.c
@@ -30,6 +30,12 @@
#define MXL862XX_API_READ_QUIET(dev, cmd, data) \
mxl862xx_api_wrap(dev, cmd, &(data), sizeof((data)), true, true)
+/* Polling interval for RMON counter accumulation. At 2.5 Gbps with
+ * minimum-size (64-byte) frames, a 32-bit packet counter wraps in ~880s.
+ * 2s gives a comfortable margin.
+ */
+#define MXL862XX_STATS_POLL_INTERVAL (2 * HZ)
+
struct mxl862xx_mib_desc {
unsigned int size;
unsigned int offset;
@@ -677,6 +683,9 @@ static int mxl862xx_setup(struct dsa_switch *ds)
if (ret)
return ret;
+ schedule_delayed_work(&priv->stats_work,
+ MXL862XX_STATS_POLL_INTERVAL);
+
return mxl862xx_setup_mdio(ds);
}
@@ -1900,6 +1909,159 @@ static void mxl862xx_get_rmon_stats(struct dsa_switch *ds, int port,
*ranges = mxl862xx_rmon_ranges;
}
+
+/* Compute the delta between two 32-bit free-running counter snapshots,
+ * handling a single wrap-around correctly via unsigned subtraction.
+ */
+static u64 mxl862xx_delta32(u32 cur, u32 prev)
+{
+ return (u32)(cur - prev);
+}
+
+/**
+ * mxl862xx_stats_poll - Read RMON counters and accumulate into 64-bit stats
+ * @ds: DSA switch
+ * @port: port index
+ *
+ * The firmware RMON counters are free-running 32-bit values (64-bit for
+ * byte counters). This function reads the hardware via MDIO (may sleep),
+ * computes deltas from the previous snapshot, and accumulates them into
+ * 64-bit per-port stats under a spinlock.
+ *
+ * Called only from the stats polling workqueue -- serialized by the
+ * single-threaded delayed_work, so no MDIO locking is needed here.
+ */
+static void mxl862xx_stats_poll(struct dsa_switch *ds, int port)
+{
+ struct mxl862xx_priv *priv = ds->priv;
+ struct mxl862xx_port_stats *s = &priv->ports[port].stats;
+ u32 rx_fcserr, rx_under, rx_over, rx_align, tx_drop;
+ u32 rx_drop, rx_evlan, mtu_exc, tx_acm;
+ struct mxl862xx_rmon_port_cnt cnt;
+ u64 rx_bytes, tx_bytes;
+ u32 rx_mcast, tx_coll;
+ u32 rx_pkts, tx_pkts;
+
+ /* MDIO read -- may sleep, done outside the spinlock. */
+ if (mxl862xx_read_rmon(ds, port, &cnt))
+ return;
+
+ rx_pkts = le32_to_cpu(cnt.rx_good_pkts);
+ tx_pkts = le32_to_cpu(cnt.tx_good_pkts);
+ rx_bytes = le64_to_cpu(cnt.rx_good_bytes);
+ tx_bytes = le64_to_cpu(cnt.tx_good_bytes);
+ rx_fcserr = le32_to_cpu(cnt.rx_fcserror_pkts);
+ rx_under = le32_to_cpu(cnt.rx_under_size_error_pkts);
+ rx_over = le32_to_cpu(cnt.rx_oversize_error_pkts);
+ rx_align = le32_to_cpu(cnt.rx_align_error_pkts);
+ tx_drop = le32_to_cpu(cnt.tx_dropped_pkts);
+ rx_drop = le32_to_cpu(cnt.rx_dropped_pkts);
+ rx_evlan = le32_to_cpu(cnt.rx_extended_vlan_discard_pkts);
+ mtu_exc = le32_to_cpu(cnt.mtu_exceed_discard_pkts);
+ tx_acm = le32_to_cpu(cnt.tx_acm_dropped_pkts);
+ rx_mcast = le32_to_cpu(cnt.rx_multicast_pkts);
+ tx_coll = le32_to_cpu(cnt.tx_coll_count);
+
+ /* Accumulate deltas under spinlock -- .get_stats64 reads these. */
+ spin_lock_bh(&priv->ports[port].stats_lock);
+
+ s->rx_packets += mxl862xx_delta32(rx_pkts, s->prev_rx_good_pkts);
+ s->tx_packets += mxl862xx_delta32(tx_pkts, s->prev_tx_good_pkts);
+ s->rx_bytes += rx_bytes - s->prev_rx_good_bytes;
+ s->tx_bytes += tx_bytes - s->prev_tx_good_bytes;
+
+ s->rx_errors +=
+ mxl862xx_delta32(rx_fcserr, s->prev_rx_fcserror_pkts) +
+ mxl862xx_delta32(rx_under, s->prev_rx_under_size_error_pkts) +
+ mxl862xx_delta32(rx_over, s->prev_rx_oversize_error_pkts) +
+ mxl862xx_delta32(rx_align, s->prev_rx_align_error_pkts);
+ s->tx_errors +=
+ mxl862xx_delta32(tx_drop, s->prev_tx_dropped_pkts);
+
+ s->rx_dropped +=
+ mxl862xx_delta32(rx_drop, s->prev_rx_dropped_pkts) +
+ mxl862xx_delta32(rx_evlan, s->prev_rx_evlan_discard_pkts) +
+ mxl862xx_delta32(mtu_exc, s->prev_mtu_exceed_discard_pkts);
+ s->tx_dropped +=
+ mxl862xx_delta32(tx_drop, s->prev_tx_dropped_pkts) +
+ mxl862xx_delta32(tx_acm, s->prev_tx_acm_dropped_pkts);
+
+ s->multicast += mxl862xx_delta32(rx_mcast, s->prev_rx_multicast_pkts);
+ s->collisions += mxl862xx_delta32(tx_coll, s->prev_tx_coll_count);
+
+ s->rx_length_errors +=
+ mxl862xx_delta32(rx_under, s->prev_rx_under_size_error_pkts) +
+ mxl862xx_delta32(rx_over, s->prev_rx_oversize_error_pkts);
+ s->rx_crc_errors +=
+ mxl862xx_delta32(rx_fcserr, s->prev_rx_fcserror_pkts);
+ s->rx_frame_errors +=
+ mxl862xx_delta32(rx_align, s->prev_rx_align_error_pkts);
+
+ s->prev_rx_good_pkts = rx_pkts;
+ s->prev_tx_good_pkts = tx_pkts;
+ s->prev_rx_good_bytes = rx_bytes;
+ s->prev_tx_good_bytes = tx_bytes;
+ s->prev_rx_fcserror_pkts = rx_fcserr;
+ s->prev_rx_under_size_error_pkts = rx_under;
+ s->prev_rx_oversize_error_pkts = rx_over;
+ s->prev_rx_align_error_pkts = rx_align;
+ s->prev_tx_dropped_pkts = tx_drop;
+ s->prev_rx_dropped_pkts = rx_drop;
+ s->prev_rx_evlan_discard_pkts = rx_evlan;
+ s->prev_mtu_exceed_discard_pkts = mtu_exc;
+ s->prev_tx_acm_dropped_pkts = tx_acm;
+ s->prev_rx_multicast_pkts = rx_mcast;
+ s->prev_tx_coll_count = tx_coll;
+
+ spin_unlock_bh(&priv->ports[port].stats_lock);
+}
+
+static void mxl862xx_stats_work_fn(struct work_struct *work)
+{
+ struct mxl862xx_priv *priv =
+ container_of(work, struct mxl862xx_priv, stats_work.work);
+ struct dsa_switch *ds = priv->ds;
+ struct dsa_port *dp;
+
+ dsa_switch_for_each_available_port(dp, ds)
+ mxl862xx_stats_poll(ds, dp->index);
+
+ if (!test_bit(MXL862XX_FLAG_WORK_STOPPED, &priv->flags))
+ schedule_delayed_work(&priv->stats_work,
+ MXL862XX_STATS_POLL_INTERVAL);
+}
+
+static void mxl862xx_get_stats64(struct dsa_switch *ds, int port,
+ struct rtnl_link_stats64 *s)
+{
+ struct mxl862xx_priv *priv = ds->priv;
+ struct mxl862xx_port_stats *ps = &priv->ports[port].stats;
+
+ spin_lock_bh(&priv->ports[port].stats_lock);
+
+ s->rx_packets = ps->rx_packets;
+ s->tx_packets = ps->tx_packets;
+ s->rx_bytes = ps->rx_bytes;
+ s->tx_bytes = ps->tx_bytes;
+ s->rx_errors = ps->rx_errors;
+ s->tx_errors = ps->tx_errors;
+ s->rx_dropped = ps->rx_dropped;
+ s->tx_dropped = ps->tx_dropped;
+ s->multicast = ps->multicast;
+ s->collisions = ps->collisions;
+ s->rx_length_errors = ps->rx_length_errors;
+ s->rx_crc_errors = ps->rx_crc_errors;
+ s->rx_frame_errors = ps->rx_frame_errors;
+
+ spin_unlock_bh(&priv->ports[port].stats_lock);
+
+ /* Trigger a fresh poll so the next read sees up-to-date counters.
+ * No-op if the work is already pending, running, or teardown started.
+ */
+ if (!test_bit(MXL862XX_FLAG_WORK_STOPPED, &priv->flags))
+ schedule_delayed_work(&priv->stats_work, 0);
+}
+
static const struct dsa_switch_ops mxl862xx_switch_ops = {
.get_tag_protocol = mxl862xx_get_tag_protocol,
.setup = mxl862xx_setup,
@@ -1931,6 +2093,7 @@ static const struct dsa_switch_ops mxl862xx_switch_ops = {
.get_eth_ctrl_stats = mxl862xx_get_eth_ctrl_stats,
.get_pause_stats = mxl862xx_get_pause_stats,
.get_rmon_stats = mxl862xx_get_rmon_stats,
+ .get_stats64 = mxl862xx_get_stats64,
};
static void mxl862xx_phylink_mac_config(struct phylink_config *config,
@@ -1992,16 +2155,22 @@ static int mxl862xx_probe(struct mdio_device *mdiodev)
priv->ports[i].priv = priv;
INIT_WORK(&priv->ports[i].host_flood_work,
mxl862xx_host_flood_work_fn);
+ spin_lock_init(&priv->ports[i].stats_lock);
}
+ INIT_DELAYED_WORK(&priv->stats_work, mxl862xx_stats_work_fn);
+
dev_set_drvdata(dev, ds);
err = dsa_register_switch(ds);
if (err) {
+ set_bit(MXL862XX_FLAG_WORK_STOPPED, &priv->flags);
+ cancel_delayed_work_sync(&priv->stats_work);
mxl862xx_host_shutdown(priv);
for (i = 0; i < MXL862XX_MAX_PORTS; i++)
cancel_work_sync(&priv->ports[i].host_flood_work);
}
+
return err;
}
@@ -2016,6 +2185,9 @@ static void mxl862xx_remove(struct mdio_device *mdiodev)
priv = ds->priv;
+ set_bit(MXL862XX_FLAG_WORK_STOPPED, &priv->flags);
+ cancel_delayed_work_sync(&priv->stats_work);
+
dsa_unregister_switch(ds);
mxl862xx_host_shutdown(priv);
@@ -2042,6 +2214,9 @@ static void mxl862xx_shutdown(struct mdio_device *mdiodev)
dsa_switch_shutdown(ds);
+ set_bit(MXL862XX_FLAG_WORK_STOPPED, &priv->flags);
+ cancel_delayed_work_sync(&priv->stats_work);
+
mxl862xx_host_shutdown(priv);
for (i = 0; i < MXL862XX_MAX_PORTS; i++)
diff --git a/drivers/net/dsa/mxl862xx/mxl862xx.h b/drivers/net/dsa/mxl862xx/mxl862xx.h
index a010cf6b961a9..80053ab40e4ce 100644
--- a/drivers/net/dsa/mxl862xx/mxl862xx.h
+++ b/drivers/net/dsa/mxl862xx/mxl862xx.h
@@ -116,6 +116,79 @@ struct mxl862xx_evlan_block {
u16 n_active;
};
+/**
+ * struct mxl862xx_port_stats - 64-bit accumulated hardware port statistics
+ * @rx_packets: total received packets
+ * @tx_packets: total transmitted packets
+ * @rx_bytes: total received bytes
+ * @tx_bytes: total transmitted bytes
+ * @rx_errors: total receive errors
+ * @tx_errors: total transmit errors
+ * @rx_dropped: total received packets dropped
+ * @tx_dropped: total transmitted packets dropped
+ * @multicast: total received multicast packets
+ * @collisions: total transmit collisions
+ * @rx_length_errors: received length errors (undersize + oversize)
+ * @rx_crc_errors: received FCS errors
+ * @rx_frame_errors: received alignment errors
+ * @prev_rx_good_pkts: previous snapshot of rx good packet counter
+ * @prev_tx_good_pkts: previous snapshot of tx good packet counter
+ * @prev_rx_good_bytes: previous snapshot of rx good byte counter
+ * @prev_tx_good_bytes: previous snapshot of tx good byte counter
+ * @prev_rx_fcserror_pkts: previous snapshot of rx FCS error counter
+ * @prev_rx_under_size_error_pkts: previous snapshot of rx undersize
+ * error counter
+ * @prev_rx_oversize_error_pkts: previous snapshot of rx oversize
+ * error counter
+ * @prev_rx_align_error_pkts: previous snapshot of rx alignment
+ * error counter
+ * @prev_tx_dropped_pkts: previous snapshot of tx dropped counter
+ * @prev_rx_dropped_pkts: previous snapshot of rx dropped counter
+ * @prev_rx_evlan_discard_pkts: previous snapshot of extended VLAN
+ * discard counter
+ * @prev_mtu_exceed_discard_pkts: previous snapshot of MTU exceed
+ * discard counter
+ * @prev_tx_acm_dropped_pkts: previous snapshot of tx ACM dropped
+ * counter
+ * @prev_rx_multicast_pkts: previous snapshot of rx multicast counter
+ * @prev_tx_coll_count: previous snapshot of tx collision counter
+ *
+ * The firmware RMON counters are 32-bit free-running (64-bit for byte
+ * counters). This structure holds 64-bit accumulators alongside the
+ * previous raw snapshot so that deltas can be computed across polls,
+ * handling 32-bit wrap correctly via unsigned subtraction.
+ */
+struct mxl862xx_port_stats {
+ u64 rx_packets;
+ u64 tx_packets;
+ u64 rx_bytes;
+ u64 tx_bytes;
+ u64 rx_errors;
+ u64 tx_errors;
+ u64 rx_dropped;
+ u64 tx_dropped;
+ u64 multicast;
+ u64 collisions;
+ u64 rx_length_errors;
+ u64 rx_crc_errors;
+ u64 rx_frame_errors;
+ u32 prev_rx_good_pkts;
+ u32 prev_tx_good_pkts;
+ u64 prev_rx_good_bytes;
+ u64 prev_tx_good_bytes;
+ u32 prev_rx_fcserror_pkts;
+ u32 prev_rx_under_size_error_pkts;
+ u32 prev_rx_oversize_error_pkts;
+ u32 prev_rx_align_error_pkts;
+ u32 prev_tx_dropped_pkts;
+ u32 prev_rx_dropped_pkts;
+ u32 prev_rx_evlan_discard_pkts;
+ u32 prev_mtu_exceed_discard_pkts;
+ u32 prev_tx_acm_dropped_pkts;
+ u32 prev_rx_multicast_pkts;
+ u32 prev_tx_coll_count;
+};
+
/**
* struct mxl862xx_port - per-port state tracked by the driver
* @priv: back-pointer to switch private data; needed by
@@ -145,6 +218,10 @@ struct mxl862xx_evlan_block {
* The worker acquires rtnl_lock() to serialize with
* DSA callbacks and checks @setup_done to avoid
* acting on torn-down ports.
+ * @stats: 64-bit accumulated hardware statistics; updated
+ * periodically by the stats polling work
+ * @stats_lock: protects accumulator reads in .get_stats64 against
+ * concurrent updates from the polling work
*/
struct mxl862xx_port {
struct mxl862xx_priv *priv;
@@ -160,16 +237,24 @@ struct mxl862xx_port {
bool host_flood_uc;
bool host_flood_mc;
struct work_struct host_flood_work;
+ struct mxl862xx_port_stats stats;
+ spinlock_t stats_lock; /* protects stats accumulators */
};
+/* Bit indices for struct mxl862xx_priv::flags */
+#define MXL862XX_FLAG_CRC_ERR 0
+#define MXL862XX_FLAG_WORK_STOPPED 1
+
/**
* struct mxl862xx_priv - driver private data for an MxL862xx switch
* @ds: pointer to the DSA switch instance
* @mdiodev: MDIO device used to communicate with the switch firmware
* @crc_err_work: deferred work for shutting down all ports on MDIO CRC
* errors
- * @crc_err: set atomically before CRC-triggered shutdown, cleared
- * after
+ * @flags: atomic status flags; %MXL862XX_FLAG_CRC_ERR is set
+ * before CRC-triggered shutdown and cleared after;
+ * %MXL862XX_FLAG_WORK_STOPPED is set before cancelling
+ * stats_work to prevent rescheduling during teardown
* @drop_meter: index of the single shared zero-rate firmware meter
* used to unconditionally drop traffic (used to block
* flooding)
@@ -181,18 +266,21 @@ struct mxl862xx_port {
* @evlan_ingress_size: per-port ingress Extended VLAN block size
* @evlan_egress_size: per-port egress Extended VLAN block size
* @vf_block_size: per-port VLAN Filter block size
+ * @stats_work: periodic work item that polls RMON hardware counters
+ * and accumulates them into 64-bit per-port stats
*/
struct mxl862xx_priv {
struct dsa_switch *ds;
struct mdio_device *mdiodev;
struct work_struct crc_err_work;
- unsigned long crc_err;
+ unsigned long flags;
u16 drop_meter;
struct mxl862xx_port ports[MXL862XX_MAX_PORTS];
u16 bridges[MXL862XX_MAX_BRIDGES + 1];
u16 evlan_ingress_size;
u16 evlan_egress_size;
u16 vf_block_size;
+ struct delayed_work stats_work;
};
#endif /* __MXL862XX_H */
--
2.53.0
^ permalink raw reply related
* [PATCH v4] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Prathamesh Deshpande @ 2026-04-12 0:04 UTC (permalink / raw)
To: Carolina Jubran, Saeed Mahameed, Leon Romanovsky
Cc: Richard Cochran, Tariq Toukan, Mark Bloch, netdev, linux-rdma,
linux-kernel, Prathamesh Deshpande
In mlx5_pps_event(), several critical issues were identified:
1. The 'pin' index from the hardware event was used without bounds
checking to index 'pin_config' and 'pps_info->start'. Check against
MAX_PIN_NUM to prevent out-of-bounds access.
2. 'ptp_event' was not zero-initialized, potentially leaking stack
memory through the union.
3. A NULL 'pin_config' could be dereferenced if initialization failed.
4. 'clock->ptp' could be NULL if ptp_clock_register() failed.
Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
Suggested-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
v4:
- Validate pin index against MAX_PIN_NUM instead of n_pins [Carolina].
v3:
- Fix union corruption by using a local timestamp variable [Sashiko].
- Validate pin index against n_pins with WARN_ON_ONCE [Carolina].
- Remove redundant pin < 0 check and cleanup TODO comment.
v2:
- Zero-initialize ptp_event to prevent stack information leak [Sashiko].
- Add bounds check for hardware pin index to prevent OOB access [Sashiko].
- Add NULL guard for pin_config to handle initialization failures [Sashiko].
- Add NULL check for clock->ptp as originally intended.
.../net/ethernet/mellanox/mlx5/core/lib/clock.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
index bd4e042077af..ff03dfa12a67 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
@@ -1164,16 +1164,22 @@ static int mlx5_pps_event(struct notifier_block *nb,
pps_nb);
struct mlx5_core_dev *mdev = clock_state->mdev;
struct mlx5_clock *clock = mdev->clock;
- struct ptp_clock_event ptp_event;
+ struct ptp_clock_event ptp_event = {};
struct mlx5_eqe *eqe = data;
int pin = eqe->data.pps.pin;
unsigned long flags;
u64 ns;
+ if (!clock->ptp_info.pin_config)
+ return NOTIFY_OK;
+
+ if (WARN_ON_ONCE(pin >= MAX_PIN_NUM))
+ return NOTIFY_OK;
+
switch (clock->ptp_info.pin_config[pin].func) {
case PTP_PF_EXTTS:
ptp_event.index = pin;
- ptp_event.timestamp = mlx5_real_time_mode(mdev) ?
+ ns = mlx5_real_time_mode(mdev) ?
mlx5_real_time_cyc2time(clock,
be64_to_cpu(eqe->data.pps.time_stamp)) :
mlx5_timecounter_cyc2time(clock,
@@ -1181,12 +1187,13 @@ static int mlx5_pps_event(struct notifier_block *nb,
if (clock->pps_info.enabled) {
ptp_event.type = PTP_CLOCK_PPSUSR;
ptp_event.pps_times.ts_real =
- ns_to_timespec64(ptp_event.timestamp);
+ ns_to_timespec64(ns);
} else {
ptp_event.type = PTP_CLOCK_EXTTS;
+ ptp_event.timestamp = ns;
}
- /* TODOL clock->ptp can be NULL if ptp_clock_register fails */
- ptp_clock_event(clock->ptp, &ptp_event);
+ if (clock->ptp)
+ ptp_clock_event(clock->ptp, &ptp_event);
break;
case PTP_PF_PEROUT:
if (clock->shared) {
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v3] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: prathamesh deshpande @ 2026-04-12 0:15 UTC (permalink / raw)
To: Carolina Jubran
Cc: Richard Cochran, Tariq Toukan, Mark Bloch, netdev, linux-rdma,
linux-kernel, Leon Romanovsky, Saeed Mahameed
In-Reply-To: <c30f21a3-5a27-43fb-957d-107775b00faf@nvidia.com>
Hi Carolina,
Thanks for the feedback. I have just submitted v4 which addresses this
by checking the pin index against MAX_PIN_NUM.
Thanks,
Prathamesh
On Sat, Apr 11, 2026 at 12:35 PM Carolina Jubran <cjubran@nvidia.com> wrote:
>
>
> On 10/04/2026 4:53, Prathamesh Deshpande wrote:
> > In mlx5_pps_event(), several critical issues were identified during
> > review by Sashiko:
> >
> > 1. The 'pin' index from the hardware event was used without bounds
> > checking to index 'pin_config' and 'pps_info->start', leading to
> > potential out-of-bounds memory access.
> > 2. 'ptp_event' was not zero-initialized. Since it contains a union,
> > assigning a timestamp partially leaves the 'ts_raw' field with
> > uninitialized stack memory, which can leak kernel data or
> > corrupt time sync logic in hardpps().
> > 3. A NULL 'pin_config' could be dereferenced if initialization failed.
> > 4. 'clock->ptp' could be NULL if ptp_clock_register() failed.
> >
> > Fix these by zero-initializing the event struct, adding a bounds
> > check against n_pins, and adding appropriate NULL guards.
> >
> > Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
> > Suggested-by: Carolina Jubran <cjubran@nvidia.com>
> > Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> > ---
> > v3:
> > - Fix union corruption by using a local timestamp variable [Sashiko].
> > - Validate pin index against n_pins with WARN_ON_ONCE [Carolina].
> > - Remove redundant pin < 0 check and cleanup TODO comment.
> > v2:
> > - Zero-initialize ptp_event to prevent stack information leak [Sashiko].
> > - Add bounds check for hardware pin index to prevent OOB access [Sashiko].
> > - Add NULL guard for pin_config to handle initialization failures [Sashiko].
> > - Add NULL check for clock->ptp as originally intended.
> >
> > .../net/ethernet/mellanox/mlx5/core/lib/clock.c | 17 ++++++++++++-----
> > 1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> > index bd4e042077af..674dd048a6b8 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> > @@ -1164,16 +1164,22 @@ static int mlx5_pps_event(struct notifier_block *nb,
> > pps_nb);
> > struct mlx5_core_dev *mdev = clock_state->mdev;
> > struct mlx5_clock *clock = mdev->clock;
> > - struct ptp_clock_event ptp_event;
> > + struct ptp_clock_event ptp_event = {};
> > struct mlx5_eqe *eqe = data;
> > int pin = eqe->data.pps.pin;
> > unsigned long flags;
> > u64 ns;
> >
> > + if (!clock->ptp_info.pin_config)
> > + return NOTIFY_OK;
> > +
> > + if (WARN_ON_ONCE(pin >= clock->ptp_info.n_pins))
> > + return NOTIFY_OK;
>
>
> Sorry if my previous comment wasn't clear enough.
>
>
> The firmware will never report a pin higher than n_pins, thats not the
> concern
>
> here. if future hardware reports n_pins > 8, checking against n_pins
> would still
>
> allow OOB access on those arrays. The check should compare against
> MAX_PIN_NUM
>
> instead, since thats the actual hard limit of the driver's data
> structures. and if a new
>
> device supports more than 8 pins, the WARN_ON_ONCE would let us know we need
>
> to update the driver.
>
>
> Thanks,
>
> Carolina
>
--
Thanks and Regards,
Prathamesh Deshpande
^ permalink raw reply
* Re: [PATCH v2 net-next 2/5] net: phy: make mdio_device.c part of libphy
From: Stephen Boyd @ 2026-04-12 0:25 UTC (permalink / raw)
To: Andrew Lunn, Bjorn Andersson, David Miller, Eric Dumazet,
Heiner Kallweit, Jakub Kicinski, Michael Turquette,
Neil Armstrong, Paolo Abeni, Russell King - ARM Linux, Vinod Koul
Cc: netdev@vger.kernel.org, Philipp Zabel, linux-arm-msm, linux-clk,
linux-phy
In-Reply-To: <c6dbf9b3-3ca0-434b-ad3a-71fe602ab809@gmail.com>
Quoting Heiner Kallweit (2026-03-09 10:03:31)
> This patch
> - makes mdio_device.c part of libphy
> - makes mdio_device_(un)register_reset() static
> - moves mdiobus_(un)register_device() from mdio_bus.c to mdio_device.c,
> stops exporting both functions and makes them private to phylib
>
> This further decouples the MDIO consumer functionality from libphy.
>
> Note: This makes MDIO driver registration part of phylib, therefore
> adjust Kconfig dependencies where needed.
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
Acked-by: Stephen Boyd <sboyd@kernel.org>
^ permalink raw reply
* [PATCH net-next v2] r8169: Use napi_schedule_irqoff()
From: Matt Vollrath @ 2026-04-12 1:40 UTC (permalink / raw)
To: netdev
Cc: Matt Vollrath, edumazet, pabeni, hkallweit1, kuba, andrew+netdev,
nic_swsd
napi_schedule() masks hard interrupts while doing its work, which is
redundant when called from an interrupt handler where hard interrupts
are already masked. Use napi_schedule_irqoff() instead to bypass this
redundant masking. This is an optimization.
Tested on a Lenovo RTL8168h/8111h.
Signed-off-by: Matt Vollrath <tactii@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 791277e750ba..4c0ad0de3410 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4873,7 +4873,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
phy_mac_interrupt(tp->phydev);
rtl_irq_disable(tp);
- napi_schedule(&tp->napi);
+ napi_schedule_irqoff(&tp->napi);
out:
rtl_ack_events(tp, status);
--
2.43.0
Changes:
v2:
* CC the maintainers, make the CI board green
^ permalink raw reply related
* [PATCH] xfrm: fix memory leak in xfrm_add_policy()
From: Deepanshu Kartikey @ 2026-04-12 2:08 UTC (permalink / raw)
To: steffen.klassert, herbert, davem, edumazet, kuba, pabeni, horms
Cc: leon, netdev, linux-kernel, Deepanshu Kartikey,
syzbot+901d48e0b95aed4a2548
When xfrm_policy_insert() fails, the error path performs manual
cleanup by calling xfrm_dev_policy_free(), security_xfrm_policy_free()
and kfree() directly. This is incorrect because xfrm_policy_destroy()
already handles all of these, causing a memory leak detected by
kmemleak.
Replace the open-coded cleanup with xfrm_policy_destroy(), consistent
with the error handling in xfrm_policy_construct(). The walk.dead
flag must be set before calling xfrm_policy_destroy() as it requires
it via BUG_ON(!policy->walk.dead).
Fixes: 94b95dfaa814 ("xfrm: release all offloaded policy memory")
Reported-by: syzbot+901d48e0b95aed4a2548@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=901d48e0b95aed4a2548
Tested-by: syzbot+901d48e0b95aed4a2548@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
net/xfrm/xfrm_user.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d56450f61669..ae144d1e4a65 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2267,9 +2267,8 @@ static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err) {
xfrm_dev_policy_delete(xp);
- xfrm_dev_policy_free(xp);
- security_xfrm_policy_free(xp->security);
- kfree(xp);
+ xp->walk.dead = 1;
+ xfrm_policy_destroy(xp);
return err;
}
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Hugh Blemings @ 2026-04-12 2:32 UTC (permalink / raw)
To: Greg KH
Cc: Kuniyuki Iwashima, kuba, davem, edumazet, horms, linux-hams,
linux-kernel, netdev, pabeni, stable, workflows, yizhe
In-Reply-To: <2026041124-hyphen-circulate-34ae@gregkh>
On 11/4/2026 18:58, Greg KH wrote:
> On Sat, Apr 11, 2026 at 05:24:17PM +1000, Hugh Blemings wrote:
>> On 11/4/2026 15:50, Greg KH wrote:
>>> On Sat, Apr 11, 2026 at 08:25:19AM +1000, Hugh Blemings wrote:
>>>> On 11/4/2026 08:11, Kuniyuki Iwashima wrote:
>>>>> From: Jakub Kicinski <kuba@kernel.org>
>>>>> Date: Fri, 10 Apr 2026 14:54:48 -0700
>>>>>> On Fri, 10 Apr 2026 14:30:42 -0700 Jakub Kicinski wrote:
>>>>>>> On Fri, 10 Apr 2026 07:24:36 +0200 Greg Kroah-Hartman wrote:
>>>>>>>> On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:
>>>>>>>>> Or for simplicity we could also be testing against skb_headlen()
>>>>>>>>> since we don't expect any legit non-linear frames here? Dunno.
>>>>>>>> I'll be glad to change this either way, your call. Given that this is
>>>>>>>> an obsolete protocol that seems to only be a target for drive-by fuzzers
>>>>>>>> to attack, whatever the simplest thing to do to quiet them up I'll be
>>>>>>>> glad to implement.
>>>>>>>>
>>>>>>>> Or can we just delete this stuff entirely? :)
>>>>>>> Yes.
>>>>>>>
>>>>>>> My thinking is to delete hamradio, nfc, atm, caif.. [more to come]
>>>>>>> Create GH repos which provide them as OOT modules.
>>>>>>> Hopefully we can convince any existing users to switch to that.
>>>>>>>
>>>>>>> The only thing stopping me is the concern that this is just the softest
>>>>>>> target and the LLMs will find something else to focus on which we can't
>>>>>>> delete. I suspect any PCIe driver can be flooded with "aren't you
>>>>>>> trusting the HW to provide valid responses here?" bullshit.
>>>>>>>
>>>>>>> But hey, let's try. I'll post a patch nuking all of hamradio later
>>>>>>> today.
>>>>>> Well, either we "expunge" this code to OOT repos, or we mark it
>>>>>> as broken and tell everyone that we don't take security fixes
>>>>>> for anything that depends on BROKEN. I'd personally rather expunge.
>>>>> +1 for "expunge" to prevent LLM-based patch flood.
>>>>>
>>>>> IIRC, we did that recently for one driver only used by OpenWRT ?
>>>>>
>>>>>
>>>> If the main concern here is ongoing maintenance of these Ham Radio related
>>>> protocols/drivers, can we pause for a moment on anything as dramatic as
>>>> removing from the tree entirely ?
>>> Sure, but:
>>>
>>>> There is a good cohort of capable kernel folks that either are or were ham
>>>> radio operators who I believe, upon realising that things have got to this
>>>> point, will be happy to redouble efforts to ensure this code maintained and
>>>> tested to a satisfactory standard.
>>> We need this code to be maintained, because as is being shown, there are
>>> reported problems with it that will affect these devices/networks that
>>> you all are using. So all we need is a maintainer for this to be able
>>> to take reports that we get and fix things up as needed. I know you
>>> have that experience, want to come back to kernel development, we've
>>> missed you :)
>> That's most kind Greg, thank you, have missed all you cool kids too :)
>>
>> More seriously though - I'd be up for doing it, but I think there may be
>> others better placed than I who haven't yet realised we have this conundrum.
>> I'm nudging a few folks offline on this front.
> The main "conundrum" is, is that this protocol completly trusts the
> hardware to give the kernel the "correct" data. So if you trust the
> hardware to work properly, it will be fine, but as the fuzzing tools are
> finding, if the data from the hardware modems is a bit out-of-spec,
> "bad" things can happen.
>
> I don't know how well controlled the data is from these devices, if it's
> just a "pass through" from what they get off the "wire" or if the
> devices always ensure the protocol packets are sane before passing them
> off to the kernel. That's going to be something you all with the
> hardware is going to have to determine in order to keep this a working
> system over time. Especially given that this is a wireless protcol
> where you "have" to trust the remote end.
Thanks for the thoughts Greg - and ya, I guess on balance I come back to
being generally skeptical of both hardware and software to Do The Right
Thing (TM)
So bounds checking and the like seems prudent irrespective of whether
the kernel is getting the data from real hardware, software modems etc.
I've done some initial digging around that confirms my suspicion that
this in kernel code remains quite widely used, if somewhat out of view.
Accordingly I lean then towards working to get these various mitigations
in place with some revised patches etc. as needed and into the main tree.
Once this done I think that'll give me a good sense of whether I or
someone else is well positioned to keep the code maintained longer term
and thus justify it remaining in tree or not.
More to follow once I finish remembering this kernel thing!
Cheers,
Hugh
^ permalink raw reply
* [PATCH v2 0/3] bpf: fix sock_ops rtt_min OOB read and related guard issues
From: Werner Kasselman @ 2026-04-12 3:03 UTC (permalink / raw)
To: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko
Cc: John Fastabend, Lawrence Brakmo, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Werner Kasselman
Patch 3 fixes an out-of-bounds read in sock_ops_convert_ctx_access()
for the rtt_min context field. It is the only tcp_sock-backed field
that bypasses the is_locked_tcp_sock guard, so on request_sock-backed
sock_ops callbacks the converted BPF load reads past the end of a
tcp_request_sock.
Patches 1 and 2 are groundwork. Patch 1 fixes a pre-existing info
leak in SOCK_OPS_GET_FIELD() and SOCK_OPS_GET_SK() where dst_reg is
left holding the context pointer on the guard-failure branch when
dst_reg == src_reg, instead of being zeroed. Patch 2 extracts
SOCK_OPS_LOAD_TCP_SOCK_FIELD() from SOCK_OPS_GET_FIELD() so the
rtt_min sub-field access in patch 3 can reuse it.
Patches 1 and 3 carry Fixes: tags and Cc: stable. Patch 2 is a pure
refactor.
v1: https://lore.kernel.org/bpf/ (earlier single-patch posting)
- Inlined the guarded load sequence by hand.
- Feedback: please factor it through the existing helper instead
of open-coding 30 lines.
v2:
- Patch 1 (new): fix latent dst == src info leak in both macros.
- Patch 2 (new): refactor SOCK_OPS_GET_FIELD().
- Patch 3: use SOCK_OPS_LOAD_TCP_SOCK_FIELD() for rtt_min and use
offsetof(struct minmax_sample, v) for the sub-field offset.
Werner Kasselman (3):
bpf: zero dst_reg on sock_ops field guard failure when dst == src
bpf: extract SOCK_OPS_LOAD_TCP_SOCK_FIELD from SOCK_OPS_GET_FIELD
bpf: guard sock_ops rtt_min against non-locked tcp_sock
net/core/filter.c | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH v2 1/3] bpf: zero dst_reg on sock_ops field guard failure when dst == src
From: Werner Kasselman @ 2026-04-12 3:03 UTC (permalink / raw)
To: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko
Cc: John Fastabend, Lawrence Brakmo, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Werner Kasselman
In-Reply-To: <20260412030306.3469543-1-werner@verivus.com>
When a BPF_PROG_TYPE_SOCK_OPS program reads a tcp_sock-backed context
field (e.g. ctx->snd_ssthresh) or ctx->sk using the same register for
source and destination, SOCK_OPS_GET_FIELD() and SOCK_OPS_GET_SK()
load is_locked_tcp_sock/is_fullsock into a scratch register rather
than into dst_reg. On the guard-failure branch the macro only
restores the scratch register before falling through, leaving
dst_reg holding the unchanged context pointer.
Callers expect dst_reg to read as a scalar 0 when the guard fails.
Instead the BPF program sees a kernel heap address, which the
verifier has already typed as a scalar, giving a narrow kernel
pointer leak. Clang does not emit the dst == src pattern for normal
C ctx field reads, but it is reachable via inline asm and
hand-written BPF.
Add an explicit BPF_MOV64_IMM(dst_reg, 0) on the failure path in
both macros and bump the success-path BPF_JMP_A() to skip over it.
Found via AST-based call-graph analysis using sqry.
Fixes: fd09af010788 ("bpf: sock_ops ctx access may stomp registers in corner case")
Fixes: 84f44df664e9 ("bpf: sock_ops sk access may stomp registers when dst_reg = src_reg")
Cc: stable@vger.kernel.org
Signed-off-by: Werner Kasselman <werner@verivus.com>
---
net/core/filter.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 78b548158fb0..53ce06ed4a88 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10581,10 +10581,11 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
si->dst_reg, si->dst_reg, \
offsetof(OBJ, OBJ_FIELD)); \
if (si->dst_reg == si->src_reg) { \
- *insn++ = BPF_JMP_A(1); \
+ *insn++ = BPF_JMP_A(2); \
*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, \
temp)); \
+ *insn++ = BPF_MOV64_IMM(si->dst_reg, 0); \
} \
} while (0)
@@ -10618,10 +10619,11 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, sk));\
if (si->dst_reg == si->src_reg) { \
- *insn++ = BPF_JMP_A(1); \
+ *insn++ = BPF_JMP_A(2); \
*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, \
temp)); \
+ *insn++ = BPF_MOV64_IMM(si->dst_reg, 0); \
} \
} while (0)
--
2.43.0
^ permalink raw reply related
* [PATCH v2 2/3] bpf: extract SOCK_OPS_LOAD_TCP_SOCK_FIELD from SOCK_OPS_GET_FIELD
From: Werner Kasselman @ 2026-04-12 3:03 UTC (permalink / raw)
To: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko
Cc: John Fastabend, Lawrence Brakmo, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Werner Kasselman
In-Reply-To: <20260412030306.3469543-1-werner@verivus.com>
Split the tcp_sock field load sequence out of SOCK_OPS_GET_FIELD()
into SOCK_OPS_LOAD_TCP_SOCK_FIELD(FIELD_SIZE, FIELD_OFFSET) so it can
be reused for fields that are not direct struct members.
No functional change.
Signed-off-by: Werner Kasselman <werner@verivus.com>
---
net/core/filter.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 53ce06ed4a88..385fc3e9eb4a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10544,12 +10544,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
struct bpf_insn *insn = insn_buf;
int off;
-/* Helper macro for adding read access to tcp_sock or sock fields. */
-#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
+/* Helper macro for adding guarded read access to tcp_sock fields. */
+#define SOCK_OPS_LOAD_TCP_SOCK_FIELD(FIELD_SIZE, FIELD_OFFSET) \
do { \
int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2; \
- BUILD_BUG_ON(sizeof_field(OBJ, OBJ_FIELD) > \
- sizeof_field(struct bpf_sock_ops, BPF_FIELD)); \
if (si->dst_reg == reg || si->src_reg == reg) \
reg--; \
if (si->dst_reg == reg || si->src_reg == reg) \
@@ -10576,10 +10574,9 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
struct bpf_sock_ops_kern, sk),\
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, sk));\
- *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(OBJ, \
- OBJ_FIELD), \
+ *insn++ = BPF_LDX_MEM(FIELD_SIZE, \
si->dst_reg, si->dst_reg, \
- offsetof(OBJ, OBJ_FIELD)); \
+ FIELD_OFFSET); \
if (si->dst_reg == si->src_reg) { \
*insn++ = BPF_JMP_A(2); \
*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, \
@@ -10589,6 +10586,14 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
} \
} while (0)
+#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
+ do { \
+ BUILD_BUG_ON(sizeof_field(OBJ, OBJ_FIELD) > \
+ sizeof_field(struct bpf_sock_ops, BPF_FIELD)); \
+ SOCK_OPS_LOAD_TCP_SOCK_FIELD(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD),\
+ offsetof(OBJ, OBJ_FIELD)); \
+ } while (0)
+
#define SOCK_OPS_GET_SK() \
do { \
int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 1; \
--
2.43.0
^ permalink raw reply related
* [PATCH v2 3/3] bpf: guard sock_ops rtt_min against non-locked tcp_sock
From: Werner Kasselman @ 2026-04-12 3:03 UTC (permalink / raw)
To: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko
Cc: John Fastabend, Lawrence Brakmo, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Werner Kasselman
In-Reply-To: <20260412030306.3469543-1-werner@verivus.com>
sock_ops_convert_ctx_access() reads rtt_min without the
is_locked_tcp_sock guard used for every other tcp_sock field. On
request_sock-backed sock_ops callbacks, sk points at a
tcp_request_sock and the converted load reads past the end of the
allocation.
Use SOCK_OPS_LOAD_TCP_SOCK_FIELD() so the load is guarded, and
compute the offset via offsetof(struct minmax_sample, v).
Found via AST-based call-graph analysis using sqry.
Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
Cc: stable@vger.kernel.org
Signed-off-by: Werner Kasselman <werner@verivus.com>
---
net/core/filter.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 385fc3e9eb4a..88fa290caeaa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10836,14 +10836,12 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
sizeof(struct minmax));
BUILD_BUG_ON(sizeof(struct minmax) <
sizeof(struct minmax_sample));
+ BUILD_BUG_ON(offsetof(struct tcp_sock, rtt_min) +
+ offsetof(struct minmax_sample, v) > S16_MAX);
- *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
- struct bpf_sock_ops_kern, sk),
- si->dst_reg, si->src_reg,
- offsetof(struct bpf_sock_ops_kern, sk));
- *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
- offsetof(struct tcp_sock, rtt_min) +
- sizeof_field(struct minmax_sample, t));
+ off = offsetof(struct tcp_sock, rtt_min) +
+ offsetof(struct minmax_sample, v);
+ SOCK_OPS_LOAD_TCP_SOCK_FIELD(BPF_W, off);
break;
case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 00/10] enic: SR-IOV V2 admin channel and MBOX protocol
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat,
Breno Leitao
This series adds the admin channel infrastructure and mailbox (MBOX)
protocol needed for V2 SR-IOV support in the enic driver.
The V2 SR-IOV design uses a direct PF-VF communication channel built on
dedicated WQ/RQ/CQ hardware resources and an MSI-X interrupt.
Firmware capability and admin channel infrastructure (patches 1-4):
- Probe-time firmware feature check for V2 SR-IOV support
- Admin channel open/close, RQ buffer management, CQ service
with MSI-X interrupt and NAPI polling
MBOX protocol and VF enable (patches 5-10):
- MBOX message types, core send/receive, PF and VF handlers
- V2 SR-IOV enable wiring with admin channel setup
- V2 VF probe with admin channel and PF registration
This series depends on "enic: SR-IOV V2 resource discovery and VF
type detection" (Series 1), which has been accepted.
Depends-on: 20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9@cisco.com
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
Changes in v4:
- Fix reverse xmas tree variable ordering (patches 1, 6)
- Use kzalloc_obj instead of kzalloc with sizeof (patch 9)
- Add NULL check for pp allocation in V1 SR-IOV disable path (patch 9)
- Link to v3: https://lore.kernel.org/r/20260408-enic-sriov-v2-admin-channel-v2-v3-0-1d4999a03cec@cisco.com
Changes in v3:
- Use early-return pattern in enic_sriov_detect_vf_type to reduce
nesting (patch 1) [Breno Leitao]
- Link to v2: https://lore.kernel.org/r/20260408-enic-sriov-v2-admin-channel-v2-v2-0-d05dd3623fd3@cisco.com
Changes in v2:
- Fix lines exceeding 80 columns (patches 4, 6, 7, 8)
- Add __maybe_unused to enic_sriov_configure and enic_sriov_v2_enable;
.sriov_configure wiring deferred to a later series after devcmd
hardening is in place (patch 9)
- Guard probe-time auto-enable to skip V2 VFs (patch 9)
- Link to v1: https://lore.kernel.org/r/20260406-enic-sriov-v2-admin-channel-v2-v1-0-82cc47636a78@cisco.com
---
Satish Kharat (10):
enic: verify firmware supports V2 SR-IOV at probe time
enic: add admin channel open and close for SR-IOV
enic: add admin RQ buffer management
enic: add admin CQ service with MSI-X interrupt and NAPI polling
enic: define MBOX message types and header structures
enic: add MBOX core send and receive for admin channel
enic: add MBOX PF handlers for VF register and capability
enic: add MBOX VF handlers for capability, register and link state
enic: wire V2 SR-IOV enable with admin channel and MBOX
enic: add V2 VF probe with admin channel and PF registration
drivers/net/ethernet/cisco/enic/Makefile | 3 +-
drivers/net/ethernet/cisco/enic/enic.h | 29 +-
drivers/net/ethernet/cisco/enic/enic_admin.c | 511 ++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_admin.h | 27 ++
drivers/net/ethernet/cisco/enic/enic_main.c | 218 +++++++++-
drivers/net/ethernet/cisco/enic/enic_mbox.c | 546 ++++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_mbox.h | 87 ++++
drivers/net/ethernet/cisco/enic/enic_res.c | 4 +-
drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 11 +
drivers/net/ethernet/cisco/enic/vnic_enet.h | 4 +-
10 files changed, 1425 insertions(+), 15 deletions(-)
---
base-commit: 3e6ef4fb822c971b464d44910a1561b4e7f9efa7
change-id: 20260404-enic-sriov-v2-admin-channel-v2-c0aa3e988833
Best regards,
--
Satish Kharat <satishkh@cisco.com>
^ permalink raw reply
* [PATCH net-next v4 01/10] enic: verify firmware supports V2 SR-IOV at probe time
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat,
Breno Leitao
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
During PF probe, query the firmware get-supported-feature interface
to verify that the running firmware supports V2 SR-IOV. Firmware
version 5.3(4.72) and later report VIC_FEATURE_SRIOV via
CMD_GET_SUPP_FEATURE_VER. If the firmware does not support the
feature, set vf_type to ENIC_VF_TYPE_NONE and log a warning so the
admin knows a firmware upgrade is needed.
The VIC_FEATURE_SRIOV enum value (4) matches the firmware ABI. A
placeholder entry (VIC_FEATURE_PTP at position 3) is added to keep
the enum in sync with firmware's feature numbering.
Suggested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic_main.c | 21 ++++++++++++++++++++-
drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 2 ++
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index e7125b818087..53d68272d06a 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2641,8 +2641,10 @@ static void enic_iounmap(struct enic *enic)
static void enic_sriov_detect_vf_type(struct enic *enic)
{
struct pci_dev *pdev = enic->pdev;
- int pos;
+ u64 supported_versions, a1 = 0;
u16 vf_dev_id;
+ int pos;
+ int err;
if (enic_is_sriov_vf(enic) || enic_is_dynamic(enic))
return;
@@ -2669,6 +2671,23 @@ static void enic_sriov_detect_vf_type(struct enic *enic)
enic->vf_type = ENIC_VF_TYPE_NONE;
break;
}
+
+ if (enic->vf_type != ENIC_VF_TYPE_V2)
+ return;
+
+ /* A successful command means firmware recognizes
+ * VIC_FEATURE_SRIOV; supported_versions is available
+ * for sub-feature versioning in the future.
+ */
+ err = vnic_dev_get_supported_feature_ver(enic->vdev,
+ VIC_FEATURE_SRIOV,
+ &supported_versions,
+ &a1);
+ if (err) {
+ dev_warn(&pdev->dev,
+ "SR-IOV V2 not supported by current firmware. Upgrade to VIC FW 5.3(4.72) or higher.\n");
+ enic->vf_type = ENIC_VF_TYPE_NONE;
+ }
}
#endif
diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
index 605ef17f967e..7a4bce736105 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
@@ -734,6 +734,8 @@ enum vic_feature_t {
VIC_FEATURE_VXLAN,
VIC_FEATURE_RDMA,
VIC_FEATURE_VXLAN_PATCH,
+ VIC_FEATURE_PTP,
+ VIC_FEATURE_SRIOV,
VIC_FEATURE_MAX,
};
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 03/10] enic: add admin RQ buffer management
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
The admin receive queue needs pre-posted DMA buffers for incoming
mailbox messages from VFs. Each buffer is a kmalloc'd region mapped
for DMA (2048 bytes, sufficient for any MBOX message).
Add enic_admin_rq_fill() to post buffers at open time, and
enic_admin_rq_drain() to unmap and free them at close time.
Wire both into the admin channel open/close paths.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic_admin.c | 66 +++++++++++++++++++++++++++-
1 file changed, 64 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
index d1abe6a50095..a8fcd5f116d1 100644
--- a/drivers/net/ethernet/cisco/enic/enic_admin.c
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
@@ -3,6 +3,7 @@
#include <linux/kernel.h>
#include <linux/netdevice.h>
+#include <linux/dma-mapping.h>
#include "vnic_dev.h"
#include "vnic_wq.h"
@@ -23,10 +24,63 @@ static void enic_admin_wq_buf_clean(struct vnic_wq *wq,
{
}
-/* No-op: admin RQ buffer teardown is handled in enic_admin_channel_close */
static void enic_admin_rq_buf_clean(struct vnic_rq *rq,
struct vnic_rq_buf *buf)
{
+ struct enic *enic = vnic_dev_priv(rq->vdev);
+
+ if (!buf->os_buf)
+ return;
+
+ dma_unmap_single(&enic->pdev->dev, buf->dma_addr, buf->len,
+ DMA_FROM_DEVICE);
+ kfree(buf->os_buf);
+ buf->os_buf = NULL;
+}
+
+static int enic_admin_rq_post_one(struct enic *enic)
+{
+ struct vnic_rq *rq = &enic->admin_rq;
+ struct rq_enet_desc *desc;
+ dma_addr_t dma_addr;
+ void *buf;
+
+ buf = kmalloc(ENIC_ADMIN_BUF_SIZE, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ dma_addr = dma_map_single(&enic->pdev->dev, buf, ENIC_ADMIN_BUF_SIZE,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&enic->pdev->dev, dma_addr)) {
+ kfree(buf);
+ return -ENOMEM;
+ }
+
+ desc = vnic_rq_next_desc(rq);
+ rq_enet_desc_enc(desc, (u64)dma_addr | VNIC_PADDR_TARGET,
+ RQ_ENET_TYPE_ONLY_SOP, ENIC_ADMIN_BUF_SIZE);
+ vnic_rq_post(rq, buf, 0, dma_addr, ENIC_ADMIN_BUF_SIZE, 0);
+
+ return 0;
+}
+
+static int enic_admin_rq_fill(struct enic *enic)
+{
+ struct vnic_rq *rq = &enic->admin_rq;
+ int err;
+
+ while (vnic_rq_desc_avail(rq) > 0) {
+ err = enic_admin_rq_post_one(enic);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static void enic_admin_rq_drain(struct enic *enic)
+{
+ vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
}
static int enic_admin_qp_type_set(struct enic *enic, u32 enable)
@@ -138,6 +192,13 @@ int enic_admin_channel_open(struct enic *enic)
vnic_wq_enable(&enic->admin_wq);
vnic_rq_enable(&enic->admin_rq);
+ err = enic_admin_rq_fill(enic);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Failed to fill admin RQ buffers: %d\n", err);
+ goto disable_queues;
+ }
+
err = enic_admin_qp_type_set(enic, 1);
if (err) {
netdev_err(enic->netdev,
@@ -151,6 +212,7 @@ int enic_admin_channel_open(struct enic *enic)
vnic_wq_disable(&enic->admin_wq);
vnic_rq_disable(&enic->admin_rq);
enic_admin_qp_type_set(enic, 0);
+ enic_admin_rq_drain(enic);
enic_admin_free_resources(enic);
return err;
}
@@ -166,7 +228,7 @@ void enic_admin_channel_close(struct enic *enic)
enic_admin_qp_type_set(enic, 0);
vnic_wq_clean(&enic->admin_wq, enic_admin_wq_buf_clean);
- vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
+ enic_admin_rq_drain(enic);
vnic_cq_clean(&enic->admin_cq[0]);
vnic_cq_clean(&enic->admin_cq[1]);
vnic_intr_clean(&enic->admin_intr);
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 02/10] enic: add admin channel open and close for SR-IOV
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
The V2 SR-IOV design uses a dedicated admin channel (WQ/RQ/CQ/INTR
on separate BAR resources) for PF-VF mailbox communication rather
than firmware-proxied devcmds.
Introduce enic_admin_channel_open() and enic_admin_channel_close().
Open allocates and initialises the admin WQ, RQ, two CQs (one per
direction) and one SR-IOV interrupt, then issues CMD_QP_TYPE_SET to
tell firmware the queues are admin-type. Close reverses the sequence.
Add CMD_QP_TYPE_SET (97) and QP_TYPE_ADMIN/DATA defines to
vnic_devcmd.h.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/Makefile | 3 +-
drivers/net/ethernet/cisco/enic/enic_admin.c | 175 ++++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_admin.h | 15 +++
drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 9 ++
4 files changed, 201 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cisco/enic/Makefile b/drivers/net/ethernet/cisco/enic/Makefile
index a96b8332e6e2..7ae72fefc99a 100644
--- a/drivers/net/ethernet/cisco/enic/Makefile
+++ b/drivers/net/ethernet/cisco/enic/Makefile
@@ -3,5 +3,6 @@ obj-$(CONFIG_ENIC) := enic.o
enic-y := enic_main.o vnic_cq.o vnic_intr.o vnic_wq.o \
enic_res.o enic_dev.o enic_pp.o vnic_dev.o vnic_rq.o vnic_vic.o \
- enic_ethtool.o enic_api.o enic_clsf.o enic_rq.o enic_wq.o
+ enic_ethtool.o enic_api.o enic_clsf.o enic_rq.o enic_wq.o \
+ enic_admin.o
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
new file mode 100644
index 000000000000..d1abe6a50095
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2025 Cisco Systems, Inc. All rights reserved.
+
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+
+#include "vnic_dev.h"
+#include "vnic_wq.h"
+#include "vnic_rq.h"
+#include "vnic_cq.h"
+#include "vnic_intr.h"
+#include "vnic_resource.h"
+#include "vnic_devcmd.h"
+#include "enic.h"
+#include "enic_admin.h"
+#include "cq_desc.h"
+#include "wq_enet_desc.h"
+#include "rq_enet_desc.h"
+
+/* No-op: admin WQ buffers are freed inline after completion polling */
+static void enic_admin_wq_buf_clean(struct vnic_wq *wq,
+ struct vnic_wq_buf *buf)
+{
+}
+
+/* No-op: admin RQ buffer teardown is handled in enic_admin_channel_close */
+static void enic_admin_rq_buf_clean(struct vnic_rq *rq,
+ struct vnic_rq_buf *buf)
+{
+}
+
+static int enic_admin_qp_type_set(struct enic *enic, u32 enable)
+{
+ u64 a0 = QP_TYPE_ADMIN, a1 = enable;
+ int wait = 1000;
+ int err;
+
+ spin_lock_bh(&enic->devcmd_lock);
+ err = vnic_dev_cmd(enic->vdev, CMD_QP_TYPE_SET, &a0, &a1, wait);
+ spin_unlock_bh(&enic->devcmd_lock);
+
+ return err;
+}
+
+static int enic_admin_alloc_resources(struct enic *enic)
+{
+ int err;
+
+ err = vnic_wq_alloc_with_type(enic->vdev, &enic->admin_wq, 0,
+ ENIC_ADMIN_DESC_COUNT,
+ sizeof(struct wq_enet_desc),
+ RES_TYPE_ADMIN_WQ);
+ if (err)
+ return err;
+
+ err = vnic_rq_alloc_with_type(enic->vdev, &enic->admin_rq, 0,
+ ENIC_ADMIN_DESC_COUNT,
+ sizeof(struct rq_enet_desc),
+ RES_TYPE_ADMIN_RQ);
+ if (err)
+ goto free_wq;
+
+ err = vnic_cq_alloc_with_type(enic->vdev, &enic->admin_cq[0], 0,
+ ENIC_ADMIN_DESC_COUNT,
+ sizeof(struct cq_desc),
+ RES_TYPE_ADMIN_CQ);
+ if (err)
+ goto free_rq;
+
+ err = vnic_cq_alloc_with_type(enic->vdev, &enic->admin_cq[1], 1,
+ ENIC_ADMIN_DESC_COUNT,
+ 16 << enic->ext_cq,
+ RES_TYPE_ADMIN_CQ);
+ if (err)
+ goto free_cq0;
+
+ /* PFs have dedicated SRIOV_INTR resources for admin channel.
+ * VFs lack SRIOV_INTR; use a regular INTR_CTRL slot instead.
+ */
+ if (vnic_dev_get_res_count(enic->vdev, RES_TYPE_SRIOV_INTR) >= 1)
+ err = vnic_intr_alloc_with_type(enic->vdev,
+ &enic->admin_intr, 0,
+ RES_TYPE_SRIOV_INTR);
+ else
+ err = vnic_intr_alloc(enic->vdev, &enic->admin_intr,
+ enic->intr_count);
+ if (err)
+ goto free_cq1;
+
+ return 0;
+
+free_cq1:
+ vnic_cq_free(&enic->admin_cq[1]);
+free_cq0:
+ vnic_cq_free(&enic->admin_cq[0]);
+free_rq:
+ vnic_rq_free(&enic->admin_rq);
+free_wq:
+ vnic_wq_free(&enic->admin_wq);
+ return err;
+}
+
+static void enic_admin_free_resources(struct enic *enic)
+{
+ vnic_intr_free(&enic->admin_intr);
+ vnic_cq_free(&enic->admin_cq[1]);
+ vnic_cq_free(&enic->admin_cq[0]);
+ vnic_rq_free(&enic->admin_rq);
+ vnic_wq_free(&enic->admin_wq);
+}
+
+static void enic_admin_init_resources(struct enic *enic)
+{
+ vnic_wq_init(&enic->admin_wq, 0, 0, 0);
+ vnic_rq_init(&enic->admin_rq, 1, 0, 0);
+ vnic_cq_init(&enic->admin_cq[0], 0, 1, 0, 0, 1, 0, 1, 0, 0, 0);
+ vnic_cq_init(&enic->admin_cq[1], 0, 1, 0, 0, 1, 0, 1, 0, 0, 0);
+ vnic_intr_init(&enic->admin_intr, 0, 0, 1);
+}
+
+int enic_admin_channel_open(struct enic *enic)
+{
+ int err;
+
+ if (!enic->has_admin_channel)
+ return -ENODEV;
+
+ err = enic_admin_alloc_resources(enic);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Failed to alloc admin channel resources: %d\n",
+ err);
+ return err;
+ }
+
+ enic_admin_init_resources(enic);
+
+ vnic_wq_enable(&enic->admin_wq);
+ vnic_rq_enable(&enic->admin_rq);
+
+ err = enic_admin_qp_type_set(enic, 1);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Failed to set admin QP type: %d\n", err);
+ goto disable_queues;
+ }
+
+ return 0;
+
+disable_queues:
+ vnic_wq_disable(&enic->admin_wq);
+ vnic_rq_disable(&enic->admin_rq);
+ enic_admin_qp_type_set(enic, 0);
+ enic_admin_free_resources(enic);
+ return err;
+}
+
+void enic_admin_channel_close(struct enic *enic)
+{
+ if (!enic->has_admin_channel)
+ return;
+
+ vnic_wq_disable(&enic->admin_wq);
+ vnic_rq_disable(&enic->admin_rq);
+
+ enic_admin_qp_type_set(enic, 0);
+
+ vnic_wq_clean(&enic->admin_wq, enic_admin_wq_buf_clean);
+ vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
+ vnic_cq_clean(&enic->admin_cq[0]);
+ vnic_cq_clean(&enic->admin_cq[1]);
+ vnic_intr_clean(&enic->admin_intr);
+
+ enic_admin_free_resources(enic);
+}
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.h b/drivers/net/ethernet/cisco/enic/enic_admin.h
new file mode 100644
index 000000000000..569aadeb9312
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2025 Cisco Systems, Inc. All rights reserved. */
+
+#ifndef _ENIC_ADMIN_H_
+#define _ENIC_ADMIN_H_
+
+#define ENIC_ADMIN_DESC_COUNT 64
+#define ENIC_ADMIN_BUF_SIZE 2048
+
+struct enic;
+
+int enic_admin_channel_open(struct enic *enic);
+void enic_admin_channel_close(struct enic *enic);
+
+#endif /* _ENIC_ADMIN_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
index 7a4bce736105..a1c8f522c7d7 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
@@ -455,8 +455,17 @@ enum vnic_devcmd_cmd {
*/
CMD_CQ_ENTRY_SIZE_SET = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 90),
+ /*
+ * Set queue pair type (admin or data)
+ * in: (u32) a0 = queue pair type (0 = admin, 1 = data)
+ * in: (u32) a1 = enable (1) / disable (0)
+ */
+ CMD_QP_TYPE_SET = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 97),
};
+#define QP_TYPE_ADMIN 0
+#define QP_TYPE_DATA 1
+
/* CMD_ENABLE2 flags */
#define CMD_ENABLE2_STANDBY 0x0
#define CMD_ENABLE2_ACTIVE 0x1
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 04/10] enic: add admin CQ service with MSI-X interrupt and NAPI polling
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
Add completion queue service for the admin channel WQ and RQ, driven
by an MSI-X interrupt and NAPI polling.
The receive pipeline is: MSI-X ISR -> NAPI poll -> RQ CQ service ->
message enqueue -> workqueue handler -> admin_rq_handler callback.
NAPI drains the RQ CQ in softirq context, copying each received
buffer into an enic_admin_msg and appending it to a spinlock-protected
list. A system workqueue handler then processes each message in
process context where sleeping (mutex, GFP_KERNEL allocations) is
safe.
The WQ CQ service counts transmit completions and is called from the
synchronous MBOX send path.
RQ buffer allocation uses GFP_ATOMIC since enic_admin_rq_fill() is
called from NAPI context during CQ processing.
The admin channel open/close paths set up and tear down the MSI-X
interrupt, NAPI instance, and workqueue. CQ init enables interrupt
delivery and sets the interrupt offset so completions trigger the
admin ISR.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic.h | 8 +
drivers/net/ethernet/cisco/enic/enic_admin.c | 297 +++++++++++++++++++++++++--
drivers/net/ethernet/cisco/enic/enic_admin.h | 12 ++
3 files changed, 295 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 08472420f3a1..1c09da3c0b1a 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -296,6 +296,14 @@ struct enic {
struct vnic_rq admin_rq;
struct vnic_cq admin_cq[2];
struct vnic_intr admin_intr;
+ struct napi_struct admin_napi;
+ unsigned int admin_intr_index;
+ struct work_struct admin_msg_work;
+ spinlock_t admin_msg_lock; /* protects admin_msg_list */
+ struct list_head admin_msg_list;
+ u64 admin_msg_drop_cnt;
+ void (*admin_rq_handler)(struct enic *enic, void *buf,
+ unsigned int len);
};
static inline struct net_device *vnic_get_netdev(struct vnic_dev *vdev)
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
index a8fcd5f116d1..345d194c6eeb 100644
--- a/drivers/net/ethernet/cisco/enic/enic_admin.c
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
@@ -4,6 +4,7 @@
#include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/dma-mapping.h>
+#include <linux/interrupt.h>
#include "vnic_dev.h"
#include "vnic_wq.h"
@@ -15,6 +16,7 @@
#include "enic.h"
#include "enic_admin.h"
#include "cq_desc.h"
+#include "cq_enet_desc.h"
#include "wq_enet_desc.h"
#include "rq_enet_desc.h"
@@ -38,14 +40,14 @@ static void enic_admin_rq_buf_clean(struct vnic_rq *rq,
buf->os_buf = NULL;
}
-static int enic_admin_rq_post_one(struct enic *enic)
+static int enic_admin_rq_post_one(struct enic *enic, gfp_t gfp)
{
struct vnic_rq *rq = &enic->admin_rq;
struct rq_enet_desc *desc;
dma_addr_t dma_addr;
void *buf;
- buf = kmalloc(ENIC_ADMIN_BUF_SIZE, GFP_KERNEL);
+ buf = kmalloc(ENIC_ADMIN_BUF_SIZE, gfp);
if (!buf)
return -ENOMEM;
@@ -64,13 +66,13 @@ static int enic_admin_rq_post_one(struct enic *enic)
return 0;
}
-static int enic_admin_rq_fill(struct enic *enic)
+static int enic_admin_rq_fill(struct enic *enic, gfp_t gfp)
{
struct vnic_rq *rq = &enic->admin_rq;
int err;
while (vnic_rq_desc_avail(rq) > 0) {
- err = enic_admin_rq_post_one(enic);
+ err = enic_admin_rq_post_one(enic, gfp);
if (err)
return err;
}
@@ -83,6 +85,207 @@ static void enic_admin_rq_drain(struct enic *enic)
vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
}
+static unsigned int enic_admin_cq_color(void *cq_desc, unsigned int desc_size)
+{
+ u8 type_color = *((u8 *)cq_desc + desc_size - 1);
+
+ return (type_color >> CQ_DESC_COLOR_SHIFT) & CQ_DESC_COLOR_MASK;
+}
+
+unsigned int enic_admin_wq_cq_service(struct enic *enic)
+{
+ struct vnic_cq *cq = &enic->admin_cq[0];
+ unsigned int work = 0;
+ void *desc;
+
+ desc = vnic_cq_to_clean(cq);
+ while (enic_admin_cq_color(desc, cq->ring.desc_size) !=
+ cq->last_color) {
+ /* Ensure color bit is read before descriptor fields */
+ rmb();
+ vnic_cq_inc_to_clean(cq);
+ work++;
+ desc = vnic_cq_to_clean(cq);
+ }
+
+ return work;
+}
+
+static void enic_admin_msg_enqueue(struct enic *enic, void *buf,
+ unsigned int len)
+{
+ struct enic_admin_msg *msg;
+
+ msg = kmalloc(struct_size(msg, data, len), GFP_ATOMIC);
+ if (!msg) {
+ enic->admin_msg_drop_cnt++;
+ if (net_ratelimit())
+ netdev_warn(enic->netdev,
+ "admin msg enqueue drop (len=%u drops=%llu)\n",
+ len, enic->admin_msg_drop_cnt);
+ return;
+ }
+
+ msg->len = len;
+ memcpy(msg->data, buf, len);
+
+ spin_lock(&enic->admin_msg_lock);
+ list_add_tail(&msg->list, &enic->admin_msg_list);
+ spin_unlock(&enic->admin_msg_lock);
+}
+
+unsigned int enic_admin_rq_cq_service(struct enic *enic, unsigned int budget)
+{
+ struct vnic_cq *cq = &enic->admin_cq[1];
+ struct vnic_rq *rq = &enic->admin_rq;
+ struct vnic_rq_buf *buf;
+ unsigned int work = 0;
+ void *desc;
+
+ desc = vnic_cq_to_clean(cq);
+ while (work < budget &&
+ enic_admin_cq_color(desc, cq->ring.desc_size) !=
+ cq->last_color) {
+ /* Ensure CQ descriptor fields are read after
+ * the color/valid check.
+ */
+ rmb();
+ buf = rq->to_clean;
+
+ dma_sync_single_for_cpu(&enic->pdev->dev,
+ buf->dma_addr, buf->len,
+ DMA_FROM_DEVICE);
+
+ enic_admin_msg_enqueue(enic, buf->os_buf, buf->len);
+
+ enic_admin_rq_buf_clean(rq, rq->to_clean);
+ rq->to_clean = rq->to_clean->next;
+ rq->ring.desc_avail++;
+
+ vnic_cq_inc_to_clean(cq);
+ work++;
+ desc = vnic_cq_to_clean(cq);
+ }
+
+ enic_admin_rq_fill(enic, GFP_ATOMIC);
+
+ return work;
+}
+
+static irqreturn_t enic_admin_isr_msix(int irq, void *data)
+{
+ struct napi_struct *napi = data;
+
+ napi_schedule_irqoff(napi);
+
+ return IRQ_HANDLED;
+}
+
+static void enic_admin_msg_work_handler(struct work_struct *work)
+{
+ struct enic *enic = container_of(work, struct enic, admin_msg_work);
+ struct enic_admin_msg *msg, *tmp;
+ LIST_HEAD(local_list);
+
+ spin_lock_bh(&enic->admin_msg_lock);
+ list_splice_init(&enic->admin_msg_list, &local_list);
+ spin_unlock_bh(&enic->admin_msg_lock);
+
+ list_for_each_entry_safe(msg, tmp, &local_list, list) {
+ if (enic->admin_rq_handler)
+ enic->admin_rq_handler(enic, msg->data, msg->len);
+ list_del(&msg->list);
+ kfree(msg);
+ }
+}
+
+static int enic_admin_napi_poll(struct napi_struct *napi, int budget)
+{
+ struct enic *enic = container_of(napi, struct enic, admin_napi);
+ unsigned int credits;
+ unsigned int rq_work;
+
+ credits = vnic_intr_credits(&enic->admin_intr);
+
+ rq_work = enic_admin_rq_cq_service(enic, budget);
+
+ if (rq_work > 0)
+ schedule_work(&enic->admin_msg_work);
+
+ if (rq_work < budget && napi_complete_done(napi, rq_work)) {
+ if (credits)
+ vnic_intr_return_credits(&enic->admin_intr, credits,
+ 1 /* unmask */, 0);
+ } else {
+ if (credits)
+ vnic_intr_return_credits(&enic->admin_intr, credits,
+ 0 /* don't unmask */, 0);
+ }
+
+ return rq_work;
+}
+
+static int enic_admin_setup_intr(struct enic *enic)
+{
+ unsigned int intr_index = enic->intr_count;
+ int err;
+
+ if (vnic_dev_get_intr_mode(enic->vdev) != VNIC_DEV_INTR_MODE_MSIX ||
+ intr_index >= enic->intr_avail)
+ return -ENODEV;
+
+ err = vnic_intr_alloc(enic->vdev, &enic->admin_intr, intr_index);
+ if (err) {
+ netdev_warn(enic->netdev,
+ "Failed to alloc admin intr at index %u: %d\n",
+ intr_index, err);
+ return err;
+ }
+
+ enic->admin_intr_index = intr_index;
+
+ snprintf(enic->msix[intr_index].devname,
+ sizeof(enic->msix[intr_index].devname),
+ "%s-admin", enic->netdev->name);
+ enic->msix[intr_index].isr = enic_admin_isr_msix;
+ enic->msix[intr_index].devid = &enic->admin_napi;
+
+ err = request_irq(enic->msix_entry[intr_index].vector,
+ enic->msix[intr_index].isr, 0,
+ enic->msix[intr_index].devname,
+ enic->msix[intr_index].devid);
+ if (err) {
+ netdev_warn(enic->netdev,
+ "Failed to request admin MSI-X irq: %d\n", err);
+ vnic_intr_free(&enic->admin_intr);
+ return err;
+ }
+
+ enic->msix[intr_index].requested = 1;
+
+ netif_napi_add(enic->netdev, &enic->admin_napi,
+ enic_admin_napi_poll);
+ napi_enable(&enic->admin_napi);
+
+ netdev_dbg(enic->netdev,
+ "admin channel using MSI-X interrupt (index %u)\n",
+ intr_index);
+
+ return 0;
+}
+
+static void enic_admin_teardown_intr(struct enic *enic)
+{
+ unsigned int intr_index = enic->admin_intr_index;
+
+ napi_disable(&enic->admin_napi);
+ netif_napi_del(&enic->admin_napi);
+
+ free_irq(enic->msix_entry[intr_index].vector,
+ enic->msix[intr_index].devid);
+ enic->msix[intr_index].requested = 0;
+}
+
static int enic_admin_qp_type_set(struct enic *enic, u32 enable)
{
u64 a0 = QP_TYPE_ADMIN, a1 = enable;
@@ -128,23 +331,8 @@ static int enic_admin_alloc_resources(struct enic *enic)
if (err)
goto free_cq0;
- /* PFs have dedicated SRIOV_INTR resources for admin channel.
- * VFs lack SRIOV_INTR; use a regular INTR_CTRL slot instead.
- */
- if (vnic_dev_get_res_count(enic->vdev, RES_TYPE_SRIOV_INTR) >= 1)
- err = vnic_intr_alloc_with_type(enic->vdev,
- &enic->admin_intr, 0,
- RES_TYPE_SRIOV_INTR);
- else
- err = vnic_intr_alloc(enic->vdev, &enic->admin_intr,
- enic->intr_count);
- if (err)
- goto free_cq1;
-
return 0;
-free_cq1:
- vnic_cq_free(&enic->admin_cq[1]);
free_cq0:
vnic_cq_free(&enic->admin_cq[0]);
free_rq:
@@ -165,10 +353,32 @@ static void enic_admin_free_resources(struct enic *enic)
static void enic_admin_init_resources(struct enic *enic)
{
+ unsigned int intr_offset = enic->admin_intr_index;
+
vnic_wq_init(&enic->admin_wq, 0, 0, 0);
vnic_rq_init(&enic->admin_rq, 1, 0, 0);
- vnic_cq_init(&enic->admin_cq[0], 0, 1, 0, 0, 1, 0, 1, 0, 0, 0);
- vnic_cq_init(&enic->admin_cq[1], 0, 1, 0, 0, 1, 0, 1, 0, 0, 0);
+ vnic_cq_init(&enic->admin_cq[0],
+ 0 /* flow_control_enable */,
+ 1 /* color_enable */,
+ 0 /* cq_head */,
+ 0 /* cq_tail */,
+ 1 /* cq_tail_color */,
+ 1 /* interrupt_enable */,
+ 1 /* cq_entry_enable */,
+ 0 /* cq_message_enable */,
+ intr_offset,
+ 0 /* cq_message_addr */);
+ vnic_cq_init(&enic->admin_cq[1],
+ 0 /* flow_control_enable */,
+ 1 /* color_enable */,
+ 0 /* cq_head */,
+ 0 /* cq_tail */,
+ 1 /* cq_tail_color */,
+ 1 /* interrupt_enable */,
+ 1 /* cq_entry_enable */,
+ 0 /* cq_message_enable */,
+ intr_offset,
+ 0 /* cq_message_addr */);
vnic_intr_init(&enic->admin_intr, 0, 0, 1);
}
@@ -187,12 +397,24 @@ int enic_admin_channel_open(struct enic *enic)
return err;
}
+ err = enic_admin_setup_intr(enic);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Admin channel requires MSI-X, SR-IOV unavailable: %d\n",
+ err);
+ goto free_resources;
+ }
+
+ spin_lock_init(&enic->admin_msg_lock);
+ INIT_LIST_HEAD(&enic->admin_msg_list);
+ INIT_WORK(&enic->admin_msg_work, enic_admin_msg_work_handler);
+
enic_admin_init_resources(enic);
vnic_wq_enable(&enic->admin_wq);
vnic_rq_enable(&enic->admin_rq);
- err = enic_admin_rq_fill(enic);
+ err = enic_admin_rq_fill(enic, GFP_KERNEL);
if (err) {
netdev_err(enic->netdev,
"Failed to fill admin RQ buffers: %d\n", err);
@@ -206,22 +428,53 @@ int enic_admin_channel_open(struct enic *enic)
goto disable_queues;
}
+ vnic_intr_unmask(&enic->admin_intr);
+
+ netdev_dbg(enic->netdev,
+ "admin channel open: intr=%u wq_avail=%u rq_avail=%u cq0_color=%u cq1_color=%u\n",
+ enic->admin_intr_index,
+ vnic_wq_desc_avail(&enic->admin_wq),
+ vnic_rq_desc_avail(&enic->admin_rq),
+ enic->admin_cq[0].last_color,
+ enic->admin_cq[1].last_color);
+
return 0;
disable_queues:
+ enic_admin_teardown_intr(enic);
vnic_wq_disable(&enic->admin_wq);
vnic_rq_disable(&enic->admin_rq);
enic_admin_qp_type_set(enic, 0);
enic_admin_rq_drain(enic);
+free_resources:
enic_admin_free_resources(enic);
return err;
}
+static void enic_admin_msg_drain(struct enic *enic)
+{
+ struct enic_admin_msg *msg, *tmp;
+
+ spin_lock_bh(&enic->admin_msg_lock);
+ list_for_each_entry_safe(msg, tmp, &enic->admin_msg_list, list) {
+ list_del(&msg->list);
+ kfree(msg);
+ }
+ spin_unlock_bh(&enic->admin_msg_lock);
+}
+
void enic_admin_channel_close(struct enic *enic)
{
if (!enic->has_admin_channel)
return;
+ netdev_dbg(enic->netdev, "admin channel close\n");
+
+ vnic_intr_mask(&enic->admin_intr);
+ enic_admin_teardown_intr(enic);
+ cancel_work_sync(&enic->admin_msg_work);
+ enic_admin_msg_drain(enic);
+
vnic_wq_disable(&enic->admin_wq);
vnic_rq_disable(&enic->admin_rq);
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.h b/drivers/net/ethernet/cisco/enic/enic_admin.h
index 569aadeb9312..73cdd3dac7ec 100644
--- a/drivers/net/ethernet/cisco/enic/enic_admin.h
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.h
@@ -9,7 +9,19 @@
struct enic;
+/* Wrapper for received admin messages queued for deferred processing.
+ * NAPI enqueues these; a workqueue handler processes them in process context
+ * where sleeping (mutex, GFP_KERNEL) is safe.
+ */
+struct enic_admin_msg {
+ struct list_head list;
+ unsigned int len;
+ u8 data[];
+};
+
int enic_admin_channel_open(struct enic *enic);
void enic_admin_channel_close(struct enic *enic);
+unsigned int enic_admin_wq_cq_service(struct enic *enic);
+unsigned int enic_admin_rq_cq_service(struct enic *enic, unsigned int budget);
#endif /* _ENIC_ADMIN_H_ */
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 06/10] enic: add MBOX core send and receive for admin channel
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
Implement the mailbox protocol engine used for PF-VF communication
over the admin channel.
The send path (enic_mbox_send_msg) builds a message with a common
header, DMA-maps it, posts a single WQ descriptor with the
destination vnic ID encoded in the VLAN tag field, and polls
the WQ CQ for completion.
The receive path (enic_mbox_recv_handler) is installed as the admin
RQ callback and validates incoming message headers. PF/VF-specific
dispatch will be added in subsequent commits.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/Makefile | 2 +-
drivers/net/ethernet/cisco/enic/enic.h | 6 ++
drivers/net/ethernet/cisco/enic/enic_admin.c | 23 +++-
drivers/net/ethernet/cisco/enic/enic_mbox.c | 156 +++++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_mbox.h | 8 ++
5 files changed, 193 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/Makefile b/drivers/net/ethernet/cisco/enic/Makefile
index 7ae72fefc99a..e38aaf34c148 100644
--- a/drivers/net/ethernet/cisco/enic/Makefile
+++ b/drivers/net/ethernet/cisco/enic/Makefile
@@ -4,5 +4,5 @@ obj-$(CONFIG_ENIC) := enic.o
enic-y := enic_main.o vnic_cq.o vnic_intr.o vnic_wq.o \
enic_res.o enic_dev.o enic_pp.o vnic_dev.o vnic_rq.o vnic_vic.o \
enic_ethtool.o enic_api.o enic_clsf.o enic_rq.o enic_wq.o \
- enic_admin.o
+ enic_admin.o enic_mbox.o
diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 1c09da3c0b1a..42f345aceced 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -292,6 +292,8 @@ struct enic {
/* Admin channel resources for SR-IOV MBOX */
bool has_admin_channel;
+ /* set on send timeout; cleared on channel re-open */
+ bool mbox_send_disabled;
struct vnic_wq admin_wq;
struct vnic_rq admin_rq;
struct vnic_cq admin_cq[2];
@@ -304,6 +306,10 @@ struct enic {
u64 admin_msg_drop_cnt;
void (*admin_rq_handler)(struct enic *enic, void *buf,
unsigned int len);
+
+ /* MBOX protocol state */
+ struct mutex mbox_lock;
+ u64 mbox_msg_num;
};
static inline struct net_device *vnic_get_netdev(struct vnic_dev *vdev)
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
index 345d194c6eeb..c96268adc173 100644
--- a/drivers/net/ethernet/cisco/enic/enic_admin.c
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
@@ -19,6 +19,7 @@
#include "cq_enet_desc.h"
#include "wq_enet_desc.h"
#include "rq_enet_desc.h"
+#include "enic_mbox.h"
/* No-op: admin WQ buffers are freed inline after completion polling */
static void enic_admin_wq_buf_clean(struct vnic_wq *wq,
@@ -156,7 +157,26 @@ unsigned int enic_admin_rq_cq_service(struct enic *enic, unsigned int budget)
buf->dma_addr, buf->len,
DMA_FROM_DEVICE);
- enic_admin_msg_enqueue(enic, buf->os_buf, buf->len);
+ if (enic->admin_rq_handler) {
+ struct cq_enet_rq_desc *rq_desc = desc;
+ u16 sender_vlan;
+
+ /* Firmware sets the CQ VLAN field to identify the
+ * sender: 0 = PF, 1-based = VF index. Overwrite
+ * the untrusted src_vnic_id in the MBOX header with
+ * the hardware-verified value.
+ */
+ sender_vlan = le16_to_cpu(rq_desc->vlan);
+ if (buf->len >= sizeof(struct enic_mbox_hdr)) {
+ struct enic_mbox_hdr *hdr = buf->os_buf;
+
+ hdr->src_vnic_id = (sender_vlan == 0) ?
+ cpu_to_le16(ENIC_MBOX_DST_PF) :
+ cpu_to_le16(sender_vlan - 1);
+ }
+
+ enic_admin_msg_enqueue(enic, buf->os_buf, buf->len);
+ }
enic_admin_rq_buf_clean(rq, rq->to_clean);
rq->to_clean = rq->to_clean->next;
@@ -389,6 +409,7 @@ int enic_admin_channel_open(struct enic *enic)
if (!enic->has_admin_channel)
return -ENODEV;
+ enic->mbox_send_disabled = false;
err = enic_admin_alloc_resources(enic);
if (err) {
netdev_err(enic->netdev,
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.c b/drivers/net/ethernet/cisco/enic/enic_mbox.c
new file mode 100644
index 000000000000..d144c86d9ef8
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2025 Cisco Systems, Inc. All rights reserved.
+
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/dma-mapping.h>
+#include <linux/delay.h>
+
+#include "vnic_dev.h"
+#include "vnic_wq.h"
+#include "vnic_cq.h"
+#include "enic.h"
+#include "enic_admin.h"
+#include "enic_mbox.h"
+#include "wq_enet_desc.h"
+
+#define ENIC_MBOX_POLL_TIMEOUT_US 5000000
+#define ENIC_MBOX_POLL_INTERVAL_US 100
+
+static void enic_mbox_fill_hdr(struct enic *enic, struct enic_mbox_hdr *hdr,
+ u8 msg_type, u16 dst_vnic_id, u16 msg_len)
+{
+ memset(hdr, 0, sizeof(*hdr));
+ hdr->dst_vnic_id = cpu_to_le16(dst_vnic_id);
+ hdr->msg_type = msg_type;
+ hdr->msg_len = cpu_to_le16(msg_len);
+ hdr->msg_num = cpu_to_le64(++enic->mbox_msg_num);
+}
+
+int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
+ void *payload, u16 payload_len)
+{
+ u16 total_len = sizeof(struct enic_mbox_hdr) + payload_len;
+ struct vnic_wq *wq = &enic->admin_wq;
+ struct wq_enet_desc *desc;
+ unsigned long timeout;
+ dma_addr_t dma_addr;
+ u16 vlan_tag;
+ void *buf;
+ int err;
+
+ /* Serialize MBOX sends. The admin channel is a low-frequency
+ * control path; holding the mutex across the poll is acceptable.
+ */
+ mutex_lock(&enic->mbox_lock);
+
+ if (!enic->has_admin_channel || enic->mbox_send_disabled) {
+ err = -ENODEV;
+ goto unlock;
+ }
+
+ if (vnic_wq_desc_avail(wq) == 0) {
+ err = -ENOSPC;
+ goto unlock;
+ }
+
+ buf = kmalloc(total_len, GFP_KERNEL);
+ if (!buf) {
+ err = -ENOMEM;
+ goto unlock;
+ }
+
+ enic_mbox_fill_hdr(enic, buf, msg_type, dst_vnic_id, total_len);
+ if (payload_len) {
+ void *dst = buf + sizeof(struct enic_mbox_hdr);
+
+ memcpy(dst, payload, payload_len);
+ }
+
+ dma_addr = dma_map_single(&enic->pdev->dev, buf, total_len,
+ DMA_TO_DEVICE);
+ if (dma_mapping_error(&enic->pdev->dev, dma_addr)) {
+ kfree(buf);
+ err = -ENOMEM;
+ goto unlock;
+ }
+
+ /* Firmware uses vlan field for routing: 0 = PF, 1-based = VF index */
+ if (dst_vnic_id == ENIC_MBOX_DST_PF)
+ vlan_tag = 0;
+ else
+ vlan_tag = dst_vnic_id + 1;
+
+ desc = vnic_wq_next_desc(wq);
+ wq_enet_desc_enc(desc, (u64)dma_addr | VNIC_PADDR_TARGET,
+ total_len, 0, 0, 0, 1, 1, 0, 1, vlan_tag, 0);
+ vnic_wq_post(wq, buf, dma_addr, total_len, 1, 1, 1, 1, 0, 0);
+ vnic_wq_doorbell(wq);
+
+ timeout = jiffies + usecs_to_jiffies(ENIC_MBOX_POLL_TIMEOUT_US);
+ err = -ETIMEDOUT;
+ while (time_before(jiffies, timeout)) {
+ if (enic_admin_wq_cq_service(enic)) {
+ err = 0;
+ break;
+ }
+ usleep_range(ENIC_MBOX_POLL_INTERVAL_US,
+ ENIC_MBOX_POLL_INTERVAL_US + 50);
+ }
+
+ if (!err) {
+ wq->to_clean = wq->to_clean->next;
+ wq->ring.desc_avail++;
+ dma_unmap_single(&enic->pdev->dev, dma_addr, total_len,
+ DMA_TO_DEVICE);
+ kfree(buf);
+ } else {
+ netdev_err(enic->netdev,
+ "MBOX send timed out (type %u dst %u), disabling channel\n",
+ msg_type, dst_vnic_id);
+ /*
+ * The WQ descriptor is still live in hardware. Do not unmap
+ * or free the buffer: the device may still DMA from dma_addr.
+ * Mark the channel unusable so no further sends are attempted.
+ */
+ enic->mbox_send_disabled = true;
+ }
+
+ netdev_dbg(enic->netdev,
+ "MBOX send msg_type %u dst %u vlan %u err %d\n",
+ msg_type, dst_vnic_id, vlan_tag, err);
+unlock:
+ mutex_unlock(&enic->mbox_lock);
+ return err;
+}
+
+static void enic_mbox_recv_handler(struct enic *enic, void *buf,
+ unsigned int len)
+{
+ struct enic_mbox_hdr *hdr = buf;
+
+ if (len < sizeof(*hdr)) {
+ netdev_warn(enic->netdev,
+ "MBOX: truncated message (len %u < %zu)\n",
+ len, sizeof(*hdr));
+ return;
+ }
+
+ if (hdr->msg_type >= ENIC_MBOX_MAX) {
+ netdev_warn(enic->netdev, "MBOX: unknown msg type %u\n",
+ hdr->msg_type);
+ return;
+ }
+
+ netdev_dbg(enic->netdev,
+ "MBOX recv: type %u from vnic %u len %u\n",
+ hdr->msg_type, le16_to_cpu(hdr->src_vnic_id),
+ le16_to_cpu(hdr->msg_len));
+}
+
+void enic_mbox_init(struct enic *enic)
+{
+ enic->mbox_msg_num = 0;
+ mutex_init(&enic->mbox_lock);
+ enic->admin_rq_handler = enic_mbox_recv_handler;
+}
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.h b/drivers/net/ethernet/cisco/enic/enic_mbox.h
index 84cb6bbc1ead..554269b78780 100644
--- a/drivers/net/ethernet/cisco/enic/enic_mbox.h
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.h
@@ -72,4 +72,12 @@ struct enic_mbox_pf_link_state_ack_msg {
struct enic_mbox_generic_reply ack;
};
+#define ENIC_MBOX_DST_PF 0xFFFF
+
+struct enic;
+
+void enic_mbox_init(struct enic *enic);
+int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
+ void *payload, u16 payload_len);
+
#endif /* _ENIC_MBOX_H_ */
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 05/10] enic: define MBOX message types and header structures
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
Define the mailbox protocol used for PF-VF communication over the
admin channel. The protocol uses request/reply pairs where even
message types are requests and odd are replies.
Initial message types cover the core SR-IOV handshake:
- VF_CAPABILITY: version negotiation
- VF_REGISTER/UNREGISTER: VF lifecycle management
- PF_LINK_STATE_NOTIF: PF-initiated link state changes
Each message carries a common header (src/dst vnic ID, type,
length, sequence number) followed by a type-specific payload.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic_mbox.h | 75 +++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.h b/drivers/net/ethernet/cisco/enic/enic_mbox.h
new file mode 100644
index 000000000000..84cb6bbc1ead
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2025 Cisco Systems, Inc. All rights reserved. */
+
+#ifndef _ENIC_MBOX_H_
+#define _ENIC_MBOX_H_
+
+/*
+ * Mailbox protocol for PF-VF communication over the admin channel.
+ *
+ * Even numbers are requests, odd numbers are replies/acks.
+ * The prefix indicates the initiator: VF_ = VF-initiated, PF_ = PF-initiated.
+ */
+enum enic_mbox_msg_type {
+ ENIC_MBOX_VF_CAPABILITY_REQUEST = 0,
+ ENIC_MBOX_VF_CAPABILITY_REPLY = 1,
+ ENIC_MBOX_VF_REGISTER_REQUEST = 2,
+ ENIC_MBOX_VF_REGISTER_REPLY = 3,
+ ENIC_MBOX_VF_UNREGISTER_REQUEST = 4,
+ ENIC_MBOX_VF_UNREGISTER_REPLY = 5,
+ ENIC_MBOX_PF_LINK_STATE_NOTIF = 6,
+ ENIC_MBOX_PF_LINK_STATE_ACK = 7,
+ ENIC_MBOX_MAX
+};
+
+struct enic_mbox_hdr {
+ __le16 src_vnic_id;
+ __le16 dst_vnic_id;
+ u8 msg_type;
+ u8 flags;
+ __le16 msg_len;
+ __le64 msg_num;
+};
+
+struct enic_mbox_generic_reply {
+ __le16 ret_major;
+ __le16 ret_minor;
+};
+
+#define ENIC_MBOX_ERR_GENERIC BIT(0)
+#define ENIC_MBOX_ERR_VF_NOT_REGISTERED BIT(1)
+#define ENIC_MBOX_ERR_MSG_NOT_SUPPORTED BIT(2)
+
+/* ENIC_MBOX_VF_CAPABILITY_REQUEST / _REPLY */
+#define ENIC_MBOX_CAP_VERSION_0 0
+#define ENIC_MBOX_CAP_VERSION_1 1
+
+struct enic_mbox_vf_capability_msg {
+ __le32 version;
+ __le32 reserved[32];
+};
+
+struct enic_mbox_vf_capability_reply_msg {
+ struct enic_mbox_generic_reply reply;
+ __le32 version;
+ __le32 reserved[32];
+};
+
+/* ENIC_MBOX_VF_REGISTER / _UNREGISTER */
+struct enic_mbox_vf_register_reply_msg {
+ struct enic_mbox_generic_reply reply;
+};
+
+/* ENIC_MBOX_PF_LINK_STATE_NOTIF / _ACK */
+#define ENIC_MBOX_LINK_STATE_DISABLE 0
+#define ENIC_MBOX_LINK_STATE_ENABLE 1
+
+struct enic_mbox_pf_link_state_notif_msg {
+ __le32 link_state;
+};
+
+struct enic_mbox_pf_link_state_ack_msg {
+ struct enic_mbox_generic_reply ack;
+};
+
+#endif /* _ENIC_MBOX_H_ */
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 10/10] enic: add V2 VF probe with admin channel and PF registration
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
When a V2 SR-IOV VF probes, open the admin channel, initialize the
MBOX protocol, perform the capability check with the PF, and register
with the PF. This establishes the PF-VF communication path that the PF
uses to send link state notifications.
The admin channel and MBOX registration happen after enic_dev_init()
(which discovers admin channel resources) and before register_netdev()
so the VF is fully initialized before the interface is visible to
userspace.
On remove, the VF unregisters from the PF and closes its admin channel
before tearing down data path resources.
V2 VFs are not provisioned with an RES_TYPE_SRIOV_INTR resource by
firmware, so bypass that check in the admin channel capability
detection for V2 VFs. The PF still requires this resource.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic.h | 1 +
drivers/net/ethernet/cisco/enic/enic_main.c | 58 ++++++++++++++++++++++++++++-
drivers/net/ethernet/cisco/enic/enic_res.c | 3 +-
3 files changed, 59 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 29ce26284493..6301930903ee 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -441,6 +441,7 @@ void enic_reset_addr_lists(struct enic *enic);
int enic_sriov_enabled(struct enic *enic);
int enic_is_valid_vf(struct enic *enic, int vf);
int enic_is_dynamic(struct enic *enic);
+int enic_is_sriov_vf_v2(struct enic *enic);
void enic_set_ethtool_ops(struct net_device *netdev);
int __enic_set_rsskey(struct enic *enic);
void enic_ext_cq(struct enic *enic);
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 057716ccc283..bf4417e67b16 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -316,6 +316,11 @@ static int enic_is_sriov_vf(struct enic *enic)
enic->pdev->device == PCI_DEVICE_ID_CISCO_VIC_ENET_VF_V2;
}
+int enic_is_sriov_vf_v2(struct enic *enic)
+{
+ return enic->pdev->device == PCI_DEVICE_ID_CISCO_VIC_ENET_VF_V2;
+}
+
int enic_is_valid_vf(struct enic *enic, int vf)
{
#ifdef CONFIG_PCI_IOV
@@ -2992,6 +2997,32 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_out_dev_close;
}
+ /* V2 VF: open admin channel and register with PF.
+ * Must happen before register_netdev so the VF is fully
+ * initialized before the interface is visible to userspace.
+ */
+ if (enic_is_sriov_vf_v2(enic)) {
+ err = enic_admin_channel_open(enic);
+ if (err) {
+ dev_err(dev,
+ "Failed to open admin channel: %d\n", err);
+ goto err_out_dev_deinit;
+ }
+ enic_mbox_init(enic);
+ err = enic_mbox_vf_capability_check(enic);
+ if (err) {
+ dev_err(dev,
+ "MBOX capability check failed: %d\n", err);
+ goto err_out_admin_close;
+ }
+ err = enic_mbox_vf_register(enic);
+ if (err) {
+ dev_err(dev,
+ "MBOX VF registration failed: %d\n", err);
+ goto err_out_admin_close;
+ }
+ }
+
netif_set_real_num_tx_queues(netdev, enic->wq_count);
netif_set_real_num_rx_queues(netdev, enic->rq_count);
@@ -3016,7 +3047,7 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
err = enic_set_mac_addr(netdev, enic->mac_addr);
if (err) {
dev_err(dev, "Invalid MAC address, aborting\n");
- goto err_out_dev_deinit;
+ goto err_out_admin_close;
}
enic->tx_coalesce_usecs = enic->config.intr_timer_usec;
@@ -3114,11 +3145,23 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
err = register_netdev(netdev);
if (err) {
dev_err(dev, "Cannot register net device, aborting\n");
- goto err_out_dev_deinit;
+ goto err_out_admin_close;
}
return 0;
+err_out_admin_close:
+ if (enic_is_sriov_vf_v2(enic)) {
+ if (enic->vf_registered) {
+ int unreg_err = enic_mbox_vf_unregister(enic);
+
+ if (unreg_err)
+ netdev_warn(netdev,
+ "Failed to unregister from PF: %d\n",
+ unreg_err);
+ }
+ enic_admin_channel_close(enic);
+ }
err_out_dev_deinit:
enic_dev_deinit(enic);
err_out_dev_close:
@@ -3156,6 +3199,17 @@ static void enic_remove(struct pci_dev *pdev)
cancel_work_sync(&enic->reset);
cancel_work_sync(&enic->change_mtu_work);
unregister_netdev(netdev);
+ if (enic_is_sriov_vf_v2(enic)) {
+ if (enic->vf_registered) {
+ int unreg_err = enic_mbox_vf_unregister(enic);
+
+ if (unreg_err)
+ netdev_warn(netdev,
+ "Failed to unregister from PF: %d\n",
+ unreg_err);
+ }
+ enic_admin_channel_close(enic);
+ }
#ifdef CONFIG_PCI_IOV
if (enic_sriov_enabled(enic)) {
if (enic->vf_type == ENIC_VF_TYPE_V2)
diff --git a/drivers/net/ethernet/cisco/enic/enic_res.c b/drivers/net/ethernet/cisco/enic/enic_res.c
index 436326ace049..74cd2ee3af5c 100644
--- a/drivers/net/ethernet/cisco/enic/enic_res.c
+++ b/drivers/net/ethernet/cisco/enic/enic_res.c
@@ -211,7 +211,8 @@ void enic_get_res_counts(struct enic *enic)
vnic_dev_get_res_count(enic->vdev, RES_TYPE_ADMIN_RQ) >= 1 &&
vnic_dev_get_res_count(enic->vdev, RES_TYPE_ADMIN_CQ) >=
ARRAY_SIZE(enic->admin_cq) &&
- vnic_dev_get_res_count(enic->vdev, RES_TYPE_SRIOV_INTR) >= 1;
+ (enic_is_sriov_vf_v2(enic) ||
+ vnic_dev_get_res_count(enic->vdev, RES_TYPE_SRIOV_INTR) >= 1);
dev_info(enic_get_dev(enic),
"vNIC resources avail: wq %d rq %d cq %d intr %d admin %s\n",
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 08/10] enic: add MBOX VF handlers for capability, register and link state
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
Implement VF-side mailbox message processing for SR-IOV V2
admin channel communication.
VF receive handlers:
- VF_CAPABILITY_REPLY: store PF protocol version, signal
completion
- VF_REGISTER_REPLY: mark VF as registered, signal completion
- VF_UNREGISTER_REPLY: mark VF as unregistered, signal
completion
- PF_LINK_STATE_NOTIF: update carrier state via
netif_carrier_on/off, send ACK back to PF
VF initiation functions for the probe-time handshake:
- enic_mbox_vf_capability_check: send capability request,
wait for PF reply via completion
- enic_mbox_vf_register: send register request, wait for
PF confirmation via completion
- enic_mbox_vf_unregister: send unregister request, wait
for PF confirmation
The wait helper (enic_mbox_wait_reply) uses
wait_for_completion_timeout, signaled when the admin ISR/NAPI/
workqueue pipeline delivers the reply message.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic.h | 9 +-
drivers/net/ethernet/cisco/enic/enic_mbox.c | 220 ++++++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_mbox.h | 3 +
3 files changed, 231 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 9b1fa3857df5..29ce26284493 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -258,6 +258,8 @@ struct enic {
u32 tx_coalesce_usecs;
u16 num_vfs;
enum enic_vf_type vf_type;
+ bool vf_registered;
+ u32 pf_cap_version;
unsigned int enable_count;
spinlock_t enic_api_lock;
bool enic_api_busy;
@@ -305,9 +307,14 @@ struct enic {
void (*admin_rq_handler)(struct enic *enic, void *buf,
unsigned int len);
- /* MBOX protocol state */
+ /* MBOX protocol state -- single-flight: on the VF, all callers
+ * that wait on mbox_comp run under RTNL or during probe/remove,
+ * so only one completion is outstanding at a time. mbox_lock
+ * protects the shared admin WQ from concurrent senders.
+ */
struct mutex mbox_lock;
u64 mbox_msg_num;
+ struct completion mbox_comp;
/* PF: per-VF MBOX state, allocated when SRIOV V2 is enabled */
struct enic_vf_state {
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.c b/drivers/net/ethernet/cisco/enic/enic_mbox.c
index f5784624ebbd..b5ed31450ee7 100644
--- a/drivers/net/ethernet/cisco/enic/enic_mbox.c
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.c
@@ -5,6 +5,7 @@
#include <linux/netdevice.h>
#include <linux/dma-mapping.h>
#include <linux/delay.h>
+#include <linux/completion.h>
#include "vnic_dev.h"
#include "vnic_wq.h"
@@ -124,6 +125,16 @@ int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
return err;
}
+static int enic_mbox_wait_reply(struct enic *enic, unsigned long timeout_ms)
+{
+ unsigned long left;
+
+ left = wait_for_completion_timeout(&enic->mbox_comp,
+ msecs_to_jiffies(timeout_ms));
+
+ return left ? 0 : -ETIMEDOUT;
+}
+
int enic_mbox_send_link_state(struct enic *enic, u16 vf_id, u32 link_state)
{
struct enic_mbox_pf_link_state_notif_msg notif = {};
@@ -280,6 +291,136 @@ static void enic_mbox_pf_process_msg(struct enic *enic,
hdr->msg_type, vf_id, err);
}
+static void enic_mbox_vf_handle_capability_reply(struct enic *enic,
+ void *payload)
+{
+ struct enic_mbox_vf_capability_reply_msg *reply = payload;
+
+ if (le16_to_cpu(reply->reply.ret_major) == 0)
+ enic->pf_cap_version = le32_to_cpu(reply->version);
+ complete(&enic->mbox_comp);
+}
+
+static void enic_mbox_vf_handle_register_reply(struct enic *enic,
+ void *payload)
+{
+ struct enic_mbox_vf_register_reply_msg *reply = payload;
+
+ if (le16_to_cpu(reply->reply.ret_major)) {
+ netdev_warn(enic->netdev,
+ "MBOX: VF register rejected by PF: %u/%u\n",
+ le16_to_cpu(reply->reply.ret_major),
+ le16_to_cpu(reply->reply.ret_minor));
+ } else {
+ enic->vf_registered = true;
+ }
+ complete(&enic->mbox_comp);
+}
+
+static void enic_mbox_vf_handle_unregister_reply(struct enic *enic,
+ void *payload)
+{
+ struct enic_mbox_vf_register_reply_msg *reply = payload;
+
+ if (le16_to_cpu(reply->reply.ret_major)) {
+ netdev_warn(enic->netdev,
+ "MBOX: VF unregister rejected by PF: %u/%u\n",
+ le16_to_cpu(reply->reply.ret_major),
+ le16_to_cpu(reply->reply.ret_minor));
+ } else {
+ enic->vf_registered = false;
+ }
+ complete(&enic->mbox_comp);
+}
+
+static void enic_mbox_vf_handle_link_state(struct enic *enic, void *payload)
+{
+ struct enic_mbox_pf_link_state_notif_msg *notif = payload;
+ struct enic_mbox_pf_link_state_ack_msg ack = {};
+
+ switch (le32_to_cpu(notif->link_state)) {
+ case ENIC_MBOX_LINK_STATE_ENABLE:
+ if (!netif_carrier_ok(enic->netdev))
+ netif_carrier_on(enic->netdev);
+ netdev_dbg(enic->netdev, "MBOX: link state -> UP\n");
+ break;
+ case ENIC_MBOX_LINK_STATE_DISABLE:
+ if (netif_carrier_ok(enic->netdev))
+ netif_carrier_off(enic->netdev);
+ netdev_dbg(enic->netdev, "MBOX: link state -> DOWN\n");
+ break;
+ default:
+ netdev_warn(enic->netdev, "MBOX: unknown link state %u\n",
+ le32_to_cpu(notif->link_state));
+ ack.ack.ret_major = cpu_to_le16(ENIC_MBOX_ERR_GENERIC);
+ break;
+ }
+
+ enic_mbox_send_msg(enic, ENIC_MBOX_PF_LINK_STATE_ACK, ENIC_MBOX_DST_PF,
+ &ack, sizeof(ack));
+}
+
+static bool enic_mbox_vf_payload_ok(struct enic *enic, u8 msg_type,
+ u16 payload_len, size_t min_len)
+{
+ if (payload_len < min_len) {
+ netdev_warn(enic->netdev,
+ "MBOX: short payload for type %u (%u < %zu)\n",
+ msg_type, payload_len, min_len);
+ return false;
+ }
+ return true;
+}
+
+static void enic_mbox_vf_process_msg(struct enic *enic,
+ struct enic_mbox_hdr *hdr, void *payload,
+ u16 payload_len)
+{
+ switch (hdr->msg_type) {
+ case ENIC_MBOX_VF_CAPABILITY_REPLY: {
+ size_t exp = sizeof(struct enic_mbox_vf_capability_reply_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_capability_reply(enic, payload);
+ break;
+ }
+ case ENIC_MBOX_VF_REGISTER_REPLY: {
+ size_t exp = sizeof(struct enic_mbox_vf_register_reply_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_register_reply(enic, payload);
+ break;
+ }
+ case ENIC_MBOX_VF_UNREGISTER_REPLY: {
+ size_t exp = sizeof(struct enic_mbox_vf_register_reply_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_unregister_reply(enic, payload);
+ break;
+ }
+ case ENIC_MBOX_PF_LINK_STATE_NOTIF: {
+ size_t exp = sizeof(struct enic_mbox_pf_link_state_notif_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_link_state(enic, payload);
+ break;
+ }
+ default:
+ netdev_dbg(enic->netdev,
+ "MBOX: VF unhandled msg type %u\n",
+ hdr->msg_type);
+ break;
+ }
+}
+
static void enic_mbox_recv_handler(struct enic *enic, void *buf,
unsigned int len)
{
@@ -316,11 +457,90 @@ static void enic_mbox_recv_handler(struct enic *enic, void *buf,
if (enic->vf_state)
enic_mbox_pf_process_msg(enic, hdr, payload);
+ else
+ enic_mbox_vf_process_msg(enic, hdr, payload,
+ msg_len - (u16)sizeof(*hdr));
+}
+
+int enic_mbox_vf_capability_check(struct enic *enic)
+{
+ struct enic_mbox_vf_capability_msg req = {};
+ int err;
+
+ enic->pf_cap_version = 0;
+ reinit_completion(&enic->mbox_comp);
+ req.version = cpu_to_le32(ENIC_MBOX_CAP_VERSION_1);
+
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_CAPABILITY_REQUEST,
+ ENIC_MBOX_DST_PF, &req, sizeof(req));
+ if (err)
+ return err;
+
+ err = enic_mbox_wait_reply(enic, 3000);
+ if (err) {
+ netdev_warn(enic->netdev,
+ "MBOX: no capability reply from PF\n");
+ return err;
+ }
+
+ if (enic->pf_cap_version < ENIC_MBOX_CAP_VERSION_1) {
+ netdev_warn(enic->netdev,
+ "MBOX: PF version %u too old\n",
+ enic->pf_cap_version);
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+int enic_mbox_vf_register(struct enic *enic)
+{
+ int err;
+
+ enic->vf_registered = false;
+ reinit_completion(&enic->mbox_comp);
+
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_REGISTER_REQUEST,
+ ENIC_MBOX_DST_PF, NULL, 0);
+ if (err)
+ return err;
+
+ err = enic_mbox_wait_reply(enic, 3000);
+ if (err) {
+ netdev_warn(enic->netdev,
+ "MBOX: VF registration with PF timed out\n");
+ return err;
+ }
+
+ if (!enic->vf_registered)
+ return -ENODEV;
+
+ return 0;
+}
+
+int enic_mbox_vf_unregister(struct enic *enic)
+{
+ int err;
+
+ if (!enic->vf_registered)
+ return 0;
+
+ reinit_completion(&enic->mbox_comp);
+
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_UNREGISTER_REQUEST,
+ ENIC_MBOX_DST_PF, NULL, 0);
+ if (err)
+ return err;
+
+ err = enic_mbox_wait_reply(enic, 3000);
+
+ return enic->vf_registered ? -ETIMEDOUT : 0;
}
void enic_mbox_init(struct enic *enic)
{
enic->mbox_msg_num = 0;
mutex_init(&enic->mbox_lock);
+ init_completion(&enic->mbox_comp);
enic->admin_rq_handler = enic_mbox_recv_handler;
}
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.h b/drivers/net/ethernet/cisco/enic/enic_mbox.h
index a6f6798d14f4..fa2fb08bf7d0 100644
--- a/drivers/net/ethernet/cisco/enic/enic_mbox.h
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.h
@@ -80,5 +80,8 @@ void enic_mbox_init(struct enic *enic);
int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
void *payload, u16 payload_len);
int enic_mbox_send_link_state(struct enic *enic, u16 vf_id, u32 link_state);
+int enic_mbox_vf_capability_check(struct enic *enic);
+int enic_mbox_vf_register(struct enic *enic);
+int enic_mbox_vf_unregister(struct enic *enic);
#endif /* _ENIC_MBOX_H_ */
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 07/10] enic: add MBOX PF handlers for VF register and capability
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
Implement PF-side mailbox message processing for SR-IOV V2
admin channel communication.
When the PF receives messages from VFs, the dispatch routes
them to type-specific handlers:
- VF_CAPABILITY_REQUEST: reply with protocol version 1
- VF_REGISTER_REQUEST: mark VF registered, reply, then
send PF_LINK_STATE_NOTIF with link enabled
- VF_UNREGISTER_REQUEST: mark VF unregistered, send reply
- PF_LINK_STATE_ACK: log errors from VF acknowledgment
Per-VF state (struct enic_vf_state) is tracked via enic->vf_state
which will be allocated when SRIOV V2 is enabled.
Remove the CONFIG_PCI_IOV guard from num_vfs in struct enic. The
PF handlers reference enic->num_vfs for VF ID bounds checking in
enic_mbox.c, which is compiled unconditionally. The field must be
visible regardless of CONFIG_PCI_IOV to avoid build failures.
Add enic_mbox_send_link_state() helper for PF-initiated link
state notifications, also used later by ndo_set_vf_link_state.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic.h | 7 +-
drivers/net/ethernet/cisco/enic/enic_mbox.c | 174 +++++++++++++++++++++++++++-
drivers/net/ethernet/cisco/enic/enic_mbox.h | 1 +
3 files changed, 178 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 42f345aceced..9b1fa3857df5 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -256,9 +256,7 @@ struct enic {
struct enic_rx_coal rx_coalesce_setting;
u32 rx_coalesce_usecs;
u32 tx_coalesce_usecs;
-#ifdef CONFIG_PCI_IOV
u16 num_vfs;
-#endif
enum enic_vf_type vf_type;
unsigned int enable_count;
spinlock_t enic_api_lock;
@@ -310,6 +308,11 @@ struct enic {
/* MBOX protocol state */
struct mutex mbox_lock;
u64 mbox_msg_num;
+
+ /* PF: per-VF MBOX state, allocated when SRIOV V2 is enabled */
+ struct enic_vf_state {
+ bool registered;
+ } *vf_state;
};
static inline struct net_device *vnic_get_netdev(struct vnic_dev *vdev)
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.c b/drivers/net/ethernet/cisco/enic/enic_mbox.c
index d144c86d9ef8..f5784624ebbd 100644
--- a/drivers/net/ethernet/cisco/enic/enic_mbox.c
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.c
@@ -124,10 +124,168 @@ int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
return err;
}
+int enic_mbox_send_link_state(struct enic *enic, u16 vf_id, u32 link_state)
+{
+ struct enic_mbox_pf_link_state_notif_msg notif = {};
+
+ if (!enic->vf_state || vf_id >= enic->num_vfs ||
+ !enic->vf_state[vf_id].registered) {
+ netdev_dbg(enic->netdev,
+ "MBOX: skip link state to unregistered VF %u\n",
+ vf_id);
+ return 0;
+ }
+
+ notif.link_state = cpu_to_le32(link_state);
+ return enic_mbox_send_msg(enic, ENIC_MBOX_PF_LINK_STATE_NOTIF, vf_id,
+ ¬if, sizeof(notif));
+}
+
+static int enic_mbox_pf_handle_capability(struct enic *enic, void *msg,
+ u16 vf_id, u64 msg_num)
+{
+ struct enic_mbox_vf_capability_reply_msg reply = {};
+
+ reply.reply.ret_major = cpu_to_le16(0);
+ reply.version = cpu_to_le32(ENIC_MBOX_CAP_VERSION_1);
+
+ return enic_mbox_send_msg(enic, ENIC_MBOX_VF_CAPABILITY_REPLY, vf_id,
+ &reply, sizeof(reply));
+}
+
+static int enic_mbox_pf_handle_register(struct enic *enic, void *msg,
+ u16 vf_id, u64 msg_num)
+{
+ struct enic_mbox_vf_register_reply_msg reply = {};
+ int err;
+
+ if (!enic->vf_state || vf_id >= enic->num_vfs) {
+ netdev_warn(enic->netdev,
+ "MBOX: register from invalid VF %u\n", vf_id);
+ return -EINVAL;
+ }
+
+ /* VF re-registering (e.g. guest reboot without clean unregister):
+ * mark the previous registration inactive before accepting the new one.
+ */
+ if (enic->vf_state[vf_id].registered) {
+ netdev_dbg(enic->netdev,
+ "MBOX: VF %u re-register, cleaning previous state\n",
+ vf_id);
+ enic->vf_state[vf_id].registered = false;
+ }
+
+ reply.reply.ret_major = cpu_to_le16(0);
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_REGISTER_REPLY, vf_id,
+ &reply, sizeof(reply));
+ if (err)
+ return err;
+
+ enic->vf_state[vf_id].registered = true;
+ netdev_info(enic->netdev, "VF %u registered via MBOX\n", vf_id);
+
+ err = enic_mbox_send_link_state(enic, vf_id,
+ ENIC_MBOX_LINK_STATE_ENABLE);
+ if (err)
+ netdev_warn(enic->netdev,
+ "VF %u: failed to send initial link state: %d\n",
+ vf_id, err);
+ /* Registration succeeded; link state will be (re-)sent on next
+ * enic_link_check() event.
+ */
+ return 0;
+}
+
+static int enic_mbox_pf_handle_unregister(struct enic *enic, void *msg,
+ u16 vf_id, u64 msg_num)
+{
+ struct enic_mbox_vf_register_reply_msg reply = {};
+ int err;
+
+ if (!enic->vf_state || vf_id >= enic->num_vfs) {
+ netdev_warn(enic->netdev,
+ "MBOX: unregister from invalid VF %u\n", vf_id);
+ return -EINVAL;
+ }
+
+ reply.reply.ret_major = cpu_to_le16(0);
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_UNREGISTER_REPLY, vf_id,
+ &reply, sizeof(reply));
+ if (err)
+ return err;
+
+ enic->vf_state[vf_id].registered = false;
+
+ netdev_info(enic->netdev, "VF %u unregistered via MBOX\n", vf_id);
+
+ return 0;
+}
+
+static void enic_mbox_pf_process_msg(struct enic *enic,
+ struct enic_mbox_hdr *hdr, void *payload)
+{
+ u16 vf_id = le16_to_cpu(hdr->src_vnic_id);
+ u16 msg_len = le16_to_cpu(hdr->msg_len);
+ int err = 0;
+
+ if (!enic->vf_state) {
+ netdev_dbg(enic->netdev,
+ "MBOX: PF received msg but SRIOV not active\n");
+ return;
+ }
+
+ if (vf_id >= enic->num_vfs) {
+ netdev_warn(enic->netdev,
+ "MBOX: PF received msg from invalid VF %u\n",
+ vf_id);
+ return;
+ }
+
+ switch (hdr->msg_type) {
+ case ENIC_MBOX_VF_CAPABILITY_REQUEST:
+ err = enic_mbox_pf_handle_capability(enic, payload, vf_id,
+ le64_to_cpu(hdr->msg_num));
+ break;
+ case ENIC_MBOX_VF_REGISTER_REQUEST:
+ err = enic_mbox_pf_handle_register(enic, payload, vf_id,
+ le64_to_cpu(hdr->msg_num));
+ break;
+ case ENIC_MBOX_VF_UNREGISTER_REQUEST:
+ err = enic_mbox_pf_handle_unregister(enic, payload, vf_id,
+ le64_to_cpu(hdr->msg_num));
+ break;
+ case ENIC_MBOX_PF_LINK_STATE_ACK: {
+ struct enic_mbox_pf_link_state_ack_msg *ack = payload;
+
+ if (msg_len < sizeof(*hdr) + sizeof(*ack))
+ break;
+ if (le16_to_cpu(ack->ack.ret_major))
+ netdev_warn(enic->netdev,
+ "MBOX: VF %u link state ACK error %u/%u\n",
+ vf_id, le16_to_cpu(ack->ack.ret_major),
+ le16_to_cpu(ack->ack.ret_minor));
+ break;
+ }
+ default:
+ netdev_dbg(enic->netdev,
+ "MBOX: PF unhandled msg type %u from VF %u\n",
+ hdr->msg_type, vf_id);
+ err = -EOPNOTSUPP;
+ break;
+ }
+
+ if (err)
+ netdev_warn(enic->netdev,
+ "MBOX: PF handler for msg type %u from VF %u failed: %d\n",
+ hdr->msg_type, vf_id, err);
+}
+
static void enic_mbox_recv_handler(struct enic *enic, void *buf,
unsigned int len)
{
struct enic_mbox_hdr *hdr = buf;
+ void *payload;
+ u16 msg_len;
if (len < sizeof(*hdr)) {
netdev_warn(enic->netdev,
@@ -142,10 +300,22 @@ static void enic_mbox_recv_handler(struct enic *enic, void *buf,
return;
}
+ msg_len = le16_to_cpu(hdr->msg_len);
+ if (msg_len < sizeof(*hdr) || msg_len > len) {
+ netdev_warn(enic->netdev,
+ "MBOX: invalid msg_len %u (buf len %u)\n",
+ msg_len, len);
+ return;
+ }
+
netdev_dbg(enic->netdev,
"MBOX recv: type %u from vnic %u len %u\n",
- hdr->msg_type, le16_to_cpu(hdr->src_vnic_id),
- le16_to_cpu(hdr->msg_len));
+ hdr->msg_type, le16_to_cpu(hdr->src_vnic_id), msg_len);
+
+ payload = buf + sizeof(*hdr);
+
+ if (enic->vf_state)
+ enic_mbox_pf_process_msg(enic, hdr, payload);
}
void enic_mbox_init(struct enic *enic)
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.h b/drivers/net/ethernet/cisco/enic/enic_mbox.h
index 554269b78780..a6f6798d14f4 100644
--- a/drivers/net/ethernet/cisco/enic/enic_mbox.h
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.h
@@ -79,5 +79,6 @@ struct enic;
void enic_mbox_init(struct enic *enic);
int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
void *payload, u16 payload_len);
+int enic_mbox_send_link_state(struct enic *enic, u16 vf_id, u32 link_state);
#endif /* _ENIC_MBOX_H_ */
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v4 09/10] enic: wire V2 SR-IOV enable with admin channel and MBOX
From: Satish Kharat via B4 Relay @ 2026-04-12 5:06 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel,
20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Satish Kharat
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com>
From: Satish Kharat <satishkh@cisco.com>
Extend enic_sriov_configure() to handle V2 SR-IOV VFs. When the PF
detects V2 VF device IDs, the enable path allocates per-VF MBOX state,
opens the admin channel, initializes the MBOX protocol, and then calls
pci_enable_sriov(). The admin channel must be ready before VFs are
created so that VF drivers can immediately begin the MBOX capability
and registration handshake during their probe.
The disable path reverses this order: pci_disable_sriov() first (so VF
drivers unregister via MBOX), then the admin channel is closed and
per-VF state is freed.
The existing V1/USNIC SR-IOV paths are unchanged.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic_main.c | 139 ++++++++++++++++++++++++++--
drivers/net/ethernet/cisco/enic/enic_res.c | 1 +
drivers/net/ethernet/cisco/enic/vnic_enet.h | 4 +-
3 files changed, 136 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 53d68272d06a..057716ccc283 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -60,6 +60,8 @@
#include "enic_clsf.h"
#include "enic_rq.h"
#include "enic_wq.h"
+#include "enic_admin.h"
+#include "enic_mbox.h"
#define ENIC_NOTIFY_TIMER_PERIOD (2 * HZ)
@@ -2689,6 +2691,122 @@ static void enic_sriov_detect_vf_type(struct enic *enic)
enic->vf_type = ENIC_VF_TYPE_NONE;
}
}
+
+static int __maybe_unused
+enic_sriov_v2_enable(struct enic *enic, int num_vfs)
+{
+ int err;
+
+ if (!enic->has_admin_channel) {
+ netdev_err(enic->netdev,
+ "V2 SR-IOV requires admin channel resources\n");
+ return -EOPNOTSUPP;
+ }
+
+ enic->vf_state = kcalloc(num_vfs, sizeof(*enic->vf_state), GFP_KERNEL);
+ if (!enic->vf_state)
+ return -ENOMEM;
+
+ err = enic_admin_channel_open(enic);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Failed to open admin channel: %d\n", err);
+ goto free_vf_state;
+ }
+
+ enic_mbox_init(enic);
+
+ enic->num_vfs = num_vfs;
+
+ err = pci_enable_sriov(enic->pdev, num_vfs);
+ if (err) {
+ netdev_err(enic->netdev,
+ "pci_enable_sriov failed: %d\n", err);
+ goto close_admin;
+ }
+
+ enic->priv_flags |= ENIC_SRIOV_ENABLED;
+ return num_vfs;
+
+close_admin:
+ enic->num_vfs = 0;
+ enic_admin_channel_close(enic);
+free_vf_state:
+ kfree(enic->vf_state);
+ enic->vf_state = NULL;
+ return err;
+}
+
+static void enic_sriov_v2_disable(struct enic *enic)
+{
+ pci_disable_sriov(enic->pdev);
+ enic_admin_channel_close(enic);
+ kfree(enic->vf_state);
+ enic->vf_state = NULL;
+ enic->num_vfs = 0;
+ enic->priv_flags &= ~ENIC_SRIOV_ENABLED;
+}
+
+static int __maybe_unused
+enic_sriov_configure(struct pci_dev *pdev, int num_vfs)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct enic *enic = netdev_priv(netdev);
+ struct enic_port_profile *pp;
+ int err;
+
+ if (num_vfs > 0) {
+ if (enic->config.mq_subvnic_count) {
+ netdev_err(netdev,
+ "SR-IOV not supported with multi-queue sub-vnics\n");
+ return -EOPNOTSUPP;
+ }
+
+ if (enic->vf_type == ENIC_VF_TYPE_NONE) {
+ netdev_err(netdev,
+ "SR-IOV not supported on this firmware version\n");
+ return -EOPNOTSUPP;
+ }
+
+ if (enic->vf_type == ENIC_VF_TYPE_V2)
+ return enic_sriov_v2_enable(enic, num_vfs);
+
+ pp = kcalloc(num_vfs, sizeof(*pp), GFP_KERNEL);
+ if (!pp)
+ return -ENOMEM;
+
+ err = pci_enable_sriov(pdev, num_vfs);
+ if (err) {
+ kfree(pp);
+ return err;
+ }
+
+ kfree(enic->pp);
+ enic->pp = pp;
+ enic->num_vfs = num_vfs;
+ enic->priv_flags |= ENIC_SRIOV_ENABLED;
+ return num_vfs;
+ }
+
+ if (!enic_sriov_enabled(enic))
+ return 0;
+
+ if (enic->vf_type == ENIC_VF_TYPE_V2) {
+ enic_sriov_v2_disable(enic);
+ return 0;
+ }
+
+ pci_disable_sriov(pdev);
+ enic->num_vfs = 0;
+ enic->priv_flags &= ~ENIC_SRIOV_ENABLED;
+
+ kfree(enic->pp);
+ enic->pp = kzalloc_obj(*enic->pp, GFP_KERNEL);
+ if (!enic->pp)
+ return -ENOMEM;
+
+ return 0;
+}
#endif
static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
@@ -2787,12 +2905,18 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_out_vnic_unregister;
#ifdef CONFIG_PCI_IOV
- /* Get number of subvnics */
+ enic_sriov_detect_vf_type(enic);
+
+ /* Auto-enable SR-IOV if VFs were pre-configured (e.g. at boot).
+ * V2 VFs require the admin channel, which is not yet set up at probe
+ * time; use sysfs (enic_sriov_configure) to enable V2 SR-IOV instead.
+ */
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
if (pos) {
pci_read_config_word(pdev, pos + PCI_SRIOV_TOTAL_VF,
&enic->num_vfs);
- if (enic->num_vfs) {
+ if (enic->num_vfs &&
+ enic->vf_type != ENIC_VF_TYPE_V2) {
err = pci_enable_sriov(pdev, enic->num_vfs);
if (err) {
dev_err(dev, "SRIOV enable failed, aborting."
@@ -2804,7 +2928,6 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
num_pps = enic->num_vfs;
}
}
- enic_sriov_detect_vf_type(enic);
#endif
/* Allocate structure for port profiles */
@@ -3033,14 +3156,16 @@ static void enic_remove(struct pci_dev *pdev)
cancel_work_sync(&enic->reset);
cancel_work_sync(&enic->change_mtu_work);
unregister_netdev(netdev);
- enic_dev_deinit(enic);
- vnic_dev_close(enic->vdev);
#ifdef CONFIG_PCI_IOV
if (enic_sriov_enabled(enic)) {
- pci_disable_sriov(pdev);
- enic->priv_flags &= ~ENIC_SRIOV_ENABLED;
+ if (enic->vf_type == ENIC_VF_TYPE_V2)
+ enic_sriov_v2_disable(enic);
+ else
+ pci_disable_sriov(pdev);
}
#endif
+ enic_dev_deinit(enic);
+ vnic_dev_close(enic->vdev);
kfree(enic->pp);
vnic_dev_unregister(enic->vdev);
enic_iounmap(enic);
diff --git a/drivers/net/ethernet/cisco/enic/enic_res.c b/drivers/net/ethernet/cisco/enic/enic_res.c
index 2b7545d6a67f..436326ace049 100644
--- a/drivers/net/ethernet/cisco/enic/enic_res.c
+++ b/drivers/net/ethernet/cisco/enic/enic_res.c
@@ -59,6 +59,7 @@ int enic_get_vnic_config(struct enic *enic)
GET_CONFIG(intr_timer_usec);
GET_CONFIG(loop_tag);
GET_CONFIG(num_arfs);
+ GET_CONFIG(mq_subvnic_count);
GET_CONFIG(max_rq_ring);
GET_CONFIG(max_wq_ring);
GET_CONFIG(max_cq_ring);
diff --git a/drivers/net/ethernet/cisco/enic/vnic_enet.h b/drivers/net/ethernet/cisco/enic/vnic_enet.h
index 9e8e86262a3f..519d2969990b 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_enet.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_enet.h
@@ -21,7 +21,9 @@ struct vnic_enet_config {
u16 loop_tag;
u16 vf_rq_count;
u16 num_arfs;
- u8 reserved[66];
+ u8 reserved1[32];
+ u16 mq_subvnic_count;
+ u8 reserved2[32];
u32 max_rq_ring; // MAX RQ ring size
u32 max_wq_ring; // MAX WQ ring size
u32 max_cq_ring; // MAX CQ ring size
--
2.43.0
^ permalink raw reply related
* [PATCH net 1/1] ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()
From: Ren Wei @ 2026-04-12 5:07 UTC (permalink / raw)
To: netdev
Cc: steffen.klassert, herbert, davem, dsahern, edumazet, kuba, pabeni,
horms, sd, yifanwucs, tomapufckgml, yuantan098, bird, caoruide123,
zylzyl2333, n05ec
In-Reply-To: <cover.1775886482.git.zylzyl2333@gmail.com>
From: Yilin Zhu <zylzyl2333@gmail.com>
xfrm6_rcv_encap() performs an IPv6 route lookup when the skb does not
already have a dst attached. ip6_route_input_lookup() returns a
referenced dst entry even when the lookup resolves to an error route.
If dst->error is set, xfrm6_rcv_encap() drops the skb without attaching
the dst to the skb and without releasing the reference returned by the
lookup. Repeated packets hitting this path therefore leak dst entries.
Release the dst before jumping to the drop path.
Fixes: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP")
Cc: stable@kernel.org
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Ruide Cao <caoruide123@gmail.com>
Signed-off-by: Yilin Zhu <zylzyl2333@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
net/ipv6/xfrm6_protocol.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/xfrm6_protocol.c b/net/ipv6/xfrm6_protocol.c
index ea2f805d3b01..9b586fcec485 100644
--- a/net/ipv6/xfrm6_protocol.c
+++ b/net/ipv6/xfrm6_protocol.c
@@ -88,8 +88,10 @@ int xfrm6_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
dst = ip6_route_input_lookup(dev_net(skb->dev), skb->dev, &fl6,
skb, flags);
- if (dst->error)
+ if (dst->error) {
+ dst_release(dst);
goto drop;
+ }
skb_dst_set(skb, dst);
}
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox