* Re: [PATCH net-next 1/3] net: ipv4: report multicast group user count
From: Vadim Fedorenko @ 2026-07-01 20:54 UTC (permalink / raw)
To: Yuyang Huang
Cc: David S. Miller, Andrew Lunn, David Ahern, Donald Hunter,
Eric Dumazet, Ido Schimmel, Jakub Kicinski, Paolo Abeni,
Shuah Khan, Simon Horman, linux-kernel, linux-kselftest, netdev
In-Reply-To: <20260630110207.37841-2-sigefriedhyy@gmail.com>
On 30/06/2026 12:02, Yuyang Huang wrote:
> RTM_GETMULTICAST has been part of the rtnetlink ABI for a long time
> and already reports IPv4 multicast group membership through
> IFA_MULTICAST and IFA_CACHEINFO. It does not report how many consumers
> hold each membership, so userspace still has to parse /proc/net/igmp to
> get the Users column.
>
> Add IFA_MC_USERS as a u32 attribute carrying ip_mc_list::users in
> RTM_GETMULTICAST replies and entry-lifecycle notifications.
>
> This gives iproute2 enough information to migrate the IPv4 part of
> "ip maddr show" from procfs parsing to rtnetlink.
>
> Signed-off-by: Yuyang Huang <sigefriedhyy@gmail.com>
> ---
> Documentation/netlink/specs/rt-addr.yaml | 4 ++++
> include/uapi/linux/if_addr.h | 1 +
> net/ipv4/igmp.c | 2 ++
> 3 files changed, 7 insertions(+)
>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
^ permalink raw reply
* Re: [PATCH net-next 2/3] net: ipv6: report multicast group user count
From: Vadim Fedorenko @ 2026-07-01 20:55 UTC (permalink / raw)
To: Yuyang Huang
Cc: David S. Miller, Andrew Lunn, David Ahern, Donald Hunter,
Eric Dumazet, Ido Schimmel, Jakub Kicinski, Paolo Abeni,
Shuah Khan, Simon Horman, linux-kernel, linux-kselftest, netdev
In-Reply-To: <20260630110207.37841-3-sigefriedhyy@gmail.com>
On 30/06/2026 12:02, Yuyang Huang wrote:
> The previous patch added IFA_MC_USERS and emits it for IPv4 multicast
> groups. Add the same snapshot attribute to IPv6 RTM_GETMULTICAST
> replies and entry-lifecycle notifications, carrying
> ifmcaddr6::mca_users.
>
> This makes the multicast rtnetlink ABI symmetric across IPv4 and IPv6
> and gives userspace the same user count that /proc/net/igmp6 exposes.
>
> Signed-off-by: Yuyang Huang <sigefriedhyy@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
^ permalink raw reply
* Re: [PATCH net-next 3/3] selftests: net: check multicast group user count
From: Vadim Fedorenko @ 2026-07-01 20:55 UTC (permalink / raw)
To: Yuyang Huang
Cc: David S. Miller, Andrew Lunn, David Ahern, Donald Hunter,
Eric Dumazet, Ido Schimmel, Jakub Kicinski, Paolo Abeni,
Shuah Khan, Simon Horman, linux-kernel, linux-kselftest, netdev
In-Reply-To: <20260630110207.37841-4-sigefriedhyy@gmail.com>
On 30/06/2026 12:02, Yuyang Huang wrote:
> Extend the RTM_GETMULTICAST dump test to verify IFA_MC_USERS for both
> IPv4 and IPv6 multicast groups.
>
> Run each protocol test in a fresh network namespace to avoid changing
> host-network state or racing with unrelated multicast users. Join a
> fixed multicast group twice using separate sockets and check that the
> reported user count increases by two.
>
> Signed-off-by: Yuyang Huang <sigefriedhyy@gmail.com>
> ---
> tools/testing/selftests/net/rtnetlink.py | 101 ++++++++++++++++++++---
> 1 file changed, 90 insertions(+), 11 deletions(-)
>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
^ permalink raw reply
* Re: [REGRESSION][BISECTED] tun/tap & vhost-net: multi-threaded network performance
From: Michael S. Tsirkin @ 2026-07-01 20:56 UTC (permalink / raw)
To: Brett Sheffield
Cc: regressions, netdev, Jakub Kicinski, Simon Schippers, Tim Gebauer,
Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
Eric Dumazet, Paolo Abeni, linux-kernel
In-Reply-To: <akVnoOYQOrt8k-Gu@karahi.librecast.net>
On Wed, Jul 01, 2026 at 09:16:48PM +0200, Brett Sheffield wrote:
> TL;DR - Commit 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3 causes
> significant performance regressions with TAP interfaces and multithreaded
> network code. Please revert.
>
>
> Librecast is an IPv6 multicast library. One of the tests (0055) fails under
> Linux 7.2-rc1. The test performs data synchronization over IPv6 multicast using a TAP
> interface. This test has run successfully on every stable, LTS and mainline RC
> released in the past year. Every kernel with my Tested-by has run this test.
>
> There have been a bunch of changes to MLDv2 so I started bisecting there, but
> the culprit is actually 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3 "tun/tap &
> vhost-net: avoid ptr_ring tail-drop when a qdisc is present"
>
> Reverting this commit fixes the test.
>
> To eliminate my code and any multicast weirdness, I ran tests with iperf3
> comparing the same host running 7.2-rc1 both with and without 1d6e569b7d0
> reverted.
Thanks a lot for the bisect! Reverting is not out of question, but
just before we do, it is worth analyzing the situation.
Could you pls tell us
- do you see packet drops?
- does it help to increase the tun queue size?
Thanks a lot!
> CPU: AMD Ryzen 9 9950X
>
> [ host ] - [ bridge ] - [ tap ] - [ guest (qemu) ]
>
> Running matching kernels on host and guest, I started iperf3 in server mode on
> the guest and tested from the host so traffic passes through the tap interface.
>
> iperf3 -s -V # server
> iperf3 -c guest -P nthreads # client
>
> 7.2.0-rc1 (threads 1):
>
> [ 5] 0.00-10.00 sec 20.2 GBytes 17.4 Gbits/sec 0 sender
> [ 5] 0.00-10.00 sec 2.00 GBytes 1.72 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 1, reverted):
>
> [ 5] 0.00-10.00 sec 15.3 GBytes 13.1 Gbits/sec 368 sender
> [ 5] 0.00-10.00 sec 2.00 GBytes 1.72 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 2):
>
> [SUM] 0.00-10.00 sec 10.9 GBytes 9.33 Gbits/sec 0 sender
> [SUM] 0.00-10.00 sec 4.00 GBytes 3.43 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 2, reverted):
>
> [SUM] 0.00-10.00 sec 15.9 GBytes 13.7 Gbits/sec 1567 sender
> [SUM] 0.00-10.00 sec 4.00 GBytes 3.43 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 4):
>
> [SUM] 0.00-10.00 sec 10.9 GBytes 9.33 Gbits/sec 0 sender
> [SUM] 0.00-10.00 sec 8.00 GBytes 6.87 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 4, reverted):
>
> [SUM] 0.00-10.00 sec 16.5 GBytes 14.1 Gbits/sec 6701 sender
> [SUM] 0.00-10.00 sec 8.00 GBytes 6.87 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 8):
>
> [SUM] 0.00-10.00 sec 10.7 GBytes 9.15 Gbits/sec 0 sender
> [SUM] 0.00-10.01 sec 10.6 GBytes 9.13 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 8, reverted):
>
> [SUM] 0.00-10.00 sec 16.2 GBytes 14.0 Gbits/sec 19319 sender
> [SUM] 0.00-10.00 sec 15.7 GBytes 13.5 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 16):
>
> [SUM] 0.00-10.00 sec 10.9 GBytes 9.35 Gbits/sec 0 sender
> [SUM] 0.00-10.01 sec 10.9 GBytes 9.32 Gbits/sec receiver
>
> 7.2.0-rc1 (threads 16, reverted):
>
> [SUM] 0.00-10.00 sec 14.4 GBytes 12.4 Gbits/sec 43593 sender
> [SUM] 0.00-10.00 sec 14.4 GBytes 12.4 Gbits/sec receiver
>
>
> As you can see, the new code works for single threaded, but for all other cases
> there's a significant performance drop. I see this trade-off is mentioned in the
> commit, but the performance drop off is much worse than suggested with the
> current patch.
>
> In our multicast use case data is sent by multiple threads to multiple groups
> simultaneously, this just breaks things to the extent that a <2 second test
> times out after 5 minutes.
>
>
> git bisect start
> # status: waiting for both good and bad commits
> # bad: [dc59e4fea9d83f03bad6bddf3fa2e52491777482] Linux 7.2-rc1
> git bisect bad dc59e4fea9d83f03bad6bddf3fa2e52491777482
> # status: waiting for good commit(s), bad commit known
> # good: [36bdc0e815b4e8a05b9028d8ef8a25e1ead35cc1] net: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks
> git bisect good 36bdc0e815b4e8a05b9028d8ef8a25e1ead35cc1
> # good: [db314398f618a3a23315f73c87f7d318eaf06c1b] Merge branch 'net-bridge-mcast-support-exponential-field-encoding'
> git bisect good db314398f618a3a23315f73c87f7d318eaf06c1b
> # bad: [079a028d6327e68cfa5d38b36123637b321c19a7] string: Remove strncpy() from the kernel
> git bisect bad 079a028d6327e68cfa5d38b36123637b321c19a7
> # bad: [f396f4005180928cd9e15e352a6512865d3bc908] Bluetooth: btmtk: fix URB leak in alloc_mtk_intr_urb error path
> git bisect bad f396f4005180928cd9e15e352a6512865d3bc908
> # bad: [ec1806a730a1c0b3d68a7f9afe81514fb0dd7991] netfilter: x_tables: disable 32bit compat interface in user namespaces
> git bisect bad ec1806a730a1c0b3d68a7f9afe81514fb0dd7991
> # good: [50c2d91c5dfa0e465826ec1f8dbad9cdc254bd85] mptcp: do not drop partial packets
> git bisect good 50c2d91c5dfa0e465826ec1f8dbad9cdc254bd85
> # good: [68993ced0f618e36cf33388f1e50223e5e6e78cc] Merge tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> git bisect good 68993ced0f618e36cf33388f1e50223e5e6e78cc
> # good: [34c78dff59a25110a4ce50c208e42a91490fe615] Merge branch 'net-use-ip_outnoroutes-drop-reason'
> git bisect good 34c78dff59a25110a4ce50c208e42a91490fe615
> # bad: [9587ed8137fb83d93f84b858337412f4500b21e9] Merge branch 'gve-add-support-for-ptp-gettimex64'
> git bisect bad 9587ed8137fb83d93f84b858337412f4500b21e9
> # bad: [83ea7fd73b11dd8cbf4416507a5eac3890b49fb0] net: dsa: microchip: remove unused phylink_mac_link_up() callback
> git bisect bad 83ea7fd73b11dd8cbf4416507a5eac3890b49fb0
> # bad: [f0de88303d5e7e04a1224bc7a00512b5a1c4fe7a] net: make is_skb_wmem() available to modules
> git bisect bad f0de88303d5e7e04a1224bc7a00512b5a1c4fe7a
> # bad: [c411baa463e85a779a7e68a00ba6298770b58c4c] netconsole: move push_ipv6() from netpoll
> git bisect bad c411baa463e85a779a7e68a00ba6298770b58c4c
> # good: [fba362c17d9d9211fc51f272156bb84fc23bdf98] ptr_ring: move free-space check into separate helper
> git bisect good fba362c17d9d9211fc51f272156bb84fc23bdf98
> # bad: [d0273dbe8be1640e597552f81faf1d6c9997d3e3] ipvlan: use netif_receive_skb() in ipvlan_process_multicast()
> git bisect bad d0273dbe8be1640e597552f81faf1d6c9997d3e3
> # bad: [3803065cd6b0630d4161d86aa04e2d1db0f3a0b5] Merge branch 'tun-tap-vhost-net-apply-qdisc-backpressure-on-full-ptr_ring-to-reduce-tx-drops'
> git bisect bad 3803065cd6b0630d4161d86aa04e2d1db0f3a0b5
> # bad: [1d6e569b7d0c0b2736636749e4be0a27f3cefcb3] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
> git bisect bad 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3
> # first bad commit: [1d6e569b7d0c0b2736636749e4be0a27f3cefcb3] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
>
> --
> Brett Sheffield (he/him)
> Librecast - Decentralising the Internet with Multicast
> https://librecast.net/
> https://blog.brettsheffield.com/
^ permalink raw reply
* [PATCH net-next 00/11][pull request] Intel Wired LAN Driver Updates 2026-07-01 (igc, igb)
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev; +Cc: Tony Nguyen, horms
Kohei Enju adds ethtool support for get/set hash key on igc and adds
setting of skb hash type based on values from Rx descriptor on igb.
Takashi Kozu adds ethtool support for get/set hash key on igb.
Faizal adds support for forcing link speed via ethtool when
autonegotiation is disabled on the igc driver.
The following are changes since commit d6e81529749190123aa0040626c7e5dbc20fdc9a:
Merge branch 'net-fib_rules-rtnl-less-rtm_newrule-and-rtm_delrule'
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 1GbE
Faizal Rahim (4):
igc: remove unused autoneg_failed field
igc: move autoneg-enabled settings into igc_handle_autoneg_enabled()
igc: replace goto out with direct returns in
igc_config_fc_after_link_up()
igc: add support for forcing link speed without autonegotiation
Kohei Enju (4):
igc: prepare for RSS key get/set support
igc: expose RSS key via ethtool get_rxfh
igc: allow configuring RSS key via ethtool set_rxfh
igb: set skb hash type from RSS_TYPE
Takashi Kozu (3):
igb: prepare for RSS key get/set support
igb: expose RSS key via ethtool get_rxfh
igb: allow configuring RSS key via ethtool set_rxfh
drivers/net/ethernet/intel/igb/e1000_82575.h | 21 ++
drivers/net/ethernet/intel/igb/igb.h | 3 +
drivers/net/ethernet/intel/igb/igb_ethtool.c | 85 ++++--
drivers/net/ethernet/intel/igb/igb_main.c | 25 +-
drivers/net/ethernet/intel/igc/igc.h | 3 +
drivers/net/ethernet/intel/igc/igc_base.c | 35 ++-
drivers/net/ethernet/intel/igc/igc_defines.h | 9 +-
drivers/net/ethernet/intel/igc/igc_ethtool.c | 277 +++++++++++++------
drivers/net/ethernet/intel/igc/igc_hw.h | 10 +-
drivers/net/ethernet/intel/igc/igc_mac.c | 35 ++-
drivers/net/ethernet/intel/igc/igc_main.c | 10 +-
drivers/net/ethernet/intel/igc/igc_phy.c | 65 ++++-
drivers/net/ethernet/intel/igc/igc_phy.h | 1 +
13 files changed, 425 insertions(+), 154 deletions(-)
--
2.47.1
^ permalink raw reply
* [PATCH net-next 01/11] igc: prepare for RSS key get/set support
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Kohei Enju, anthony.l.nguyen, dima.ruinskiy, kohei.enju, horms,
Aleksandr Loktionov, Avigail Dahan
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Kohei Enju <kohei@enjuk.jp>
Store the RSS key inside struct igc_adapter and introduce the
igc_write_rss_key() helper function. This allows the driver to program
the RSSRK registers using a persistent RSS key, instead of using a
stack-local buffer in igc_setup_mrqc().
This is a preparation patch for adding RSS key get/set support in
subsequent changes, and no functional change is intended in this patch.
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc.h | 3 +++
drivers/net/ethernet/intel/igc/igc_ethtool.c | 20 ++++++++++++++++++++
drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++----
3 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 46d625b15f44..17f213cc93e4 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -30,6 +30,7 @@ void igc_ethtool_set_ops(struct net_device *);
#define MAX_ETYPE_FILTER 8
#define IGC_RETA_SIZE 128
+#define IGC_RSS_KEY_SIZE 40
/* SDP support */
#define IGC_N_EXTTS 2
@@ -302,6 +303,7 @@ struct igc_adapter {
unsigned int nfc_rule_count;
u8 rss_indir_tbl[IGC_RETA_SIZE];
+ u8 rss_key[IGC_RSS_KEY_SIZE];
unsigned long link_check_timeout;
struct igc_info ei;
@@ -361,6 +363,7 @@ unsigned int igc_get_max_rss_queues(struct igc_adapter *adapter);
void igc_set_flag_queue_pairs(struct igc_adapter *adapter,
const u32 max_rss_queues);
int igc_reinit_queues(struct igc_adapter *adapter);
+void igc_write_rss_key(struct igc_adapter *adapter);
void igc_write_rss_indir_tbl(struct igc_adapter *adapter);
bool igc_has_link(struct igc_adapter *adapter);
void igc_reset(struct igc_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 0122009bedd0..f01222f12776 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1460,6 +1460,26 @@ static int igc_ethtool_set_rxnfc(struct net_device *dev,
}
}
+/**
+ * igc_write_rss_key - Program the RSS key into device registers
+ * @adapter: board private structure
+ *
+ * Write the RSS key stored in adapter->rss_key to the IGC_RSSRK registers.
+ * Each 32-bit chunk of the key is read using get_unaligned_le32() and written
+ * to the appropriate register.
+ */
+void igc_write_rss_key(struct igc_adapter *adapter)
+{
+ struct igc_hw *hw = &adapter->hw;
+ u32 val;
+ int i;
+
+ for (i = 0; i < IGC_RSS_KEY_SIZE / 4; i++) {
+ val = get_unaligned_le32(&adapter->rss_key[i * 4]);
+ wr32(IGC_RSSRK(i), val);
+ }
+}
+
void igc_write_rss_indir_tbl(struct igc_adapter *adapter)
{
struct igc_hw *hw = &adapter->hw;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 2c9e2dfd8499..5ef229a5931f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -785,11 +785,8 @@ static void igc_setup_mrqc(struct igc_adapter *adapter)
struct igc_hw *hw = &adapter->hw;
u32 j, num_rx_queues;
u32 mrqc, rxcsum;
- u32 rss_key[10];
- netdev_rss_key_fill(rss_key, sizeof(rss_key));
- for (j = 0; j < 10; j++)
- wr32(IGC_RSSRK(j), rss_key[j]);
+ igc_write_rss_key(adapter);
num_rx_queues = adapter->rss_queues;
@@ -5048,6 +5045,9 @@ static int igc_sw_init(struct igc_adapter *adapter)
pci_read_config_word(pdev, PCI_COMMAND, &hw->bus.pci_cmd_word);
+ /* init RSS key */
+ netdev_rss_key_fill(adapter->rss_key, sizeof(adapter->rss_key));
+
/* set default ring sizes */
adapter->tx_ring_count = IGC_DEFAULT_TXD;
adapter->rx_ring_count = IGC_DEFAULT_RXD;
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 02/11] igc: expose RSS key via ethtool get_rxfh
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Kohei Enju, anthony.l.nguyen, dima.ruinskiy, kohei.enju, horms,
Avigail Dahan, Aleksandr Loktionov
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Kohei Enju <kohei@enjuk.jp>
Implement igc_ethtool_get_rxfh_key_size() and extend
igc_ethtool_get_rxfh() to return the RSS key to userspace.
This can be tested using `ethtool -x <dev>`.
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_ethtool.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index f01222f12776..0e76ffe7be65 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1502,6 +1502,11 @@ void igc_write_rss_indir_tbl(struct igc_adapter *adapter)
}
}
+static u32 igc_ethtool_get_rxfh_key_size(struct net_device *netdev)
+{
+ return IGC_RSS_KEY_SIZE;
+}
+
static u32 igc_ethtool_get_rxfh_indir_size(struct net_device *netdev)
{
return IGC_RETA_SIZE;
@@ -1514,10 +1519,13 @@ static int igc_ethtool_get_rxfh(struct net_device *netdev,
int i;
rxfh->hfunc = ETH_RSS_HASH_TOP;
- if (!rxfh->indir)
- return 0;
- for (i = 0; i < IGC_RETA_SIZE; i++)
- rxfh->indir[i] = adapter->rss_indir_tbl[i];
+
+ if (rxfh->indir)
+ for (i = 0; i < IGC_RETA_SIZE; i++)
+ rxfh->indir[i] = adapter->rss_indir_tbl[i];
+
+ if (rxfh->key)
+ memcpy(rxfh->key, adapter->rss_key, sizeof(adapter->rss_key));
return 0;
}
@@ -2195,6 +2203,7 @@ static const struct ethtool_ops igc_ethtool_ops = {
.get_rxnfc = igc_ethtool_get_rxnfc,
.set_rxnfc = igc_ethtool_set_rxnfc,
.get_rx_ring_count = igc_ethtool_get_rx_ring_count,
+ .get_rxfh_key_size = igc_ethtool_get_rxfh_key_size,
.get_rxfh_indir_size = igc_ethtool_get_rxfh_indir_size,
.get_rxfh = igc_ethtool_get_rxfh,
.set_rxfh = igc_ethtool_set_rxfh,
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 03/11] igc: allow configuring RSS key via ethtool set_rxfh
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Kohei Enju, anthony.l.nguyen, dima.ruinskiy, kohei.enju, horms,
Avigail Dahan
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Kohei Enju <kohei@enjuk.jp>
Change igc_ethtool_set_rxfh() to accept and save a userspace-provided
RSS key. When a key is provided, store it in the adapter and write the
RSSRK registers accordingly.
This can be tested using `ethtool -X <dev> hkey <key>`.
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_ethtool.c | 30 +++++++++++---------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 0e76ffe7be65..fbba3e84673a 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1539,24 +1539,28 @@ static int igc_ethtool_set_rxfh(struct net_device *netdev,
int i;
/* We do not allow change in unsupported parameters */
- if (rxfh->key ||
- (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
- rxfh->hfunc != ETH_RSS_HASH_TOP))
+ if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
+ rxfh->hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
- if (!rxfh->indir)
- return 0;
- num_queues = adapter->rss_queues;
+ if (rxfh->indir) {
+ num_queues = adapter->rss_queues;
- /* Verify user input. */
- for (i = 0; i < IGC_RETA_SIZE; i++)
- if (rxfh->indir[i] >= num_queues)
- return -EINVAL;
+ /* Verify user input. */
+ for (i = 0; i < IGC_RETA_SIZE; i++)
+ if (rxfh->indir[i] >= num_queues)
+ return -EINVAL;
- for (i = 0; i < IGC_RETA_SIZE; i++)
- adapter->rss_indir_tbl[i] = rxfh->indir[i];
+ for (i = 0; i < IGC_RETA_SIZE; i++)
+ adapter->rss_indir_tbl[i] = rxfh->indir[i];
- igc_write_rss_indir_tbl(adapter);
+ igc_write_rss_indir_tbl(adapter);
+ }
+
+ if (rxfh->key) {
+ memcpy(adapter->rss_key, rxfh->key, sizeof(adapter->rss_key));
+ igc_write_rss_key(adapter);
+ }
return 0;
}
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 04/11] igb: prepare for RSS key get/set support
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Takashi Kozu, anthony.l.nguyen, horms, enjuk, kohei.enju,
Piotr Kwapulinski, Aleksandr Loktionov, Rinitha S
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Takashi Kozu <takkozu@amazon.com>
Store the RSS key inside struct igb_adapter and introduce the
igb_write_rss_key() helper function. This allows the driver to program
the E1000 registers using a persistent RSS key, instead of using a
stack-local buffer in igb_setup_mrqc().
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Takashi Kozu <takkozu@amazon.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 3 +++
drivers/net/ethernet/intel/igb/igb_ethtool.c | 21 ++++++++++++++++++++
drivers/net/ethernet/intel/igb/igb_main.c | 8 ++++----
3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 0fff1df81b7b..8c9b02058cec 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -495,6 +495,7 @@ struct hwmon_buff {
#define IGB_N_PEROUT 2
#define IGB_N_SDP 4
#define IGB_RETA_SIZE 128
+#define IGB_RSS_KEY_SIZE 40
enum igb_filter_match_flags {
IGB_FILTER_FLAG_ETHER_TYPE = 0x1,
@@ -655,6 +656,7 @@ struct igb_adapter {
struct i2c_client *i2c_client;
u32 rss_indir_tbl_init;
u8 rss_indir_tbl[IGB_RETA_SIZE];
+ u8 rss_key[IGB_RSS_KEY_SIZE];
unsigned long link_check_timeout;
int copper_tries;
@@ -735,6 +737,7 @@ void igb_down(struct igb_adapter *);
void igb_reinit_locked(struct igb_adapter *);
void igb_reset(struct igb_adapter *);
int igb_reinit_queues(struct igb_adapter *);
+void igb_write_rss_key(struct igb_adapter *adapter);
void igb_write_rss_indir_tbl(struct igb_adapter *);
int igb_set_spd_dplx(struct igb_adapter *, u32, u8);
int igb_setup_tx_resources(struct igb_ring *);
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index f7938c1da835..9a105e59f432 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -3019,6 +3019,27 @@ static int igb_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
return ret;
}
+/**
+ * igb_write_rss_key - Program the RSS key into device registers
+ * @adapter: board private structure
+ *
+ * Write the RSS key stored in adapter->rss_key to the E1000 hardware registers.
+ * Each 32-bit chunk of the key is read using get_unaligned_le32() and written
+ * to the appropriate register.
+ */
+void igb_write_rss_key(struct igb_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+
+ ASSERT_RTNL();
+
+ for (int i = 0; i < IGB_RSS_KEY_SIZE / 4; i++) {
+ u32 val = get_unaligned_le32(&adapter->rss_key[i * 4]);
+
+ wr32(E1000_RSSRK(i), val);
+ }
+}
+
static int igb_get_eee(struct net_device *netdev, struct ethtool_keee *edata)
{
struct igb_adapter *adapter = netdev_priv(netdev);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a1e89a375744..b7d36dd0b8e4 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4048,6 +4048,9 @@ static int igb_sw_init(struct igb_adapter *adapter)
pci_read_config_word(pdev, PCI_COMMAND, &hw->bus.pci_cmd_word);
+ /* init RSS key */
+ netdev_rss_key_fill(adapter->rss_key, sizeof(adapter->rss_key));
+
/* set default ring sizes */
adapter->tx_ring_count = IGB_DEFAULT_TXD;
adapter->rx_ring_count = IGB_DEFAULT_RXD;
@@ -4522,11 +4525,8 @@ static void igb_setup_mrqc(struct igb_adapter *adapter)
struct e1000_hw *hw = &adapter->hw;
u32 mrqc, rxcsum;
u32 j, num_rx_queues;
- u32 rss_key[10];
- netdev_rss_key_fill(rss_key, sizeof(rss_key));
- for (j = 0; j < 10; j++)
- wr32(E1000_RSSRK(j), rss_key[j]);
+ igb_write_rss_key(adapter);
num_rx_queues = adapter->rss_queues;
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 05/11] igb: expose RSS key via ethtool get_rxfh
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Takashi Kozu, anthony.l.nguyen, horms, enjuk, kohei.enju,
Aleksandr Loktionov, Rinitha S
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Takashi Kozu <takkozu@amazon.com>
Implement igb_get_rxfh_key_size() and extend
igb_get_rxfh() to return the RSS key to userspace.
This can be tested using `ethtool -x <dev>`.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Takashi Kozu <takkozu@amazon.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igb/igb_ethtool.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 9a105e59f432..47fc620026a9 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -3297,10 +3297,12 @@ static int igb_get_rxfh(struct net_device *netdev,
int i;
rxfh->hfunc = ETH_RSS_HASH_TOP;
- if (!rxfh->indir)
- return 0;
- for (i = 0; i < IGB_RETA_SIZE; i++)
- rxfh->indir[i] = adapter->rss_indir_tbl[i];
+ if (rxfh->indir)
+ for (i = 0; i < IGB_RETA_SIZE; i++)
+ rxfh->indir[i] = adapter->rss_indir_tbl[i];
+
+ if (rxfh->key)
+ memcpy(rxfh->key, adapter->rss_key, sizeof(adapter->rss_key));
return 0;
}
@@ -3340,6 +3342,11 @@ void igb_write_rss_indir_tbl(struct igb_adapter *adapter)
}
}
+static u32 igb_get_rxfh_key_size(struct net_device *netdev)
+{
+ return IGB_RSS_KEY_SIZE;
+}
+
static int igb_set_rxfh(struct net_device *netdev,
struct ethtool_rxfh_param *rxfh,
struct netlink_ext_ack *extack)
@@ -3504,6 +3511,7 @@ static const struct ethtool_ops igb_ethtool_ops = {
.get_module_eeprom = igb_get_module_eeprom,
.get_rxfh_indir_size = igb_get_rxfh_indir_size,
.get_rxfh = igb_get_rxfh,
+ .get_rxfh_key_size = igb_get_rxfh_key_size,
.set_rxfh = igb_set_rxfh,
.get_rxfh_fields = igb_get_rxfh_fields,
.set_rxfh_fields = igb_set_rxfh_fields,
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 06/11] igb: allow configuring RSS key via ethtool set_rxfh
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Takashi Kozu, anthony.l.nguyen, horms, enjuk, kohei.enju,
Kohei Enju, Rinitha S
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Takashi Kozu <takkozu@amazon.com>
Change igb_set_rxfh() to accept and save a userspace-provided
RSS key. When a key is provided, store it in the adapter and write the
E1000 registers accordingly.
This can be tested using `ethtool -X <dev> hkey <key>`.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Takashi Kozu <takkozu@amazon.com>
Tested-by: Kohei Enju <kohei@enjuk.jp>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igb/igb_ethtool.c | 48 +++++++++++---------
1 file changed, 26 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 47fc620026a9..65014a54a6d1 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -3357,35 +3357,39 @@ static int igb_set_rxfh(struct net_device *netdev,
u32 num_queues;
/* We do not allow change in unsupported parameters */
- if (rxfh->key ||
- (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
- rxfh->hfunc != ETH_RSS_HASH_TOP))
+ if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
+ rxfh->hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
- if (!rxfh->indir)
- return 0;
- num_queues = adapter->rss_queues;
+ if (rxfh->indir) {
+ num_queues = adapter->rss_queues;
- switch (hw->mac.type) {
- case e1000_82576:
- /* 82576 supports 2 RSS queues for SR-IOV */
- if (adapter->vfs_allocated_count)
- num_queues = 2;
- break;
- default:
- break;
- }
+ switch (hw->mac.type) {
+ case e1000_82576:
+ /* 82576 supports 2 RSS queues for SR-IOV */
+ if (adapter->vfs_allocated_count)
+ num_queues = 2;
+ break;
+ default:
+ break;
+ }
- /* Verify user input. */
- for (i = 0; i < IGB_RETA_SIZE; i++)
- if (rxfh->indir[i] >= num_queues)
- return -EINVAL;
+ /* Verify user input. */
+ for (i = 0; i < IGB_RETA_SIZE; i++)
+ if (rxfh->indir[i] >= num_queues)
+ return -EINVAL;
- for (i = 0; i < IGB_RETA_SIZE; i++)
- adapter->rss_indir_tbl[i] = rxfh->indir[i];
+ for (i = 0; i < IGB_RETA_SIZE; i++)
+ adapter->rss_indir_tbl[i] = rxfh->indir[i];
+
+ igb_write_rss_indir_tbl(adapter);
+ }
- igb_write_rss_indir_tbl(adapter);
+ if (rxfh->key) {
+ memcpy(adapter->rss_key, rxfh->key, sizeof(adapter->rss_key));
+ igb_write_rss_key(adapter);
+ }
return 0;
}
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 07/11] igb: set skb hash type from RSS_TYPE
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Kohei Enju, anthony.l.nguyen, kohei.enju, horms,
Aleksandr Loktionov, Paul Menzel
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Kohei Enju <kohei@enjuk.jp>
igb always marks the RX hash as L3 regardless of RSS_TYPE in the
advanced descriptor, which may indicate L4 (TCP/UDP) hash. This can
trigger unnecessary SW hash recalculation and breaks toeplitz selftests.
Use RSS_TYPE from pkt_info to set the correct PKT_HASH_TYPE_*
Tested by toeplitz.py with the igb RSS key get/set patches applied as
they are required for toeplitz.py (see Link below).
# ethtool -N $DEV rx-flow-hash udp4 sdfn
# ethtool -N $DEV rx-flow-hash udp6 sdfn
# python toeplitz.py | grep -E "^# Totals"
Without patch:
# Totals: pass:0 fail:12 xfail:0 xpass:0 skip:0 error:0
With patch:
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
Link: https://lore.kernel.org/intel-wired-lan/20260119084511.95287-5-takkozu@amazon.com/
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igb/e1000_82575.h | 21 ++++++++++++++++++++
drivers/net/ethernet/intel/igb/igb_main.c | 17 ++++++++++++----
2 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.h b/drivers/net/ethernet/intel/igb/e1000_82575.h
index 63ec253ac788..9e696d55e512 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.h
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.h
@@ -87,6 +87,27 @@ union e1000_adv_rx_desc {
} wb; /* writeback */
};
+#define E1000_RSS_TYPE_NO_HASH 0
+#define E1000_RSS_TYPE_HASH_TCP_IPV4 1
+#define E1000_RSS_TYPE_HASH_IPV4 2
+#define E1000_RSS_TYPE_HASH_TCP_IPV6 3
+#define E1000_RSS_TYPE_HASH_IPV6_EX 4
+#define E1000_RSS_TYPE_HASH_IPV6 5
+#define E1000_RSS_TYPE_HASH_TCP_IPV6_EX 6
+#define E1000_RSS_TYPE_HASH_UDP_IPV4 7
+#define E1000_RSS_TYPE_HASH_UDP_IPV6 8
+#define E1000_RSS_TYPE_HASH_UDP_IPV6_EX 9
+
+#define E1000_RSS_TYPE_MASK GENMASK(3, 0)
+
+#define E1000_RSS_L4_TYPES_MASK \
+ (BIT(E1000_RSS_TYPE_HASH_TCP_IPV4) | \
+ BIT(E1000_RSS_TYPE_HASH_TCP_IPV6) | \
+ BIT(E1000_RSS_TYPE_HASH_TCP_IPV6_EX) | \
+ BIT(E1000_RSS_TYPE_HASH_UDP_IPV4) | \
+ BIT(E1000_RSS_TYPE_HASH_UDP_IPV6) | \
+ BIT(E1000_RSS_TYPE_HASH_UDP_IPV6_EX))
+
#define E1000_RXDADV_HDRBUFLEN_MASK 0x7FE0
#define E1000_RXDADV_HDRBUFLEN_SHIFT 5
#define E1000_RXDADV_STAT_TS 0x10000 /* Pkt was time stamped */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b7d36dd0b8e4..d4a897a8c82c 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8820,10 +8820,19 @@ static inline void igb_rx_hash(struct igb_ring *ring,
union e1000_adv_rx_desc *rx_desc,
struct sk_buff *skb)
{
- if (ring->netdev->features & NETIF_F_RXHASH)
- skb_set_hash(skb,
- le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
- PKT_HASH_TYPE_L3);
+ u16 rss_type;
+
+ if (!(ring->netdev->features & NETIF_F_RXHASH))
+ return;
+
+ rss_type = le16_to_cpu(rx_desc->wb.lower.lo_dword.pkt_info) &
+ E1000_RSS_TYPE_MASK;
+ if (!rss_type)
+ return;
+
+ skb_set_hash(skb, le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
+ (E1000_RSS_L4_TYPES_MASK & BIT(rss_type)) ?
+ PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
}
/**
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 08/11] igc: remove unused autoneg_failed field
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Faizal Rahim, anthony.l.nguyen, khai.wen.tan, khai.wen.tan,
faizal.abdul.rahim, hong.aun.looi, hector.blanco.alcaine,
dima.ruinskiy, Aleksandr Loktionov, Piotr Kwapulinski,
Simon Horman, Moriya Kadosh
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
autoneg_failed in struct igc_mac_info is never set in the igc driver.
Remove the field and the dead code checking it in
igc_config_fc_after_link_up().
The field originates from the e1000/e1000e fiber/serdes forced-link
path, where MAC-level autoneg timeout sets it to signal the flow-control
code to force pause. igc supports only copper, so it never needs to set
this field.
Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_hw.h | 1 -
drivers/net/ethernet/intel/igc/igc_mac.c | 16 +---------------
2 files changed, 1 insertion(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/ethernet/intel/igc/igc_hw.h
index be8a49a86d09..86ab8f566f44 100644
--- a/drivers/net/ethernet/intel/igc/igc_hw.h
+++ b/drivers/net/ethernet/intel/igc/igc_hw.h
@@ -92,7 +92,6 @@ struct igc_mac_info {
bool asf_firmware_present;
bool arc_subsystem_valid;
- bool autoneg_failed;
bool get_link_status;
};
diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index 7ac6637f8db7..142beb9ae557 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -438,28 +438,14 @@ void igc_config_collision_dist(struct igc_hw *hw)
* Checks the status of auto-negotiation after link up to ensure that the
* speed and duplex were not forced. If the link needed to be forced, then
* flow control needs to be forced also. If auto-negotiation is enabled
- * and did not fail, then we configure flow control based on our link
- * partner.
+ * then we configure flow control based on our link partner.
*/
s32 igc_config_fc_after_link_up(struct igc_hw *hw)
{
u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
- struct igc_mac_info *mac = &hw->mac;
u16 speed, duplex;
s32 ret_val = 0;
- /* Check for the case where we have fiber media and auto-neg failed
- * so we had to force link. In this case, we need to force the
- * configuration of the MAC to match the "fc" parameter.
- */
- if (mac->autoneg_failed)
- ret_val = igc_force_mac_fc(hw);
-
- if (ret_val) {
- hw_dbg("Error forcing flow control settings\n");
- goto out;
- }
-
/* In auto-neg, we need to check and see if Auto-Neg has completed,
* and if so, how the PHY and link partner has flow control
* configured.
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 09/11] igc: move autoneg-enabled settings into igc_handle_autoneg_enabled()
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Faizal Rahim, anthony.l.nguyen, khai.wen.tan, khai.wen.tan,
faizal.abdul.rahim, hong.aun.looi, hector.blanco.alcaine,
dima.ruinskiy, Aleksandr Loktionov, Simon Horman, Moriya Kadosh
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Move the advertised link modes and flow control configuration from
igc_ethtool_set_link_ksettings() into igc_handle_autoneg_enabled().
No functional change.
Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_ethtool.c | 72 ++++++++++++--------
1 file changed, 44 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index fbba3e84673a..7ee84c24dc4e 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -2032,6 +2032,49 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
return 0;
}
+/**
+ * igc_handle_autoneg_enabled - Configure autonegotiation advertisement
+ * @adapter: private driver structure
+ * @cmd: ethtool link ksettings from user
+ *
+ * Records advertised speeds and flow control settings when autoneg
+ * is enabled.
+ */
+static void igc_handle_autoneg_enabled(struct igc_adapter *adapter,
+ const struct ethtool_link_ksettings *cmd)
+{
+ struct igc_hw *hw = &adapter->hw;
+ u16 advertised = 0;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 2500baseT_Full))
+ advertised |= ADVERTISE_2500_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 1000baseT_Full))
+ advertised |= ADVERTISE_1000_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 100baseT_Full))
+ advertised |= ADVERTISE_100_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 100baseT_Half))
+ advertised |= ADVERTISE_100_HALF;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 10baseT_Full))
+ advertised |= ADVERTISE_10_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 10baseT_Half))
+ advertised |= ADVERTISE_10_HALF;
+
+ hw->phy.autoneg_advertised = advertised;
+ if (adapter->fc_autoneg)
+ hw->fc.requested_mode = igc_fc_default;
+}
+
static int
igc_ethtool_set_link_ksettings(struct net_device *netdev,
const struct ethtool_link_ksettings *cmd)
@@ -2039,7 +2082,6 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
struct igc_adapter *adapter = netdev_priv(netdev);
struct net_device *dev = adapter->netdev;
struct igc_hw *hw = &adapter->hw;
- u16 advertised = 0;
/* When adapter in resetting mode, autoneg/speed/duplex
* cannot be changed
@@ -2064,34 +2106,8 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
usleep_range(1000, 2000);
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 2500baseT_Full))
- advertised |= ADVERTISE_2500_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 1000baseT_Full))
- advertised |= ADVERTISE_1000_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 100baseT_Full))
- advertised |= ADVERTISE_100_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 100baseT_Half))
- advertised |= ADVERTISE_100_HALF;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 10baseT_Full))
- advertised |= ADVERTISE_10_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 10baseT_Half))
- advertised |= ADVERTISE_10_HALF;
-
if (cmd->base.autoneg == AUTONEG_ENABLE) {
- hw->phy.autoneg_advertised = advertised;
- if (adapter->fc_autoneg)
- hw->fc.requested_mode = igc_fc_default;
+ igc_handle_autoneg_enabled(adapter, cmd);
} else {
netdev_info(dev, "Force mode currently not supported\n");
}
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 10/11] igc: replace goto out with direct returns in igc_config_fc_after_link_up()
From: Tony Nguyen @ 2026-07-01 21:02 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Faizal Rahim, anthony.l.nguyen, khai.wen.tan, khai.wen.tan,
faizal.abdul.rahim, hong.aun.looi, hector.blanco.alcaine,
dima.ruinskiy, Simon Horman, Moriya Kadosh
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
The out: label only returns ret_val with no cleanup. The kernel coding
style guide states: "If there is no cleanup needed then just return
directly." (Documentation/process/coding-style.rst, section 7).
This improves readability ahead of a subsequent patch that introduces a
new goto label in this function.
No functional change.
Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_mac.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index 142beb9ae557..0a3d3f357505 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -458,15 +458,15 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS,
&mii_status_reg);
if (ret_val)
- goto out;
+ return ret_val;
ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS,
&mii_status_reg);
if (ret_val)
- goto out;
+ return ret_val;
if (!(mii_status_reg & MII_SR_AUTONEG_COMPLETE)) {
hw_dbg("Copper PHY and Auto Neg has not completed.\n");
- goto out;
+ return ret_val;
}
/* The AutoNeg process has completed, so we now need to
@@ -478,11 +478,11 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
ret_val = hw->phy.ops.read_reg(hw, PHY_AUTONEG_ADV,
&mii_nway_adv_reg);
if (ret_val)
- goto out;
+ return ret_val;
ret_val = hw->phy.ops.read_reg(hw, PHY_LP_ABILITY,
&mii_nway_lp_ability_reg);
if (ret_val)
- goto out;
+ return ret_val;
/* Two bits in the Auto Negotiation Advertisement Register
* (Address 4) and two bits in the Auto Negotiation Base
* Page Ability Register (Address 5) determine flow control
@@ -598,7 +598,7 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
ret_val = hw->mac.ops.get_speed_and_duplex(hw, &speed, &duplex);
if (ret_val) {
hw_dbg("Error getting link speed and duplex\n");
- goto out;
+ return ret_val;
}
if (duplex == HALF_DUPLEX)
@@ -610,10 +610,9 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
ret_val = igc_force_mac_fc(hw);
if (ret_val) {
hw_dbg("Error forcing flow control settings\n");
- goto out;
+ return ret_val;
}
-out:
return ret_val;
}
--
2.47.1
^ permalink raw reply related
* [PATCH net-next 11/11] igc: add support for forcing link speed without autonegotiation
From: Tony Nguyen @ 2026-07-01 21:03 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Faizal Rahim, anthony.l.nguyen, khai.wen.tan, khai.wen.tan,
faizal.abdul.rahim, hong.aun.looi, hector.blanco.alcaine,
dima.ruinskiy, Simon Horman, Moriya Kadosh
In-Reply-To: <20260701210303.1745310-1-anthony.l.nguyen@intel.com>
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Allow users to force 10/100 Mb/s link speed and duplex via ethtool
when autonegotiation is disabled. Previously, the driver rejected
these requests with "Force mode currently not supported.".
Forcing at 1000 Mb/s and 2500 Mb/s is not supported.
Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_base.c | 35 ++++-
drivers/net/ethernet/intel/igc/igc_defines.h | 9 +-
drivers/net/ethernet/intel/igc/igc_ethtool.c | 138 ++++++++++++++-----
drivers/net/ethernet/intel/igc/igc_hw.h | 9 ++
drivers/net/ethernet/intel/igc/igc_mac.c | 12 ++
drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_phy.c | 65 ++++++++-
drivers/net/ethernet/intel/igc/igc_phy.h | 1 +
8 files changed, 220 insertions(+), 51 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_base.c b/drivers/net/ethernet/intel/igc/igc_base.c
index 1613b562d17c..ab9120a3127f 100644
--- a/drivers/net/ethernet/intel/igc/igc_base.c
+++ b/drivers/net/ethernet/intel/igc/igc_base.c
@@ -114,11 +114,35 @@ static s32 igc_setup_copper_link_base(struct igc_hw *hw)
u32 ctrl;
ctrl = rd32(IGC_CTRL);
- ctrl |= IGC_CTRL_SLU;
- ctrl &= ~(IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX);
- wr32(IGC_CTRL, ctrl);
-
- ret_val = igc_setup_copper_link(hw);
+ ctrl &= ~(IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX |
+ IGC_CTRL_SPEED_MASK | IGC_CTRL_FD);
+
+ if (hw->mac.autoneg_enabled) {
+ ctrl |= IGC_CTRL_SLU;
+ wr32(IGC_CTRL, ctrl);
+ ret_val = igc_setup_copper_link(hw);
+ } else {
+ ctrl |= IGC_CTRL_SLU | IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX;
+
+ switch (hw->mac.forced_speed_duplex) {
+ case IGC_FORCED_10H:
+ ctrl |= IGC_CTRL_SPEED_10;
+ break;
+ case IGC_FORCED_10F:
+ ctrl |= IGC_CTRL_SPEED_10 | IGC_CTRL_FD;
+ break;
+ case IGC_FORCED_100H:
+ ctrl |= IGC_CTRL_SPEED_100;
+ break;
+ case IGC_FORCED_100F:
+ ctrl |= IGC_CTRL_SPEED_100 | IGC_CTRL_FD;
+ break;
+ default:
+ return -IGC_ERR_CONFIG;
+ }
+ wr32(IGC_CTRL, ctrl);
+ ret_val = igc_setup_copper_link(hw);
+ }
return ret_val;
}
@@ -443,6 +467,7 @@ static const struct igc_phy_operations igc_phy_ops_base = {
.reset = igc_phy_hw_reset,
.read_reg = igc_read_phy_reg_gpy,
.write_reg = igc_write_phy_reg_gpy,
+ .force_speed_duplex = igc_force_speed_duplex,
};
const struct igc_info igc_base_info = {
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 9482ab11f050..3f504751c2d9 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -129,10 +129,13 @@
#define IGC_ERR_SWFW_SYNC 13
/* Device Control */
+#define IGC_CTRL_FD BIT(0) /* Full Duplex */
#define IGC_CTRL_RST 0x04000000 /* Global reset */
-
#define IGC_CTRL_PHY_RST 0x80000000 /* PHY Reset */
#define IGC_CTRL_SLU 0x00000040 /* Set link up (Force Link) */
+#define IGC_CTRL_SPEED_MASK GENMASK(10, 8)
+#define IGC_CTRL_SPEED_10 FIELD_PREP(IGC_CTRL_SPEED_MASK, 0)
+#define IGC_CTRL_SPEED_100 FIELD_PREP(IGC_CTRL_SPEED_MASK, 1)
#define IGC_CTRL_FRCSPD 0x00000800 /* Force Speed */
#define IGC_CTRL_FRCDPX 0x00001000 /* Force Duplex */
#define IGC_CTRL_VME 0x40000000 /* IEEE VLAN mode enable */
@@ -673,6 +676,10 @@
#define IGC_GEN_POLL_TIMEOUT 1920
/* PHY Control Register */
+#define MII_CR_SPEED_MASK (BIT(6) | BIT(13))
+#define MII_CR_SPEED_10 0x0000 /* SSM=0, SSL=0: 10 Mb/s */
+#define MII_CR_SPEED_100 BIT(13) /* SSM=0, SSL=1: 100 Mb/s */
+#define MII_CR_DUPLEX_EN BIT(8) /* 0 = Half Duplex, 1 = Full Duplex */
#define MII_CR_RESTART_AUTO_NEG 0x0200 /* Restart auto negotiation */
#define MII_CR_POWER_DOWN 0x0800 /* Power down */
#define MII_CR_AUTO_NEG_EN 0x1000 /* Auto Neg Enable */
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 7ee84c24dc4e..89fe2788a565 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1946,44 +1946,58 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
ethtool_link_ksettings_add_link_mode(cmd, supported, TP);
ethtool_link_ksettings_add_link_mode(cmd, advertising, TP);
- /* advertising link modes */
- if (hw->phy.autoneg_advertised & ADVERTISE_10_HALF)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 10baseT_Half);
- if (hw->phy.autoneg_advertised & ADVERTISE_10_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 10baseT_Full);
- if (hw->phy.autoneg_advertised & ADVERTISE_100_HALF)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 100baseT_Half);
- if (hw->phy.autoneg_advertised & ADVERTISE_100_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 100baseT_Full);
- if (hw->phy.autoneg_advertised & ADVERTISE_1000_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 1000baseT_Full);
- if (hw->phy.autoneg_advertised & ADVERTISE_2500_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 2500baseT_Full);
-
/* set autoneg settings */
ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
- ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
+ if (hw->mac.autoneg_enabled) {
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
+ cmd->base.autoneg = AUTONEG_ENABLE;
+
+ /* advertising link modes only apply when autoneg is on */
+ if (hw->phy.autoneg_advertised & ADVERTISE_10_HALF)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 10baseT_Half);
+ if (hw->phy.autoneg_advertised & ADVERTISE_10_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 10baseT_Full);
+ if (hw->phy.autoneg_advertised & ADVERTISE_100_HALF)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 100baseT_Half);
+ if (hw->phy.autoneg_advertised & ADVERTISE_100_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 100baseT_Full);
+ if (hw->phy.autoneg_advertised & ADVERTISE_1000_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 1000baseT_Full);
+ if (hw->phy.autoneg_advertised & ADVERTISE_2500_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 2500baseT_Full);
+
+ /* Set pause flow control advertising */
+ switch (hw->fc.requested_mode) {
+ case igc_fc_full:
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Pause);
+ break;
+ case igc_fc_rx_pause:
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Pause);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Asym_Pause);
+ break;
+ case igc_fc_tx_pause:
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Asym_Pause);
+ break;
+ default:
+ break;
+ }
+ } else {
+ cmd->base.autoneg = AUTONEG_DISABLE;
+ }
- /* Set pause flow control settings */
+ /* Pause is always supported */
ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
- switch (hw->fc.requested_mode) {
- case igc_fc_full:
- ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
- break;
- case igc_fc_rx_pause:
- ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
- ethtool_link_ksettings_add_link_mode(cmd, advertising,
- Asym_Pause);
- break;
- case igc_fc_tx_pause:
- ethtool_link_ksettings_add_link_mode(cmd, advertising,
- Asym_Pause);
- break;
- default:
- break;
- }
-
status = pm_runtime_suspended(&adapter->pdev->dev) ?
0 : rd32(IGC_STATUS);
@@ -2015,7 +2029,6 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
cmd->base.duplex = DUPLEX_UNKNOWN;
}
cmd->base.speed = speed;
- cmd->base.autoneg = AUTONEG_ENABLE;
/* MDI-X => 2; MDI =>1; Invalid =>0 */
if (hw->phy.media_type == igc_media_type_copper)
@@ -2032,6 +2045,37 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
return 0;
}
+/**
+ * igc_handle_autoneg_disabled - Configure forced speed/duplex settings
+ * @adapter: private driver structure
+ * @speed: requested speed (must be SPEED_10 or SPEED_100)
+ * @duplex: requested duplex
+ *
+ * Records forced speed/duplex when autoneg is disabled.
+ * Caller must validate speed before calling this function.
+ */
+static void igc_handle_autoneg_disabled(struct igc_adapter *adapter, u32 speed,
+ u8 duplex)
+{
+ struct igc_mac_info *mac = &adapter->hw.mac;
+
+ switch (speed) {
+ case SPEED_10:
+ mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
+ IGC_FORCED_10F : IGC_FORCED_10H;
+ break;
+ case SPEED_100:
+ mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
+ IGC_FORCED_100F : IGC_FORCED_100H;
+ break;
+ default:
+ WARN_ONCE(1, "Unsupported speed %u\n", speed);
+ return;
+ }
+
+ mac->autoneg_enabled = false;
+}
+
/**
* igc_handle_autoneg_enabled - Configure autonegotiation advertisement
* @adapter: private driver structure
@@ -2070,6 +2114,7 @@ static void igc_handle_autoneg_enabled(struct igc_adapter *adapter,
10baseT_Half))
advertised |= ADVERTISE_10_HALF;
+ hw->mac.autoneg_enabled = true;
hw->phy.autoneg_advertised = advertised;
if (adapter->fc_autoneg)
hw->fc.requested_mode = igc_fc_default;
@@ -2091,6 +2136,12 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
return -EINVAL;
}
+ if (cmd->base.autoneg != AUTONEG_ENABLE &&
+ cmd->base.autoneg != AUTONEG_DISABLE) {
+ netdev_info(dev, "Unsupported autoneg setting\n");
+ return -EINVAL;
+ }
+
/* MDI setting is only allowed when autoneg enabled because
* some hardware doesn't allow MDI setting when speed or
* duplex is forced.
@@ -2103,14 +2154,25 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
}
}
+ if (cmd->base.autoneg == AUTONEG_DISABLE) {
+ if (cmd->base.speed != SPEED_10 && cmd->base.speed != SPEED_100) {
+ netdev_info(dev, "Unsupported speed for forced link\n");
+ return -EINVAL;
+ }
+ if (cmd->base.duplex != DUPLEX_HALF && cmd->base.duplex != DUPLEX_FULL) {
+ netdev_info(dev, "Duplex must be half or full for forced link\n");
+ return -EINVAL;
+ }
+ }
+
while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
usleep_range(1000, 2000);
- if (cmd->base.autoneg == AUTONEG_ENABLE) {
+ if (cmd->base.autoneg == AUTONEG_ENABLE)
igc_handle_autoneg_enabled(adapter, cmd);
- } else {
- netdev_info(dev, "Force mode currently not supported\n");
- }
+ else
+ igc_handle_autoneg_disabled(adapter, cmd->base.speed,
+ cmd->base.duplex);
/* MDI-X => 2; MDI => 1; Auto => 3 */
if (cmd->base.eth_tp_mdix_ctrl) {
diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/ethernet/intel/igc/igc_hw.h
index 86ab8f566f44..62aaee55668a 100644
--- a/drivers/net/ethernet/intel/igc/igc_hw.h
+++ b/drivers/net/ethernet/intel/igc/igc_hw.h
@@ -73,6 +73,13 @@ struct igc_info {
extern const struct igc_info igc_base_info;
+enum igc_forced_speed_duplex {
+ IGC_FORCED_10H,
+ IGC_FORCED_10F,
+ IGC_FORCED_100H,
+ IGC_FORCED_100F,
+};
+
struct igc_mac_info {
struct igc_mac_operations ops;
@@ -93,6 +100,8 @@ struct igc_mac_info {
bool arc_subsystem_valid;
bool get_link_status;
+ bool autoneg_enabled;
+ enum igc_forced_speed_duplex forced_speed_duplex;
};
struct igc_nvm_operations {
diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index 0a3d3f357505..d6f3f6618469 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -446,6 +446,17 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
u16 speed, duplex;
s32 ret_val = 0;
+ /* Without autoneg, flow control capability is not exchanged with the
+ * link partner. IEEE 802.3 prohibits flow control in half-duplex mode.
+ */
+ if (!hw->mac.autoneg_enabled) {
+ if (hw->mac.forced_speed_duplex == IGC_FORCED_10H ||
+ hw->mac.forced_speed_duplex == IGC_FORCED_100H)
+ hw->fc.current_mode = igc_fc_none;
+
+ goto force_fc;
+ }
+
/* In auto-neg, we need to check and see if Auto-Neg has completed,
* and if so, how the PHY and link partner has flow control
* configured.
@@ -607,6 +618,7 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
/* Now we call a subroutine to actually force the MAC
* controller to use the correct flow control settings.
*/
+force_fc:
ret_val = igc_force_mac_fc(hw);
if (ret_val) {
hw_dbg("Error forcing flow control settings\n");
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 5ef229a5931f..e6e9441fc3d4 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -7298,7 +7298,7 @@ static int igc_probe(struct pci_dev *pdev,
/* Initialize link properties that are user-changeable */
adapter->fc_autoneg = true;
hw->phy.autoneg_advertised = 0xaf;
-
+ hw->mac.autoneg_enabled = true;
hw->fc.requested_mode = igc_fc_default;
hw->fc.current_mode = igc_fc_default;
diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/ethernet/intel/igc/igc_phy.c
index 6c4d204aecfa..4cf737fb3b21 100644
--- a/drivers/net/ethernet/intel/igc/igc_phy.c
+++ b/drivers/net/ethernet/intel/igc/igc_phy.c
@@ -494,12 +494,20 @@ s32 igc_setup_copper_link(struct igc_hw *hw)
s32 ret_val = 0;
bool link;
- /* Setup autoneg and flow control advertisement and perform
- * autonegotiation.
- */
- ret_val = igc_copper_link_autoneg(hw);
- if (ret_val)
- goto out;
+ if (hw->mac.autoneg_enabled) {
+ /* Setup autoneg and flow control advertisement and perform
+ * autonegotiation.
+ */
+ ret_val = igc_copper_link_autoneg(hw);
+ if (ret_val)
+ goto out;
+ } else {
+ ret_val = hw->phy.ops.force_speed_duplex(hw);
+ if (ret_val) {
+ hw_dbg("Error Forcing Speed/Duplex\n");
+ goto out;
+ }
+ }
/* Check link status. Wait up to 100 microseconds for link to become
* valid.
@@ -778,3 +786,48 @@ u16 igc_read_phy_fw_version(struct igc_hw *hw)
return gphy_version;
}
+
+/**
+ * igc_force_speed_duplex - Force PHY speed and duplex settings
+ * @hw: pointer to the HW structure
+ *
+ * Programs the GPY PHY control register to disable autonegotiation
+ * and force the speed/duplex indicated by hw->mac.forced_speed_duplex.
+ */
+s32 igc_force_speed_duplex(struct igc_hw *hw)
+{
+ struct igc_phy_info *phy = &hw->phy;
+ u16 phy_ctrl;
+ s32 ret_val;
+
+ ret_val = phy->ops.read_reg(hw, PHY_CONTROL, &phy_ctrl);
+ if (ret_val)
+ return ret_val;
+
+ phy_ctrl &= ~(MII_CR_SPEED_MASK | MII_CR_DUPLEX_EN |
+ MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
+
+ switch (hw->mac.forced_speed_duplex) {
+ case IGC_FORCED_10H:
+ phy_ctrl |= MII_CR_SPEED_10;
+ break;
+ case IGC_FORCED_10F:
+ phy_ctrl |= MII_CR_SPEED_10 | MII_CR_DUPLEX_EN;
+ break;
+ case IGC_FORCED_100H:
+ phy_ctrl |= MII_CR_SPEED_100;
+ break;
+ case IGC_FORCED_100F:
+ phy_ctrl |= MII_CR_SPEED_100 | MII_CR_DUPLEX_EN;
+ break;
+ default:
+ return -IGC_ERR_CONFIG;
+ }
+
+ ret_val = phy->ops.write_reg(hw, PHY_CONTROL, phy_ctrl);
+ if (ret_val)
+ return ret_val;
+
+ hw->mac.get_link_status = true;
+ return 0;
+}
diff --git a/drivers/net/ethernet/intel/igc/igc_phy.h b/drivers/net/ethernet/intel/igc/igc_phy.h
index 832a7e359f18..d37a89174826 100644
--- a/drivers/net/ethernet/intel/igc/igc_phy.h
+++ b/drivers/net/ethernet/intel/igc/igc_phy.h
@@ -18,5 +18,6 @@ void igc_power_down_phy_copper(struct igc_hw *hw);
s32 igc_write_phy_reg_gpy(struct igc_hw *hw, u32 offset, u16 data);
s32 igc_read_phy_reg_gpy(struct igc_hw *hw, u32 offset, u16 *data);
u16 igc_read_phy_fw_version(struct igc_hw *hw);
+s32 igc_force_speed_duplex(struct igc_hw *hw);
#endif
--
2.47.1
^ permalink raw reply related
* Re: RTL8159 firmware
From: Aleksander Jan Bajkowski @ 2026-07-01 21:06 UTC (permalink / raw)
To: Birger Koblitz, Andrew Lunn
Cc: Jan Hendrik Farr, andrew+netdev, davem, edumazet, hsu.chih.kai,
kuba, linux-kernel, linux-usb, netdev, pabeni
In-Reply-To: <6677d82e-2b8c-4173-ba6d-6743a5059bc1@birger-koblitz.de>
Hi Birger,
Realtek recently released firmware for the RTL8261C[1]. I expect
the RTL8159 has an RTL8261x PHY built in. Do you know whether
the RTL8159 firmware consists only the PHY firmware?
1.
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/rtl_nic/rtl8261c.bin
Best regards,
Aleksander
^ permalink raw reply
* Re: Ethtool : PRBS feature
From: Srinivasan, Vijay @ 2026-07-01 21:38 UTC (permalink / raw)
To: Andrew Lunn, Das, Shubham
Cc: Alexander Duyck, Lee Trager, Maxime Chevallier,
netdev@vger.kernel.org, mkubecek@suse.cz, D H, Siddaraju,
Chintalapalle, Balaji, Lindberg, Magnus,
niklas.damberg@ericsson.com, Wirandi, Jonas
In-Reply-To: <BL3PR11MB63854B0A4AA33A718D474C6588F62@BL3PR11MB6385.namprd11.prod.outlook.com>
[-- Attachment #1.1: Type: text/plain, Size: 6186 bytes --]
Hi Andrew,
I think there is a disconnect here. Please see the diagram attached indicating the error injection location.
Here, we are referring to error injection at the "bit" level, not "frame" level.
Bit error(s) injected at the PMA/PMD boundary, does not distinguish between data (Frames) vs test patterns (PRBS).
Many SerDes IP's , if not all, include error injection as part of the test pattern block (BIST in the diagram).
Some IP's may have error injection outside of BIST in the common data path (as shown in the diagram) in which case bit errors may be injected in both data mode (frame/traffic) and test pattern mode (BIST).
Regardless, we are seeking error injection capability at PMA/PMD to be made available through user/driver space (ethtool) .
Notes:
1.
Error injection can be "one-shot" (single error) or continuous (fixed rate, say 1 bit error every 10**6 bits (1E-6)).
2.
Error injection availability is IP/API dependent. One-shot mode is highly likely to be available in all SerDes IP's.
3.
Error injection location and implementation is IP dependent.
4.
If IP/API supported:
*
One-shot error injected in test pattern (PRBS) mode captured as individual/single bit error at the far-end checker.
*
One-shot error injected in data (traffic/frame) mode captured as:
*
Individual/single corrected codeword error if FEC is used for the link
*
Individual/single CRC: (a) if FEC is not used for the link and (b) if bit error injected corrupts any of the bits used to compute CRC.
*
Effect of errors injected at fixed rate is a corollary to one-shot with additional :
*
Errors injected at higher rates (> 1E-4) may result in uncorrected FEC codewords or loss of link
*
Presence of uncorrected FEC codewords will lead to MAC CRCs
What is requested:
1.
One-shot bit error injection (required/preferred), fixed rate is optional
How to inject error:
1.
Test pattern mode:
*
Generate any PRBS pattern
*
Configure error checker on receive side (same device in loopback mode or far-end device) - measure Bit Error Ratio (BER) without error injection
*
Inject error and verify BER>0 (if BER==0 without injection)
2.
Data (Traffic/Frame) mode:
*
Configure and establish link (same device in loopback mode or far-end device)
*
Measure MAC CRC and/or FEC corrected/uncorrected codeword counts
*
Inject error and verify MAC CRC or FEC corrected counts match injected error count
Vijay
________________________________
From: Andrew Lunn <andrew@lunn.ch>
Sent: Wednesday, July 1, 2026 10:32 AM
To: Das, Shubham <shubham.das@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>; Lee Trager <lee@trager.us>; Maxime Chevallier <maxime.chevallier@bootlin.com>; netdev@vger.kernel.org <netdev@vger.kernel.org>; mkubecek@suse.cz <mkubecek@suse.cz>; D H, Siddaraju <siddaraju.dh@intel.com>; Chintalapalle, Balaji <balaji.chintalapalle@intel.com>; Lindberg, Magnus <magnus.k.lindberg@ericsson.com>; niklas.damberg@ericsson.com <niklas.damberg@ericsson.com>; Wirandi, Jonas <jonas.wirandi@ericsson.com>; Srinivasan, Vijay <vijay.srinivasan@intel.com>
Subject: Re: Ethtool : PRBS feature
On Wed, Jul 01, 2026 at 05:10:43PM +0000, Das, Shubham wrote:
> > Sorry, but i could not implement that, in a sensible way, given its current
> > specification.
> >
> > I suppose i could simply flip the first `inject-error-count` bits, and make the rest of
> > the stream perfect? I could also wait until the stop command is received, and
> > then flip that many bits before i stop the stream? But none of these seem
> > sensible.
> >
> > Please make this specification have sufficient details, or references to 802.3, that
> > you could give it to another engineer and get back a reasonable implementation,
> > without having to answer any questions.
>
> Andrew,
>
> IEEE has clear documentation of the PRBS Receiver block and the BER counter as an output.
> Before performing the actual BER validation, it is a usual industry practice to introduce errors
> to guarantee that the checker is functional and accurately identifying them.
>
> Similarly, in DATA mode, error injection is used to verify the FEC block
> by ensuring that injected errors are detected and corrected as expected.
>
> Updated description.
>
> + name: inject-error-count
> + type: u32
> + doc: |
> + Request the PHY to inject exactly this many bit errors into the
> + currently active test data stream.
> +
> + This is a diagnostic tool used to validate that the far-end PRBS
> + checker or FEC decoder is functioning correctly. For example,
> + after enabling a PRBS pattern and confirming ber-lock-status is
> + locked, injecting N errors should cause ber-error-count to
> + increment by exactly N on the receiving port, confirming the
> + checker is actively detecting bit errors. Similarly, in normal
> + data mode with FEC enabled, injecting errors verifies that the
> + FEC block detects errors as expected.
There is no mention of how many frames to send in the stream. I don't
think that is part of the API? Because we have no idea of how many
frames will be sent, it is not possible to distribute the corrupted
frames over the duration of the stream. So that means i should flip
one bit, anywhere in the first inject-error-count frames. All frames
after that should not have bit flips. The assumption being, the stream
has a minimum of inject-error-count frames, and if the stream is
short, the counter will be too low. But it does not matter if the
stream is longer.
Your description has no mention of frames. Should it? What exactly
does the ber-error-count count? Can multiple bit flip within one frame
be counted individually? I don't see how, since the checksum just says
the frame is bad, and cannot report how bad.
As i said, give this description to another engineer and ask him/her
how it could be implemented.
https://www.youtube.com/watch?v=j-6N3bLgYyQ
Andrew
[-- Attachment #1.2: Type: text/html, Size: 16159 bytes --]
[-- Attachment #2: Error Injection at Bit Level.pdf --]
[-- Type: application/pdf, Size: 38062 bytes --]
^ permalink raw reply
* [PATCH v1 net-next 00/14] net: Support per-netns device unregistration
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
The biggest blocker to per-netns RTNL is netdev unregistration.
It starts within a single netns, but it can eventually involve
multiple namespaces.
There are three types of such cross-netns devices:
1. Paired devices (e.g., netkit, veth, vxcan)
-> Unregistering one device also deletes its peer, which
may reside in another netns.
2. Tunnel devices (e.g., bareudp, geneve, etc)
-> Destroying a netns removes devices in another netns if
their backend sockets reside in the dying netns
3. Stacked devices (e.g., ipvlan, macvlan, etc)
-> Removing the lower device also removes multiple upper
devices, each of which may reside in different namespaces.
While the first two device types require at most two rtnl_net_lock()s,
the stacked type has no upper limit. This makes it impossible to
freeze all necessary namespaces in advance.
This series introduces per-netns work, initially suggested at
NetConf 2024, to delegate the unregistration of such cross-netns
devices.
https://netdev.bots.linux.dev/netconf/2024/kuniyu.pdf#page=62
The first half of the series wraps NETDEV_UNREGISTER (in core) with
per-netns RTNL, adds a helper for per-netns device unregistration,
and forces per-netns device unregistration in the core code when
CONFIG_DEBUG_NET_SMALL_RTNL=y.
The latter half picks out one from each type (veth, bareudp, ipvlan)
and converts them to support per-netns device unregistration,
although the operations are **still serialised under RTNL** for now.
Please note that this series focuses only on the device unregistration
paths. For example, there are ASSERT_RTNL() left in other paths, and
Sashiko may point it out, but they are out of scope.
This is just the first step, and we need more incremental changes to
completely remove RTNL anyway.
Now, we can see that unregistering a lower device (veth0 below)
removes upper devices (ipvl2, ipvl3) in different namespaces using
per-netns work with a different PID. The lower device (veth0) is
freed only after all upper ipvlan devices have called netdev_put()
in ipvlan_uninit().
# ip netns add ns1
# ip netns add ns2
# ip netns add ns3
# ip -n ns1 link add veth0 type veth peer veth1
# ip -n ns2 link add ipvl2 link veth0 link-netns ns1 type ipvlan mode l2
# ip -n ns3 link add ipvl3 link veth0 link-netns ns1 type ipvlan mode l2
# ip -n ns1 link del veth0
# bpftrace -e '#include <linux/netdevice.h>
kprobe:ipvlan_uninit,
kprobe:veth_dellink,
kprobe:free_netdev {
$dev = (struct net_device *)arg0;
printf("PID: %d | DEV: %s%s\n", pid, $dev->name, kstack());
}'
PID: 2010 | DEV: veth0
veth_dellink+5
rtnl_dellink+1213
rtnetlink_rcv_msg+1791
...
PID: 440 | DEV: ipvl2
ipvlan_uninit+5
unregister_netdevice_many_notify+7129
unregister_netdevice_many_net+1050
rtnl_net_work_func+136
...
PID: 440 | DEV: ipvl2
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
PID: 440 | DEV: ipvl3
ipvlan_uninit+5
unregister_netdevice_many_notify+7129
unregister_netdevice_many_net+1050
rtnl_net_work_func+136
process_scheduled_works+2538
...
PID: 2010 | DEV: veth0
free_netdev+5
netdev_run_todo+4798
rtnl_dellink+1507
rtnetlink_rcv_msg+1791
...
PID: 440 | DEV: ipvl3
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
Kuniyuki Iwashima (14):
rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink().
rtnetlink: Call unregister_netdevice_many() only once in
rtnl_link_unregister().
rtnetlink: Add per-netns rtnl_work.
net: Wrap default_device_exit_net() with __rtnl_net_lock().
net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any().
net: Add per-netns netdev unregistration infra.
net: Call unregister_netdevice_many() per netns.
veth: Support per-netns device unregistration.
bareudp: Protect bareudp_list with mutex.
bareudp: Support per-netns netdev unregistration.
ipvlan: Convert ipvl_port.count to refcount_t.
ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same
lower dev.
ipvlan: Protect ipvl_port.ipvlans with mutex.
ipvlan: Support per-netns netdev unregistration.
drivers/net/bareudp.c | 43 ++++++++-
drivers/net/ipvlan/ipvlan.h | 18 +++-
drivers/net/ipvlan/ipvlan_main.c | 153 +++++++++++++++++++++++++------
drivers/net/ipvlan/ipvtap.c | 16 ++--
drivers/net/veth.c | 34 ++++---
include/linux/netdevice.h | 22 +++++
include/linux/rtnetlink.h | 8 ++
include/net/net_namespace.h | 3 +
net/core/dev.c | 129 +++++++++++++++++++++++++-
net/core/net_namespace.c | 4 +
net/core/rtnetlink.c | 57 ++++++++++--
11 files changed, 418 insertions(+), 69 deletions(-)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply
* [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink().
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260701214334.266991-1-kuniyu@google.com>
There are a few cases where rtnl_net_lock() is not properly
held in rtnl_newlink().
When either of IFLA_NET_NS_PID / IFLA_NET_NS_FD / IFLA_TARGET_NETNSID
is specified but IFLA_LINK_NETNSID is not, sock_net(skb->sk) is used
as link_net in rtnl_newlink_link_net().
In addition, the do_setlink() path uses sock_net(skb->sk) and one
from the three netns attributes while rtnl_link_get_net_capable()
returns only one of four.
Let's add sock_net(skb->sk) to rtnl_nets in rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
No fixes tag is needed since there is no real bug nor assertion.
---
net/core/rtnetlink.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 12aa3aa1688b..f39c93e80e20 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -282,10 +282,11 @@ static int rtnl_net_cmp_locks(const struct net *net_a, const struct net *net_b)
#endif
struct rtnl_nets {
- /* ->newlink() needs to freeze 3 netns at most;
- * 2 for the new device, 1 for its peer.
+ /* ->newlink() needs to freeze 4 netns at most;
+ * 2 for the new device, 1 for its peer, 1 for
+ * an existing device (do_setlink() path).
*/
- struct net *net[3];
+ struct net *net[4];
unsigned char len;
};
@@ -4155,6 +4156,8 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
}
}
+ rtnl_nets_add(&rtnl_nets, get_net(sock_net(skb->sk)));
+
rtnl_nets_lock(&rtnl_nets);
ret = __rtnl_newlink(skb, nlh, ops, tgt_net, link_net, peer_net, tbs, data, extack);
rtnl_nets_unlock(&rtnl_nets);
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH v1 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister().
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260701214334.266991-1-kuniyu@google.com>
When rtnl_link_unregister() is called during module unload, it
calls __rtnl_kill_links() for every netns.
__rtnl_kill_links() collects all devices of the unloaded module
and passes them to unregister_netdevice_many().
Let's move unregister_netdevice_many() to rtnl_link_unregister()
to unregister all devices across netns in a single batch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/core/rtnetlink.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f39c93e80e20..7207da002fb5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -637,16 +637,15 @@ int rtnl_link_register(struct rtnl_link_ops *ops)
}
EXPORT_SYMBOL_GPL(rtnl_link_register);
-static void __rtnl_kill_links(struct net *net, struct rtnl_link_ops *ops)
+static void __rtnl_kill_links(struct net *net, struct rtnl_link_ops *ops,
+ struct list_head *dev_kill_list)
{
struct net_device *dev;
- LIST_HEAD(list_kill);
for_each_netdev(net, dev) {
if (dev->rtnl_link_ops == ops)
- ops->dellink(dev, &list_kill);
+ ops->dellink(dev, dev_kill_list);
}
- unregister_netdevice_many(&list_kill);
}
/* Return with the rtnl_lock held when there are no network
@@ -677,6 +676,7 @@ static void rtnl_lock_unregistering_all(void)
*/
void rtnl_link_unregister(struct rtnl_link_ops *ops)
{
+ LIST_HEAD(dev_kill_list);
struct net *net;
mutex_lock(&link_ops_mutex);
@@ -691,7 +691,9 @@ void rtnl_link_unregister(struct rtnl_link_ops *ops)
rtnl_lock_unregistering_all();
for_each_net(net)
- __rtnl_kill_links(net, ops);
+ __rtnl_kill_links(net, ops, &dev_kill_list);
+
+ unregister_netdevice_many(&dev_kill_list);
rtnl_unlock();
up_write(&pernet_ops_rwsem);
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH v1 net-next 03/14] rtnetlink: Add per-netns rtnl_work.
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260701214334.266991-1-kuniyu@google.com>
The biggest blocker to per-netns RTNL is netdev unregistration.
It starts within a single netns (e.g., during a device lookup or
netns dismantle), but it can eventually involve multiple namespaces,
such as when upper ipvlan devices reside in different netns.
This prevents us from acquiring multiple rtnl_net_lock()s beforehand.
When we encounter such a cross-netns device, we must delegate the
unregistration to the work of the netns where the device actually
resides.
Let's add per-netns rtnl_work to support the deferred netdev
unregistration.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
include/linux/rtnetlink.h | 8 ++++++++
include/net/net_namespace.h | 1 +
net/core/net_namespace.c | 1 +
net/core/rtnetlink.c | 26 ++++++++++++++++++++++++++
4 files changed, 36 insertions(+)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index ea39dd23a197..95729339e7a5 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -115,6 +115,10 @@ bool rtnl_net_is_locked(struct net *net);
bool lockdep_rtnl_net_is_held(struct net *net);
+void rtnl_net_queue_work(struct net *net);
+void rtnl_net_flush_workqueue(void);
+void rtnl_net_work_func(struct work_struct *work);
+
#define rcu_dereference_rtnl_net(net, p) \
rcu_dereference_check(p, lockdep_rtnl_net_is_held(net))
#define rtnl_net_dereference(net, p) \
@@ -150,6 +154,10 @@ static inline void ASSERT_RTNL_NET(struct net *net)
ASSERT_RTNL();
}
+static inline void rtnl_net_flush_workqueue(void)
+{
+}
+
#define rcu_dereference_rtnl_net(net, p) \
rcu_dereference_rtnl(p)
#define rtnl_net_dereference(net, p) \
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 80de5e98a66d..a989019af5f7 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -197,6 +197,7 @@ struct net {
#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
/* Move to a better place when the config guard is removed. */
struct mutex rtnl_mutex;
+ struct work_struct rtnl_work;
#endif
#if IS_ENABLED(CONFIG_VSOCKETS)
struct netns_vsock vsock;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d9dafe24f57e..d1aeff9de580 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -422,6 +422,7 @@ static __net_init int preinit_net(struct net *net, struct user_namespace *user_n
#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
mutex_init(&net->rtnl_mutex);
lock_set_cmp_fn(&net->rtnl_mutex, rtnl_net_lock_cmp_fn, NULL);
+ INIT_WORK(&net->rtnl_work, rtnl_net_work_func);
#endif
INIT_LIST_HEAD(&net->ptype_all);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7207da002fb5..7959519e7375 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -273,6 +273,26 @@ bool lockdep_rtnl_net_is_held(struct net *net)
return lockdep_rtnl_is_held() && lockdep_is_held(&net->rtnl_mutex);
}
EXPORT_SYMBOL(lockdep_rtnl_net_is_held);
+
+static struct workqueue_struct *rtnl_net_wq;
+
+void rtnl_net_queue_work(struct net *net)
+{
+ queue_work(rtnl_net_wq, &net->rtnl_work);
+}
+
+void rtnl_net_flush_workqueue(void)
+{
+ flush_workqueue(rtnl_net_wq);
+}
+
+void rtnl_net_work_func(struct work_struct *work)
+{
+ struct net *net = container_of(work, struct net, rtnl_work);
+
+ rtnl_net_lock(net);
+ rtnl_net_unlock(net);
+}
#else
static int rtnl_net_cmp_locks(const struct net *net_a, const struct net *net_b)
{
@@ -7226,4 +7246,10 @@ void __init rtnetlink_init(void)
register_netdevice_notifier(&rtnetlink_dev_notifier);
rtnl_register_many(rtnetlink_rtnl_msg_handlers);
+
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ rtnl_net_wq = create_workqueue("rtnl_net");
+ if (!rtnl_net_wq)
+ panic("Could not create rtnl_net workq");
+#endif
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH v1 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock().
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260701214334.266991-1-kuniyu@google.com>
default_device_exit_net() could call dev_change_net_namespace()
to move devices from a dying netns to init_net.
Let's hold the two netns __rtnl_net_lock() around it.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/core/dev.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 4b3d5cfdf6e0..c477c4f84ed9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -13034,7 +13034,7 @@ static void __net_exit default_device_exit_net(struct net *net)
* Push all migratable network devices back to the
* initial network namespace
*/
- ASSERT_RTNL();
+
for_each_netdev_safe(net, dev, aux) {
int err;
char fb_name[IFNAMSIZ];
@@ -13077,11 +13077,19 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
LIST_HEAD(dev_kill_list);
rtnl_lock();
+
+ __rtnl_net_lock(&init_net);
+
list_for_each_entry(net, net_list, exit_list) {
+ __rtnl_net_lock(net);
default_device_exit_net(net);
+ __rtnl_net_unlock(net);
+
cond_resched();
}
+ __rtnl_net_unlock(&init_net);
+
list_for_each_entry(net, net_list, exit_list) {
for_each_netdev_reverse(net, dev) {
if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH v1 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any().
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260701214334.266991-1-kuniyu@google.com>
Currently, netdev_run_todo() processes pending devices from multiple
namespaces in a batch.
To expand the per-netns RTNL coverage for NETDEV_UNREGISTER, let's
acquire __rtnl_net_lock() in netdev_wait_allrefs_any().
Note that netdev_run_todo() itself will need to be namespacified
before RTNL is removed.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/core/dev.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index c477c4f84ed9..48818a194fa5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11608,8 +11608,13 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
rtnl_lock();
/* Rebroadcast unregister notification */
- list_for_each_entry(dev, list, todo_list)
+ list_for_each_entry(dev, list, todo_list) {
+ struct net *net = dev_net(dev);
+
+ __rtnl_net_lock(net);
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ __rtnl_net_unlock(net);
+ }
__rtnl_unlock();
rcu_barrier();
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH v1 net-next 06/14] net: Add per-netns netdev unregistration infra.
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260701214334.266991-1-kuniyu@google.com>
When we need to unregister a netdev in a different netns, we will
delegate its unregistration to per-netns work.
There are three types of such cross-netns devices:
1. Paired devices (e.g., netkit, veth, vxcan)
-> Unregistering one device also deletes its peer, which
may reside in another netns.
2. Tunnel devices (e.g., bareudp, geneve, etc)
-> Destroying a netns removes devices in another netns if
their backend sockets reside in the dying netns
3. Stacked devices (e.g., ipvlan, macvlan, etc)
-> Removing the lower device also removes multiple upper
devices, each of which may reside in different namespaces.
In these cases, we will use unregister_netdevice_queue_net() to
queue such potential cross-netns devices for destruction.
unregister_netdevice_queue_net() takes net and dev. If dev resides
in the net, it simply calls unregister_netdevice_queue().
If dev_net(dev) is different from the net, it enqueues the device
to dev_net(dev)->dev_unreg_head and schedules the per-netns work.
When __rtnl_net_unlock() is called from the per-netns work (or another
thread already holding the lock), unregister_netdevice_many_net()
collects the queued devices and calls unregister_netdevice_many()
to perform the actual unregistration.
During netns dismantle, rtnl_net_flush_workqueue() is called at the
end of default_device_exit_batch() to ensure that cross-netns
devices in the other alive netns are unregistered.
Once RTNL is removed, a device could be moved to another netns while
being queued to net->dev_unreg_head.
__dev_change_net_namespace() handles this race by acquiring
net->dev_unreg_lock of both the old and new netns after dev_set_net()
and moving the device between their dev_unreg_head lists.
Since dev_set_net() and unregister_netdevice_queue_net() are
synchronised by netdev_lock(), the device is either queued to the
old netns's dev_unreg_head and then moved, or queued directly to
the new netns.
Note that unregister_netdevice_move_net() does not need to call
rtnl_net_queue_work() because __dev_change_net_namespace() is
(supposed to be) called with rtnl_net_lock(). (Not all callers
hold it yet, but the race does not happen until all callers
are converted and RTNL is removed.)
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
include/linux/netdevice.h | 16 +++++++
include/net/net_namespace.h | 2 +
net/core/dev.c | 85 +++++++++++++++++++++++++++++++++++++
net/core/net_namespace.c | 2 +
net/core/rtnetlink.c | 4 ++
5 files changed, 109 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9981d637f8b5..53454db3611a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2241,6 +2241,9 @@ struct net_device {
struct list_head dev_list;
struct list_head napi_list;
struct list_head unreg_list;
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ struct list_head unreg_list_net;
+#endif
struct list_head close_list;
struct list_head ptype_all;
@@ -3472,6 +3475,19 @@ static inline void unregister_netdevice(struct net_device *dev)
unregister_netdevice_queue(dev, NULL);
}
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+void unregister_netdevice_queue_net(struct net *net, struct net_device *dev,
+ struct list_head *head);
+void unregister_netdevice_many_net(struct net *net);
+#else
+static inline void unregister_netdevice_queue_net(struct net *net,
+ struct net_device *dev,
+ struct list_head *head)
+{
+ unregister_netdevice_queue(dev, head);
+}
+#endif
+
int netdev_refcnt_read(const struct net_device *dev);
void free_netdev(struct net_device *dev);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index a989019af5f7..501af1999fe8 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -198,6 +198,8 @@ struct net {
/* Move to a better place when the config guard is removed. */
struct mutex rtnl_mutex;
struct work_struct rtnl_work;
+ struct list_head dev_unreg_head;
+ spinlock_t dev_unreg_lock;
#endif
#if IS_ENABLED(CONFIG_VSOCKETS)
struct netns_vsock vsock;
diff --git a/net/core/dev.c b/net/core/dev.c
index 48818a194fa5..0f0bf65f5bf9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12092,6 +12092,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->napi_list);
INIT_LIST_HEAD(&dev->unreg_list);
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ INIT_LIST_HEAD(&dev->unreg_list_net);
+#endif
INIT_LIST_HEAD(&dev->close_list);
INIT_LIST_HEAD(&dev->link_watch_list);
INIT_LIST_HEAD(&dev->adj_list.upper);
@@ -12485,6 +12488,16 @@ void unregister_netdevice_many_notify(struct list_head *head,
synchronize_net();
list_for_each_entry(dev, head, unreg_list) {
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ struct net *net = dev_net(dev);
+
+ /* spin_lock() can be moved outside of the loop
+ * once the per-netns RTNL conversion completes.
+ */
+ spin_lock(&net->dev_unreg_lock);
+ list_del(&dev->unreg_list_net);
+ spin_unlock(&net->dev_unreg_lock);
+#endif
netdev_put(dev, &dev->dev_registered_tracker);
net_set_todo(dev);
cnt++;
@@ -12507,6 +12520,72 @@ void unregister_netdevice_many(struct list_head *head)
}
EXPORT_SYMBOL(unregister_netdevice_many);
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+void unregister_netdevice_queue_net(struct net *net, struct net_device *dev,
+ struct list_head *head)
+{
+ netdev_lock(dev);
+
+ if (net_eq(dev_net(dev), net)) {
+ netdev_unlock(dev);
+ unregister_netdevice_queue(dev, head);
+ return;
+ }
+
+ net = dev_net(dev);
+
+ spin_lock(&net->dev_unreg_lock);
+
+ DEBUG_NET_WARN_ON_ONCE(!list_empty(&dev->unreg_list_net));
+ list_add_tail(&dev->unreg_list_net, &net->dev_unreg_head);
+ rtnl_net_queue_work(net);
+
+ spin_unlock(&net->dev_unreg_lock);
+
+ netdev_unlock(dev);
+}
+EXPORT_SYMBOL(unregister_netdevice_queue_net);
+
+static void unregister_netdevice_move_net(struct net *net_old,
+ struct net *net,
+ struct net_device *dev)
+{
+ if (net_old > net) {
+ spin_lock(&net->dev_unreg_lock);
+ spin_lock(&net_old->dev_unreg_lock);
+ } else {
+ spin_lock(&net_old->dev_unreg_lock);
+ spin_lock(&net->dev_unreg_lock);
+ }
+
+ if (!list_empty(&dev->unreg_list_net)) {
+ list_del(&dev->unreg_list_net);
+ list_add_tail(&dev->unreg_list_net, &net->dev_unreg_head);
+ }
+
+ spin_unlock(&net_old->dev_unreg_lock);
+ spin_unlock(&net->dev_unreg_lock);
+}
+
+void unregister_netdevice_many_net(struct net *net)
+{
+ struct net_device *dev, *tmp;
+ LIST_HEAD(unreg_head_net);
+ LIST_HEAD(unreg_head);
+
+ spin_lock(&net->dev_unreg_lock);
+ list_splice_init(&net->dev_unreg_head, &unreg_head_net);
+ spin_unlock(&net->dev_unreg_lock);
+
+ list_for_each_entry_safe(dev, tmp, &unreg_head_net, unreg_list_net) {
+ list_del_init(&dev->unreg_list_net);
+ list_add_tail(&dev->unreg_list, &unreg_head);
+ }
+
+ unregister_netdevice_many(&unreg_head);
+}
+#endif
+
/**
* unregister_netdev - remove device from the kernel
* @dev: device
@@ -12663,6 +12742,10 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
netdev_unlock(dev);
dev->ifindex = new_ifindex;
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ unregister_netdevice_move_net(net_old, net, dev);
+#endif
+
if (new_name[0]) {
/* Rename the netdev to prepared name */
write_seqlock_bh(&netdev_rename_lock);
@@ -13105,6 +13188,8 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
}
unregister_netdevice_many(&dev_kill_list);
rtnl_unlock();
+
+ rtnl_net_flush_workqueue();
}
static struct pernet_operations __net_initdata default_device_ops = {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d1aeff9de580..578b48cf5318 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -423,6 +423,8 @@ static __net_init int preinit_net(struct net *net, struct user_namespace *user_n
mutex_init(&net->rtnl_mutex);
lock_set_cmp_fn(&net->rtnl_mutex, rtnl_net_lock_cmp_fn, NULL);
INIT_WORK(&net->rtnl_work, rtnl_net_work_func);
+ INIT_LIST_HEAD(&net->dev_unreg_head);
+ spin_lock_init(&net->dev_unreg_lock);
#endif
INIT_LIST_HEAD(&net->ptype_all);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7959519e7375..544498d3c325 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -197,6 +197,7 @@ void __rtnl_net_unlock(struct net *net)
{
ASSERT_RTNL();
+ unregister_netdevice_many_net(net);
mutex_unlock(&net->rtnl_mutex);
}
EXPORT_SYMBOL(__rtnl_net_unlock);
@@ -290,6 +291,9 @@ void rtnl_net_work_func(struct work_struct *work)
{
struct net *net = container_of(work, struct net, rtnl_work);
+ if (list_empty(&net->dev_unreg_head))
+ return;
+
rtnl_net_lock(net);
rtnl_net_unlock(net);
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox