* [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write
@ 2026-01-12 18:16 I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 1/6] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-12 18:16 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev,
I Viswanath
This is an implementation of the idea provided by Jakub here
https://lore.kernel.org/netdev/20250923163727.5e97abdb@kernel.org/
ndo_set_rx_mode is problematic because it cannot sleep.
To address this, this series proposes dividing the concept of setting
rx_mode into 2 stages: snapshot and deferred I/O. To achieve this, we
change the semantics of set_rx_mode and add a new ndo write_rx_mode.
The new set_rx_mode will be responsible for customizing the rx_mode
snapshot which will be used by write_rx_mode to update the hardware
This refactor has the secondary benefit of stopping set rx_mode
requests from building up as only the most recent request (before the
work has gotten a chance to run) will be confirmed/executed.
In brief, the new flow will look something like:
set_rx_mode():
ndo_set_rx_mode();
prepare_rx_mode();
write_rx_mode():
use_snapshot();
ndo_write_rx_mode();
write_rx_mode() is called from a work item and doesn't hold the
netif_addr_lock spin lock during ndo_write_rx_mode() making it sleepable
in that section.
This model should work correctly if the following conditions hold:
1. write_rx_mode should use the rx_mode set by the most recent
call to prepare_rx_mode() before its execution.
2. If a make_snapshot_ready call happens during execution of write_rx_mode,
write_rx_mode() should be rescheduled.
3. All calls to modify rx_mode should pass through the prepare_rx_mode +
schedule write_rx_mode() execution flow. netif_schedule_rx_mode_work()
has been implemented in core for this purpose.
1 and 2 are implemented in core
Drivers need to ensure 3 using netif_schedule_rx_mode_work()
To use this model, a driver needs to implement the
ndo_write_rx_mode callback, change the set_rx_mode callback
appropriately and replace all calls to modify rx mode with
netif_schedule_rx_mode_work()
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
v1:
Link: https://lore.kernel.org/netdev/20251020134857.5820-1-viswanathiyyappan@gmail.com/
v2:
- Exported set_and_schedule_rx_config as a symbol for use in modules
- Fixed incorrect cleanup for the case of rx_work alloc failing in alloc_netdev_mqs
- Removed the locked version (cp_set_rx_mode) and renamed __cp_set_rx_mode to cp_set_rx_mode
Link: https://lore.kernel.org/netdev/20251026175445.1519537-1-viswanathiyyappan@gmail.com/
v3:
- Added RFT tag
- Corrected mangled patch
Link: https://lore.kernel.org/netdev/20251028174222.1739954-1-viswanathiyyappan@gmail.com/
v4:
- Completely reworked the snapshot mechanism as per v3 comments
- Implemented the callback for virtio-net instead of 8139cp driver
- Removed RFC tag
Link: https://lore.kernel.org/netdev/20251118164333.24842-1-viswanathiyyappan@gmail.com/
v5:
- Fix broken code and titles
- Remove RFT tag
Link: https://lore.kernel.org/netdev/20251120141354.355059-1-viswanathiyyappan@gmail.com/
v6:
- Added struct netif_deferred_work_cleanup and members needs_deferred_cleanup and deferred_work_cleanup in net_device
- Moved out ctrl bits from netif_rx_mode_config to netif_rx_mode_work_ctx
Link: https://lore.kernel.org/netdev/20251227174225.699975-1-viswanathiyyappan@gmail.com/
v7:
- Improved function, enum and struct names
Link: https://lore.kernel.org/netdev/20260102180530.1559514-1-viswanathiyyappan@gmail.com/
v8:
- Implemented the callback for drivers e1000, 8139cp, vmxnet3 and pcnet32
- Moved the rx_mode config set calls (for prom and allmulti) in prepare_rx_mode to the ndo_set_rx_mode callback for consistency
- Improved commit messages
I Viswanath (6):
net: refactor set_rx_mode into snapshot and deferred I/O
virtio-net: Implement ndo_write_rx_mode callback
e1000: Implement ndo_write_rx_mode callback
8139cp: Implement ndo_write_rx_mode callback
vmxnet3: Implement ndo_write_rx_mode callback
pcnet32: Implement ndo_write_rx_mode callback
drivers/net/ethernet/amd/pcnet32.c | 57 +++-
drivers/net/ethernet/intel/e1000/e1000_main.c | 59 ++--
drivers/net/ethernet/realtek/8139cp.c | 33 ++-
drivers/net/virtio_net.c | 61 ++--
drivers/net/vmxnet3/vmxnet3_drv.c | 38 ++-
include/linux/netdevice.h | 112 +++++++-
net/core/dev.c | 265 +++++++++++++++++-
7 files changed, 522 insertions(+), 103 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v8 1/6] net: refactor set_rx_mode into snapshot and deferred I/O
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
@ 2026-01-12 18:16 ` I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 2/6] virtio-net: Implement ndo_write_rx_mode callback I Viswanath
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-12 18:16 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev,
I Viswanath
Refactor set_rx_mode into two stages: a snapshot stage and the
actual I/O. When __dev_set_rx_mode() is called, the core takes a
snapshot of the current rx_mode config and commits it to
hardware later via a work item.
In this model, ndo_set_rx_mode() is responsible for customizing the
rx mode snapshot and deciding whether the work should happen or not,
while ndo_write_rx_mode() applies the snapshot to hardware.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
include/linux/netdevice.h | 112 +++++++++++++++-
net/core/dev.c | 265 +++++++++++++++++++++++++++++++++++++-
2 files changed, 370 insertions(+), 7 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d99b0fbc1942..5f9268ac7b75 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1062,6 +1062,45 @@ struct netdev_net_notifier {
struct notifier_block *nb;
};
+struct netif_cleanup_work {
+ struct work_struct work;
+ struct net_device *dev;
+};
+
+enum netif_rx_mode_cfg {
+ NETIF_RX_MODE_CFG_ALLMULTI,
+ NETIF_RX_MODE_CFG_PROMISC,
+ NETIF_RX_MODE_CFG_VLAN,
+ NETIF_RX_MODE_CFG_BROADCAST
+};
+
+enum netif_rx_mode_flags {
+ NETIF_RX_MODE_READY,
+
+ /* if set, rx_mode set work will be skipped */
+ NETIF_RX_MODE_SET_SKIP,
+
+ /* if set, uc/mc lists will not be part of rx_mode config */
+ NETIF_RX_MODE_UC_SKIP,
+ NETIF_RX_MODE_MC_SKIP
+};
+
+struct netif_rx_mode_config {
+ char *uc_addrs;
+ char *mc_addrs;
+ int uc_count;
+ int mc_count;
+ int cfg;
+};
+
+struct netif_rx_mode_ctx {
+ struct netif_rx_mode_config *pending;
+ struct netif_rx_mode_config *ready;
+ struct work_struct work;
+ struct net_device *dev;
+ int flags;
+};
+
/*
* This structure defines the management hooks for network devices.
* The following hooks can be defined; unless noted otherwise, they are
@@ -1114,9 +1153,14 @@ struct netdev_net_notifier {
* changes to configuration when multicast or promiscuous is enabled.
*
* void (*ndo_set_rx_mode)(struct net_device *dev);
- * This function is called device changes address list filtering.
+ * This function is called when device changes address list filtering.
* If driver handles unicast address filtering, it should set
- * IFF_UNICAST_FLT in its priv_flags.
+ * IFF_UNICAST_FLT in its priv_flags. This is used to configure
+ * the rx_mode snapshot that will be written to the hardware.
+ *
+ * void (*ndo_write_rx_mode)(struct net_device *dev);
+ * This function is scheduled after set_rx_mode and is responsible for
+ * writing the rx_mode snapshot to the hardware.
*
* int (*ndo_set_mac_address)(struct net_device *dev, void *addr);
* This function is called when the Media Access Control address
@@ -1437,6 +1481,7 @@ struct net_device_ops {
void (*ndo_change_rx_flags)(struct net_device *dev,
int flags);
void (*ndo_set_rx_mode)(struct net_device *dev);
+ void (*ndo_write_rx_mode)(struct net_device *dev);
int (*ndo_set_mac_address)(struct net_device *dev,
void *addr);
int (*ndo_validate_addr)(struct net_device *dev);
@@ -1939,7 +1984,7 @@ enum netdev_reg_state {
* @ingress_queue: XXX: need comments on this one
* @nf_hooks_ingress: netfilter hooks executed for ingress packets
* @broadcast: hw bcast address
- *
+ * @rx_mode_ctx: rx_mode work context
* @rx_cpu_rmap: CPU reverse-mapping for RX completion interrupts,
* indexed by RX queue number. Assigned by driver.
* This must only be set if the ndo_rx_flow_steer
@@ -1971,6 +2016,8 @@ enum netdev_reg_state {
* @link_watch_list: XXX: need comments on this one
*
* @reg_state: Register/unregister state machine
+ * @needs_cleanup_work: Should dev_close schedule the cleanup work?
+ * @cleanup_work: Cleanup work context
* @dismantle: Device is going to be freed
* @needs_free_netdev: Should unregister perform free_netdev?
* @priv_destructor: Called from unregister
@@ -2350,6 +2397,7 @@ struct net_device {
#endif
unsigned char broadcast[MAX_ADDR_LEN];
+ struct netif_rx_mode_ctx *rx_mode_ctx;
#ifdef CONFIG_RFS_ACCEL
struct cpu_rmap *rx_cpu_rmap;
#endif
@@ -2387,6 +2435,10 @@ struct net_device {
u8 reg_state;
+ bool needs_cleanup_work;
+
+ struct netif_cleanup_work *cleanup_work;
+
bool dismantle;
/** @moving_ns: device is changing netns, protected by @lock */
@@ -3373,6 +3425,60 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb);
u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
struct net_device *sb_dev);
+/* Helpers to be used in the set_rx_mode implementation */
+static inline void netif_rx_mode_set_cfg(struct net_device *dev, int b,
+ bool val)
+{
+ if (val)
+ dev->rx_mode_ctx->pending->cfg |= BIT(b);
+ else
+ dev->rx_mode_ctx->pending->cfg &= ~BIT(b);
+}
+
+static inline void netif_rx_mode_set_flag(struct net_device *dev, int b,
+ bool val)
+{
+ if (val)
+ dev->rx_mode_ctx->flags |= BIT(b);
+ else
+ dev->rx_mode_ctx->flags &= ~BIT(b);
+}
+
+/* Helpers to be used in the write_rx_mode implementation */
+static inline int netif_rx_mode_get_cfg(struct net_device *dev, int b)
+{
+ return !!(dev->rx_mode_ctx->ready->cfg & BIT(b));
+}
+
+static inline int netif_rx_mode_get_flag(struct net_device *dev, int b)
+{
+ return !!(dev->rx_mode_ctx->flags & BIT(b));
+}
+
+static inline int netif_rx_mode_mc_count(struct net_device *dev)
+{
+ return dev->rx_mode_ctx->ready->mc_count;
+}
+
+static inline int netif_rx_mode_uc_count(struct net_device *dev)
+{
+ return dev->rx_mode_ctx->ready->uc_count;
+}
+
+void netif_schedule_rx_mode_work(struct net_device *dev);
+
+void netif_flush_rx_mode_work(struct net_device *dev);
+
+#define netif_rx_mode_for_each_uc_addr(ha_addr, dev, __i) \
+ for (__i = 0, ha_addr = (dev)->rx_mode_ctx->ready->uc_addrs; \
+ __i < (dev)->rx_mode_ctx->ready->uc_count; \
+ __i++, ha_addr += (dev)->addr_len)
+
+#define netif_rx_mode_for_each_mc_addr(ha_addr, dev, __i) \
+ for (__i = 0, ha_addr = (dev)->rx_mode_ctx->ready->mc_addrs; \
+ __i < (dev)->rx_mode_ctx->ready->mc_count; \
+ __i++, ha_addr += (dev)->addr_len)
+
int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
diff --git a/net/core/dev.c b/net/core/dev.c
index c711da335510..072da874a958 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1586,6 +1586,197 @@ void netif_state_change(struct net_device *dev)
}
}
+/* This function attempts to copy the current state of the
+ * net device into pending (reallocating if necessary). If it fails,
+ * pending is guaranteed to be unmodified.
+ */
+static int __netif_prepare_rx_mode(struct net_device *dev)
+{
+ struct netif_rx_mode_config *pending = dev->rx_mode_ctx->pending;
+ bool skip_uc = false, skip_mc = false;
+ int uc_count = 0, mc_count = 0;
+ struct netdev_hw_addr *ha;
+ char *tmp;
+ int i;
+
+ skip_uc = netif_rx_mode_get_flag(dev, NETIF_RX_MODE_UC_SKIP);
+ skip_mc = netif_rx_mode_get_flag(dev, NETIF_RX_MODE_MC_SKIP);
+
+ /* The allocations need to be atomic since this will be called under
+ * netif_addr_lock_bh()
+ */
+ if (!skip_uc) {
+ uc_count = netdev_uc_count(dev);
+ tmp = krealloc(pending->uc_addrs, uc_count * dev->addr_len,
+ GFP_ATOMIC);
+ if (!tmp)
+ return -ENOMEM;
+ pending->uc_addrs = tmp;
+ }
+
+ if (!skip_mc) {
+ mc_count = netdev_mc_count(dev);
+ tmp = krealloc(pending->mc_addrs, mc_count * dev->addr_len,
+ GFP_ATOMIC);
+ if (!tmp)
+ return -ENOMEM;
+ pending->mc_addrs = tmp;
+ }
+
+ /* This function cannot fail after this point */
+ i = 0;
+ if (!skip_uc) {
+ pending->uc_count = uc_count;
+ netdev_for_each_uc_addr(ha, dev)
+ memcpy(pending->uc_addrs + (i++) * dev->addr_len,
+ ha->addr, dev->addr_len);
+ }
+
+ i = 0;
+ if (!skip_mc) {
+ pending->mc_count = mc_count;
+ netdev_for_each_mc_addr(ha, dev)
+ memcpy(pending->mc_addrs + (i++) * dev->addr_len,
+ ha->addr, dev->addr_len);
+ }
+ return 0;
+}
+
+static void netif_prepare_rx_mode(struct net_device *dev)
+{
+ int rc;
+
+ lockdep_assert_held(&dev->addr_list_lock);
+
+ rc = __netif_prepare_rx_mode(dev);
+ if (rc)
+ return;
+
+ netif_rx_mode_set_flag(dev, NETIF_RX_MODE_READY, true);
+}
+
+static void netif_write_rx_mode(struct work_struct *param)
+{
+ struct netif_rx_mode_ctx *ctx;
+ struct net_device *dev;
+
+ rtnl_lock();
+ ctx = container_of(param, struct netif_rx_mode_ctx, work);
+ dev = ctx->dev;
+
+ if (!netif_running(dev)) {
+ rtnl_unlock();
+ return;
+ }
+
+ /* Paranoia. */
+ if (WARN_ON(!dev->netdev_ops->ndo_write_rx_mode)) {
+ rtnl_unlock();
+ return;
+ }
+
+ /* We could introduce a new lock for this but reusing the addr
+ * lock works well enough
+ */
+ netif_addr_lock_bh(dev);
+
+ /* There's no point continuing if the pending config is not ready */
+ if (!netif_rx_mode_get_flag(dev, NETIF_RX_MODE_READY)) {
+ netif_addr_unlock_bh(dev);
+ rtnl_unlock();
+ return;
+ }
+
+ swap(ctx->ready, ctx->pending);
+ netif_rx_mode_set_flag(dev, NETIF_RX_MODE_READY, false);
+ netif_addr_unlock_bh(dev);
+
+ dev->netdev_ops->ndo_write_rx_mode(dev);
+ rtnl_unlock();
+}
+
+static int netif_alloc_rx_mode_ctx(struct net_device *dev)
+{
+ dev->rx_mode_ctx = kzalloc(sizeof(*dev->rx_mode_ctx), GFP_KERNEL);
+ if (!dev->rx_mode_ctx)
+ goto fail_all;
+
+ dev->rx_mode_ctx->ready = kzalloc(sizeof(*dev->rx_mode_ctx->ready),
+ GFP_KERNEL);
+ if (!dev->rx_mode_ctx->ready)
+ goto fail_ready;
+
+ dev->rx_mode_ctx->pending = kzalloc(sizeof(*dev->rx_mode_ctx->pending),
+ GFP_KERNEL);
+ if (!dev->rx_mode_ctx->pending)
+ goto fail_pending;
+
+ dev->rx_mode_ctx->dev = dev;
+ INIT_WORK(&dev->rx_mode_ctx->work, netif_write_rx_mode);
+ return 0;
+
+fail_pending:
+ kfree(dev->rx_mode_ctx->ready);
+
+fail_ready:
+ kfree(dev->rx_mode_ctx);
+
+fail_all:
+ return -ENOMEM;
+}
+
+static void netif_free_rx_mode_ctx(struct net_device *dev)
+{
+ if (!dev->rx_mode_ctx)
+ return;
+
+ cancel_work_sync(&dev->rx_mode_ctx->work);
+
+ kfree(dev->rx_mode_ctx->ready->uc_addrs);
+ kfree(dev->rx_mode_ctx->ready->mc_addrs);
+ kfree(dev->rx_mode_ctx->ready);
+
+ kfree(dev->rx_mode_ctx->pending->uc_addrs);
+ kfree(dev->rx_mode_ctx->pending->mc_addrs);
+ kfree(dev->rx_mode_ctx->pending);
+
+ kfree(dev->rx_mode_ctx);
+ dev->rx_mode_ctx = NULL;
+}
+
+static void netif_cleanup_work_fn(struct work_struct *param)
+{
+ struct netif_cleanup_work *ctx;
+ struct net_device *dev;
+
+ ctx = container_of(param, struct netif_cleanup_work, work);
+ dev = ctx->dev;
+
+ if (dev->netdev_ops->ndo_write_rx_mode)
+ netif_free_rx_mode_ctx(dev);
+}
+
+static int netif_alloc_cleanup_work(struct net_device *dev)
+{
+ dev->cleanup_work = kzalloc(sizeof(*dev->cleanup_work), GFP_KERNEL);
+ if (!dev->cleanup_work)
+ return -ENOMEM;
+
+ dev->cleanup_work->dev = dev;
+ INIT_WORK(&dev->cleanup_work->work, netif_cleanup_work_fn);
+ return 0;
+}
+
+static void netif_free_cleanup_work(struct net_device *dev)
+{
+ if (!dev->cleanup_work)
+ return;
+
+ flush_work(&dev->cleanup_work->work);
+ kfree(dev->cleanup_work);
+ dev->cleanup_work = NULL;
+}
+
/**
* __netdev_notify_peers - notify network peers about existence of @dev,
* to be called when rtnl lock is already held.
@@ -1678,6 +1869,16 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
if (ops->ndo_validate_addr)
ret = ops->ndo_validate_addr(dev);
+ if (!ret && dev->needs_cleanup_work) {
+ if (!dev->cleanup_work)
+ ret = netif_alloc_cleanup_work(dev);
+ else
+ flush_work(&dev->cleanup_work->work);
+ }
+
+ if (!ret && ops->ndo_write_rx_mode)
+ ret = netif_alloc_rx_mode_ctx(dev);
+
if (!ret && ops->ndo_open)
ret = ops->ndo_open(dev);
@@ -1754,6 +1955,9 @@ static void __dev_close_many(struct list_head *head)
if (ops->ndo_stop)
ops->ndo_stop(dev);
+ if (dev->needs_cleanup_work)
+ schedule_work(&dev->cleanup_work->work);
+
netif_set_up(dev, false);
netpoll_poll_enable(dev);
}
@@ -9622,6 +9826,57 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
return 0;
}
+/* netif_schedule_rx_mode_work - Sets up the rx_config snapshot and
+ * schedules the deferred I/O.
+ */
+static void __netif_schedule_rx_mode_work(struct net_device *dev)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+
+ if (ops->ndo_set_rx_mode)
+ ops->ndo_set_rx_mode(dev);
+
+ if (!ops->ndo_write_rx_mode)
+ return;
+
+ /* This part is only for drivers that implement ndo_write_rx_mode */
+
+ /* If rx_mode set is to be skipped, we don't schedule the work */
+ if (netif_rx_mode_get_flag(dev, NETIF_RX_MODE_SET_SKIP))
+ return;
+
+ netif_prepare_rx_mode(dev);
+ schedule_work(&dev->rx_mode_ctx->work);
+}
+
+void netif_schedule_rx_mode_work(struct net_device *dev)
+{
+ if (WARN_ON(!netif_running(dev)))
+ return;
+
+ netif_addr_lock_bh(dev);
+ __netif_schedule_rx_mode_work(dev);
+ netif_addr_unlock_bh(dev);
+}
+EXPORT_SYMBOL(netif_schedule_rx_mode_work);
+
+/* Drivers that implement rx mode as work flush the work item when closing
+ * or suspending. This is the substitute for those calls.
+ */
+void netif_flush_rx_mode_work(struct net_device *dev)
+{
+ /* Calling this function with RTNL held will result in a deadlock. */
+ if (WARN_ON(rtnl_is_locked()))
+ return;
+
+ /* Doing nothing is enough to "flush" work on a closed interface */
+ if (!netif_running(dev))
+ return;
+
+ flush_work(&dev->rx_mode_ctx->work);
+}
+EXPORT_SYMBOL(netif_flush_rx_mode_work);
+
/*
* Upload unicast and multicast address lists to device and
* configure RX filtering. When the device doesn't support unicast
@@ -9630,8 +9885,6 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
*/
void __dev_set_rx_mode(struct net_device *dev)
{
- const struct net_device_ops *ops = dev->netdev_ops;
-
/* dev_open will call this function so the list will stay sane. */
if (!(dev->flags&IFF_UP))
return;
@@ -9652,8 +9905,7 @@ void __dev_set_rx_mode(struct net_device *dev)
}
}
- if (ops->ndo_set_rx_mode)
- ops->ndo_set_rx_mode(dev);
+ __netif_schedule_rx_mode_work(dev);
}
void dev_set_rx_mode(struct net_device *dev)
@@ -11324,6 +11576,9 @@ int register_netdevice(struct net_device *dev)
}
}
+ if (dev->netdev_ops->ndo_write_rx_mode)
+ dev->needs_cleanup_work = true;
+
if (((dev->hw_features | dev->features) &
NETIF_F_HW_VLAN_CTAG_FILTER) &&
(!dev->netdev_ops->ndo_vlan_rx_add_vid ||
@@ -12067,6 +12322,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
dev->real_num_rx_queues = rxqs;
if (netif_alloc_rx_queues(dev))
goto free_all;
+
dev->ethtool = kzalloc(sizeof(*dev->ethtool), GFP_KERNEL_ACCOUNT);
if (!dev->ethtool)
goto free_all;
@@ -12150,6 +12406,7 @@ void free_netdev(struct net_device *dev)
kfree(dev->ethtool);
netif_free_tx_queues(dev);
netif_free_rx_queues(dev);
+ netif_free_cleanup_work(dev);
kfree(rcu_dereference_protected(dev->ingress_queue, 1));
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v8 2/6] virtio-net: Implement ndo_write_rx_mode callback
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 1/6] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
@ 2026-01-12 18:16 ` I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 3/6] e1000: " I Viswanath
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-12 18:16 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev,
I Viswanath
Add callback and update the code to use the rx_mode snapshot and
deferred write model
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
drivers/net/virtio_net.c | 61 +++++++++++++++++-----------------------
1 file changed, 26 insertions(+), 35 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 22d894101c01..1d0e5f6ceb88 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -460,9 +460,6 @@ struct virtnet_info {
/* Work struct for config space updates */
struct work_struct config_work;
- /* Work struct for setting rx mode */
- struct work_struct rx_mode_work;
-
/* OK to queue work setting RX mode? */
bool rx_mode_work_enabled;
@@ -3866,33 +3863,30 @@ static int virtnet_close(struct net_device *dev)
return 0;
}
-static void virtnet_rx_mode_work(struct work_struct *work)
+static void virtnet_write_rx_mode(struct net_device *dev)
{
- struct virtnet_info *vi =
- container_of(work, struct virtnet_info, rx_mode_work);
+ struct virtnet_info *vi = netdev_priv(dev);
u8 *promisc_allmulti __free(kfree) = NULL;
- struct net_device *dev = vi->dev;
struct scatterlist sg[2];
struct virtio_net_ctrl_mac *mac_data;
- struct netdev_hw_addr *ha;
+ char *ha_addr;
int uc_count;
int mc_count;
void *buf;
- int i;
+ int i, ni;
/* We can't dynamically set ndo_set_rx_mode, so return gracefully */
if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
return;
- promisc_allmulti = kzalloc(sizeof(*promisc_allmulti), GFP_KERNEL);
+ promisc_allmulti = kzalloc(sizeof(*promisc_allmulti), GFP_ATOMIC);
if (!promisc_allmulti) {
dev_warn(&dev->dev, "Failed to set RX mode, no memory.\n");
return;
}
- rtnl_lock();
-
- *promisc_allmulti = !!(dev->flags & IFF_PROMISC);
+ *promisc_allmulti = netif_rx_mode_get_cfg(dev,
+ NETIF_RX_MODE_CFG_PROMISC);
sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
@@ -3900,7 +3894,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
*promisc_allmulti ? "en" : "dis");
- *promisc_allmulti = !!(dev->flags & IFF_ALLMULTI);
+ *promisc_allmulti = netif_rx_mode_get_cfg(dev,
+ NETIF_RX_MODE_CFG_ALLMULTI);
sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
@@ -3908,27 +3903,22 @@ static void virtnet_rx_mode_work(struct work_struct *work)
dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
*promisc_allmulti ? "en" : "dis");
- netif_addr_lock_bh(dev);
-
- uc_count = netdev_uc_count(dev);
- mc_count = netdev_mc_count(dev);
+ uc_count = netif_rx_mode_uc_count(dev);
+ mc_count = netif_rx_mode_mc_count(dev);
/* MAC filter - use one buffer for both lists */
buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
(2 * sizeof(mac_data->entries)), GFP_ATOMIC);
mac_data = buf;
- if (!buf) {
- netif_addr_unlock_bh(dev);
- rtnl_unlock();
+ if (!buf)
return;
- }
sg_init_table(sg, 2);
/* Store the unicast list and count in the front of the buffer */
mac_data->entries = cpu_to_virtio32(vi->vdev, uc_count);
i = 0;
- netdev_for_each_uc_addr(ha, dev)
- memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
+ netif_rx_mode_for_each_uc_addr(ha_addr, dev, ni)
+ memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
sg_set_buf(&sg[0], mac_data,
sizeof(mac_data->entries) + (uc_count * ETH_ALEN));
@@ -3938,10 +3928,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
mac_data->entries = cpu_to_virtio32(vi->vdev, mc_count);
i = 0;
- netdev_for_each_mc_addr(ha, dev)
- memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
-
- netif_addr_unlock_bh(dev);
+ netif_rx_mode_for_each_mc_addr(ha_addr, dev, ni)
+ memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
sg_set_buf(&sg[1], mac_data,
sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
@@ -3950,17 +3938,20 @@ static void virtnet_rx_mode_work(struct work_struct *work)
VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
dev_warn(&dev->dev, "Failed to set MAC filter table.\n");
- rtnl_unlock();
-
kfree(buf);
}
static void virtnet_set_rx_mode(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ char cfg_disabled = !vi->rx_mode_work_enabled;
+ bool allmulti = !!(dev->flags & IFF_ALLMULTI);
+ bool promisc = !!(dev->flags & IFF_PROMISC);
+
+ netif_rx_mode_set_flag(dev, NETIF_RX_MODE_SET_SKIP, cfg_disabled);
- if (vi->rx_mode_work_enabled)
- schedule_work(&vi->rx_mode_work);
+ netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_ALLMULTI, allmulti);
+ netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_PROMISC, promisc);
}
static int virtnet_vlan_rx_add_vid(struct net_device *dev,
@@ -5776,7 +5767,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
/* Make sure no work handler is accessing the device */
flush_work(&vi->config_work);
disable_rx_mode_work(vi);
- flush_work(&vi->rx_mode_work);
+ netif_flush_rx_mode_work(vi->dev);
if (netif_running(vi->dev)) {
rtnl_lock();
@@ -6279,6 +6270,7 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = virtnet_set_mac_address,
.ndo_set_rx_mode = virtnet_set_rx_mode,
+ .ndo_write_rx_mode = virtnet_write_rx_mode,
.ndo_get_stats64 = virtnet_stats,
.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
@@ -6900,7 +6892,6 @@ static int virtnet_probe(struct virtio_device *vdev)
vdev->priv = vi;
INIT_WORK(&vi->config_work, virtnet_config_changed_work);
- INIT_WORK(&vi->rx_mode_work, virtnet_rx_mode_work);
spin_lock_init(&vi->refill_lock);
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
@@ -7205,7 +7196,7 @@ static void virtnet_remove(struct virtio_device *vdev)
/* Make sure no work handler is accessing the device. */
flush_work(&vi->config_work);
disable_rx_mode_work(vi);
- flush_work(&vi->rx_mode_work);
+ netif_flush_rx_mode_work(vi->dev);
virtnet_free_irq_moder(vi);
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v8 3/6] e1000: Implement ndo_write_rx_mode callback
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 1/6] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 2/6] virtio-net: Implement ndo_write_rx_mode callback I Viswanath
@ 2026-01-12 18:16 ` I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 4/6] 8139cp: " I Viswanath
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-12 18:16 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev,
I Viswanath
Add callback and update the code to use the rx_mode snapshot and
deferred write model
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
The suspend callback was calling the set_rx_mode ndo even when the netif was down.
Since that wouldn't make sense in the new model, Now, It does that only if netif
is not down. Correct me if this is a mistake
drivers/net/ethernet/intel/e1000/e1000_main.c | 59 ++++++++++++-------
1 file changed, 38 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 7f078ec9c14c..3b0260d502d4 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -99,6 +99,7 @@ static void e1000_clean_tx_ring(struct e1000_adapter *adapter,
static void e1000_clean_rx_ring(struct e1000_adapter *adapter,
struct e1000_rx_ring *rx_ring);
static void e1000_set_rx_mode(struct net_device *netdev);
+static void e1000_write_rx_mode(struct net_device *netdev);
static void e1000_update_phy_info_task(struct work_struct *work);
static void e1000_watchdog(struct work_struct *work);
static void e1000_82547_tx_fifo_stall_task(struct work_struct *work);
@@ -359,7 +360,7 @@ static void e1000_configure(struct e1000_adapter *adapter)
struct net_device *netdev = adapter->netdev;
int i;
- e1000_set_rx_mode(netdev);
+ netif_schedule_rx_mode_work(netdev);
e1000_restore_vlan(adapter);
e1000_init_manageability(adapter);
@@ -823,6 +824,7 @@ static const struct net_device_ops e1000_netdev_ops = {
.ndo_stop = e1000_close,
.ndo_start_xmit = e1000_xmit_frame,
.ndo_set_rx_mode = e1000_set_rx_mode,
+ .ndo_write_rx_mode = e1000_write_rx_mode,
.ndo_set_mac_address = e1000_set_mac,
.ndo_tx_timeout = e1000_tx_timeout,
.ndo_change_mtu = e1000_change_mtu,
@@ -1827,7 +1829,7 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
/* This is useful for sniffing bad packets. */
if (adapter->netdev->features & NETIF_F_RXALL) {
/* UPE and MPE will be handled by normal PROMISC logic
- * in e1000e_set_rx_mode
+ * in e1000_write_rx_mode
*/
rctl |= (E1000_RCTL_SBP | /* Receive bad packets */
E1000_RCTL_BAM | /* RX All Bcast Pkts */
@@ -2222,26 +2224,39 @@ static int e1000_set_mac(struct net_device *netdev, void *p)
return 0;
}
+static void e1000_set_rx_mode(struct net_device *netdev)
+{
+ struct e1000_adapter *adapter = netdev_priv(netdev);
+
+ bool allmulti = !!(netdev->flags & IFF_ALLMULTI);
+ bool promisc = !!(netdev->flags & IFF_PROMISC);
+ bool vlan = e1000_vlan_used(adapter);
+
+ netif_rx_mode_set_flag(netdev, NETIF_RX_MODE_UC_SKIP, promisc);
+
+ netif_rx_mode_set_cfg(netdev, NETIF_RX_MODE_CFG_ALLMULTI, allmulti);
+ netif_rx_mode_set_cfg(netdev, NETIF_RX_MODE_CFG_PROMISC, promisc);
+ netif_rx_mode_set_cfg(netdev, NETIF_RX_MODE_CFG_VLAN, vlan);
+}
+
/**
- * e1000_set_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set
+ * e1000_write_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set
* @netdev: network interface device structure
*
- * The set_rx_mode entry point is called whenever the unicast or multicast
- * address lists or the network interface flags are updated. This routine is
- * responsible for configuring the hardware for proper unicast, multicast,
- * promiscuous mode, and all-multi behavior.
+ * This routine is responsible for configuring the hardware for proper unicast,
+ * multicast, promiscuous mode, and all-multi behavior.
**/
-static void e1000_set_rx_mode(struct net_device *netdev)
+static void e1000_write_rx_mode(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
- struct netdev_hw_addr *ha;
bool use_uc = false;
u32 rctl;
u32 hash_value;
- int i, rar_entries = E1000_RAR_ENTRIES;
+ int i, rar_entries = E1000_RAR_ENTRIES, ni;
int mta_reg_count = E1000_NUM_MTA_REGISTERS;
u32 *mcarray = kcalloc(mta_reg_count, sizeof(u32), GFP_ATOMIC);
+ char *ha_addr;
if (!mcarray)
return;
@@ -2250,22 +2265,22 @@ static void e1000_set_rx_mode(struct net_device *netdev)
rctl = er32(RCTL);
- if (netdev->flags & IFF_PROMISC) {
+ if (netif_rx_mode_get_cfg(netdev, NETIF_RX_MODE_CFG_PROMISC)) {
rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
rctl &= ~E1000_RCTL_VFE;
} else {
- if (netdev->flags & IFF_ALLMULTI)
+ if (netif_rx_mode_get_cfg(netdev, NETIF_RX_MODE_CFG_ALLMULTI))
rctl |= E1000_RCTL_MPE;
else
rctl &= ~E1000_RCTL_MPE;
/* Enable VLAN filter if there is a VLAN */
- if (e1000_vlan_used(adapter))
+ if (netif_rx_mode_get_cfg(netdev, NETIF_RX_MODE_CFG_VLAN))
rctl |= E1000_RCTL_VFE;
}
- if (netdev_uc_count(netdev) > rar_entries - 1) {
+ if (netif_rx_mode_uc_count(netdev) > rar_entries - 1) {
rctl |= E1000_RCTL_UPE;
- } else if (!(netdev->flags & IFF_PROMISC)) {
+ } else if (!netif_rx_mode_get_cfg(netdev, NETIF_RX_MODE_CFG_PROMISC)) {
rctl &= ~E1000_RCTL_UPE;
use_uc = true;
}
@@ -2286,23 +2301,23 @@ static void e1000_set_rx_mode(struct net_device *netdev)
*/
i = 1;
if (use_uc)
- netdev_for_each_uc_addr(ha, netdev) {
+ netif_rx_mode_for_each_uc_addr(ha_addr, netdev, ni) {
if (i == rar_entries)
break;
- e1000_rar_set(hw, ha->addr, i++);
+ e1000_rar_set(hw, ha_addr, i++);
}
- netdev_for_each_mc_addr(ha, netdev) {
+ netif_rx_mode_for_each_mc_addr(ha_addr, netdev, ni) {
if (i == rar_entries) {
/* load any remaining addresses into the hash table */
u32 hash_reg, hash_bit, mta;
- hash_value = e1000_hash_mc_addr(hw, ha->addr);
+ hash_value = e1000_hash_mc_addr(hw, ha_addr);
hash_reg = (hash_value >> 5) & 0x7F;
hash_bit = hash_value & 0x1F;
mta = (1 << hash_bit);
mcarray[hash_reg] |= mta;
} else {
- e1000_rar_set(hw, ha->addr, i++);
+ e1000_rar_set(hw, ha_addr, i++);
}
}
@@ -5094,7 +5109,9 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool *enable_wake)
if (wufc) {
e1000_setup_rctl(adapter);
- e1000_set_rx_mode(netdev);
+
+ if (netif_running(netdev))
+ netif_schedule_rx_mode_work(netdev);
rctl = er32(RCTL);
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v8 4/6] 8139cp: Implement ndo_write_rx_mode callback
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
` (2 preceding siblings ...)
2026-01-12 18:16 ` [PATCH net-next v8 3/6] e1000: " I Viswanath
@ 2026-01-12 18:16 ` I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 5/6] vmxnet3: " I Viswanath
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-12 18:16 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev,
I Viswanath
Add callback and update the code to use the rx_mode snapshot and
deferred write model
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
In the old cp_set_rx_mode, cp->lock was protecting access to registers
at addresses RxConfig, MAR0 and MAR0+4. The lock was probably meant to
provide synchronization for cp_set_rx_mode as these registers were
accessed exclusively by __cp_set_rx_mode.
drivers/net/ethernet/realtek/8139cp.c | 33 +++++++++++++++------------
1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 5652da8a178c..ab0395640305 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -372,7 +372,6 @@ struct cp_private {
} while (0)
-static void __cp_set_rx_mode (struct net_device *dev);
static void cp_tx (struct cp_private *cp);
static void cp_clean_rings (struct cp_private *cp);
#ifdef CONFIG_NET_POLL_CONTROLLER
@@ -885,30 +884,31 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
/* Set or clear the multicast filter for this adaptor.
This routine is not state sensitive and need not be SMP locked. */
-static void __cp_set_rx_mode (struct net_device *dev)
+static void cp_write_rx_mode(struct net_device *dev)
{
struct cp_private *cp = netdev_priv(dev);
u32 mc_filter[2]; /* Multicast hash filter */
+ char *ha_addr;
int rx_mode;
+ int ni;
/* Note: do not reorder, GCC is clever about common statements. */
- if (dev->flags & IFF_PROMISC) {
+ if (netif_rx_mode_get_cfg(dev, NETIF_RX_MODE_CFG_PROMISC)) {
/* Unconditionally log net taps. */
rx_mode =
AcceptBroadcast | AcceptMulticast | AcceptMyPhys |
AcceptAllPhys;
mc_filter[1] = mc_filter[0] = 0xffffffff;
- } else if ((netdev_mc_count(dev) > multicast_filter_limit) ||
- (dev->flags & IFF_ALLMULTI)) {
+ } else if ((netif_rx_mode_mc_count(dev) > multicast_filter_limit) ||
+ netif_rx_mode_get_cfg(dev, NETIF_RX_MODE_CFG_ALLMULTI)) {
/* Too many to filter perfectly -- accept all multicasts. */
rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
mc_filter[1] = mc_filter[0] = 0xffffffff;
} else {
- struct netdev_hw_addr *ha;
rx_mode = AcceptBroadcast | AcceptMyPhys;
mc_filter[1] = mc_filter[0] = 0;
- netdev_for_each_mc_addr(ha, dev) {
- int bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26;
+ netif_rx_mode_for_each_mc_addr(ha_addr, dev, ni) {
+ int bit_nr = ether_crc(ETH_ALEN, ha_addr) >> 26;
mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
rx_mode |= AcceptMulticast;
@@ -925,12 +925,14 @@ static void __cp_set_rx_mode (struct net_device *dev)
static void cp_set_rx_mode (struct net_device *dev)
{
- unsigned long flags;
- struct cp_private *cp = netdev_priv(dev);
+ bool allmulti = !!(dev->flags & IFF_ALLMULTI);
+ bool promisc = !!(dev->flags & IFF_PROMISC);
- spin_lock_irqsave (&cp->lock, flags);
- __cp_set_rx_mode(dev);
- spin_unlock_irqrestore (&cp->lock, flags);
+ netif_rx_mode_set_flag(dev, NETIF_RX_MODE_UC_SKIP, true);
+ netif_rx_mode_set_flag(dev, NETIF_RX_MODE_MC_SKIP, promisc | allmulti);
+
+ netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_ALLMULTI, allmulti);
+ netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_PROMISC, promisc);
}
static void __cp_get_stats(struct cp_private *cp)
@@ -1040,7 +1042,7 @@ static void cp_init_hw (struct cp_private *cp)
cp_start_hw(cp);
cpw8(TxThresh, 0x06); /* XXX convert magic num to a constant */
- __cp_set_rx_mode(dev);
+ netif_schedule_rx_mode_work(dev);
cpw32_f (TxConfig, IFG | (TX_DMA_BURST << TxDMAShift));
cpw8(Config1, cpr8(Config1) | DriverLoaded | PMEnable);
@@ -1262,7 +1264,7 @@ static void cp_tx_timeout(struct net_device *dev, unsigned int txqueue)
cp_clean_rings(cp);
cp_init_rings(cp);
cp_start_hw(cp);
- __cp_set_rx_mode(dev);
+ netif_schedule_rx_mode_work(dev);
cpw16_f(IntrMask, cp_norx_intr_mask);
netif_wake_queue(dev);
@@ -1870,6 +1872,7 @@ static const struct net_device_ops cp_netdev_ops = {
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = cp_set_mac_address,
.ndo_set_rx_mode = cp_set_rx_mode,
+ .ndo_write_rx_mode = cp_write_rx_mode,
.ndo_get_stats = cp_get_stats,
.ndo_eth_ioctl = cp_ioctl,
.ndo_start_xmit = cp_start_xmit,
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v8 5/6] vmxnet3: Implement ndo_write_rx_mode callback
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
` (3 preceding siblings ...)
2026-01-12 18:16 ` [PATCH net-next v8 4/6] 8139cp: " I Viswanath
@ 2026-01-12 18:16 ` I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 6/6] pcnet32: " I Viswanath
2026-01-16 5:58 ` [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
6 siblings, 0 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-12 18:16 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev,
I Viswanath
Add callback and update the code to use the rx_mode snapshot and
deferred write model
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
drivers/net/vmxnet3/vmxnet3_drv.c | 38 ++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 0572f6a9bdb6..fe76f6a2afea 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2775,18 +2775,18 @@ static u8 *
vmxnet3_copy_mc(struct net_device *netdev)
{
u8 *buf = NULL;
- u32 sz = netdev_mc_count(netdev) * ETH_ALEN;
+ u32 sz = netif_rx_mode_mc_count(netdev) * ETH_ALEN;
+ char *ha_addr;
+ int ni;
/* struct Vmxnet3_RxFilterConf.mfTableLen is u16. */
if (sz <= 0xffff) {
/* We may be called with BH disabled */
buf = kmalloc(sz, GFP_ATOMIC);
if (buf) {
- struct netdev_hw_addr *ha;
int i = 0;
-
- netdev_for_each_mc_addr(ha, netdev)
- memcpy(buf + i++ * ETH_ALEN, ha->addr,
+ netif_rx_mode_for_each_mc_addr(ha_addr, netdev, ni)
+ memcpy(buf + i++ * ETH_ALEN, ha_addr,
ETH_ALEN);
}
}
@@ -2796,8 +2796,23 @@ vmxnet3_copy_mc(struct net_device *netdev)
static void
vmxnet3_set_mc(struct net_device *netdev)
+{
+ bool allmulti = !!(netdev->flags & IFF_ALLMULTI);
+ bool promisc = !!(netdev->flags & IFF_PROMISC);
+ bool broadcast = !!(netdev->flags & IFF_BROADCAST);
+
+ netif_rx_mode_set_flag(netdev, NETIF_RX_MODE_UC_SKIP, true);
+ netif_rx_mode_set_flag(netdev, NETIF_RX_MODE_MC_SKIP, allmulti);
+
+ netif_rx_mode_set_cfg(netdev, NETIF_RX_MODE_CFG_ALLMULTI, allmulti);
+ netif_rx_mode_set_cfg(netdev, NETIF_RX_MODE_CFG_PROMISC, promisc);
+ netif_rx_mode_set_cfg(netdev, NETIF_RX_MODE_CFG_BROADCAST, broadcast);
+}
+
+static void vmxnet3_write_mc(struct net_device *netdev)
{
struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+ int mc_count = netif_rx_mode_mc_count(netdev);
unsigned long flags;
struct Vmxnet3_RxFilterConf *rxConf =
&adapter->shared->devRead.rxFilterConf;
@@ -2806,7 +2821,7 @@ vmxnet3_set_mc(struct net_device *netdev)
bool new_table_pa_valid = false;
u32 new_mode = VMXNET3_RXM_UCAST;
- if (netdev->flags & IFF_PROMISC) {
+ if (netif_rx_mode_get_cfg(netdev, NETIF_RX_MODE_CFG_PROMISC)) {
u32 *vfTable = adapter->shared->devRead.rxFilterConf.vfTable;
memset(vfTable, 0, VMXNET3_VFT_SIZE * sizeof(*vfTable));
@@ -2815,16 +2830,16 @@ vmxnet3_set_mc(struct net_device *netdev)
vmxnet3_restore_vlan(adapter);
}
- if (netdev->flags & IFF_BROADCAST)
+ if (netif_rx_mode_get_cfg(netdev, NETIF_RX_MODE_CFG_BROADCAST))
new_mode |= VMXNET3_RXM_BCAST;
- if (netdev->flags & IFF_ALLMULTI)
+ if (netif_rx_mode_get_cfg(netdev, NETIF_RX_MODE_CFG_ALLMULTI))
new_mode |= VMXNET3_RXM_ALL_MULTI;
else
- if (!netdev_mc_empty(netdev)) {
+ if (mc_count) {
new_table = vmxnet3_copy_mc(netdev);
if (new_table) {
- size_t sz = netdev_mc_count(netdev) * ETH_ALEN;
+ size_t sz = mc_count * ETH_ALEN;
rxConf->mfTableLen = cpu_to_le16(sz);
new_table_pa = dma_map_single(
@@ -3213,7 +3228,7 @@ vmxnet3_activate_dev(struct vmxnet3_adapter *adapter)
}
/* Apply the rx filter settins last. */
- vmxnet3_set_mc(adapter->netdev);
+ netif_schedule_rx_mode_work(adapter->netdev);
/*
* Check link state when first activating device. It will start the
@@ -3977,6 +3992,7 @@ vmxnet3_probe_device(struct pci_dev *pdev,
.ndo_get_stats64 = vmxnet3_get_stats64,
.ndo_tx_timeout = vmxnet3_tx_timeout,
.ndo_set_rx_mode = vmxnet3_set_mc,
+ .ndo_write_rx_mode = vmxnet3_write_mc,
.ndo_vlan_rx_add_vid = vmxnet3_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = vmxnet3_vlan_rx_kill_vid,
#ifdef CONFIG_NET_POLL_CONTROLLER
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v8 6/6] pcnet32: Implement ndo_write_rx_mode callback
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
` (4 preceding siblings ...)
2026-01-12 18:16 ` [PATCH net-next v8 5/6] vmxnet3: " I Viswanath
@ 2026-01-12 18:16 ` I Viswanath
2026-01-16 5:58 ` [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
6 siblings, 0 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-12 18:16 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev,
I Viswanath
Add callback and update the code to use the rx_mode snapshot and
deferred write model
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
This is a very weird driver in that it calls pcnet32_load_multicast to set up
the mc filter registers instead of the set_rx_mode callback in ndo_open.
I can't find a single other driver that does that.
Apart from that, I don't think it makes sense for the (now) write_rx_mode
callback to call netif_wake_queue(). Correct me if I am wrong here.
drivers/net/ethernet/amd/pcnet32.c | 57 ++++++++++++++++++++++--------
1 file changed, 43 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/amd/pcnet32.c b/drivers/net/ethernet/amd/pcnet32.c
index 9eaefa0f5e80..8bb0bb3da789 100644
--- a/drivers/net/ethernet/amd/pcnet32.c
+++ b/drivers/net/ethernet/amd/pcnet32.c
@@ -314,8 +314,9 @@ static void pcnet32_tx_timeout(struct net_device *dev, unsigned int txqueue);
static irqreturn_t pcnet32_interrupt(int, void *);
static int pcnet32_close(struct net_device *);
static struct net_device_stats *pcnet32_get_stats(struct net_device *);
-static void pcnet32_load_multicast(struct net_device *dev);
+static void pcnet32_load_multicast(struct net_device *dev, bool is_open);
static void pcnet32_set_multicast_list(struct net_device *);
+static void pcnet32_write_multicast_list(struct net_device *);
static int pcnet32_ioctl(struct net_device *, struct ifreq *, int);
static void pcnet32_watchdog(struct timer_list *);
static int mdio_read(struct net_device *dev, int phy_id, int reg_num);
@@ -1580,6 +1581,7 @@ static const struct net_device_ops pcnet32_netdev_ops = {
.ndo_tx_timeout = pcnet32_tx_timeout,
.ndo_get_stats = pcnet32_get_stats,
.ndo_set_rx_mode = pcnet32_set_multicast_list,
+ .ndo_write_rx_mode = pcnet32_write_multicast_list,
.ndo_eth_ioctl = pcnet32_ioctl,
.ndo_set_mac_address = eth_mac_addr,
.ndo_validate_addr = eth_validate_addr,
@@ -2264,7 +2266,7 @@ static int pcnet32_open(struct net_device *dev)
lp->init_block->mode =
cpu_to_le16((lp->options & PCNET32_PORT_PORTSEL) << 7);
- pcnet32_load_multicast(dev);
+ pcnet32_load_multicast(dev, true);
if (pcnet32_init_ring(dev)) {
rc = -ENOMEM;
@@ -2680,18 +2682,26 @@ static struct net_device_stats *pcnet32_get_stats(struct net_device *dev)
}
/* taken from the sunlance driver, which it took from the depca driver */
-static void pcnet32_load_multicast(struct net_device *dev)
+static void pcnet32_load_multicast(struct net_device *dev, bool is_open)
{
struct pcnet32_private *lp = netdev_priv(dev);
volatile struct pcnet32_init_block *ib = lp->init_block;
volatile __le16 *mcast_table = (__le16 *)ib->filter;
struct netdev_hw_addr *ha;
+ char *ha_addr;
+ bool allmulti;
unsigned long ioaddr = dev->base_addr;
- int i;
+ int i, ni;
u32 crc;
+ if (is_open)
+ allmulti = dev->flags & IFF_ALLMULTI;
+ else
+ allmulti = netif_rx_mode_get_cfg(dev,
+ NETIF_RX_MODE_CFG_ALLMULTI);
+
/* set all multicast bits */
- if (dev->flags & IFF_ALLMULTI) {
+ if (allmulti) {
ib->filter[0] = cpu_to_le32(~0U);
ib->filter[1] = cpu_to_le32(~0U);
lp->a->write_csr(ioaddr, PCNET32_MC_FILTER, 0xffff);
@@ -2705,20 +2715,40 @@ static void pcnet32_load_multicast(struct net_device *dev)
ib->filter[1] = 0;
/* Add addresses */
- netdev_for_each_mc_addr(ha, dev) {
- crc = ether_crc_le(6, ha->addr);
- crc = crc >> 26;
- mcast_table[crc >> 4] |= cpu_to_le16(1 << (crc & 0xf));
- }
+ if (is_open)
+ netdev_for_each_mc_addr(ha, dev) {
+ crc = ether_crc_le(6, ha->addr);
+ crc = crc >> 26;
+ mcast_table[crc >> 4] |= cpu_to_le16(1 << (crc & 0xf));
+ }
+ else
+ netif_rx_mode_for_each_mc_addr(ha_addr, dev, ni) {
+ crc = ether_crc_le(6, ha_addr);
+ crc = crc >> 26;
+ mcast_table[crc >> 4] |= cpu_to_le16(1 << (crc & 0xf));
+ }
+
for (i = 0; i < 4; i++)
lp->a->write_csr(ioaddr, PCNET32_MC_FILTER + i,
le16_to_cpu(mcast_table[i]));
}
+static void pcnet32_set_multicast_list(struct net_device *dev)
+{
+ bool allmulti = !!(dev->flags & IFF_ALLMULTI);
+ bool promisc = !!(dev->flags & IFF_PROMISC);
+
+ netif_rx_mode_set_flag(dev, NETIF_RX_MODE_UC_SKIP, true);
+ netif_rx_mode_set_flag(dev, NETIF_RX_MODE_MC_SKIP, promisc | allmulti);
+
+ netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_ALLMULTI, allmulti);
+ netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_PROMISC, promisc);
+}
+
/*
* Set or clear the multicast filter for this adaptor.
*/
-static void pcnet32_set_multicast_list(struct net_device *dev)
+static void pcnet32_write_multicast_list(struct net_device *dev)
{
unsigned long ioaddr = dev->base_addr, flags;
struct pcnet32_private *lp = netdev_priv(dev);
@@ -2727,7 +2757,7 @@ static void pcnet32_set_multicast_list(struct net_device *dev)
spin_lock_irqsave(&lp->lock, flags);
suspended = pcnet32_suspend(dev, &flags, 0);
csr15 = lp->a->read_csr(ioaddr, CSR15);
- if (dev->flags & IFF_PROMISC) {
+ if (netif_rx_mode_get_cfg(dev, NETIF_RX_MODE_CFG_PROMISC)) {
/* Log any net taps. */
netif_info(lp, hw, dev, "Promiscuous mode enabled\n");
lp->init_block->mode =
@@ -2738,7 +2768,7 @@ static void pcnet32_set_multicast_list(struct net_device *dev)
lp->init_block->mode =
cpu_to_le16((lp->options & PCNET32_PORT_PORTSEL) << 7);
lp->a->write_csr(ioaddr, CSR15, csr15 & 0x7fff);
- pcnet32_load_multicast(dev);
+ pcnet32_load_multicast(dev, false);
}
if (suspended) {
@@ -2746,7 +2776,6 @@ static void pcnet32_set_multicast_list(struct net_device *dev)
} else {
lp->a->write_csr(ioaddr, CSR0, CSR0_STOP);
pcnet32_restart(dev, CSR0_NORMAL);
- netif_wake_queue(dev);
}
spin_unlock_irqrestore(&lp->lock, flags);
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
` (5 preceding siblings ...)
2026-01-12 18:16 ` [PATCH net-next v8 6/6] pcnet32: " I Viswanath
@ 2026-01-16 5:58 ` I Viswanath
6 siblings, 0 replies; 8+ messages in thread
From: I Viswanath @ 2026-01-16 5:58 UTC (permalink / raw)
To: edumazet, horms, sdf, kuba, andrew+netdev, pabeni, jasowang,
eperezma, mst, xuanzhuo, przemyslaw.kitszel, anthony.l.nguyen,
ronak.doshi, pcnet32
Cc: bcm-kernel-feedback-list, intel-wired-lan, virtualization, netdev
Patch 1:
> If netif_alloc_rx_mode_ctx() succeeds but ndo_open() subsequently fails,
does this leak the rx_mode_ctx allocation? The error path only clears
__LINK_STATE_START but does not appear to free the rx_mode_ctx. (Yes,
There is a leak)
> Would it make sense to add netif_free_rx_mode_ctx(dev) to the error path,
or perhaps check if dev->rx_mode_ctx is already allocated before calling
netif_alloc_rx_mode_ctx()?
This framework should accommodate future ndo s requiring deferred work.
Therefore, the best course of action would be to schedule the cleanup
work. If we reuse it, we would have a memory leak in case __dev_open
never succeeds as the cleanup is in __dev_close_many
Does it make sense to move
+ if (!ret && dev->needs_cleanup_work) {
+ if (!dev->cleanup_work)
+ ret = netif_alloc_cleanup_work(dev);
+ else
+ flush_work(&dev->cleanup_work->work);
+ }
+
+ if (!ret && ops->ndo_write_rx_mode)
+ ret = netif_alloc_rx_mode_ctx(dev);
+
to a new function netif_alloc_deferred_ctx() and rename
netif_cleanup_work_fn() to netif_free_deferred_ctx()?
Patch 3:
First of all, Does it make sense to call e1000_set_rx_mode when the
netif is down?
Second of all, I am not dealing with the cases where I/O should be
illegal but the netif is still up correctly.
For this, I am thinking of adding netif_enable_deferred_ctx() and
netif_disable_deferred_ctx()
netif_disable_deferred_ctx() will be called in the PM suspend
callbacks and in the PCI shutdown callback while
netif_enable_deferred_ctx() will be called in the PM resume callbacks.
I know this will be a lot of work but this is a one time thing that
other deferred work ndo s can use for free.
Correct me if I have missed any cases.
Patch 6:
This was stupid on my part. I will add back netif_wake_queue(dev) in
the next version.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-16 5:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 18:16 [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 1/6] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 2/6] virtio-net: Implement ndo_write_rx_mode callback I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 3/6] e1000: " I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 4/6] 8139cp: " I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 5/6] vmxnet3: " I Viswanath
2026-01-12 18:16 ` [PATCH net-next v8 6/6] pcnet32: " I Viswanath
2026-01-16 5:58 ` [PATCH net-next v8 0/6] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox