public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v7 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write
@ 2026-01-02 18:05 I Viswanath
  2026-01-02 18:05 ` [PATCH net-next v7 1/2] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
  2026-01-02 18:05 ` [PATCH net-next v7 2/2] virtio-net: Implement ndo_write_rx_mode callback I Viswanath
  0 siblings, 2 replies; 6+ messages in thread
From: I Viswanath @ 2026-01-02 18:05 UTC (permalink / raw)
  To: edumazet, andrew+netdev, horms, kuba, pabeni, mst, eperezma,
	jasowang, xuanzhuo
  Cc: netdev, virtualization, linux-kernel, I Viswanath

This is an implementation of the idea provided by Jakub here

https://lore.kernel.org/netdev/20250923163727.5e97abdb@kernel.org/

ndo_set_rx_mode is problematic because it cannot sleep. 

To address this, this series proposes dividing the concept of setting
rx_mode into 2 stages: snapshot and deferred I/O. To achieve this, we
change the semantics of set_rx_mode and add a new ndo write_rx_mode.

The new set_rx_mode will be responsible for customizing the rx_mode
snapshot which will be used by write_rx_mode to update the hardware

In brief, the new flow will look something like:

set_rx_mode():
    ndo_set_rx_mode();
    prepare_rx_mode();

write_rx_mode():
    use_snapshot();
    ndo_write_rx_mode();

write_rx_mode() is called from a work item and doesn't hold the 
netif_addr_lock spin lock during ndo_write_rx_mode() making it sleepable
in that section.

This model should work correctly if the following conditions hold:

1. write_rx_mode should use the rx_mode set by the most recent
    call to prepare_rx_mode() before its execution.

2. If a make_snapshot_ready call happens during execution of write_rx_mode,
    write_rx_mode() should be rescheduled.

3. All calls to modify rx_mode should pass through the prepare_rx_mode +
	schedule write_rx_mode() execution flow. netif_schedule_rx_mode_work()
    has been implemented in core for this purpose.

1 and 2 are implemented in core

Drivers need to ensure 3 using netif_schedule_rx_mode_work()

To use this model, a driver needs to implement the
ndo_write_rx_mode callback, change the set_rx_mode callback
appropriately and replace all calls to modify rx mode with
netif_schedule_rx_mode_work()

Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---

In v5, apart from the bug with netif_rx_mode_flush_work, netif_free_rx_mode_ctx() was problematic
because it needed to cancel and wait for the work to complete before freeing memory.

The problem was that the work needed to grab the RTNL lock while the RTNL lock was held as this function
was part of dev_close()

This means we are guaranteed a deadlock in case the work was pending. 

cancelling the work should be done in a context that doesn't hold the RTNL lock. The only existing
function in the teardown path that did this was free_netdev and it isn't ideal to do the cleanup there.

My solution is to introduce a new struct netif_cleanup_work and a new net_device member cleanup_work.
I am not sure if there is a better solution than this.

cleanup_work will be a work item scheduled by dev_close() that will execute the cleanup functions that 
need a RTNL lock free context.

v1:
Link: https://lore.kernel.org/netdev/20251020134857.5820-1-viswanathiyyappan@gmail.com/

v2:
- Exported set_and_schedule_rx_config as a symbol for use in modules
- Fixed incorrect cleanup for the case of rx_work alloc failing in alloc_netdev_mqs
- Removed the locked version (cp_set_rx_mode) and renamed __cp_set_rx_mode to cp_set_rx_mode
Link: https://lore.kernel.org/netdev/20251026175445.1519537-1-viswanathiyyappan@gmail.com/

v3:
- Added RFT tag
- Corrected mangled patch
Link: https://lore.kernel.org/netdev/20251028174222.1739954-1-viswanathiyyappan@gmail.com/

v4:
- Completely reworked the snapshot mechanism as per v3 comments
- Implemented the callback for virtio-net instead of 8139cp driver
- Removed RFC tag
Link: https://lore.kernel.org/netdev/20251118164333.24842-1-viswanathiyyappan@gmail.com/

v5:
- Fix broken code and titles
- Remove RFT tag
Link: https://lore.kernel.org/netdev/20251120141354.355059-1-viswanathiyyappan@gmail.com/

v6:
- Added struct netif_deferred_work_cleanup and members needs_deferred_cleanup and deferred_work_cleanup in net_device
- Moved out ctrl bits from netif_rx_mode_config to netif_rx_mode_work_ctx
Link: https://lore.kernel.org/netdev/20251227174225.699975-1-viswanathiyyappan@gmail.com/

v7:
- Improved function, enum and struct names

I Viswanath (2):
  net: refactor set_rx_mode into snapshot and deferred I/O
  virtio-net: Implement ndo_write_rx_mode callback

 drivers/net/virtio_net.c  |  55 +++-----
 include/linux/netdevice.h | 111 +++++++++++++++-
 net/core/dev.c            | 264 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 389 insertions(+), 41 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next v7 1/2] net: refactor set_rx_mode into snapshot and deferred I/O
  2026-01-02 18:05 [PATCH net-next v7 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
@ 2026-01-02 18:05 ` I Viswanath
  2026-01-06 10:28   ` I Viswanath
  2026-01-02 18:05 ` [PATCH net-next v7 2/2] virtio-net: Implement ndo_write_rx_mode callback I Viswanath
  1 sibling, 1 reply; 6+ messages in thread
From: I Viswanath @ 2026-01-02 18:05 UTC (permalink / raw)
  To: edumazet, andrew+netdev, horms, kuba, pabeni, mst, eperezma,
	jasowang, xuanzhuo
  Cc: netdev, virtualization, linux-kernel, I Viswanath

ndo_set_rx_mode is problematic as it cannot sleep.

There are drivers that circumvent this by doing the rx_mode work
in a work item. This requires extra work that can be avoided if
core provided a mechanism to do that. This patch proposes such a
mechanism.

Refactor set_rx_mode into 2 stages: A snapshot stage and the
actual I/O. In this new model, when _dev_set_rx_mode is called,
we take a snapshot of the current rx_config and then commit it
to the hardware later via a work item

To accomplish this, reinterpret set_rx_mode as the ndo for
customizing the snapshot and enabling/disabling rx_mode set
and add a new ndo write_rx_mode for the deferred I/O

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
 include/linux/netdevice.h | 111 +++++++++++++++-
 net/core/dev.c            | 264 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 368 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5870a9e514a5..210f320d404d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1062,6 +1062,44 @@ struct netdev_net_notifier {
 	struct notifier_block *nb;
 };
 
+struct netif_cleanup_work {
+	struct work_struct work;
+	struct net_device *dev;
+};
+
+enum netif_rx_mode_cfg {
+	NETIF_RX_MODE_CFG_ALLMULTI,
+	NETIF_RX_MODE_CFG_PROMISC,
+	NETIF_RX_MODE_CFG_VLAN
+};
+
+enum netif_rx_mode_flags {
+	NETIF_RX_MODE_READY,
+
+	/* if set, rx_mode set work will be skipped */
+	NETIF_RX_MODE_SET_SKIP,
+
+	/* if set, uc/mc lists will not be part of rx_mode config */
+	NETIF_RX_MODE_UC_SKIP,
+	NETIF_RX_MODE_MC_SKIP
+};
+
+struct netif_rx_mode_config {
+	char	*uc_addrs;
+	char	*mc_addrs;
+	int	uc_count;
+	int	mc_count;
+	int	cfg;
+};
+
+struct netif_rx_mode_ctx {
+	struct netif_rx_mode_config *pending;
+	struct netif_rx_mode_config *ready;
+	struct work_struct work;
+	struct net_device *dev;
+	int flags;
+};
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1114,9 +1152,14 @@ struct netdev_net_notifier {
  *	changes to configuration when multicast or promiscuous is enabled.
  *
  * void (*ndo_set_rx_mode)(struct net_device *dev);
- *	This function is called device changes address list filtering.
+ *	This function is called when device changes address list filtering.
  *	If driver handles unicast address filtering, it should set
- *	IFF_UNICAST_FLT in its priv_flags.
+ *	IFF_UNICAST_FLT in its priv_flags. This is used to configure
+ *	the rx_mode snapshot that will be written to the hardware.
+ *
+ * void (*ndo_write_rx_mode)(struct net_device *dev);
+ *	This function is scheduled after set_rx_mode and is responsible for
+ *	writing the rx_mode snapshot to the hardware.
  *
  * int (*ndo_set_mac_address)(struct net_device *dev, void *addr);
  *	This function  is called when the Media Access Control address
@@ -1437,6 +1480,7 @@ struct net_device_ops {
 	void			(*ndo_change_rx_flags)(struct net_device *dev,
 						       int flags);
 	void			(*ndo_set_rx_mode)(struct net_device *dev);
+	void			(*ndo_write_rx_mode)(struct net_device *dev);
 	int			(*ndo_set_mac_address)(struct net_device *dev,
 						       void *addr);
 	int			(*ndo_validate_addr)(struct net_device *dev);
@@ -1939,7 +1983,7 @@ enum netdev_reg_state {
  *	@ingress_queue:		XXX: need comments on this one
  *	@nf_hooks_ingress:	netfilter hooks executed for ingress packets
  *	@broadcast:		hw bcast address
- *
+ *	@rx_mode_ctx:		rx_mode work context
  *	@rx_cpu_rmap:	CPU reverse-mapping for RX completion interrupts,
  *			indexed by RX queue number. Assigned by driver.
  *			This must only be set if the ndo_rx_flow_steer
@@ -1971,6 +2015,8 @@ enum netdev_reg_state {
  *	@link_watch_list:	XXX: need comments on this one
  *
  *	@reg_state:		Register/unregister state machine
+ *	@needs_cleanup_work:	Should dev_close schedule the cleanup work?
+ *	@cleanup_work:		Cleanup work context
  *	@dismantle:		Device is going to be freed
  *	@needs_free_netdev:	Should unregister perform free_netdev?
  *	@priv_destructor:	Called from unregister
@@ -2350,6 +2396,7 @@ struct net_device {
 #endif
 
 	unsigned char		broadcast[MAX_ADDR_LEN];
+	struct netif_rx_mode_ctx *rx_mode_ctx;
 #ifdef CONFIG_RFS_ACCEL
 	struct cpu_rmap		*rx_cpu_rmap;
 #endif
@@ -2387,6 +2434,10 @@ struct net_device {
 
 	u8 reg_state;
 
+	bool needs_cleanup_work;
+
+	struct netif_cleanup_work *cleanup_work;
+
 	bool dismantle;
 
 	/** @moving_ns: device is changing netns, protected by @lock */
@@ -3373,6 +3424,60 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb);
 u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
 		     struct net_device *sb_dev);
 
+/* Helpers to be used in the set_rx_mode implementation */
+static inline void netif_rx_mode_set_cfg(struct net_device *dev, int b,
+					 bool val)
+{
+	if (val)
+		dev->rx_mode_ctx->pending->cfg |= BIT(b);
+	else
+		dev->rx_mode_ctx->pending->cfg &= ~BIT(b);
+}
+
+static inline void netif_rx_mode_set_flag(struct net_device *dev, int b,
+					  bool val)
+{
+	if (val)
+		dev->rx_mode_ctx->flags |= BIT(b);
+	else
+		dev->rx_mode_ctx->flags &= ~BIT(b);
+}
+
+/* Helpers to be used in the write_rx_mode implementation */
+static inline int netif_rx_mode_get_cfg(struct net_device *dev, int b)
+{
+	return !!(dev->rx_mode_ctx->ready->cfg & BIT(b));
+}
+
+static inline int netif_rx_mode_get_flag(struct net_device *dev, int b)
+{
+	return !!(dev->rx_mode_ctx->flags & BIT(b));
+}
+
+static inline int netif_rx_mode_get_mc_count(struct net_device *dev)
+{
+	return dev->rx_mode_ctx->ready->mc_count;
+}
+
+static inline int netif_rx_mode_get_uc_count(struct net_device *dev)
+{
+	return dev->rx_mode_ctx->ready->uc_count;
+}
+
+void netif_schedule_rx_mode_work(struct net_device *dev);
+
+void netif_flush_rx_mode_work(struct net_device *dev);
+
+#define netif_rx_mode_for_each_uc_addr(dev, ha_addr, idx) \
+	for (ha_addr = (dev)->rx_mode_ctx->ready->uc_addrs, idx = 0; \
+	     idx < netif_rx_mode_get_uc_count((dev)); \
+	     ha_addr += (dev)->addr_len, idx++)
+
+#define netif_rx_mode_for_each_mc_addr(dev, ha_addr, idx) \
+	for (ha_addr = (dev)->rx_mode_ctx->ready->mc_addrs, idx = 0; \
+	     idx < netif_rx_mode_get_mc_count((dev)); \
+	     ha_addr += (dev)->addr_len, idx++)
+
 int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
 int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 36dc5199037e..ffa0615b688e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1587,6 +1587,206 @@ void netif_state_change(struct net_device *dev)
 	}
 }
 
+/* This function attempts to copy the current state of the
+ * net device into pending (reallocating if necessary). If it fails,
+ * pending is guaranteed to be unmodified.
+ */
+static int __netif_prepare_rx_mode(struct net_device *dev)
+{
+	struct netif_rx_mode_config *pending = dev->rx_mode_ctx->pending;
+	bool skip_uc = false, skip_mc = false;
+	int uc_count = 0, mc_count = 0;
+	struct netdev_hw_addr *ha;
+	char *tmp;
+	int i;
+
+	skip_uc = netif_rx_mode_get_flag(dev, NETIF_RX_MODE_UC_SKIP);
+	skip_mc = netif_rx_mode_get_flag(dev, NETIF_RX_MODE_MC_SKIP);
+
+	/* The allocations need to be atomic since this will be called under
+	 * netif_addr_lock_bh()
+	 */
+	if (!skip_uc) {
+		uc_count = netdev_uc_count(dev);
+		tmp = krealloc(pending->uc_addrs, uc_count * dev->addr_len,
+			       GFP_ATOMIC);
+		if (!tmp)
+			return -ENOMEM;
+		pending->uc_addrs = tmp;
+	}
+
+	if (!skip_mc) {
+		mc_count = netdev_mc_count(dev);
+		tmp = krealloc(pending->mc_addrs, mc_count * dev->addr_len,
+			       GFP_ATOMIC);
+		if (!tmp)
+			return -ENOMEM;
+		pending->mc_addrs = tmp;
+	}
+
+	/* This function cannot fail after this point */
+
+	/* This is going to be the same for every single driver. Better to
+	 * do it here than in the set_rx_mode impl
+	 */
+	netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_ALLMULTI,
+			      !!(dev->flags & IFF_ALLMULTI));
+
+	netif_rx_mode_set_cfg(dev, NETIF_RX_MODE_CFG_PROMISC,
+			      !!(dev->flags & IFF_PROMISC));
+
+	i = 0;
+	if (!skip_uc) {
+		pending->uc_count = uc_count;
+		netdev_for_each_uc_addr(ha, dev)
+			memcpy(pending->uc_addrs + (i++) * dev->addr_len,
+			       ha->addr, dev->addr_len);
+	}
+
+	i = 0;
+	if (!skip_mc) {
+		pending->mc_count = mc_count;
+		netdev_for_each_mc_addr(ha, dev)
+			memcpy(pending->mc_addrs + (i++) * dev->addr_len,
+			       ha->addr, dev->addr_len);
+	}
+	return 0;
+}
+
+static void netif_prepare_rx_mode(struct net_device *dev)
+{
+	lockdep_assert_held(&dev->addr_list_lock);
+	int rc;
+
+	rc = __netif_prepare_rx_mode(dev);
+	if (rc)
+		return;
+
+	netif_rx_mode_set_flag(dev, NETIF_RX_MODE_READY, true);
+}
+
+static void netif_write_rx_mode(struct work_struct *param)
+{
+	struct netif_rx_mode_ctx *ctx;
+	struct net_device *dev;
+
+	rtnl_lock();
+	ctx = container_of(param, struct netif_rx_mode_ctx, work);
+	dev = ctx->dev;
+
+	if (!netif_running(dev)) {
+		rtnl_unlock();
+		return;
+	}
+
+	/* Paranoia. */
+	if (WARN_ON(!dev->netdev_ops->ndo_write_rx_mode)) {
+		rtnl_unlock();
+		return;
+	}
+
+	/* We could introduce a new lock for this but reusing the addr
+	 * lock works well enough
+	 */
+	netif_addr_lock_bh(dev);
+
+	/* There's no point continuing if the pending config is not ready */
+	if (!netif_rx_mode_get_flag(dev, NETIF_RX_MODE_READY)) {
+		netif_addr_unlock_bh(dev);
+		rtnl_unlock();
+		return;
+	}
+
+	swap(ctx->ready, ctx->pending);
+	netif_rx_mode_set_flag(dev, NETIF_RX_MODE_READY, false);
+	netif_addr_unlock_bh(dev);
+
+	dev->netdev_ops->ndo_write_rx_mode(dev);
+	rtnl_unlock();
+}
+
+static int netif_alloc_rx_mode_ctx(struct net_device *dev)
+{
+	dev->rx_mode_ctx = kzalloc(sizeof(*dev->rx_mode_ctx), GFP_KERNEL);
+	if (!dev->rx_mode_ctx)
+		goto fail_all;
+
+	dev->rx_mode_ctx->ready = kzalloc(sizeof(*dev->rx_mode_ctx->ready),
+					  GFP_KERNEL);
+	if (!dev->rx_mode_ctx->ready)
+		goto fail_ready;
+
+	dev->rx_mode_ctx->pending = kzalloc(sizeof(*dev->rx_mode_ctx->pending),
+					    GFP_KERNEL);
+	if (!dev->rx_mode_ctx->pending)
+		goto fail_pending;
+
+	dev->rx_mode_ctx->dev = dev;
+	INIT_WORK(&dev->rx_mode_ctx->work, netif_write_rx_mode);
+	return 0;
+
+fail_pending:
+	kfree(dev->rx_mode_ctx->ready);
+
+fail_ready:
+	kfree(dev->rx_mode_ctx);
+
+fail_all:
+	return -ENOMEM;
+}
+
+static void netif_free_rx_mode_ctx(struct net_device *dev)
+{
+	if (!dev->rx_mode_ctx)
+		return;
+
+	cancel_work_sync(&dev->rx_mode_ctx->work);
+
+	kfree(dev->rx_mode_ctx->ready->uc_addrs);
+	kfree(dev->rx_mode_ctx->ready->mc_addrs);
+	kfree(dev->rx_mode_ctx->ready);
+
+	kfree(dev->rx_mode_ctx->pending->uc_addrs);
+	kfree(dev->rx_mode_ctx->pending->mc_addrs);
+	kfree(dev->rx_mode_ctx->pending);
+
+	kfree(dev->rx_mode_ctx);
+	dev->rx_mode_ctx = NULL;
+}
+
+static void netif_cleanup_work_fn(struct work_struct *param)
+{
+	struct netif_cleanup_work *ctx;
+	struct net_device *dev;
+
+	ctx = container_of(param, struct netif_cleanup_work, work);
+	dev = ctx->dev;
+
+	if (dev->netdev_ops->ndo_write_rx_mode)
+		netif_free_rx_mode_ctx(dev);
+}
+
+static int netif_alloc_cleanup_work(struct net_device *dev)
+{
+	dev->cleanup_work = kzalloc(sizeof(*dev->cleanup_work), GFP_KERNEL);
+	if (!dev->cleanup_work)
+		return -ENOMEM;
+
+	dev->cleanup_work->dev = dev;
+	INIT_WORK(&dev->cleanup_work->work, netif_cleanup_work_fn);
+	return 0;
+}
+
+static void netif_free_cleanup_work(struct net_device *dev)
+{
+	if (!dev->cleanup_work)
+		return;
+
+	cancel_work_sync(&dev->cleanup_work->work);
+	kfree(dev->cleanup_work);
+	dev->cleanup_work = NULL;
+}
+
 /**
  * __netdev_notify_peers - notify network peers about existence of @dev,
  * to be called when rtnl lock is already held.
@@ -1682,6 +1882,16 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 	if (!ret && ops->ndo_open)
 		ret = ops->ndo_open(dev);
 
+	if (!ret && dev->needs_cleanup_work) {
+		if (!dev->cleanup_work)
+			ret = netif_alloc_cleanup_work(dev);
+		else
+			cancel_work_sync(&dev->cleanup_work->work);
+	}
+
+	if (!ret && ops->ndo_write_rx_mode)
+		ret = netif_alloc_rx_mode_ctx(dev);
+
 	netpoll_poll_enable(dev);
 
 	if (ret)
@@ -1755,6 +1965,9 @@ static void __dev_close_many(struct list_head *head)
 		if (ops->ndo_stop)
 			ops->ndo_stop(dev);
 
+		if (dev->needs_cleanup_work)
+			schedule_work(&dev->cleanup_work->work);
+
 		netif_set_up(dev, false);
 		netpoll_poll_enable(dev);
 	}
@@ -9623,6 +9836,47 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
 	return 0;
 }
 
+/* netif_schedule_rx_mode_work - Sets up the rx_config snapshot and
+ * schedules the deferred I/O.
+ */
+void netif_schedule_rx_mode_work(struct net_device *dev)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (ops->ndo_set_rx_mode)
+		ops->ndo_set_rx_mode(dev);
+
+	if (!ops->ndo_write_rx_mode)
+		return;
+
+	/* This part is only for drivers that implement ndo_write_rx_mode */
+
+	/* If rx_mode set is to be skipped, we don't schedule the work */
+	if (netif_rx_mode_get_flag(dev, NETIF_RX_MODE_SET_SKIP))
+		return;
+
+	netif_prepare_rx_mode(dev);
+	schedule_work(&dev->rx_mode_ctx->work);
+}
+EXPORT_SYMBOL(netif_schedule_rx_mode_work);
+
+/* Drivers that implement rx mode as work flush the work item when closing
+ * or suspending. This is the substitute for those calls.
+ */
+void netif_flush_rx_mode_work(struct net_device *dev)
+{
+	/* Calling this function with RTNL held will result in a deadlock. */
+	if (WARN_ON(rtnl_is_locked()))
+		return;
+
+	/* Doing nothing is enough to "flush" work on a closed interface */
+	if (!netif_running(dev))
+		return;
+
+	flush_work(&dev->rx_mode_ctx->work);
+}
+EXPORT_SYMBOL(netif_flush_rx_mode_work);
+
 /*
  *	Upload unicast and multicast address lists to device and
  *	configure RX filtering. When the device doesn't support unicast
@@ -9631,8 +9885,6 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
  */
 void __dev_set_rx_mode(struct net_device *dev)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
-
 	/* dev_open will call this function so the list will stay sane. */
 	if (!(dev->flags&IFF_UP))
 		return;
@@ -9653,8 +9905,7 @@ void __dev_set_rx_mode(struct net_device *dev)
 		}
 	}
 
-	if (ops->ndo_set_rx_mode)
-		ops->ndo_set_rx_mode(dev);
+	netif_schedule_rx_mode_work(dev);
 }
 
 void dev_set_rx_mode(struct net_device *dev)
@@ -11325,6 +11576,9 @@ int register_netdevice(struct net_device *dev)
 		}
 	}
 
+	if (dev->netdev_ops->ndo_write_rx_mode)
+		dev->needs_cleanup_work = true;
+
 	if (((dev->hw_features | dev->features) &
 	     NETIF_F_HW_VLAN_CTAG_FILTER) &&
 	    (!dev->netdev_ops->ndo_vlan_rx_add_vid ||
@@ -12068,6 +12322,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->real_num_rx_queues = rxqs;
 	if (netif_alloc_rx_queues(dev))
 		goto free_all;
+
 	dev->ethtool = kzalloc(sizeof(*dev->ethtool), GFP_KERNEL_ACCOUNT);
 	if (!dev->ethtool)
 		goto free_all;
@@ -12151,6 +12406,7 @@ void free_netdev(struct net_device *dev)
 	kfree(dev->ethtool);
 	netif_free_tx_queues(dev);
 	netif_free_rx_queues(dev);
+	netif_free_cleanup_work(dev);
 
 	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next v7 2/2] virtio-net: Implement ndo_write_rx_mode callback
  2026-01-02 18:05 [PATCH net-next v7 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
  2026-01-02 18:05 ` [PATCH net-next v7 1/2] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
@ 2026-01-02 18:05 ` I Viswanath
  2026-01-06 11:22   ` Xuan Zhuo
  1 sibling, 1 reply; 6+ messages in thread
From: I Viswanath @ 2026-01-02 18:05 UTC (permalink / raw)
  To: edumazet, andrew+netdev, horms, kuba, pabeni, mst, eperezma,
	jasowang, xuanzhuo
  Cc: netdev, virtualization, linux-kernel, I Viswanath

Implement ndo_write_rx_mode callback for virtio-net

Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
 drivers/net/virtio_net.c | 55 +++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 34 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1bb3aeca66c6..83d543bf6ae2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -460,9 +460,6 @@ struct virtnet_info {
 	/* Work struct for config space updates */
 	struct work_struct config_work;
 
-	/* Work struct for setting rx mode */
-	struct work_struct rx_mode_work;
-
 	/* OK to queue work setting RX mode? */
 	bool rx_mode_work_enabled;
 
@@ -3866,33 +3863,31 @@ static int virtnet_close(struct net_device *dev)
 	return 0;
 }
 
-static void virtnet_rx_mode_work(struct work_struct *work)
+static void virtnet_write_rx_mode(struct net_device *dev)
 {
-	struct virtnet_info *vi =
-		container_of(work, struct virtnet_info, rx_mode_work);
+	struct virtnet_info *vi = netdev_priv(dev);
 	u8 *promisc_allmulti  __free(kfree) = NULL;
-	struct net_device *dev = vi->dev;
 	struct scatterlist sg[2];
 	struct virtio_net_ctrl_mac *mac_data;
-	struct netdev_hw_addr *ha;
+	char *ha_addr;
 	int uc_count;
 	int mc_count;
 	void *buf;
+	int idx;
 	int i;
 
 	/* We can't dynamically set ndo_set_rx_mode, so return gracefully */
 	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
 		return;
 
-	promisc_allmulti = kzalloc(sizeof(*promisc_allmulti), GFP_KERNEL);
+	promisc_allmulti = kzalloc(sizeof(*promisc_allmulti), GFP_ATOMIC);
 	if (!promisc_allmulti) {
 		dev_warn(&dev->dev, "Failed to set RX mode, no memory.\n");
 		return;
 	}
 
-	rtnl_lock();
-
-	*promisc_allmulti = !!(dev->flags & IFF_PROMISC);
+	*promisc_allmulti = netif_rx_mode_get_cfg(dev,
+						  NETIF_RX_MODE_CFG_PROMISC);
 	sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
@@ -3900,7 +3895,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 		dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
 			 *promisc_allmulti ? "en" : "dis");
 
-	*promisc_allmulti = !!(dev->flags & IFF_ALLMULTI);
+	*promisc_allmulti = netif_rx_mode_get_cfg(dev,
+						  NETIF_RX_MODE_CFG_ALLMULTI);
 	sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
@@ -3908,27 +3904,22 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 		dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
 			 *promisc_allmulti ? "en" : "dis");
 
-	netif_addr_lock_bh(dev);
-
-	uc_count = netdev_uc_count(dev);
-	mc_count = netdev_mc_count(dev);
+	uc_count = netif_rx_mode_get_uc_count(dev);
+	mc_count = netif_rx_mode_get_mc_count(dev);
 	/* MAC filter - use one buffer for both lists */
 	buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
 		      (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
 	mac_data = buf;
-	if (!buf) {
-		netif_addr_unlock_bh(dev);
-		rtnl_unlock();
+	if (!buf)
 		return;
-	}
 
 	sg_init_table(sg, 2);
 
 	/* Store the unicast list and count in the front of the buffer */
 	mac_data->entries = cpu_to_virtio32(vi->vdev, uc_count);
 	i = 0;
-	netdev_for_each_uc_addr(ha, dev)
-		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
+	netif_rx_mode_for_each_uc_addr(dev, ha_addr, idx)
+		memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
 
 	sg_set_buf(&sg[0], mac_data,
 		   sizeof(mac_data->entries) + (uc_count * ETH_ALEN));
@@ -3938,10 +3929,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 
 	mac_data->entries = cpu_to_virtio32(vi->vdev, mc_count);
 	i = 0;
-	netdev_for_each_mc_addr(ha, dev)
-		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
-
-	netif_addr_unlock_bh(dev);
+	netif_rx_mode_for_each_mc_addr(dev, ha_addr, idx)
+		memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
 
 	sg_set_buf(&sg[1], mac_data,
 		   sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
@@ -3950,17 +3939,15 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 				  VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
 		dev_warn(&dev->dev, "Failed to set MAC filter table.\n");
 
-	rtnl_unlock();
-
 	kfree(buf);
 }
 
 static void virtnet_set_rx_mode(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	char cfg_disabled = !vi->rx_mode_work_enabled;
 
-	if (vi->rx_mode_work_enabled)
-		schedule_work(&vi->rx_mode_work);
+	netif_rx_mode_set_flag(dev, NETIF_RX_MODE_SET_SKIP, cfg_disabled);
 }
 
 static int virtnet_vlan_rx_add_vid(struct net_device *dev,
@@ -5776,7 +5763,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 	/* Make sure no work handler is accessing the device */
 	flush_work(&vi->config_work);
 	disable_rx_mode_work(vi);
-	flush_work(&vi->rx_mode_work);
+	netif_flush_rx_mode_work(vi->dev);
 
 	if (netif_running(vi->dev)) {
 		rtnl_lock();
@@ -6279,6 +6266,7 @@ static const struct net_device_ops virtnet_netdev = {
 	.ndo_validate_addr   = eth_validate_addr,
 	.ndo_set_mac_address = virtnet_set_mac_address,
 	.ndo_set_rx_mode     = virtnet_set_rx_mode,
+	.ndo_write_rx_mode   = virtnet_write_rx_mode,
 	.ndo_get_stats64     = virtnet_stats,
 	.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
@@ -6900,7 +6888,6 @@ static int virtnet_probe(struct virtio_device *vdev)
 	vdev->priv = vi;
 
 	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
-	INIT_WORK(&vi->rx_mode_work, virtnet_rx_mode_work);
 	spin_lock_init(&vi->refill_lock);
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
@@ -7205,7 +7192,7 @@ static void virtnet_remove(struct virtio_device *vdev)
 	/* Make sure no work handler is accessing the device. */
 	flush_work(&vi->config_work);
 	disable_rx_mode_work(vi);
-	flush_work(&vi->rx_mode_work);
+	netif_flush_rx_mode_work(vi->dev);
 
 	virtnet_free_irq_moder(vi);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v7 1/2] net: refactor set_rx_mode into snapshot and deferred I/O
  2026-01-02 18:05 ` [PATCH net-next v7 1/2] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
@ 2026-01-06 10:28   ` I Viswanath
  0 siblings, 0 replies; 6+ messages in thread
From: I Viswanath @ 2026-01-06 10:28 UTC (permalink / raw)
  To: edumazet, andrew+netdev, horms, kuba, pabeni, mst, eperezma,
	jasowang, xuanzhuo
  Cc: netdev, virtualization, linux-kernel

> +static void netif_free_cleanup_work(struct net_device *dev)
> +{
> +       if (!dev->cleanup_work)
> +               return;
> +
> +       cancel_work_sync(&dev->cleanup_work->work);
> +       kfree(dev->cleanup_work);
> +       dev->cleanup_work = NULL;
> +}
> +
> @@ -1682,6 +1882,16 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
>         if (!ret && ops->ndo_open)
>                 ret = ops->ndo_open(dev);
>
> +       if (!ret && dev->needs_cleanup_work) {
> +               if (!dev->cleanup_work)
> +                       ret = netif_alloc_cleanup_work(dev);
> +               else
> +                       cancel_work_sync(&dev->cleanup_work->work);
> +       }
> +
> +       if (!ret && ops->ndo_write_rx_mode)
> +               ret = netif_alloc_rx_mode_ctx(dev);
> +
>         netpoll_poll_enable(dev);

This is the response to the AI review. Honestly impressed by how good
the AI review is

My bad, It should be flush_work() not cancel_work_sync() in
__dev_open() and also in
netif_free_cleanup_work(). These are the only places where execution
needs to wait
for completion of the cleanup work

It's ok to just cancel rx_mode work so this issue is only with the cleanup work

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v7 2/2] virtio-net: Implement ndo_write_rx_mode callback
  2026-01-02 18:05 ` [PATCH net-next v7 2/2] virtio-net: Implement ndo_write_rx_mode callback I Viswanath
@ 2026-01-06 11:22   ` Xuan Zhuo
  2026-01-07  6:12     ` I Viswanath
  0 siblings, 1 reply; 6+ messages in thread
From: Xuan Zhuo @ 2026-01-06 11:22 UTC (permalink / raw)
  To: I Viswanath
  Cc: netdev, virtualization, linux-kernel, I Viswanath, edumazet,
	andrew+netdev, horms, kuba, pabeni, mst, eperezma, jasowang


If this is a common requirement, it would be better to provide more examples of
driver modifications.

Thanks.


On Fri,  2 Jan 2026 23:35:30 +0530, I Viswanath <viswanathiyyappan@gmail.com> wrote:
> Implement ndo_write_rx_mode callback for virtio-net
>
> Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
> ---
>  drivers/net/virtio_net.c | 55 +++++++++++++++-------------------------
>  1 file changed, 21 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 1bb3aeca66c6..83d543bf6ae2 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -460,9 +460,6 @@ struct virtnet_info {
>  	/* Work struct for config space updates */
>  	struct work_struct config_work;
>
> -	/* Work struct for setting rx mode */
> -	struct work_struct rx_mode_work;
> -
>  	/* OK to queue work setting RX mode? */
>  	bool rx_mode_work_enabled;
>
> @@ -3866,33 +3863,31 @@ static int virtnet_close(struct net_device *dev)
>  	return 0;
>  }
>
> -static void virtnet_rx_mode_work(struct work_struct *work)
> +static void virtnet_write_rx_mode(struct net_device *dev)
>  {
> -	struct virtnet_info *vi =
> -		container_of(work, struct virtnet_info, rx_mode_work);
> +	struct virtnet_info *vi = netdev_priv(dev);
>  	u8 *promisc_allmulti  __free(kfree) = NULL;
> -	struct net_device *dev = vi->dev;
>  	struct scatterlist sg[2];
>  	struct virtio_net_ctrl_mac *mac_data;
> -	struct netdev_hw_addr *ha;
> +	char *ha_addr;
>  	int uc_count;
>  	int mc_count;
>  	void *buf;
> +	int idx;
>  	int i;
>
>  	/* We can't dynamically set ndo_set_rx_mode, so return gracefully */
>  	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
>  		return;
>
> -	promisc_allmulti = kzalloc(sizeof(*promisc_allmulti), GFP_KERNEL);
> +	promisc_allmulti = kzalloc(sizeof(*promisc_allmulti), GFP_ATOMIC);
>  	if (!promisc_allmulti) {
>  		dev_warn(&dev->dev, "Failed to set RX mode, no memory.\n");
>  		return;
>  	}
>
> -	rtnl_lock();
> -
> -	*promisc_allmulti = !!(dev->flags & IFF_PROMISC);
> +	*promisc_allmulti = netif_rx_mode_get_cfg(dev,
> +						  NETIF_RX_MODE_CFG_PROMISC);
>  	sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
>
>  	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
> @@ -3900,7 +3895,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
>  		dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
>  			 *promisc_allmulti ? "en" : "dis");
>
> -	*promisc_allmulti = !!(dev->flags & IFF_ALLMULTI);
> +	*promisc_allmulti = netif_rx_mode_get_cfg(dev,
> +						  NETIF_RX_MODE_CFG_ALLMULTI);
>  	sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
>
>  	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
> @@ -3908,27 +3904,22 @@ static void virtnet_rx_mode_work(struct work_struct *work)
>  		dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
>  			 *promisc_allmulti ? "en" : "dis");
>
> -	netif_addr_lock_bh(dev);
> -
> -	uc_count = netdev_uc_count(dev);
> -	mc_count = netdev_mc_count(dev);
> +	uc_count = netif_rx_mode_get_uc_count(dev);
> +	mc_count = netif_rx_mode_get_mc_count(dev);
>  	/* MAC filter - use one buffer for both lists */
>  	buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
>  		      (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
>  	mac_data = buf;
> -	if (!buf) {
> -		netif_addr_unlock_bh(dev);
> -		rtnl_unlock();
> +	if (!buf)
>  		return;
> -	}
>
>  	sg_init_table(sg, 2);
>
>  	/* Store the unicast list and count in the front of the buffer */
>  	mac_data->entries = cpu_to_virtio32(vi->vdev, uc_count);
>  	i = 0;
> -	netdev_for_each_uc_addr(ha, dev)
> -		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
> +	netif_rx_mode_for_each_uc_addr(dev, ha_addr, idx)
> +		memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
>
>  	sg_set_buf(&sg[0], mac_data,
>  		   sizeof(mac_data->entries) + (uc_count * ETH_ALEN));
> @@ -3938,10 +3929,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
>
>  	mac_data->entries = cpu_to_virtio32(vi->vdev, mc_count);
>  	i = 0;
> -	netdev_for_each_mc_addr(ha, dev)
> -		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
> -
> -	netif_addr_unlock_bh(dev);
> +	netif_rx_mode_for_each_mc_addr(dev, ha_addr, idx)
> +		memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
>
>  	sg_set_buf(&sg[1], mac_data,
>  		   sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
> @@ -3950,17 +3939,15 @@ static void virtnet_rx_mode_work(struct work_struct *work)
>  				  VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
>  		dev_warn(&dev->dev, "Failed to set MAC filter table.\n");
>
> -	rtnl_unlock();
> -
>  	kfree(buf);
>  }
>
>  static void virtnet_set_rx_mode(struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> +	char cfg_disabled = !vi->rx_mode_work_enabled;
>
> -	if (vi->rx_mode_work_enabled)
> -		schedule_work(&vi->rx_mode_work);
> +	netif_rx_mode_set_flag(dev, NETIF_RX_MODE_SET_SKIP, cfg_disabled);
>  }
>
>  static int virtnet_vlan_rx_add_vid(struct net_device *dev,
> @@ -5776,7 +5763,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>  	/* Make sure no work handler is accessing the device */
>  	flush_work(&vi->config_work);
>  	disable_rx_mode_work(vi);
> -	flush_work(&vi->rx_mode_work);
> +	netif_flush_rx_mode_work(vi->dev);
>
>  	if (netif_running(vi->dev)) {
>  		rtnl_lock();
> @@ -6279,6 +6266,7 @@ static const struct net_device_ops virtnet_netdev = {
>  	.ndo_validate_addr   = eth_validate_addr,
>  	.ndo_set_mac_address = virtnet_set_mac_address,
>  	.ndo_set_rx_mode     = virtnet_set_rx_mode,
> +	.ndo_write_rx_mode   = virtnet_write_rx_mode,
>  	.ndo_get_stats64     = virtnet_stats,
>  	.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
>  	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> @@ -6900,7 +6888,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	vdev->priv = vi;
>
>  	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> -	INIT_WORK(&vi->rx_mode_work, virtnet_rx_mode_work);
>  	spin_lock_init(&vi->refill_lock);
>
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
> @@ -7205,7 +7192,7 @@ static void virtnet_remove(struct virtio_device *vdev)
>  	/* Make sure no work handler is accessing the device. */
>  	flush_work(&vi->config_work);
>  	disable_rx_mode_work(vi);
> -	flush_work(&vi->rx_mode_work);
> +	netif_flush_rx_mode_work(vi->dev);
>
>  	virtnet_free_irq_moder(vi);
>
> --
> 2.47.3
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v7 2/2] virtio-net: Implement ndo_write_rx_mode callback
  2026-01-06 11:22   ` Xuan Zhuo
@ 2026-01-07  6:12     ` I Viswanath
  0 siblings, 0 replies; 6+ messages in thread
From: I Viswanath @ 2026-01-07  6:12 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, virtualization, linux-kernel, edumazet, andrew+netdev,
	horms, kuba, pabeni, mst, eperezma, jasowang

This refactor has the secondary benefit of stopping set rx_mode
requests from building up as only the most recent request (before the
work has gotten a chance to run) will be confirmed/executed. Does this
sound good enough to justify the refactor for drivers which do the I/O
under a driver specific spin lock?

On Tue, 6 Jan 2026 at 16:53, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:

> If this is a common requirement, it would be better to provide more examples of
> driver modifications.
>
> Thanks.

I am planning on modifying these drivers which I can test with
QEMU/vng. Does this list sound good?

e1000
8139cp

(I will do these if the secondary benefit sounds useful)

vmxnet3
pcnet32

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-01-07  6:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02 18:05 [PATCH net-next v7 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2026-01-02 18:05 ` [PATCH net-next v7 1/2] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
2026-01-06 10:28   ` I Viswanath
2026-01-02 18:05 ` [PATCH net-next v7 2/2] virtio-net: Implement ndo_write_rx_mode callback I Viswanath
2026-01-06 11:22   ` Xuan Zhuo
2026-01-07  6:12     ` I Viswanath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox