[RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write
@ 2025-11-18 16:43 I Viswanath
  2025-11-18 16:43 ` [RFT net-next v4] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: I Viswanath @ 2025-11-18 16:43 UTC (permalink / raw)
  To: kuba, andrew+netdev, davem, edumazet, pabeni, horms, mst,
	jasowang, xuanzhuo, eperezma, sdf, kuniyu, skhawaja,
	aleksander.lobakin
  Cc: virtualization, netdev, linux-kernel, linux-kernel-mentees,
	I Viswanath

This is an implementation of the idea provided by Jakub here

https://lore.kernel.org/netdev/20250923163727.5e97abdb@kernel.org/

ndo_set_rx_mode is problematic because it cannot sleep. 

To address this, this series proposes dividing the concept of setting
rx_mode into 2 stages: snapshot and deferred I/O. To achieve this, we
reinterpret set_rx_mode and add create a new ndo write_rx_mode as
explained below:

The new set_rx_mode will be responsible for customizing the rx_mode
snapshot which will be used by write_rx_mode to update the hardware

In brief, the new flow looks something like:

prepare_rx_mode():
    ndo_set_rx_mode();
    ready_snapshot();

write_rx_mode():
    commit_and_use_snapshot();
    ndo_write_rx_mode();

write_rx_mode() is called from a work item and doesn't hold the 
netif_addr_lock lock during ndo_write_rx_mode() making it sleepable
in that section.

This model should work correctly if the following conditions hold:

1. write_rx_mode should use the rx_mode set by the most recent
    call to prepare_rx_mode before its execution.

2. If a prepare_rx_mode call happens during execution of write_rx_mode,
    write_rx_mode should be rescheduled.

3. All calls to modify rx_mode should pass through the prepare_rx_mode +
    schedule write_rx_mode execution flow. netif_rx_mode_schedule_work 
    has been implemented in core for this.

1 and 2 are guaranteed because of the properties of work queues

Drivers need to ensure 3

To use this model, a driver needs to implement the
ndo_write_rx_mode callback, change the set_rx_mode callback
appropriately and replace all calls to modify rx mode with
netif_rx_mode_schedule_work

---

Questions I have:

1) Would there ever be a situation in which you will have to wait for 
I/O to complete in a call to set_rx_mode before proceeding further? 
That is, Does netif_rx_mode_schedule_work need the flush argument?

2) Does priv_ptr in netif_rx_mode_config make sense? For virtio_net, 
I can get the vi pointer by doing netdev_priv(dev) and 
am wondering if this would be a common thing

3) From a previous discussion: 
https://lore.kernel.org/netdev/417c677f-268a-4163-b07e-deea8f9b9b40@intel.com/

On Thu, 23 Oct 2025 at 05:16, Jacob Keller  wrote:
> Is there any mechanism to make this guarantee either implemented or at
> least verified by the core? If not that, what about some sort of way to
> lint driver code and make sure its correct?

I am not sure how to automate this but basically we need warnings to be generated
when the the set_rx_mode implementations are called normally in code (From my 
understanding, usually in the open callback or the timeout function) but not when 
they are called through ops->set_rx_mode. Can Coccinelle do something like this?

v1:
Link: https://lore.kernel.org/netdev/20251020134857.5820-1-viswanathiyyappan@gmail.com/

v2:
- Exported set_and_schedule_rx_config as a symbol for use in modules
- Fixed incorrect cleanup for the case of rx_work alloc failing in alloc_netdev_mqs
- Removed the locked version (cp_set_rx_mode) and renamed __cp_set_rx_mode to cp_set_rx_mode
Link: https://lore.kernel.org/netdev/20251026175445.1519537-1-viswanathiyyappan@gmail.com/

v3:
- Added RFT tag
- Corrected mangled patch
Link: https://lore.kernel.org/netdev/20251028174222.1739954-1-viswanathiyyappan@gmail.com/

v4:
- Completely reworked the snapshot mechanism as per v3 comments
- Implemented the callback for virtio-net instead of 8139cp driver
- Removed RFC tag

I Viswanath (2):
  net: refactor set_rx_mode into snapshot and deferred I/O
  virtio-net: Implement ndO_write_rx_mode callback

 drivers/net/virtio_net.c  |  56 +++++------
 include/linux/netdevice.h | 104 ++++++++++++++++++-
 net/core/dev.c            | 207 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 328 insertions(+), 39 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFT net-next v4] net: refactor set_rx_mode into snapshot and deferred I/O
  2025-11-18 16:43 [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
@ 2025-11-18 16:43 ` I Viswanath
  2025-11-18 16:43 ` [RFT net-next v4 2/2] virtio-net: Implement ndO_write_rx_mode callback I Viswanath
  2025-11-18 16:56 ` [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
  2 siblings, 0 replies; 7+ messages in thread
From: I Viswanath @ 2025-11-18 16:43 UTC (permalink / raw)
  To: kuba, andrew+netdev, davem, edumazet, pabeni, horms, mst,
	jasowang, xuanzhuo, eperezma, sdf, kuniyu, skhawaja,
	aleksander.lobakin
  Cc: virtualization, netdev, linux-kernel, linux-kernel-mentees,
	I Viswanath

ndo_set_rx_mode is problematic as it cannot sleep.

There are drivers that circumvent this by doing the rx_mode work
in a work item. This requires extra work that can be avoided if
core provided a mechanism to do that. This patch proposes such a
mechanism.

Refactor set_rx_mode into 2 stages: A snapshot stage and the
actual I/O. In this new model, when _dev_set_rx_mode is called,
we take a snapshot of the current rx_config and then commit it
to the hardware later via a work item

To accomplish this, reinterpret set_rx_mode as the ndo for
customizing the snapshot and enabling/disabling rx_mode set
and add a new ndo write_rx_mode for the deferred I/O

Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/netdevice.h | 104 ++++++++++++++++++-
 net/core/dev.c            | 208 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 305 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e808071dbb7d..848f341a677e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1049,6 +1049,40 @@ struct netdev_net_notifier {
 	struct notifier_block *nb;
 };
 
+enum netif_rx_mode_flags {
+	/* enable flags */
+	NETIF_RX_MODE_ALLMULTI_EN,
+	NETIF_RX_MODE_PROM_EN,
+	NETIF_RX_MODE_VLAN_EN,
+
+	/* control flags */
+	/* pending config state */
+	NETIF_RX_MODE_CFG_READY,
+
+	/* if set, rx_mode config work will not be executed */
+	NETIF_RX_MODE_SET_DIS,
+
+	/* if set, uc/mc lists will not be part of rx_mode config */
+	NETIF_RX_MODE_UC_SKIP,
+	NETIF_RX_MODE_MC_SKIP
+};
+
+struct netif_rx_mode_config {
+	char	*uc_addrs;
+	char	*mc_addrs;
+	int	uc_count;
+	int	mc_count;
+	int	ctrl_flags;
+	void	*priv_ptr;
+};
+
+struct netif_rx_mode_ctx {
+	struct work_struct		rx_mode_work;
+	struct net_device		*dev;
+	struct netif_rx_mode_config	*ready;
+	struct netif_rx_mode_config	*pending;
+};
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1101,9 +1135,14 @@ struct netdev_net_notifier {
  *	changes to configuration when multicast or promiscuous is enabled.
  *
  * void (*ndo_set_rx_mode)(struct net_device *dev);
- *	This function is called device changes address list filtering.
+ *	This function is called when device changes address list filtering.
  *	If driver handles unicast address filtering, it should set
- *	IFF_UNICAST_FLT in its priv_flags.
+ *	IFF_UNICAST_FLT in its priv_flags. This is used to configure
+ *	the rx_mode snapshot that will be written to the hardware.
+ *
+ * void (*ndo_write_rx_mode)(struct net_device *dev);
+ *	This function is scheduled after set_rx_mode and is responsible for
+ *	writing the rx_mode snapshot to the hardware.
  *
  * int (*ndo_set_mac_address)(struct net_device *dev, void *addr);
  *	This function  is called when the Media Access Control address
@@ -1424,6 +1463,7 @@ struct net_device_ops {
 	void			(*ndo_change_rx_flags)(struct net_device *dev,
 						       int flags);
 	void			(*ndo_set_rx_mode)(struct net_device *dev);
+	void			(*ndo_write_rx_mode)(struct net_device *dev);
 	int			(*ndo_set_mac_address)(struct net_device *dev,
 						       void *addr);
 	int			(*ndo_validate_addr)(struct net_device *dev);
@@ -1926,7 +1966,7 @@ enum netdev_reg_state {
  *	@ingress_queue:		XXX: need comments on this one
  *	@nf_hooks_ingress:	netfilter hooks executed for ingress packets
  *	@broadcast:		hw bcast address
- *
+ *	@rx_mode_ctx:		context required for rx_mode config work
  *	@rx_cpu_rmap:	CPU reverse-mapping for RX completion interrupts,
  *			indexed by RX queue number. Assigned by driver.
  *			This must only be set if the ndo_rx_flow_steer
@@ -2337,6 +2377,7 @@ struct net_device {
 #endif
 
 	unsigned char		broadcast[MAX_ADDR_LEN];
+	struct netif_rx_mode_ctx *rx_mode_ctx;
 #ifdef CONFIG_RFS_ACCEL
 	struct cpu_rmap		*rx_cpu_rmap;
 #endif
@@ -3360,6 +3401,63 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb);
 u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
 		     struct net_device *sb_dev);
 
+void netif_rx_mode_schedule_work(struct net_device *dev, bool flush);
+
+/* Drivers that implement rx mode as work flush the work item when closing
+ * or suspending. This is the substitute for those calls.
+ */
+static inline void netif_rx_mode_flush_work(struct net_device *dev)
+{
+	flush_work(&dev->rx_mode_ctx->rx_mode_work);
+}
+
+/* Helpers to be used in the set_rx_mode implementation */
+static inline void netif_rx_mode_set_bit(struct net_device *dev, int b,
+					 bool val)
+{
+	if (val)
+		dev->rx_mode_ctx->pending->ctrl_flags |= BIT(b);
+	else
+		dev->rx_mode_ctx->pending->ctrl_flags &= ~BIT(b);
+}
+
+static inline void netif_rx_mode_set_priv_ptr(struct net_device *dev,
+					      void *priv)
+{
+	dev->rx_mode_ctx->pending->priv_ptr = priv;
+}
+
+/* Helpers to be used in the write_rx_mode implementation */
+static inline bool netif_rx_mode_get_bit(struct net_device *dev, int b)
+{
+	return dev->rx_mode_ctx->ready->ctrl_flags & BIT(b);
+}
+
+static inline void *netif_rx_mode_get_priv_ptr(struct net_device *dev)
+{
+	return dev->rx_mode_ctx->ready->priv_ptr;
+}
+
+static inline int netif_rx_mode_get_mc_count(struct net_device *dev)
+{
+	return dev->rx_mode_ctx->ready->mc_count;
+}
+
+static inline int netif_rx_mode_get_uc_count(struct net_device *dev)
+{
+	return dev->rx_mode_ctx->ready->uc_count;
+}
+
+#define netif_rx_mode_for_each_uc_addr(dev, ha_addr, idx) \
+	for (ha_addr = dev->rx_mode_ctx->ready->uc_addrs, idx = 0; \
+		idx < dev->rx_mode_ctx->ready->uc_count; \
+		ha_addr += dev->addr_len, idx++)
+
+#define netif_rx_mode_for_each_mc_addr(dev, ha_addr, idx) \
+	for (ha_addr = dev->rx_mode_ctx->ready->mc_addrs, idx = 0; \
+		idx < dev->rx_mode_ctx->ready->mc_count; \
+		ha_addr += dev->addr_len, idx++)
+
 int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
 int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 69515edd17bc..021f24c82977 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1645,6 +1645,161 @@ static int napi_kthread_create(struct napi_struct *n)
 	return err;
 }
 
+/* The existence of pending/ready config is an implementation detail. The
+ * caller shouldn't be aware of them. This is a bit hacky. We read
+ * bits from pending because control bits need to be read before pending
+ * is prepared.
+ */
+static inline bool __netif_rx_mode_pending_get_bit(struct net_device *dev,
+						   int b)
+{
+	return dev->rx_mode_ctx->pending->ctrl_flags & BIT(b);
+}
+
+/* This function attempts to copy the current state of the
+ * net device into pending (reallocating if necessary). If it fails,
+ * pending is guaranteed to be unmodified.
+ */
+static int netif_rx_mode_alloc_and_fill_pending(struct net_device *dev)
+{
+	struct netif_rx_mode_config *pending = dev->rx_mode_ctx->pending;
+	struct netdev_hw_addr *ha;
+	int uc_count = 0, mc_count = 0;
+	char *tmp;
+	int i;
+
+	/* The allocations need to be atomic since this will be called under
+	 * netif_addr_lock_bh()
+	 */
+	if (!__netif_rx_mode_pending_get_bit(dev, NETIF_RX_MODE_UC_SKIP)) {
+		uc_count = netdev_uc_count(dev);
+		tmp = krealloc(pending->uc_addrs,
+			       uc_count * dev->addr_len,
+			       GFP_ATOMIC);
+		if (!tmp)
+			return -ENOMEM;
+		pending->uc_addrs = tmp;
+	}
+
+	if (!__netif_rx_mode_pending_get_bit(dev, NETIF_RX_MODE_MC_SKIP)) {
+		mc_count = netdev_mc_count(dev);
+		tmp = krealloc(pending->mc_addrs,
+			       mc_count * dev->addr_len,
+			       GFP_ATOMIC);
+		if (!tmp)
+			return -ENOMEM;
+		pending->mc_addrs = tmp;
+	}
+
+	/* This function cannot fail after this point */
+
+	/* This is going to be the same for every single driver. Better to
+	 * do it here than in the set_rx_mode impl
+	 */
+	netif_rx_mode_set_bit(dev, NETIF_RX_MODE_ALLMULTI_EN,
+			      !!(dev->flags & IFF_ALLMULTI));
+
+	netif_rx_mode_set_bit(dev, NETIF_RX_MODE_PROM_EN,
+			      !!(dev->flags & IFF_PROMISC));
+
+	i = 0;
+	if (!__netif_rx_mode_pending_get_bit(dev, NETIF_RX_MODE_UC_SKIP)) {
+		pending->uc_count = uc_count;
+		netdev_for_each_uc_addr(ha, dev)
+			memcpy(pending->uc_addrs + (i++) * dev->addr_len,
+			       ha->addr,
+			       dev->addr_len);
+	}
+
+	i = 0;
+	if (!__netif_rx_mode_pending_get_bit(dev, NETIF_RX_MODE_MC_SKIP)) {
+		pending->mc_count = mc_count;
+		netdev_for_each_mc_addr(ha, dev)
+			memcpy(pending->mc_addrs + (i++) * dev->addr_len,
+			       ha->addr,
+			       dev->addr_len);
+	}
+	return 0;
+}
+
+static void netif_rx_mode_prepare_pending(struct net_device *dev)
+{
+	lockdep_assert_held(&dev->addr_list_lock);
+	int rc;
+
+	rc = netif_rx_mode_alloc_and_fill_pending(dev);
+	if (rc)
+		return;
+
+	netif_rx_mode_set_bit(dev, NETIF_RX_MODE_CFG_READY, true);
+}
+
+static void netif_rx_mode_write_active(struct work_struct *param)
+{
+	struct netif_rx_mode_ctx *rx_mode_ctx = container_of(param,
+			struct netif_rx_mode_ctx, rx_mode_work);
+
+	struct net_device *dev = rx_mode_ctx->dev;
+
+	/* Paranoia. */
+	WARN_ON(!dev->netdev_ops->ndo_write_rx_mode);
+
+	/* We could introduce a new lock for this but reusing the addr
+	 * lock works well enough
+	 */
+	netif_addr_lock_bh(dev);
+
+	/* There's no point continuing if the pending config is not ready */
+	if (!__netif_rx_mode_pending_get_bit(dev, NETIF_RX_MODE_CFG_READY)) {
+		netif_addr_unlock_bh(dev);
+		return;
+	}
+
+	/* We use the prepared pending config as the new ready config and
+	 * reuse old ready config's memory for the next pending config
+	 */
+	swap(rx_mode_ctx->ready, rx_mode_ctx->pending);
+	netif_rx_mode_set_bit(dev, NETIF_RX_MODE_CFG_READY, false);
+
+	netif_addr_unlock_bh(dev);
+
+	rtnl_lock();
+	dev->netdev_ops->ndo_write_rx_mode(dev);
+	rtnl_unlock();
+}
+
+static int alloc_rx_mode_ctx(struct net_device *dev)
+{
+	dev->rx_mode_ctx = kzalloc(sizeof(*dev->rx_mode_ctx), GFP_KERNEL);
+
+	if (!dev->rx_mode_ctx)
+		goto fail;
+
+	dev->rx_mode_ctx->ready = kzalloc(sizeof(*dev->rx_mode_ctx->ready),
+					  GFP_KERNEL);
+
+	if (!dev->rx_mode_ctx->ready)
+		goto fail_ready;
+
+	dev->rx_mode_ctx->pending = kzalloc(sizeof(*dev->rx_mode_ctx->pending),
+					    GFP_KERNEL);
+
+	if (!dev->rx_mode_ctx->pending)
+		goto fail_pending;
+
+	INIT_WORK(&dev->rx_mode_ctx->rx_mode_work, netif_rx_mode_write_active);
+	dev->rx_mode_ctx->dev = dev;
+
+	return 0;
+
+fail_pending:
+	kfree(dev->rx_mode_ctx->ready);
+fail_ready:
+	kfree(dev->rx_mode_ctx);
+fail:
+	return -ENOMEM;
+}
+
 static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
@@ -1679,6 +1834,9 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 	if (ops->ndo_validate_addr)
 		ret = ops->ndo_validate_addr(dev);
 
+	if (!ret && ops->ndo_write_rx_mode)
+		ret = alloc_rx_mode_ctx(dev);
+
 	if (!ret && ops->ndo_open)
 		ret = ops->ndo_open(dev);
 
@@ -1713,6 +1871,22 @@ int netif_open(struct net_device *dev, struct netlink_ext_ack *extack)
 	return ret;
 }
 
+static void cleanup_rx_mode_ctx(struct net_device *dev)
+{
+	/* cancel and wait for execution to complete */
+	cancel_work_sync(&dev->rx_mode_ctx->rx_mode_work);
+
+	kfree(dev->rx_mode_ctx->pending->uc_addrs);
+	kfree(dev->rx_mode_ctx->pending->mc_addrs);
+	kfree(dev->rx_mode_ctx->pending);
+
+	kfree(dev->rx_mode_ctx->ready->uc_addrs);
+	kfree(dev->rx_mode_ctx->ready->mc_addrs);
+	kfree(dev->rx_mode_ctx->ready);
+
+	kfree(dev->rx_mode_ctx);
+}
+
 static void __dev_close_many(struct list_head *head)
 {
 	struct net_device *dev;
@@ -1755,6 +1929,8 @@ static void __dev_close_many(struct list_head *head)
 		if (ops->ndo_stop)
 			ops->ndo_stop(dev);
 
+		cleanup_rx_mode_ctx(dev);
+
 		netif_set_up(dev, false);
 		netpoll_poll_enable(dev);
 	}
@@ -9613,6 +9789,33 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
 	return 0;
 }
 
+/* netif_rx_mode_schedule_work - Sets up the rx_config snapshot and
+ * schedules the deferred I/O. If it's necessary to wait for completion
+ * of I/O, set flush to true.
+ */
+void netif_rx_mode_schedule_work(struct net_device *dev, bool flush)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (ops->ndo_set_rx_mode)
+		ops->ndo_set_rx_mode(dev);
+
+	/* Return early if ndo_write_rx_mode is not implemented */
+	if (!ops->ndo_write_rx_mode)
+		return;
+
+	/* If rx_mode config is disabled, we don't schedule the work */
+	if (__netif_rx_mode_pending_get_bit(dev, NETIF_RX_MODE_SET_DIS))
+		return;
+
+	netif_rx_mode_prepare_pending(dev);
+
+	schedule_work(&dev->rx_mode_ctx->rx_mode_work);
+	if (flush)
+		flush_work(&dev->rx_mode_ctx->rx_mode_work);
+}
+EXPORT_SYMBOL(netif_rx_mode_schedule_work);
+
 /*
  *	Upload unicast and multicast address lists to device and
  *	configure RX filtering. When the device doesn't support unicast
@@ -9621,8 +9824,6 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
  */
 void __dev_set_rx_mode(struct net_device *dev)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
-
 	/* dev_open will call this function so the list will stay sane. */
 	if (!(dev->flags&IFF_UP))
 		return;
@@ -9643,8 +9844,7 @@ void __dev_set_rx_mode(struct net_device *dev)
 		}
 	}
 
-	if (ops->ndo_set_rx_mode)
-		ops->ndo_set_rx_mode(dev);
+	netif_rx_mode_schedule_work(dev, false);
 }
 
 void dev_set_rx_mode(struct net_device *dev)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFT net-next v4 2/2] virtio-net: Implement ndO_write_rx_mode callback
  2025-11-18 16:43 [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
  2025-11-18 16:43 ` [RFT net-next v4] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
@ 2025-11-18 16:43 ` I Viswanath
  2025-11-18 16:56 ` [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
  2 siblings, 0 replies; 7+ messages in thread
From: I Viswanath @ 2025-11-18 16:43 UTC (permalink / raw)
  To: kuba, andrew+netdev, davem, edumazet, pabeni, horms, mst,
	jasowang, xuanzhuo, eperezma, sdf, kuniyu, skhawaja,
	aleksander.lobakin
  Cc: virtualization, netdev, linux-kernel, linux-kernel-mentees,
	I Viswanath

Implement ndo_write_rx_mode callback for virtio-net

Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
 drivers/net/virtio_net.c | 56 +++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 32 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cfa006b88688..20494ed5a0f6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -460,9 +460,6 @@ struct virtnet_info {
 	/* Work struct for config space updates */
 	struct work_struct config_work;
 
-	/* Work struct for setting rx mode */
-	struct work_struct rx_mode_work;
-
 	/* OK to queue work setting RX mode? */
 	bool rx_mode_work_enabled;
 
@@ -3857,18 +3854,17 @@ static int virtnet_close(struct net_device *dev)
 	return 0;
 }
 
-static void virtnet_rx_mode_work(struct work_struct *work)
+static void virtnet_write_rx_mode(struct net_device *dev)
 {
-	struct virtnet_info *vi =
-		container_of(work, struct virtnet_info, rx_mode_work);
+	struct virtnet_info *vi = netif_rx_mode_get_priv_ptr(dev);
 	u8 *promisc_allmulti  __free(kfree) = NULL;
-	struct net_device *dev = vi->dev;
 	struct scatterlist sg[2];
 	struct virtio_net_ctrl_mac *mac_data;
-	struct netdev_hw_addr *ha;
+	char *ha_addr;
 	int uc_count;
 	int mc_count;
 	void *buf;
+	int idx;
 	int i;
 
 	/* We can't dynamically set ndo_set_rx_mode, so return gracefully */
@@ -3881,9 +3877,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 		return;
 	}
 
-	rtnl_lock();
-
-	*promisc_allmulti = !!(dev->flags & IFF_PROMISC);
+	*promisc_allmulti = netif_rx_mode_get_bit(dev,
+						  NETIF_RX_MODE_PROM_EN);
 	sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
@@ -3891,7 +3886,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 		dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
 			 *promisc_allmulti ? "en" : "dis");
 
-	*promisc_allmulti = !!(dev->flags & IFF_ALLMULTI);
+	*promisc_allmulti = netif_rx_mode_get_bit(dev,
+						  NETIF_RX_MODE_ALLMULTI_EN);
 	sg_init_one(sg, promisc_allmulti, sizeof(*promisc_allmulti));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
@@ -3899,27 +3895,24 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 		dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
 			 *promisc_allmulti ? "en" : "dis");
 
-	netif_addr_lock_bh(dev);
+	uc_count = netif_rx_mode_get_uc_count(dev);
+	mc_count = netif_rx_mode_get_mc_count(dev);
 
-	uc_count = netdev_uc_count(dev);
-	mc_count = netdev_mc_count(dev);
 	/* MAC filter - use one buffer for both lists */
 	buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
 		      (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
 	mac_data = buf;
-	if (!buf) {
-		netif_addr_unlock_bh(dev);
-		rtnl_unlock();
+	if (!buf)
 		return;
-	}
 
 	sg_init_table(sg, 2);
 
 	/* Store the unicast list and count in the front of the buffer */
 	mac_data->entries = cpu_to_virtio32(vi->vdev, uc_count);
 	i = 0;
-	netdev_for_each_uc_addr(ha, dev)
-		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
+
+	netif_rx_mode_for_each_uc_addr(dev, ha_addr, idx)
+		memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
 
 	sg_set_buf(&sg[0], mac_data,
 		   sizeof(mac_data->entries) + (uc_count * ETH_ALEN));
@@ -3929,10 +3922,8 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 
 	mac_data->entries = cpu_to_virtio32(vi->vdev, mc_count);
 	i = 0;
-	netdev_for_each_mc_addr(ha, dev)
-		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
-
-	netif_addr_unlock_bh(dev);
+	netif_rx_mode_for_each_mc_addr(dev, ha_addr, idx)
+		memcpy(&mac_data->macs[i++][0], ha_addr, ETH_ALEN);
 
 	sg_set_buf(&sg[1], mac_data,
 		   sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
@@ -3941,17 +3932,18 @@ static void virtnet_rx_mode_work(struct work_struct *work)
 				  VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
 		dev_warn(&dev->dev, "Failed to set MAC filter table.\n");
 
-	rtnl_unlock();
-
 	kfree(buf);
 }
 
 static void virtnet_set_rx_mode(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	char cfg_disabled;
+
+	cfg_disabled = !vi->rx_mode_work_enabled;
+	netif_rx_mode_set_bit(dev, NETIF_RX_MODE_SET_DIS, cfg_disabled);
 
-	if (vi->rx_mode_work_enabled)
-		schedule_work(&vi->rx_mode_work);
+	netif_rx_mode_set_priv_ptr(dev, vi);
 }
 
 static int virtnet_vlan_rx_add_vid(struct net_device *dev,
@@ -5767,7 +5759,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 	/* Make sure no work handler is accessing the device */
 	flush_work(&vi->config_work);
 	disable_rx_mode_work(vi);
-	flush_work(&vi->rx_mode_work);
+	netif_rx_mode_flush_work(vi->dev);
 
 	if (netif_running(vi->dev)) {
 		rtnl_lock();
@@ -6270,6 +6262,7 @@ static const struct net_device_ops virtnet_netdev = {
 	.ndo_validate_addr   = eth_validate_addr,
 	.ndo_set_mac_address = virtnet_set_mac_address,
 	.ndo_set_rx_mode     = virtnet_set_rx_mode,
+	.ndo_write_rx_mode   = virtnet_write_rx_mode,
 	.ndo_get_stats64     = virtnet_stats,
 	.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
@@ -6891,7 +6884,6 @@ static int virtnet_probe(struct virtio_device *vdev)
 	vdev->priv = vi;
 
 	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
-	INIT_WORK(&vi->rx_mode_work, virtnet_rx_mode_work);
 	spin_lock_init(&vi->refill_lock);
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
@@ -7196,7 +7188,7 @@ static void virtnet_remove(struct virtio_device *vdev)
 	/* Make sure no work handler is accessing the device. */
 	flush_work(&vi->config_work);
 	disable_rx_mode_work(vi);
-	flush_work(&vi->rx_mode_work);
+	netif_rx_mode_flush_work(vi->dev);
 
 	virtnet_free_irq_moder(vi);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write
  2025-11-18 16:43 [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
  2025-11-18 16:43 ` [RFT net-next v4] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
  2025-11-18 16:43 ` [RFT net-next v4 2/2] virtio-net: Implement ndO_write_rx_mode callback I Viswanath
@ 2025-11-18 16:56 ` I Viswanath
  2025-11-18 18:50   ` Jakub Kicinski
  2 siblings, 1 reply; 7+ messages in thread
From: I Viswanath @ 2025-11-18 16:56 UTC (permalink / raw)
  To: kuba, andrew+netdev, davem, edumazet, pabeni, horms, mst,
	jasowang, xuanzhuo, eperezma, sdf, kuniyu, skhawaja,
	aleksander.lobakin
  Cc: virtualization, netdev, linux-kernel, linux-kernel-mentees

Hi,

I am sorry that it is broken. I will submit a v5 as soon as possible

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write
  2025-11-18 16:56 ` [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
@ 2025-11-18 18:50   ` Jakub Kicinski
  2025-11-19  3:57     ` I Viswanath
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-11-18 18:50 UTC (permalink / raw)
  To: I Viswanath
  Cc: andrew+netdev, davem, edumazet, pabeni, horms, mst, jasowang,
	xuanzhuo, eperezma, sdf, kuniyu, skhawaja, aleksander.lobakin,
	virtualization, netdev, linux-kernel, linux-kernel-mentees

On Tue, 18 Nov 2025 22:26:42 +0530 I Viswanath wrote:
> I am sorry that it is broken. I will submit a v5 as soon as possible

Broken as in patch titles or broken as in it crashes the kernel?
Both are true. Just to be safe "as soon as possible" hopefully
takes into account our "no reposts within 24h" rule.
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html

Also I'm not sure what you mean by RFT, doubt anyone will test this 
for you, and you're modifying virtio which you should be able to test
yourself.. RFC or PATCH is the choice.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write
  2025-11-18 18:50   ` Jakub Kicinski
@ 2025-11-19  3:57     ` I Viswanath
  2025-11-20  2:51       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: I Viswanath @ 2025-11-19  3:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, horms, mst, jasowang,
	xuanzhuo, eperezma, sdf, kuniyu, skhawaja, aleksander.lobakin,
	virtualization, netdev, linux-kernel, linux-kernel-mentees

On Wed, 19 Nov 2025 at 00:20, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Broken as in patch titles or broken as in it crashes the kernel?
> Both are true.
Yeah, I will fix both

> Just to be safe "as soon as possible" hopefully
> takes into account our "no reposts within 24h" rule.
> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
Yes, It does

> Also I'm not sure what you mean by RFT, doubt anyone will test this
> for you, and you're modifying virtio which you should be able to test
> yourself.. RFC or PATCH is the choice.

Just to be clear, would testing packet flow with all the possible mode
combinations
under heavy traffic be sufficient and exhaustive? I think this should
be PATCH once
I sort everything out

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write
  2025-11-19  3:57     ` I Viswanath
@ 2025-11-20  2:51       ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2025-11-20  2:51 UTC (permalink / raw)
  To: I Viswanath
  Cc: andrew+netdev, davem, edumazet, pabeni, horms, mst, jasowang,
	xuanzhuo, eperezma, sdf, kuniyu, skhawaja, aleksander.lobakin,
	virtualization, netdev, linux-kernel, linux-kernel-mentees

On Wed, 19 Nov 2025 09:27:58 +0530 I Viswanath wrote:
> > Also I'm not sure what you mean by RFT, doubt anyone will test this
> > for you, and you're modifying virtio which you should be able to test
> > yourself.. RFC or PATCH is the choice.  
> 
> Just to be clear, would testing packet flow with all the possible mode
> combinations
> under heavy traffic be sufficient and exhaustive? I think this should
> be PATCH once I sort everything out

I don't think traffic matters all that much, we're talking about control
path change. It's more important to exercise all the address lists 
(l2 unicast, multicast) and rx modes (promisc, allmulti) etc.
Then whatever other paths may load to touching the relevant state in
the driver (eg virtnet_freeze_down(), not triggers that, suspend?,
migration?).

In terms of traffic simple ping or a few UDP packets would be enough,
mostly to confirm the filtering is working as expected.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-11-20  2:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-18 16:43 [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2025-11-18 16:43 ` [RFT net-next v4] net: refactor set_rx_mode into snapshot and deferred I/O I Viswanath
2025-11-18 16:43 ` [RFT net-next v4 2/2] virtio-net: Implement ndO_write_rx_mode callback I Viswanath
2025-11-18 16:56 ` [RFT net-next v4 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2025-11-18 18:50   ` Jakub Kicinski
2025-11-19  3:57     ` I Viswanath
2025-11-20  2:51       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).