* [RFC PATCH net-next v2 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write
@ 2025-10-26 17:54 I Viswanath
2025-10-26 17:54 ` [RFC PATCH net-next v2 1/2] net: Add ndo_write_rx_config and helper structs and functions: I Viswanath
2025-10-26 17:54 ` [RFC PATCH net-next v2 2/2] net: ethernet: Implement ndo_write_rx_config callback for the 8139cp driver I Viswanath
0 siblings, 2 replies; 4+ messages in thread
From: I Viswanath @ 2025-10-26 17:54 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, sdf, kuniyu, ahmed.zaki,
aleksander.lobakin, jacob.e.keller
Cc: netdev, linux-kernel, skhan, linux-kernel-mentees,
david.hunter.linux, khalid, I Viswanath
This is an implementation of the idea provided by Jakub here
https://lore.kernel.org/netdev/20250923163727.5e97abdb@kernel.org/
ndo_set_rx_mode is problematic because it cannot sleep.
To address this, this series proposes dividing existing set_rx_mode
implementations into set_rx_mode and write_rx_config
The new set_rx_mode will be responsible for updating the rx_config
snapshot which will be used by ndo_write_rx_config to update the hardware
In brief, The callback implementations should look something like:
set_rx_mode():
prepare_rx_config();
update_snapshot();
write_rx_config():
read_snapshot();
do_io();
write_rx_config() is called from a work item making it sleepable
during the do_io() section.
This model should work correctly if the following conditions hold:
1. write_rx_config should use the rx_config set by the most recent
call to set_rx_mode before its execution.
2. If a set_rx_mode call happens during execution of write_rx_config,
write_rx_config should be rescheduled.
3. All calls to modify rx_mode should pass through the set_rx_mode +
schedule write_rx_config execution flow.
1 and 2 are guaranteed because of the properties of work queues
Drivers need to ensure 3
ndo_write_rx_config has been implemented for 8139cp driver as proof of
concept
To use this model, a driver needs to implement the
ndo_write_rx_config callback, have a member rx_config in
the priv struct and replace all calls to set rx mode with
schedule_and_set_rx_mode();
I Viswanath (2):
net: Add ndo_write_rx_config and helper structs and functions:
net: ethernet: Implement ndo_write_rx_config callback for the 8139cp
driver
drivers/net/ethernet/realtek/8139cp.c | 78 ++++++++++++++++-----------
include/linux/netdevice.h | 38 ++++++++++++-
net/core/dev.c | 54 +++++++++++++++++--
3 files changed, 132 insertions(+), 38 deletions(-)
---
v1:
Link: https://lore.kernel.org/netdev/20251020134857.5820-1-viswanathiyyappan@gmail.com/
v2:
- Exported set_and_schedule_rx_config as a symbol for use in modules
- Fixed incorrect cleanup for the case of rx_work alloc failing in alloc_netdev_mqs
- Removed the locked version (cp_set_rx_mode) and renamed __cp_set_rx_mode to cp_set_rx_mode
--
2.47.3
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH net-next v2 1/2] net: Add ndo_write_rx_config and helper structs and functions:
2025-10-26 17:54 [RFC PATCH net-next v2 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
@ 2025-10-26 17:54 ` I Viswanath
2025-10-26 17:54 ` [RFC PATCH net-next v2 2/2] net: ethernet: Implement ndo_write_rx_config callback for the 8139cp driver I Viswanath
1 sibling, 0 replies; 4+ messages in thread
From: I Viswanath @ 2025-10-26 17:54 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, sdf, kuniyu, ahmed.zaki,
aleksander.lobakin, jacob.e.keller
Cc: netdev, linux-kernel, skhan, linux-kernel-mentees,
david.hunter.linux, khalid, I Viswanath
Add ndo_write_rx_config callback and helper structs/functions:
rx_config_work - To schedule the callback and handle synchronization
read_snapshot/update_snapshot - Helper functions to read/update the
rx_config snapshot
set_and_schedule_rx_config - Helper function to call ndo_set_rx_mode
and schedule ndo_write_rx_config
execute_write_rx_config - Helper function that will be scheduled
by rx_work->config_write
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
include/linux/netdevice.h | 38 ++++++++++++++++++++++++++-
net/core/dev.c | 54 +++++++++++++++++++++++++++++++++++----
2 files changed, 86 insertions(+), 6 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d1a687444b27..80d6966d6981 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1100,7 +1100,12 @@ struct netdev_net_notifier {
* void (*ndo_set_rx_mode)(struct net_device *dev);
* This function is called device changes address list filtering.
* If driver handles unicast address filtering, it should set
- * IFF_UNICAST_FLT in its priv_flags.
+ * IFF_UNICAST_FLT in its priv_flags. This sets up the snapshot of
+ * rx_config that will be written to the device.
+ *
+ * void (*ndo_write_rx_config)(struct net_device *dev);
+ * This function is scheduled immediately after ndo_set_rx_mode to
+ * write rx_config to the device.
*
* int (*ndo_set_mac_address)(struct net_device *dev, void *addr);
* This function is called when the Media Access Control address
@@ -1421,6 +1426,7 @@ struct net_device_ops {
void (*ndo_change_rx_flags)(struct net_device *dev,
int flags);
void (*ndo_set_rx_mode)(struct net_device *dev);
+ void (*ndo_write_rx_config)(struct net_device *dev);
int (*ndo_set_mac_address)(struct net_device *dev,
void *addr);
int (*ndo_validate_addr)(struct net_device *dev);
@@ -1767,6 +1773,12 @@ enum netdev_reg_state {
NETREG_DUMMY, /* dummy device for NAPI poll */
};
+struct rx_config_work {
+ struct work_struct config_write;
+ struct net_device *dev;
+ spinlock_t config_lock;
+};
+
/**
* struct net_device - The DEVICE structure.
*
@@ -2082,6 +2094,8 @@ enum netdev_reg_state {
* dev_list, one per address-family.
* @hwprov: Tracks which PTP performs hardware packet time stamping.
*
+ * @rx_work: helper struct to schedule rx config write to the hardware.
+ *
* FIXME: cleanup struct net_device such that network protocol info
* moves out.
*/
@@ -2559,6 +2573,8 @@ struct net_device {
struct hwtstamp_provider __rcu *hwprov;
+ struct rx_config_work *rx_work;
+
u8 priv[] ____cacheline_aligned
__counted_by(priv_len);
} ____cacheline_aligned;
@@ -2734,6 +2750,26 @@ void dev_net_set(struct net_device *dev, struct net *net)
write_pnet(&dev->nd_net, net);
}
+#define update_snapshot(config_ptr, type) \
+ do { \
+ typeof((config_ptr)) rx_config = ((type *)(dev->priv))->rx_config; \
+ unsigned long flags; \
+ spin_lock_irqsave(&((dev)->rx_work->config_lock), flags); \
+ *rx_config = *(config_ptr); \
+ spin_unlock_irqrestore(&((dev)->rx_work->config_lock), flags); \
+ } while (0)
+
+#define read_snapshot(config_ptr, type) \
+ do { \
+ typeof((config_ptr)) rx_config = ((type *)(dev->priv))->rx_config; \
+ unsigned long flags; \
+ spin_lock_irqsave(&((dev)->rx_work->config_lock), flags); \
+ *(config_ptr) = *rx_config; \
+ spin_unlock_irqrestore(&((dev)->rx_work->config_lock), flags); \
+ } while (0)
+
+void set_and_schedule_rx_config(struct net_device *dev, bool flush);
+
/**
* netdev_priv - access network device private data
* @dev: network device
diff --git a/net/core/dev.c b/net/core/dev.c
index 2acfa44927da..2d3c6031e282 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9524,6 +9524,38 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
return 0;
}
+static void execute_write_rx_config(struct work_struct *param)
+{
+ struct rx_config_work *rx_work = container_of(param,
+ struct rx_config_work,
+ config_write);
+ struct net_device *dev = rx_work->dev;
+
+ // This path should not be hit outside the work item
+ WARN_ON(!dev->netdev_ops->ndo_write_rx_config);
+ dev->netdev_ops->ndo_write_rx_config(dev);
+}
+
+/*
+ * Sets up the rx_config snapshot and schedules write_rx_config. If
+ * it's necessary to wait for completion of write_rx_config, set
+ * flush to true.
+ */
+void set_and_schedule_rx_config(struct net_device *dev, bool flush)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+
+ if (ops->ndo_set_rx_mode)
+ ops->ndo_set_rx_mode(dev);
+
+ if (ops->ndo_write_rx_config) {
+ schedule_work(&dev->rx_work->config_write);
+ if (flush)
+ flush_work(&dev->rx_work->config_write);
+ }
+}
+EXPORT_SYMBOL(set_and_schedule_rx_config);
+
/*
* Upload unicast and multicast address lists to device and
* configure RX filtering. When the device doesn't support unicast
@@ -9532,8 +9564,6 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
*/
void __dev_set_rx_mode(struct net_device *dev)
{
- const struct net_device_ops *ops = dev->netdev_ops;
-
/* dev_open will call this function so the list will stay sane. */
if (!(dev->flags&IFF_UP))
return;
@@ -9554,8 +9584,7 @@ void __dev_set_rx_mode(struct net_device *dev)
}
}
- if (ops->ndo_set_rx_mode)
- ops->ndo_set_rx_mode(dev);
+ set_and_schedule_rx_config(dev, false);
}
void dev_set_rx_mode(struct net_device *dev)
@@ -11914,9 +11943,17 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
refcount_set(&dev->dev_refcnt, 1);
#endif
- if (dev_addr_init(dev))
+ dev->rx_work = kmalloc(sizeof(*dev->rx_work), GFP_KERNEL);
+ if (!dev->rx_work)
goto free_pcpu;
+ dev->rx_work->dev = dev;
+ spin_lock_init(&dev->rx_work->config_lock);
+ INIT_WORK(&dev->rx_work->config_write, execute_write_rx_config);
+
+ if (dev_addr_init(dev))
+ goto free_rx_work;
+
dev_mc_init(dev);
dev_uc_init(dev);
@@ -11998,6 +12035,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
free_netdev(dev);
return NULL;
+free_rx_work:
+ cancel_work_sync(&dev->rx_work->config_write);
+ kfree(dev->rx_work);
+
free_pcpu:
#ifdef CONFIG_PCPU_DEV_REFCNT
free_percpu(dev->pcpu_refcnt);
@@ -12083,6 +12124,9 @@ void free_netdev(struct net_device *dev)
return;
}
+ cancel_work_sync(&dev->rx_work->config_write);
+ kfree(dev->rx_work);
+
BUG_ON(dev->reg_state != NETREG_UNREGISTERED);
WRITE_ONCE(dev->reg_state, NETREG_RELEASED);
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH net-next v2 2/2] net: ethernet: Implement ndo_write_rx_config callback for the 8139cp driver
2025-10-26 17:54 [RFC PATCH net-next v2 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2025-10-26 17:54 ` [RFC PATCH net-next v2 1/2] net: Add ndo_write_rx_config and helper structs and functions: I Viswanath
@ 2025-10-26 17:54 ` I Viswanath
2025-10-28 15:17 ` Simon Horman
1 sibling, 1 reply; 4+ messages in thread
From: I Viswanath @ 2025-10-26 17:54 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, sdf, kuniyu, ahmed.zaki,
aleksander.lobakin, jacob.e.keller
Cc: netdev, linux-kernel, skhan, linux-kernel-mentees,
david.hunter.linux, khalid, I Viswanath
Implement ndo_write_rx_config for the 8139cp driver
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
drivers/net/ethernet/realtek/8139cp.c | 78 ++++++++++++++++-----------
1 file changed, 46 insertions(+), 32 deletions(-)
diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 5652da8a178c..3480079361f3 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -319,6 +319,11 @@ struct cp_extra_stats {
unsigned long rx_frags;
};
+struct cp_rx_config {
+ int rx_mode;
+ u32 mc_filter[2]; /* Multicast hash filter */
+};
+
struct cp_private {
void __iomem *regs;
struct net_device *dev;
@@ -328,7 +333,7 @@ struct cp_private {
struct napi_struct napi;
struct pci_dev *pdev;
- u32 rx_config;
+ struct cp_rx_config *rx_config;
u16 cpcmd;
struct cp_extra_stats cp_stats;
@@ -372,7 +377,6 @@ struct cp_private {
} while (0)
-static void __cp_set_rx_mode (struct net_device *dev);
static void cp_tx (struct cp_private *cp);
static void cp_clean_rings (struct cp_private *cp);
#ifdef CONFIG_NET_POLL_CONTROLLER
@@ -882,55 +886,53 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
goto out_unlock;
}
+static void cp_write_rx_config(struct net_device *dev)
+{
+ struct cp_private *cp = netdev_priv(dev);
+ struct cp_rx_config snapshot;
+
+ read_snapshot((&snapshot), struct cp_private);
+
+ /* We can safely update without stopping the chip. */
+ cpw32_f(RxConfig, snapshot.rx_mode);
+
+ cpw32_f(MAR0 + 0, snapshot.mc_filter[0]);
+ cpw32_f(MAR0 + 4, snapshot.mc_filter[1]);
+}Firstly, think whether you have a bug fix or new "next-like" content.
Then once decided, assuming that you use git, use the prefix flag, i.e.
+
/* Set or clear the multicast filter for this adaptor.
This routine is not state sensitive and need not be SMP locked. */
-static void __cp_set_rx_mode (struct net_device *dev)
+static void cp_set_rx_mode (struct net_device *dev)
{
- struct cp_private *cp = netdev_priv(dev);
- u32 mc_filter[2]; /* Multicast hash filter */
- int rx_mode;
+ struct cp_rx_config new_config;
/* Note: do not reorder, GCC is clever about common statements. */
if (dev->flags & IFF_PROMISC) {
/* Unconditionally log net taps. */
- rx_mode =
+ new_config.rx_mode =
AcceptBroadcast | AcceptMulticast | AcceptMyPhys |
AcceptAllPhys;
- mc_filter[1] = mc_filter[0] = 0xffffffff;
+ new_config.mc_filter[1] = new_config.mc_filter[0] = 0xffffffff;
} else if ((netdev_mc_count(dev) > multicast_filter_limit) ||
(dev->flags & IFF_ALLMULTI)) {
/* Too many to filter perfectly -- accept all multicasts. */
- rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
- mc_filter[1] = mc_filter[0] = 0xffffffff;
+ new_config.rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
+ new_config.mc_filter[1] = new_config.mc_filter[0] = 0xffffffff;
} else {
struct netdev_hw_addr *ha;
- rx_mode = AcceptBroadcast | AcceptMyPhys;
- mc_filter[1] = mc_filter[0] = 0;
+ new_config.rx_mode = AcceptBroadcast | AcceptMyPhys;
+ new_config.mc_filter[1] = new_config.mc_filter[0] = 0;
netdev_for_each_mc_addr(ha, dev) {
int bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26;
- mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
- rx_mode |= AcceptMulticast;
+ new_config.mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
+ new_config.rx_mode |= AcceptMulticast;
}
}
- /* We can safely update without stopping the chip. */
- cp->rx_config = cp_rx_config | rx_mode;
- cpw32_f(RxConfig, cp->rx_config);
-
- cpw32_f (MAR0 + 0, mc_filter[0]);
- cpw32_f (MAR0 + 4, mc_filter[1]);
-}
-
-static void cp_set_rx_mode (struct net_device *dev)
-{
- unsigned long flags;
- struct cp_private *cp = netdev_priv(dev);
-
- spin_lock_irqsave (&cp->lock, flags);
- __cp_set_rx_mode(dev);
- spin_unlock_irqrestore (&cp->lock, flags);
+ new_config.rx_mode = cp_rx_config | new_config.rx_mode;
+ update_snapshot(&new_config, struct cp_private);
}
static void __cp_get_stats(struct cp_private *cp)
@@ -1040,7 +1042,7 @@ static void cp_init_hw (struct cp_private *cp)
cp_start_hw(cp);
cpw8(TxThresh, 0x06); /* XXX convert magic num to a constant */
- __cp_set_rx_mode(dev);
+ set_and_schedule_rx_config(dev, true);
cpw32_f (TxConfig, IFG | (TX_DMA_BURST << TxDMAShift));
cpw8(Config1, cpr8(Config1) | DriverLoaded | PMEnable);
@@ -1188,6 +1190,12 @@ static int cp_open (struct net_device *dev)
if (rc)
return rc;
+ cp->rx_config = kmalloc(sizeof(*cp->rx_config), GFP_KERNEL);
+ if (!cp->rx_config) {
+ rc = -ENOMEM;
+ goto err_out_rx_config;
+ }
+
napi_enable(&cp->napi);
cp_init_hw(cp);
@@ -1207,6 +1215,9 @@ static int cp_open (struct net_device *dev)
err_out_hw:
napi_disable(&cp->napi);
cp_stop_hw(cp);
+ kfree(cp->rx_config);
+
+err_out_rx_config:
cp_free_rings(cp);
return rc;
}
@@ -1227,6 +1238,8 @@ static int cp_close (struct net_device *dev)
cp_stop_hw(cp);
+ kfree(cp->rx_config);
+
spin_unlock_irqrestore(&cp->lock, flags);
free_irq(cp->pdev->irq, dev);
@@ -1262,7 +1275,7 @@ static void cp_tx_timeout(struct net_device *dev, unsigned int txqueue)
cp_clean_rings(cp);
cp_init_rings(cp);
cp_start_hw(cp);
- __cp_set_rx_mode(dev);
+ set_and_schedule_rx_config(dev, false);
cpw16_f(IntrMask, cp_norx_intr_mask);
netif_wake_queue(dev);
@@ -1870,6 +1883,7 @@ static const struct net_device_ops cp_netdev_ops = {
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = cp_set_mac_address,
.ndo_set_rx_mode = cp_set_rx_mode,
+ .ndo_write_rx_config = cp_write_rx_config,
.ndo_get_stats = cp_get_stats,
.ndo_eth_ioctl = cp_ioctl,
.ndo_start_xmit = cp_start_xmit,
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH net-next v2 2/2] net: ethernet: Implement ndo_write_rx_config callback for the 8139cp driver
2025-10-26 17:54 ` [RFC PATCH net-next v2 2/2] net: ethernet: Implement ndo_write_rx_config callback for the 8139cp driver I Viswanath
@ 2025-10-28 15:17 ` Simon Horman
0 siblings, 0 replies; 4+ messages in thread
From: Simon Horman @ 2025-10-28 15:17 UTC (permalink / raw)
To: I Viswanath
Cc: davem, edumazet, kuba, pabeni, sdf, kuniyu, ahmed.zaki,
aleksander.lobakin, jacob.e.keller, netdev, linux-kernel, skhan,
linux-kernel-mentees, david.hunter.linux, khalid
On Sun, Oct 26, 2025 at 11:24:45PM +0530, I Viswanath wrote:
> Implement ndo_write_rx_config for the 8139cp driver
>
> Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
...
> @@ -882,55 +886,53 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
> goto out_unlock;
> }
>
> +static void cp_write_rx_config(struct net_device *dev)
> +{
> + struct cp_private *cp = netdev_priv(dev);
> + struct cp_rx_config snapshot;
> +
> + read_snapshot((&snapshot), struct cp_private);
> +
> + /* We can safely update without stopping the chip. */
> + cpw32_f(RxConfig, snapshot.rx_mode);
> +
> + cpw32_f(MAR0 + 0, snapshot.mc_filter[0]);
> + cpw32_f(MAR0 + 4, snapshot.mc_filter[1]);
> +}Firstly, think whether you have a bug fix or new "next-like" content.
> Then once decided, assuming that you use git, use the prefix flag, i.e.
FWIIW, this patch is mangled, e.g. the lines above.
> +
> /* Set or clear the multicast filter for this adaptor.
> This routine is not state sensitive and need not be SMP locked. */
...
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-28 15:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-26 17:54 [RFC PATCH net-next v2 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write I Viswanath
2025-10-26 17:54 ` [RFC PATCH net-next v2 1/2] net: Add ndo_write_rx_config and helper structs and functions: I Viswanath
2025-10-26 17:54 ` [RFC PATCH net-next v2 2/2] net: ethernet: Implement ndo_write_rx_config callback for the 8139cp driver I Viswanath
2025-10-28 15:17 ` Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).