Linux Documentation
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs
@ 2026-05-26 16:01 Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 01/10] docs: net: netdevices: small fixes and clarifications Jakub Kicinski
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

I'm hoping to start feeding our docs into the AI review tools, instead
of maintaining a separate repo with review prompts. To experiment with
that we have to refresh the docs a little bit.

A read thru our current docs makes one slightly question the value
of including them in reviews. But directionally, I feel, it's probably
still right. I'm hoping the Rx Checksum section about not dropping packets
for example to be impactful. I don't think the current AI agents or
review docs include this guidance.

Jakub Kicinski (10):
  docs: net: netdevices: small fixes and clarifications
  docs: net: fix minor issues with driver guide
  docs: net: statistics: fix kernel-internal stats list
  docs: net: update devmem code examples
  docs: net: fix minor issues with the NAPI guide
  docs: net: refresh netdev feature guidance
  docs: net: fix minor issues with checksum offloads
  docs: net: add Rx notes to the checksum guide
  docs: net: render the checksum comment in checksum-offloads.rst
  docs: net: fix minor issues with segmentation offloads

 .../networking/checksum-offloads.rst          | 67 ++++++++++++-------
 Documentation/networking/devmem.rst           | 27 +++-----
 Documentation/networking/driver.rst           |  7 +-
 Documentation/networking/napi.rst             | 11 ++-
 Documentation/networking/netdev-features.rst  | 60 +++++++++++------
 Documentation/networking/netdevices.rst       | 31 +++++----
 .../networking/segmentation-offloads.rst      | 37 +++++++++-
 Documentation/networking/skbuff.rst           |  6 --
 Documentation/networking/statistics.rst       | 19 ++++--
 9 files changed, 172 insertions(+), 93 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 01/10] docs: net: netdevices: small fixes and clarifications
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 22:12   ` Stanislav Fomichev
  2026-05-26 16:01 ` [PATCH net-next 02/10] docs: net: fix minor issues with driver guide Jakub Kicinski
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

A handful of unrelated nits:

 - free_netdevice() does not exist; replace two stray references
   with free_netdev().
 - The simple-driver probe example fell through into err_undo after
   register_netdev() success; add return 0 for clarity.
 - Clarify the netdev_priv() paragraph: "(netdev_priv())" was easy
   to misread as the thing that needs explicit freeing; spell out
   that it refers to extra pointers stored in the device private
   struct.
 - ndo_setup_tc synchronization note: TC_SETUP_BLOCK / TC_SETUP_FT
   actually run under block->cb_lock, not "NFT locks", and rtnl_lock
   may or may not be held depending on path.
 - ->lltx guidance reads as very outdated, it's not really deprecated.
   I suspect people may have been trying to use it for HW drivers
   in the past but I can't think of such a case in the last decade.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/netdevices.rst | 31 ++++++++++++++-----------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 93e06e8d51a9..60492d4df2ee 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -21,13 +21,14 @@ by free_netdev(). This is required to handle the pathological case cleanly
 alloc_netdev_mqs() / alloc_netdev() reserve extra space for driver
 private data which gets freed when the network device is freed. If
 separately allocated data is attached to the network device
-(netdev_priv()) then it is up to the module exit handler to free that.
+(extra pointers stored in the device private struct) then it is up
+to the module exit handler to free that.
 
 There are two groups of APIs for registering struct net_device.
 First group can be used in normal contexts where ``rtnl_lock`` is not already
 held: register_netdev(), unregister_netdev().
 Second group can be used when ``rtnl_lock`` is already held:
-register_netdevice(), unregister_netdevice(), free_netdevice().
+register_netdevice(), unregister_netdevice(), free_netdev().
 
 Simple drivers
 --------------
@@ -58,6 +59,7 @@ In that case the struct net_device registration is done using
       goto err_undo;
 
     /* net_device is visible to the user! */
+    return 0;
 
   err_undo:
     /* ... undo the device setup ... */
@@ -73,7 +75,7 @@ In that case the struct net_device registration is done using
 
 Note that after calling register_netdev() the device is visible in the system.
 Users can open it and start sending / receiving traffic immediately,
-or run any other callback, so all initialization must be done prior to
+or run any other callback, so all initialization must be **complete** prior to
 registration.
 
 unregister_netdev() closes the device and waits for all users to be done
@@ -157,7 +159,7 @@ register_netdevice() fails. The callback may be invoked with or without
 There is no explicit constructor callback, driver "constructs" the private
 netdev state after allocating it and before registration.
 
-Setting struct net_device.needs_free_netdev makes core call free_netdevice()
+Setting struct net_device.needs_free_netdev makes core call free_netdev()
 automatically after unregister_netdevice() when all references to the device
 are gone. It only takes effect after a successful call to register_netdevice()
 so if register_netdevice() fails driver is responsible for calling
@@ -256,7 +258,7 @@ struct net_device synchronization rules
 	lock if the driver implements queue management or shaper API.
 	Context: process
 
-ndo_get_stats:
+ndo_get_stats / ndo_get_stats64:
 	Synchronization: RCU (can be called concurrently with the stats
 	update path).
 	Context: atomic (can't sleep under RCU)
@@ -264,12 +266,9 @@ struct net_device synchronization rules
 ndo_start_xmit:
 	Synchronization: __netif_tx_lock spinlock.
 
-	When the driver sets dev->lltx this will be
-	called without holding netif_tx_lock. In this case the driver
-	has to lock by itself when needed.
-	The locking there should also properly protect against
-	set_rx_mode. WARNING: use of dev->lltx is deprecated.
-	Don't use it for new drivers.
+	When the driver sets dev->lltx this will be called without holding
+	netif_tx_lock. dev->lltx is meant for software drivers only, since
+	they often have no per-queue state.
 
 	Context: Process with BHs disabled or BH (timer),
 		 will be called with interrupts disabled by netconsole.
@@ -304,11 +303,15 @@ struct net_device synchronization rules
 	lock if the driver implements queue management or shaper API.
 
 ndo_setup_tc:
-	``TC_SETUP_BLOCK`` and ``TC_SETUP_FT`` are running under NFT locks
-	(i.e. no ``rtnl_lock`` and no device instance lock). The rest of
-	``tc_setup_type`` types run under netdev instance lock if the driver
+	Locking depends on ``tc_setup_type``. For most types the callback
+	is invoked under ``rtnl_lock`` and netdev instance lock if the driver
 	implements queue management or shaper API.
 
+	For ``TC_SETUP_BLOCK`` and ``TC_SETUP_FT`` ``rtnl_lock`` may or
+	may not be held, and the netdev instance lock is not held.
+	``TC_SETUP_BLOCK`` runs under ``block->cb_lock`` and ``TC_SETUP_FT``
+	runs under ``flowtable->flow_block_lock``.
+
 Most ndo callbacks not specified in the list above are running
 under ``rtnl_lock``. In addition, netdev instance lock is taken as well if
 the driver implements queue management or shaper API.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 02/10] docs: net: fix minor issues with driver guide
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 01/10] docs: net: netdevices: small fixes and clarifications Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 03/10] docs: net: statistics: fix kernel-internal stats list Jakub Kicinski
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Update the driver documentation TX queue example to match current APIs:

- use the ring-local tx_ring_mask field in drv_tx_avail()
- stop the selected netdev_queue with netif_tx_stop_queue() instead of
  stopping queue 0 with netif_stop_queue()

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/driver.rst | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 4f5dfa9c022e..195a916dc0de 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -51,7 +51,7 @@ Instead it must maintain the queue properly.  For example,
 	{
 		u32 used = READ_ONCE(dr->prod) - READ_ONCE(dr->cons);
 
-		return dr->tx_ring_size - (used & bp->tx_ring_mask);
+		return dr->tx_ring_size - (used & dr->tx_ring_mask);
 	}
 
 	static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
@@ -69,7 +69,7 @@ Instead it must maintain the queue properly.  For example,
 		//...
 		/* This should be a very rare race - log it. */
 		if (drv_tx_avail(dr) <= skb_shinfo(skb)->nr_frags + 1) {
-			netif_stop_queue(dev);
+			netif_tx_stop_queue(txq);
 			netdev_warn(dev, "Tx Ring full when queue awake!\n");
 			return NETDEV_TX_BUSY;
 		}
@@ -103,6 +103,9 @@ Lockless queue stop / wake helper macros
 .. kernel-doc:: include/net/netdev_queues.h
    :doc: Lockless queue stopping / waking helpers.
 
+The standard macros like netif_txq_maybe_stop(), netif_txq_try_stop() etc.
+are well tested, prefer them over local synchronization schemes.
+
 No exclusive ownership
 ----------------------
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 03/10] docs: net: statistics: fix kernel-internal stats list
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 01/10] docs: net: netdevices: small fixes and clarifications Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 02/10] docs: net: fix minor issues with driver guide Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 04/10] docs: net: update devmem code examples Jakub Kicinski
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Update the kernel-internal ethtool stats list to match current code:

- spell the entries as "struct ethtool_*_stats", not as functions
- list the full set of structures, not only pause and fec
- mention that fields are pre-initialized to ETHTOOL_STAT_NOT_SET by
  ethtool_stats_init() and drivers should leave unsupported fields at
  that value rather than zeroing them

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/statistics.rst | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
index 66b0ef941457..824ebc549383 100644
--- a/Documentation/networking/statistics.rst
+++ b/Documentation/networking/statistics.rst
@@ -231,8 +231,19 @@ Kernel-internal data structures
 -------------------------------
 
 The following structures are internal to the kernel, their members are
-translated to netlink attributes when dumped. Drivers must not overwrite
-the statistics they don't report with 0.
+translated to netlink attributes when dumped. Fields are pre-initialized
+to ``ETHTOOL_STAT_NOT_SET`` (by ``ethtool_stats_init()``); drivers must
+leave fields they do not report at that value rather than overwriting
+them with 0.
 
-- ethtool_pause_stats()
-- ethtool_fec_stats()
+- ``struct ethtool_eth_ctrl_stats``
+- ``struct ethtool_eth_mac_stats``
+- ``struct ethtool_eth_phy_stats``
+- ``struct ethtool_fec_hist``
+- ``struct ethtool_fec_stats``
+- ``struct ethtool_link_ext_stats``
+- ``struct ethtool_mm_stats``
+- ``struct ethtool_pause_stats``
+- ``struct ethtool_phy_stats``
+- ``struct ethtool_rmon_stats``
+- ``struct ethtool_ts_stats``
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 04/10] docs: net: update devmem code examples
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (2 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 03/10] docs: net: statistics: fix kernel-internal stats list Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 22:17   ` Stanislav Fomichev
  2026-05-26 16:01 ` [PATCH net-next 05/10] docs: net: fix minor issues with the NAPI guide Jakub Kicinski
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Update the code examples
 - update the YNL sample with the latest(?) APIs
 - struct dmabuf_tx_cmsg does not exist, use __u32 directly

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/devmem.rst | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
index a6cd7236bfbd..6a3f3c2ac19c 100644
--- a/Documentation/networking/devmem.rst
+++ b/Documentation/networking/devmem.rst
@@ -103,24 +103,22 @@ The user must bind a dmabuf to any number of RX queues on a given NIC using
 the netlink API::
 
 	/* Bind dmabuf to NIC RX queue 15 */
-	struct netdev_queue *queues;
-	queues = malloc(sizeof(*queues) * 1);
+	struct netdev_queue_id *queues;
 
-	queues[0]._present.type = 1;
-	queues[0]._present.idx = 1;
-	queues[0].type = NETDEV_RX_QUEUE_TYPE_RX;
-	queues[0].idx = 15;
+	queues = netdev_queue_id_alloc(1);
+	netdev_queue_id_set_type(&queues[0], NETDEV_QUEUE_TYPE_RX);
+	netdev_queue_id_set_id(&queues[0], 15);
 
 	*ys = ynl_sock_create(&ynl_netdev_family, &yerr);
 
 	req = netdev_bind_rx_req_alloc();
 	netdev_bind_rx_req_set_ifindex(req, 1 /* ifindex */);
-	netdev_bind_rx_req_set_dmabuf_fd(req, dmabuf_fd);
-	__netdev_bind_rx_req_set_queues(req, queues, n_queue_index);
+	netdev_bind_rx_req_set_fd(req, dmabuf_fd);
+	__netdev_bind_rx_req_set_queues(req, queues, 1);
 
 	rsp = netdev_bind_rx(*ys, req);
 
-	dmabuf_id = rsp->dmabuf_id;
+	dmabuf_id = rsp->id;
 
 
 The netlink API returns a dmabuf_id: a unique ID that refers to this dmabuf
@@ -302,13 +300,12 @@ The user should create a msghdr where,
 * iov_base is set to the offset into the dmabuf to start sending from
 * iov_len is set to the number of bytes to be sent from the dmabuf
 
-The user passes the dma-buf id to send from via the dmabuf_tx_cmsg.dmabuf_id.
+The user passes the dma-buf id to send from as a u32 cmsg payload.
 
 The example below sends 1024 bytes from offset 100 into the dmabuf, and 2048
 from offset 2000 into the dmabuf. The dmabuf to send from is tx_dmabuf_id::
 
-       char ctrl_data[CMSG_SPACE(sizeof(struct dmabuf_tx_cmsg))];
-       struct dmabuf_tx_cmsg ddmabuf;
+       char ctrl_data[CMSG_SPACE(sizeof(__u32))];
        struct msghdr msg = {};
        struct cmsghdr *cmsg;
        struct iovec iov[2];
@@ -327,11 +324,9 @@ The example below sends 1024 bytes from offset 100 into the dmabuf, and 2048
        cmsg = CMSG_FIRSTHDR(&msg);
        cmsg->cmsg_level = SOL_SOCKET;
        cmsg->cmsg_type = SCM_DEVMEM_DMABUF;
-       cmsg->cmsg_len = CMSG_LEN(sizeof(struct dmabuf_tx_cmsg));
+       cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
 
-       ddmabuf.dmabuf_id = tx_dmabuf_id;
-
-       *((struct dmabuf_tx_cmsg *)CMSG_DATA(cmsg)) = ddmabuf;
+       *((__u32 *)CMSG_DATA(cmsg)) = tx_dmabuf_id;
 
        sendmsg(socket_fd, &msg, MSG_ZEROCOPY);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 05/10] docs: net: fix minor issues with the NAPI guide
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (3 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 04/10] docs: net: update devmem code examples Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 06/10] docs: net: refresh netdev feature guidance Jakub Kicinski
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Update the NAPI documentation to match current API behavior:

- repeated napi_disable() calls hang waiting for ownership, rather
  than deadlock
- NAPI IDs are exposed through SO_INCOMING_NAPI_ID and netdev Netlink
- epoll uses the maxevents parameter spelling
- add that drivers holding the netdev instance lock may need _locked()
  variants

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/napi.rst | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
index 4e008efebb35..c719924f36ce 100644
--- a/Documentation/networking/napi.rst
+++ b/Documentation/networking/napi.rst
@@ -49,7 +49,11 @@ instance to be released.
 The control APIs are not idempotent. Control API calls are safe against
 concurrent use of datapath APIs but an incorrect sequence of control API
 calls may result in crashes, deadlocks, or race conditions. For example,
-calling napi_disable() multiple times in a row will deadlock.
+calling napi_disable() multiple times in a row will hang waiting for
+ownership of the NAPI instance to be released.
+
+Drivers using the netdev instance lock may need to use the ``_locked()``
+variants of the control APIs when that lock is already held.
 
 Datapath API
 ------------
@@ -190,7 +194,8 @@ User API
 ========
 
 User interactions with NAPI depend on NAPI instance ID. The instance IDs
-are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.
+are visible to the user through the ``SO_INCOMING_NAPI_ID`` socket option
+and the netdev Netlink API.
 
 Users can query NAPI IDs for a device or device queue using netlink. This can
 be done programmatically in a user application or by using a script included in
@@ -371,7 +376,7 @@ efficiency.
      the application has stalled. This value should be chosen so that it covers
      the amount of time the user application needs to process data from its
      call to epoll_wait, noting that applications can control how much data
-     they retrieve by setting ``max_events`` when calling epoll_wait.
+     they retrieve by setting ``maxevents`` when calling epoll_wait.
 
   2. The sysfs parameter or per-NAPI config parameters ``gro_flush_timeout``
      and ``napi_defer_hard_irqs`` can be set to low values. They will be used
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 06/10] docs: net: refresh netdev feature guidance
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (4 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 05/10] docs: net: fix minor issues with the NAPI guide Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 18:41   ` Maxime Chevallier
  2026-05-26 16:01 ` [PATCH net-next 07/10] docs: net: fix minor issues with checksum offloads Jakub Kicinski
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Update netdev feature documentation for current locking rules and
feature semantics. Clarify hw_features updates and netdev_update_features()
locking, keep the NETIF_F_NEVER_CHANGE rule with the VLAN challenged
exception, fix the HSR duplication wording, and document netdev->netmem_tx
as a device flag rather than a feature bit.

Split the list of basic feature sets from the "extra" ones like
vlan_features. A bunch of the newer fields weren't documented and
having them all together would be confusing.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/netdev-features.rst | 60 +++++++++++++-------
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/Documentation/networking/netdev-features.rst b/Documentation/networking/netdev-features.rst
index 02bd7536fc0c..6293d47e5b09 100644
--- a/Documentation/networking/netdev-features.rst
+++ b/Documentation/networking/netdev-features.rst
@@ -18,29 +18,38 @@ that relieve an OS of various tasks like generating and checking checksums,
 splitting packets, classifying them.  Those capabilities and their state
 are commonly referred to as netdev features in Linux kernel world.
 
-There are currently three sets of features relevant to the driver, and
-one used internally by network core:
+There are currently three main sets of features on each netdevice,
+first and second are initialized by the driver:
 
  1. netdev->hw_features set contains features whose state may possibly
     be changed (enabled or disabled) for a particular device by user's
-    request.  This set should be initialized in ndo_init callback and not
-    changed later.
+    request.  Drivers normally initialize this set before registration or
+    in the ndo_init callback. Changes after registration should be made
+    very carefully as other parts of the code may assume hw_features are
+    static. At the very least changes must be made under rtnl_lock and
+    the netdev instance lock, and followed by netdev_update_features().
 
  2. netdev->features set contains features which are currently enabled
     for a device.  This should be changed only by network core or in
     error paths of ndo_set_features callback.
 
- 3. netdev->vlan_features set contains features whose state is inherited
-    by child VLAN devices (limits netdev->features set).  This is currently
-    used for all VLAN devices whether tags are stripped or inserted in
-    hardware or software.
-
- 4. netdev->wanted_features set contains feature set requested by user.
+ 3. netdev->wanted_features set contains feature set requested by user.
     This set is filtered by ndo_fix_features callback whenever it or
     some device-specific conditions change. This set is internal to
     networking core and should not be referenced in drivers.
 
+On top of those three main sets, each netdev has:
 
+ 1. Sets which control features inherited by child devices (VLAN, MPLS,
+    hw_enc for L3/L4 tunnels). These sets allow the driver to limit which
+    netdev->features are propagated, in case HW cannot perform the offloads
+    with the extra headers present.
+
+ 2. netdev->mangleid_features, TSO features which are supported only when
+    IP ID field can be mangled (constant instead of incrementing) during TSO.
+
+ 3. netdev->gso_partial_features, additional TSO features which HW can
+    support via NETIF_F_GSO_PARTIAL.
 
 Part II: Controlling enabled features
 =====================================
@@ -62,11 +71,15 @@ ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
 are treated as always returning success.
 
 A driver that wants to trigger recalculation must do so by calling
-netdev_update_features() while holding rtnl_lock. This should not be done
-from ndo_*_features callbacks. netdev->features should not be modified by
-driver except by means of ndo_fix_features callback.
-
+netdev_update_features() while holding rtnl_lock. If the device uses the
+netdev instance lock, that lock must be held as well. This should not be
+done from ndo_*_features callbacks. netdev->features should not be modified
+by driver except by means of ndo_fix_features callback.
 
+ndo_features_check is called for each skb before that skb is passed to
+ndo_start_xmit. Driver may perform any non-trivial checks (e.g. exact
+header geometry / length) and withdraw features like HW_CSUM or TSO,
+requesting the networking stack to fall back to the software implementation.
 
 Part III: Implementation hints
 ==============================
@@ -83,8 +96,9 @@ stateless).  It can be called multiple times between successive
 ndo_set_features calls.
 
 Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
-NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
-care must be taken as the change won't affect already configured VLANs.
+NETIF_F_NEVER_CHANGE, except that NETIF_F_VLAN_CHALLENGED may be changed.
+Care must be taken as changes to NETIF_F_VLAN_CHALLENGED won't affect already
+configured VLANs.
 
  * ndo_set_features:
 
@@ -186,10 +200,14 @@ Redundancy) frames from one port to another in hardware.
 * hsr-dup-offload
 
 This should be set for devices which duplicate outgoing HSR (High-availability
-Seamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically
-frames in hardware.
+Seamless Redundancy) or PRP (Parallel Redundancy Protocol) frames
+automatically in hardware.
 
-* netmem-tx
+Part V: Related device flags
+============================
 
-This should be set for devices which support netmem TX. See
-Documentation/networking/netmem.rst
+* netdev->netmem_tx
+
+This is not a netdev feature bit. Drivers support netmem TX by setting
+netdev->netmem_tx to one of the values in enum netmem_tx_mode.
+See Documentation/networking/netmem.rst.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 07/10] docs: net: fix minor issues with checksum offloads
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (5 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 06/10] docs: net: refresh netdev feature guidance Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 16:01 ` [PATCH net-next 08/10] docs: net: add Rx notes to the checksum guide Jakub Kicinski
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Update the checksum offload documentation to match current code:

- SCTP CRC32c offload requires NETIF_F_SCTP_CRC, not ordinary IP
  checksum offload
- NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM are restricted legacy
  features; new devices should use NETIF_F_HW_CSUM
- GRE LCO is handled by the shared gre_build_header() helper used by
  both IPv4 and IPv6 GRE
- VXLAN_F_REMCSUM_TX is a VXLAN configuration flag, not a field of
  struct vxlan_rdst

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../networking/checksum-offloads.rst          | 36 +++++++++----------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst
index 69b23cf6879e..907aed9f3a3b 100644
--- a/Documentation/networking/checksum-offloads.rst
+++ b/Documentation/networking/checksum-offloads.rst
@@ -45,9 +45,11 @@ encapsulation is used, the packet may have multiple checksum fields in
 different header layers, and the rest will have to be handled by another
 mechanism such as LCO or RCO.
 
-CRC32c can also be offloaded using this interface, by means of filling
-skb->csum_start and skb->csum_offset as described above, and setting
-skb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
+SCTP CRC32c can also be offloaded using this interface, by means of filling
+skb->csum_start and skb->csum_offset as described above, setting
+skb->csum_not_inet, and advertising NETIF_F_SCTP_CRC. Drivers must not treat
+ordinary IP checksum offload as SCTP CRC32c support. See the skbuff.h comment
+(section 'D') for more details.
 
 No offloading of the IP header checksum is performed; it is always done in
 software.  This is OK because when we build the IP header, we obviously have it
@@ -59,14 +61,12 @@ recomputed for each resulting segment.  See the skbuff.h comment (section 'E')
 for more details.
 
 A driver declares its offload capabilities in netdev->hw_features; see
-Documentation/networking/netdev-features.rst for more.  Note that a device
-which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
-csum_offset given in the SKB; if it tries to deduce these itself in hardware
-(as some NICs do) the driver should check that the values in the SKB match
-those which the hardware will deduce, and if not, fall back to checksumming in
-software instead (with skb_csum_hwoffload_help() or one of the
-skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
-include/linux/skbuff.h).
+Documentation/networking/netdev-features.rst for more. NETIF_F_IP_CSUM and
+NETIF_F_IPV6_CSUM are restricted legacy features and are being deprecated in
+favor of NETIF_F_HW_CSUM. New devices should use NETIF_F_HW_CSUM to advertise
+generic checksum offload. The skb_csum_hwoffload_help() helper can resolve
+CHECKSUM_PARTIAL according to the device's advertised checksum capabilities,
+falling back to software when needed.
 
 The stack should, for the most part, assume that checksum offload is supported
 by the underlying device.  The only place that should check is
@@ -108,11 +108,9 @@ LCO is performed by the stack when constructing an outer UDP header for an
 encapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
 IPv6 equivalents, in udp6_set_csum().
 
-It is also performed when constructing an IPv4 GRE header, in
-net/ipv4/ip_gre.c:build_header().  It is *not* currently performed when
-constructing an IPv6 GRE header; the GRE checksum is computed over the whole
-packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
-LCO here as IPv6 GRE still uses an IP-style checksum.
+It is also performed when constructing GRE headers with the shared
+gre_build_header() helper in include/net/gre.h, which is used by both IPv4 and
+IPv6 GRE.
 
 All of the LCO implementations use a helper function lco_csum(), in
 include/linux/skbuff.h.
@@ -138,6 +136,6 @@ For this reason, it is disabled by default.
 * https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
 
 In Linux, RCO is implemented individually in each encapsulation protocol, and
-most tunnel types have flags controlling its use.  For instance, VXLAN has the
-flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
-used when transmitting to a given remote destination.
+most tunnel types have flags controlling its use. For instance, VXLAN has the
+configuration flag VXLAN_F_REMCSUM_TX to indicate that RCO should be used when
+transmitting.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 08/10] docs: net: add Rx notes to the checksum guide
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (6 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 07/10] docs: net: fix minor issues with checksum offloads Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 18:56   ` Willem de Bruijn
  2026-05-26 16:01 ` [PATCH net-next 09/10] docs: net: render the checksum comment in checksum-offloads.rst Jakub Kicinski
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

The Rx checksum processing gives people pause. The two main questions
in my experience are:
 - what to do with bad IPv4 checksum; and
 - what to do with packets with bad checksum.

Folks often feel the urge to drop the latter, to "avoid overloading
the host".

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/checksum-offloads.rst | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst
index 907aed9f3a3b..d838fe5c1606 100644
--- a/Documentation/networking/checksum-offloads.rst
+++ b/Documentation/networking/checksum-offloads.rst
@@ -19,7 +19,6 @@ take advantage of checksum offload capabilities of various NICs.
 
 Things that should be documented here but aren't yet:
 
-* RX Checksum Offload
 * CHECKSUM_UNNECESSARY conversion
 
 
@@ -139,3 +138,19 @@ In Linux, RCO is implemented individually in each encapsulation protocol, and
 most tunnel types have flags controlling its use. For instance, VXLAN has the
 configuration flag VXLAN_F_REMCSUM_TX to indicate that RCO should be used when
 transmitting.
+
+
+RX Checksum Offload
+===================
+
+RX checksum offload is controlled via NETIF_F_RXCSUM. When disabled the driver
+must not set skb->ip_summed on ingress packets. As mentioned, IPv4 checksum
+is not offloaded, the RXCSUM feature controls the offload of verification of
+transport layer checksums.
+
+Note that packets with bad TCP/UDP checksums must still be passed
+to the stack. skb->ip_summed of such packets can be set to ``CHECKSUM_COMPLETE``
+or left at ``CHECKSUM_NONE``. Drivers **must not discard** packets with
+bad TCP/UDP checksum and must not configure the device to drop them.
+Checksum validation is relatively inexpensive and having bad packets reflected
+in SNMP counters is crucial for network monitoring.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 09/10] docs: net: render the checksum comment in checksum-offloads.rst
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (7 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 08/10] docs: net: add Rx notes to the checksum guide Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 18:56   ` Willem de Bruijn
  2026-05-26 16:01 ` [PATCH net-next 10/10] docs: net: fix minor issues with segmentation offloads Jakub Kicinski
  2026-05-26 18:48 ` [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Randy Dunlap
  10 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

checksum-offloads.rst seems like a better place to render
the checksum comment than skbuff.rst.

Remove the stale references to sections in that comment
(it no longer has A, B, C, D, E sections).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/checksum-offloads.rst | 18 ++++++++++--------
 Documentation/networking/skbuff.rst            |  6 ------
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst
index d838fe5c1606..d4ded890011b 100644
--- a/Documentation/networking/checksum-offloads.rst
+++ b/Documentation/networking/checksum-offloads.rst
@@ -25,10 +25,8 @@ take advantage of checksum offload capabilities of various NICs.
 TX Checksum Offload
 ===================
 
-The interface for offloading a transmit checksum to a device is explained in
-detail in comments near the top of include/linux/skbuff.h.
-
-In brief, it allows to request the device fill in a single ones-complement
+In brief, Tx checksum offload allows to request the device fill in a single
+ones-complement
 checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
 The device should compute the 16-bit ones-complement checksum (i.e. the
 'IP-style' checksum) from csum_start to the end of the packet, and fill in the
@@ -47,8 +45,7 @@ mechanism such as LCO or RCO.
 SCTP CRC32c can also be offloaded using this interface, by means of filling
 skb->csum_start and skb->csum_offset as described above, setting
 skb->csum_not_inet, and advertising NETIF_F_SCTP_CRC. Drivers must not treat
-ordinary IP checksum offload as SCTP CRC32c support. See the skbuff.h comment
-(section 'D') for more details.
+ordinary IP checksum offload as SCTP CRC32c support.
 
 No offloading of the IP header checksum is performed; it is always done in
 software.  This is OK because when we build the IP header, we obviously have it
@@ -56,8 +53,7 @@ in cache, so summing it isn't expensive.  It's also rather short.
 
 The requirements for GSO are more complicated, because when segmenting an
 encapsulated packet both the inner and outer checksums may need to be edited or
-recomputed for each resulting segment.  See the skbuff.h comment (section 'E')
-for more details.
+recomputed for each resulting segment.
 
 A driver declares its offload capabilities in netdev->hw_features; see
 Documentation/networking/netdev-features.rst for more. NETIF_F_IP_CSUM and
@@ -154,3 +150,9 @@ or left at ``CHECKSUM_NONE``. Drivers **must not discard** packets with
 bad TCP/UDP checksum and must not configure the device to drop them.
 Checksum validation is relatively inexpensive and having bad packets reflected
 in SNMP counters is crucial for network monitoring.
+
+skb checksum documentation
+==========================
+
+.. kernel-doc:: include/linux/skbuff.h
+   :doc: skb checksums
diff --git a/Documentation/networking/skbuff.rst b/Documentation/networking/skbuff.rst
index 5b74275a73a3..94681523e345 100644
--- a/Documentation/networking/skbuff.rst
+++ b/Documentation/networking/skbuff.rst
@@ -29,9 +29,3 @@ dataref and headerless skbs
 
 .. kernel-doc:: include/linux/skbuff.h
    :doc: dataref and headerless skbs
-
-Checksum information
---------------------
-
-.. kernel-doc:: include/linux/skbuff.h
-   :doc: skb checksums
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 10/10] docs: net: fix minor issues with segmentation offloads
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (8 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 09/10] docs: net: render the checksum comment in checksum-offloads.rst Jakub Kicinski
@ 2026-05-26 16:01 ` Jakub Kicinski
  2026-05-26 18:48 ` [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Randy Dunlap
  10 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 16:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Update the segmentation offload documentation to match current GSO types:

- clarify csum_start for encapsulated TSO
- document TCP AccECN GSO and NETIF_F_GSO_ACCECN
- distinguish legacy UFO from UDP L4 GSO
- add ESP and fraglist GSO entries

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../networking/segmentation-offloads.rst      | 37 ++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst
index 72f69b22b28c..25a8b7eca847 100644
--- a/Documentation/networking/segmentation-offloads.rst
+++ b/Documentation/networking/segmentation-offloads.rst
@@ -14,10 +14,13 @@ to take advantage of segmentation offload capabilities of various NICs.
 The following technologies are described:
  * TCP Segmentation Offload - TSO
  * UDP Fragmentation Offload - UFO
+ * UDP Segmentation Offload - USO
  * IPIP, SIT, GRE, and UDP Tunnel Offloads
  * Generic Segmentation Offload - GSO
  * Generic Receive Offload - GRO
  * Partial Generic Segmentation Offload - GSO_PARTIAL
+ * ESP Segmentation Offload
+ * Fraglist Generic Segmentation Offload - GSO_FRAGLIST
  * SCTP acceleration with GSO - GSO_BY_FRAGS
 
 
@@ -38,7 +41,8 @@ In order to support TCP segmentation offload it is necessary to populate
 the network and transport header offsets of the skbuff so that the device
 drivers will be able determine the offsets of the IP or IPv6 header and the
 TCP header.  In addition as CHECKSUM_PARTIAL is required csum_start should
-also point to the TCP header of the packet.
+also point to the TCP header of the packet, or to the inner transport header
+for encapsulated TSO.
 
 For IPv4 segmentation we support one of two types in terms of the IP ID.
 The default behavior is to increment the IP ID with every segment.  If the
@@ -57,6 +61,10 @@ DF bit is not set on the outer header, in which case the device driver must
 guarantee that the IP ID field is incremented in the outer header with every
 segment.
 
+SKB_GSO_TCP_ACCECN is a modifier used with TCP segmentation offload for
+AccECN packets where the CWR bit must not be cleared during segmentation.
+Devices advertise support for this using NETIF_F_GSO_ACCECN.
+
 
 UDP Fragmentation Offload
 =========================
@@ -71,6 +79,16 @@ still receive them from tuntap and similar devices. Offload of UDP-based
 tunnel protocols is still supported.
 
 
+UDP Segmentation Offload
+========================
+
+UDP segmentation offload allows a device to segment a large UDP packet into
+multiple UDP datagrams.  Unlike UFO, these are not IP fragments.  The payload
+size of each datagram is specified in skb_shinfo()->gso_size and the GSO type
+is SKB_GSO_UDP_L4.  Devices advertise support for this using
+NETIF_F_GSO_UDP_L4.
+
+
 IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
 ========================================================
 
@@ -154,6 +172,23 @@ that the IPv4 ID field is incremented in the case that a given header does
 not have the DF bit set.
 
 
+ESP Segmentation Offload
+========================
+
+ESP segmentation offload uses SKB_GSO_ESP to mark packets that require
+IPsec ESP segmentation.  This type is set by the XFRM output path for GSO
+packets handled by ESP hardware offload.
+
+
+Fraglist Generic Segmentation Offload
+=====================================
+
+Fraglist GSO uses SKB_GSO_FRAGLIST to mark packets whose segments are
+already arranged as a list of skbs.  The segmentation path splits the skb
+based on that list rather than by creating segments of skb_shinfo()->gso_size
+bytes from the linear and page-fragment data.
+
+
 SCTP acceleration with GSO
 ===========================
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 06/10] docs: net: refresh netdev feature guidance
  2026-05-26 16:01 ` [PATCH net-next 06/10] docs: net: refresh netdev feature guidance Jakub Kicinski
@ 2026-05-26 18:41   ` Maxime Chevallier
  2026-05-26 22:35     ` Jakub Kicinski
  0 siblings, 1 reply; 20+ messages in thread
From: Maxime Chevallier @ 2026-05-26 18:41 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc

Hi Jakub

On 5/26/26 18:01, Jakub Kicinski wrote:

>   
>    1. netdev->hw_features set contains features whose state may possibly
>       be changed (enabled or disabled) for a particular device by user's
> -    request.  This set should be initialized in ndo_init callback and not
> -    changed later.
> +    request.  Drivers normally initialize this set before registration or
> +    in the ndo_init callback. Changes after registration should be made
> +    very carefully as other parts of the code may assume hw_features are
> +    static. At the very least changes must be made under rtnl_lock and
> +    the netdev instance lock, and followed by netdev_update_features().
Feel free to keep this description as-is, but can we get somewhere the
actual meaning of "hw" in "hw_features" ? I've seen this cause confusion
before as this is sometimes wrongly interpreted as "Hardware features",
which isn't correct as the hardware may do stuff without allowing users
to change that behaviour.

I vaguely recall something along the lines of "Host-Writeable features",
but I am not sure at all about that...

Maxime

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs
  2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
                   ` (9 preceding siblings ...)
  2026-05-26 16:01 ` [PATCH net-next 10/10] docs: net: fix minor issues with segmentation offloads Jakub Kicinski
@ 2026-05-26 18:48 ` Randy Dunlap
  2026-05-26 22:37   ` Jakub Kicinski
  10 siblings, 1 reply; 20+ messages in thread
From: Randy Dunlap @ 2026-05-26 18:48 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc

Hi,

On 5/26/26 9:01 AM, Jakub Kicinski wrote:
> I'm hoping to start feeding our docs into the AI review tools, instead
> of maintaining a separate repo with review prompts. To experiment with
> that we have to refresh the docs a little bit.
> 
> A read thru our current docs makes one slightly question the value
> of including them in reviews. But directionally, I feel, it's probably
> still right. I'm hoping the Rx Checksum section about not dropping packets
> for example to be impactful. I don't think the current AI agents or
> review docs include this guidance.
> 
> Jakub Kicinski (10):
>   docs: net: netdevices: small fixes and clarifications
>   docs: net: fix minor issues with driver guide
>   docs: net: statistics: fix kernel-internal stats list
>   docs: net: update devmem code examples
>   docs: net: fix minor issues with the NAPI guide
>   docs: net: refresh netdev feature guidance
>   docs: net: fix minor issues with checksum offloads
>   docs: net: add Rx notes to the checksum guide
>   docs: net: render the checksum comment in checksum-offloads.rst
>   docs: net: fix minor issues with segmentation offloads
> 
>  .../networking/checksum-offloads.rst          | 67 ++++++++++++-------
>  Documentation/networking/devmem.rst           | 27 +++-----
>  Documentation/networking/driver.rst           |  7 +-
>  Documentation/networking/napi.rst             | 11 ++-
>  Documentation/networking/netdev-features.rst  | 60 +++++++++++------
>  Documentation/networking/netdevices.rst       | 31 +++++----
>  .../networking/segmentation-offloads.rst      | 37 +++++++++-
>  Documentation/networking/skbuff.rst           |  6 --
>  Documentation/networking/statistics.rst       | 19 ++++--
>  9 files changed, 172 insertions(+), 93 deletions(-)
> 

There is one more cleanup that you could do. Current (linux-next) docs builds
give this warning:

WARNING: ../include/linux/netdevice.h:2622 Excess struct member 'ax25_ptr' description in 'net_device'

-- 
~Randy


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 08/10] docs: net: add Rx notes to the checksum guide
  2026-05-26 16:01 ` [PATCH net-next 08/10] docs: net: add Rx notes to the checksum guide Jakub Kicinski
@ 2026-05-26 18:56   ` Willem de Bruijn
  0 siblings, 0 replies; 20+ messages in thread
From: Willem de Bruijn @ 2026-05-26 18:56 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Jakub Kicinski wrote:
> The Rx checksum processing gives people pause. The two main questions
> in my experience are:
>  - what to do with bad IPv4 checksum; and
>  - what to do with packets with bad checksum.
> 
> Folks often feel the urge to drop the latter, to "avoid overloading
> the host".
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

Thanks, this is is an important clarification.

> ---
>  Documentation/networking/checksum-offloads.rst | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst
> index 907aed9f3a3b..d838fe5c1606 100644
> --- a/Documentation/networking/checksum-offloads.rst
> +++ b/Documentation/networking/checksum-offloads.rst
> @@ -19,7 +19,6 @@ take advantage of checksum offload capabilities of various NICs.
>  
>  Things that should be documented here but aren't yet:
>  
> -* RX Checksum Offload
>  * CHECKSUM_UNNECESSARY conversion
>  
>  
> @@ -139,3 +138,19 @@ In Linux, RCO is implemented individually in each encapsulation protocol, and
>  most tunnel types have flags controlling its use. For instance, VXLAN has the
>  configuration flag VXLAN_F_REMCSUM_TX to indicate that RCO should be used when
>  transmitting.
> +
> +
> +RX Checksum Offload
> +===================
> +
> +RX checksum offload is controlled via NETIF_F_RXCSUM. When disabled the driver
> +must not set skb->ip_summed on ingress packets. As mentioned, IPv4 checksum
> +is not offloaded, the RXCSUM feature controls the offload of verification of
> +transport layer checksums.
> +
> +Note that packets with bad TCP/UDP checksums must still be passed
> +to the stack. skb->ip_summed of such packets can be set to ``CHECKSUM_COMPLETE``

when also setting skb->csum

> +or left at ``CHECKSUM_NONE``. Drivers **must not discard** packets with
> +bad TCP/UDP checksum and must not configure the device to drop them.
> +Checksum validation is relatively inexpensive and having bad packets reflected
> +in SNMP counters is crucial for network monitoring.
> -- 
> 2.54.0
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 09/10] docs: net: render the checksum comment in checksum-offloads.rst
  2026-05-26 16:01 ` [PATCH net-next 09/10] docs: net: render the checksum comment in checksum-offloads.rst Jakub Kicinski
@ 2026-05-26 18:56   ` Willem de Bruijn
  0 siblings, 0 replies; 20+ messages in thread
From: Willem de Bruijn @ 2026-05-26 18:56 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc, Jakub Kicinski

Jakub Kicinski wrote:
> checksum-offloads.rst seems like a better place to render
> the checksum comment than skbuff.rst.
> 
> Remove the stale references to sections in that comment
> (it no longer has A, B, C, D, E sections).
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 01/10] docs: net: netdevices: small fixes and clarifications
  2026-05-26 16:01 ` [PATCH net-next 01/10] docs: net: netdevices: small fixes and clarifications Jakub Kicinski
@ 2026-05-26 22:12   ` Stanislav Fomichev
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Fomichev @ 2026-05-26 22:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, ecree.xilinx, jesse.brandeburg,
	linux-doc

On 05/26, Jakub Kicinski wrote:
> A handful of unrelated nits:
> 
>  - free_netdevice() does not exist; replace two stray references
>    with free_netdev().
>  - The simple-driver probe example fell through into err_undo after
>    register_netdev() success; add return 0 for clarity.
>  - Clarify the netdev_priv() paragraph: "(netdev_priv())" was easy
>    to misread as the thing that needs explicit freeing; spell out
>    that it refers to extra pointers stored in the device private
>    struct.
>  - ndo_setup_tc synchronization note: TC_SETUP_BLOCK / TC_SETUP_FT
>    actually run under block->cb_lock, not "NFT locks", and rtnl_lock
>    may or may not be held depending on path.
>  - ->lltx guidance reads as very outdated, it's not really deprecated.
>    I suspect people may have been trying to use it for HW drivers
>    in the past but I can't think of such a case in the last decade.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  Documentation/networking/netdevices.rst | 31 ++++++++++++++-----------
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
> index 93e06e8d51a9..60492d4df2ee 100644
> --- a/Documentation/networking/netdevices.rst
> +++ b/Documentation/networking/netdevices.rst
> @@ -21,13 +21,14 @@ by free_netdev(). This is required to handle the pathological case cleanly
>  alloc_netdev_mqs() / alloc_netdev() reserve extra space for driver
>  private data which gets freed when the network device is freed. If
>  separately allocated data is attached to the network device
> -(netdev_priv()) then it is up to the module exit handler to free that.
> +(extra pointers stored in the device private struct) then it is up
> +to the module exit handler to free that.
>  
>  There are two groups of APIs for registering struct net_device.
>  First group can be used in normal contexts where ``rtnl_lock`` is not already
>  held: register_netdev(), unregister_netdev().
>  Second group can be used when ``rtnl_lock`` is already held:
> -register_netdevice(), unregister_netdevice(), free_netdevice().
> +register_netdevice(), unregister_netdevice(), free_netdev().
>  
>  Simple drivers
>  --------------
> @@ -58,6 +59,7 @@ In that case the struct net_device registration is done using
>        goto err_undo;
>  
>      /* net_device is visible to the user! */
> +    return 0;
>  
>    err_undo:
>      /* ... undo the device setup ... */
> @@ -73,7 +75,7 @@ In that case the struct net_device registration is done using
>  
>  Note that after calling register_netdev() the device is visible in the system.
>  Users can open it and start sending / receiving traffic immediately,
> -or run any other callback, so all initialization must be done prior to
> +or run any other callback, so all initialization must be **complete** prior to
>  registration.
>  
>  unregister_netdev() closes the device and waits for all users to be done
> @@ -157,7 +159,7 @@ register_netdevice() fails. The callback may be invoked with or without
>  There is no explicit constructor callback, driver "constructs" the private
>  netdev state after allocating it and before registration.
>  
> -Setting struct net_device.needs_free_netdev makes core call free_netdevice()
> +Setting struct net_device.needs_free_netdev makes core call free_netdev()
>  automatically after unregister_netdevice() when all references to the device
>  are gone. It only takes effect after a successful call to register_netdevice()
>  so if register_netdevice() fails driver is responsible for calling
> @@ -256,7 +258,7 @@ struct net_device synchronization rules
>  	lock if the driver implements queue management or shaper API.
>  	Context: process
>  
> -ndo_get_stats:
> +ndo_get_stats / ndo_get_stats64:
>  	Synchronization: RCU (can be called concurrently with the stats
>  	update path).
>  	Context: atomic (can't sleep under RCU)
> @@ -264,12 +266,9 @@ struct net_device synchronization rules
>  ndo_start_xmit:
>  	Synchronization: __netif_tx_lock spinlock.
>  
> -	When the driver sets dev->lltx this will be
> -	called without holding netif_tx_lock. In this case the driver
> -	has to lock by itself when needed.
> -	The locking there should also properly protect against
> -	set_rx_mode. WARNING: use of dev->lltx is deprecated.
> -	Don't use it for new drivers.
> +	When the driver sets dev->lltx this will be called without holding
> +	netif_tx_lock. dev->lltx is meant for software drivers only, since
> +	they often have no per-queue state.
>  
>  	Context: Process with BHs disabled or BH (timer),
>  		 will be called with interrupts disabled by netconsole.
> @@ -304,11 +303,15 @@ struct net_device synchronization rules
>  	lock if the driver implements queue management or shaper API.
>  

[..]

>  ndo_setup_tc:
> -	``TC_SETUP_BLOCK`` and ``TC_SETUP_FT`` are running under NFT locks
> -	(i.e. no ``rtnl_lock`` and no device instance lock). The rest of
> -	``tc_setup_type`` types run under netdev instance lock if the driver
> +	Locking depends on ``tc_setup_type``. For most types the callback
> +	is invoked under ``rtnl_lock`` and netdev instance lock if the driver
>  	implements queue management or shaper API.
>  
> +	For ``TC_SETUP_BLOCK`` and ``TC_SETUP_FT`` ``rtnl_lock`` may or
> +	may not be held, and the netdev instance lock is not held.
> +	``TC_SETUP_BLOCK`` runs under ``block->cb_lock`` and ``TC_SETUP_FT``
> +	runs under ``flowtable->flow_block_lock``.
> +
>  Most ndo callbacks not specified in the list above are running
>  under ``rtnl_lock``. In addition, netdev instance lock is taken as well if
>  the driver implements queue management or shaper API.

LGTM!

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 04/10] docs: net: update devmem code examples
  2026-05-26 16:01 ` [PATCH net-next 04/10] docs: net: update devmem code examples Jakub Kicinski
@ 2026-05-26 22:17   ` Stanislav Fomichev
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Fomichev @ 2026-05-26 22:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, ecree.xilinx, jesse.brandeburg,
	linux-doc

On 05/26, Jakub Kicinski wrote:
> Update the code examples
>  - update the YNL sample with the latest(?) APIs
>  - struct dmabuf_tx_cmsg does not exist, use __u32 directly
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

(wondering if a better strategy is to add links to ncdevmem code)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 06/10] docs: net: refresh netdev feature guidance
  2026-05-26 18:41   ` Maxime Chevallier
@ 2026-05-26 22:35     ` Jakub Kicinski
  0 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 22:35 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc

On Tue, 26 May 2026 20:41:10 +0200 Maxime Chevallier wrote:
> >   
> >    1. netdev->hw_features set contains features whose state may possibly
> >       be changed (enabled or disabled) for a particular device by user's
> > -    request.  This set should be initialized in ndo_init callback and not
> > -    changed later.
> > +    request.  Drivers normally initialize this set before registration or
> > +    in the ndo_init callback. Changes after registration should be made
> > +    very carefully as other parts of the code may assume hw_features are
> > +    static. At the very least changes must be made under rtnl_lock and
> > +    the netdev instance lock, and followed by netdev_update_features().  
> Feel free to keep this description as-is, but can we get somewhere the
> actual meaning of "hw" in "hw_features" ? I've seen this cause confusion
> before as this is sometimes wrongly interpreted as "Hardware features",
> which isn't correct as the hardware may do stuff without allowing users
> to change that behaviour.
> 
> I vaguely recall something along the lines of "Host-Writeable features",
> but I am not sure at all about that...

Hm. I assumed the hw in hw_features stands for hardware.
The magic behavior of host controllable vs hardwired was
probably added later without renaming the field.

As you indicate the usual confusion is that it's legal to have 
a feature in ->features which is not set in ->hw_features which 
means that features is hardwired "on", it can't be disabled by
the user.

The current text does say this: "features whose state may [..]
be changed [..] by user's request". But perhaps it's not emphatic
enough.

Main question is whether this series should be clarifying
this or our criteria is that the series doesn't _add_ confusion,
even if it doesn't clarify all the potential confusion points? :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs
  2026-05-26 18:48 ` [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Randy Dunlap
@ 2026-05-26 22:37   ` Jakub Kicinski
  2026-05-26 22:40     ` Jakub Kicinski
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 22:37 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc

On Tue, 26 May 2026 11:48:41 -0700 Randy Dunlap wrote:
> WARNING: ../include/linux/netdevice.h:2622 Excess struct member 'ax25_ptr' description in 'net_device'

I wonder how that sneaked in? ;) ;)

I'll clean this up separately, hopefully we haven't regressed too many
things while the script was broken :(

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs
  2026-05-26 22:37   ` Jakub Kicinski
@ 2026-05-26 22:40     ` Jakub Kicinski
  0 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2026-05-26 22:40 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	vladimir.oltean, willemb, sdf.kernel, ecree.xilinx,
	jesse.brandeburg, linux-doc

On Tue, 26 May 2026 15:37:19 -0700 Jakub Kicinski wrote:
> On Tue, 26 May 2026 11:48:41 -0700 Randy Dunlap wrote:
> > WARNING: ../include/linux/netdevice.h:2622 Excess struct member 'ax25_ptr' description in 'net_device'  
> 
> I wonder how that sneaked in? ;) ;)
> 
> I'll clean this up separately, hopefully we haven't regressed too many
> things while the script was broken :(

Ugh, I'm still not seeing this on Linus's tree.
Is the fix for kernel-doc skipping such warnings on its way to Linus, 
or queue for -next?

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-05-26 22:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 16:01 [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Jakub Kicinski
2026-05-26 16:01 ` [PATCH net-next 01/10] docs: net: netdevices: small fixes and clarifications Jakub Kicinski
2026-05-26 22:12   ` Stanislav Fomichev
2026-05-26 16:01 ` [PATCH net-next 02/10] docs: net: fix minor issues with driver guide Jakub Kicinski
2026-05-26 16:01 ` [PATCH net-next 03/10] docs: net: statistics: fix kernel-internal stats list Jakub Kicinski
2026-05-26 16:01 ` [PATCH net-next 04/10] docs: net: update devmem code examples Jakub Kicinski
2026-05-26 22:17   ` Stanislav Fomichev
2026-05-26 16:01 ` [PATCH net-next 05/10] docs: net: fix minor issues with the NAPI guide Jakub Kicinski
2026-05-26 16:01 ` [PATCH net-next 06/10] docs: net: refresh netdev feature guidance Jakub Kicinski
2026-05-26 18:41   ` Maxime Chevallier
2026-05-26 22:35     ` Jakub Kicinski
2026-05-26 16:01 ` [PATCH net-next 07/10] docs: net: fix minor issues with checksum offloads Jakub Kicinski
2026-05-26 16:01 ` [PATCH net-next 08/10] docs: net: add Rx notes to the checksum guide Jakub Kicinski
2026-05-26 18:56   ` Willem de Bruijn
2026-05-26 16:01 ` [PATCH net-next 09/10] docs: net: render the checksum comment in checksum-offloads.rst Jakub Kicinski
2026-05-26 18:56   ` Willem de Bruijn
2026-05-26 16:01 ` [PATCH net-next 10/10] docs: net: fix minor issues with segmentation offloads Jakub Kicinski
2026-05-26 18:48 ` [PATCH net-next 00/10] docs: net: updates for old and cobwebbed docs Randy Dunlap
2026-05-26 22:37   ` Jakub Kicinski
2026-05-26 22:40     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox