netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
@ 2025-11-19 14:03 Oleksij Rempel
  2025-11-26  2:19 ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Oleksij Rempel @ 2025-11-19 14:03 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Florian Fainelli, Maxime Chevallier, Kory Maincent,
	Lukasz Majewski, Jonathan Corbet, Donald Hunter, Vadim Fedorenko,
	Jiri Pirko, Vladimir Oltean, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend
  Cc: Oleksij Rempel, kernel, linux-kernel, netdev, Russell King,
	Divya.Koppera, Sabrina Dubroca, Stanislav Fomichev

Introduce a new document, flow_control.rst, to provide a comprehensive
guide on Ethernet Flow Control in Linux. The guide explains how flow
control works, how autonegotiation resolves pause capabilities, and how
to configure it using ethtool and Netlink.

In parallel, document the pause and pause-stat attributes in the
ethtool.yaml netlink spec. This enables the ynl tool to generate
kernel-doc comments for the corresponding enums in the UAPI header,
making the C interface self-documenting.

Finally, replace the legacy flow control section in phy.rst with a
reference to the new document and add pointers in the relevant C source
files.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
changes v8:
- Drop enum-name for the pause and pause-stat attribute sets in
  Documentation/netlink/specs/ethtool.yaml and revert the generated
  pause enums in ethtool_netlink_generated.h back to anonymous enums.
- Simplify the pause stats "doc" string in ethtool.yaml so it only
  describes the counters and does not mention stats-src or MAC Merge.
- Make "Flow Control" capitalization consistent throughout
  Documentation/networking/flow_control.rst and clarify that the
  ethtool pause API does not control PFC.
- Extend the PFC description to reference the 3-bit PCP field in the
  802.1Q VLAN tag and spell out FCoE and RoCE explicitly.
- Reword the "Kernel Policy: Set and Trust" section to say that
  ethtool pause requests express the preferred configuration, but
  drivers may reject unsupported combinations and may require generic
  link autonegotiation before enabling Pause Autonegotiation. Clarify
  that the MAC configuration may differ from the user request depending
  on the active link mode.
- Update the get_pauseparam documentation in include/linux/ethtool.h
  so Pause Autonegotiation is described as part of the link
  autonegotiation process, and state that drivers should reject
  non-zero @autoneg when autonegotiation is disabled or not supported.
changes v7:
- regenerate ethtool_netlink_generated.h
changes v6:
- fix bullet list text parts
changes v5:
- do not render headers from yaml for now
- s/ethtool_a_pause_stat/ethtool-a-pause-stat
- s/ethtool_a_pause/ethtool-a-pause
- drop other yaml related patches
changes v4:
- Reworded pause stats-src doc: clarify that sources are MAC Merge layer
  components, not PHYs.
- Fixed non-ASCII dash in "Link-wide".
- Added explicit note that pause_time = 0 resumes transmission immediately.
- Corrected terminology: use "pause quantum" (singular) consistently.
- Dropped paragraph about user tuning of FIFO watermarks (no ABI support).
- Synced UAPI header comments with YAML wording (MAC Merge layer).
- Ran ASCII sweep to remove stray non-ASCII characters.
changes v3:
- add warning about half-duplex collision-based flow control on shared media
- clarify pause autoneg vs. generic autoneg and forced mode semantics
- document pause quanta defaults used by common MAC drivers, with time examples
- fix vague cross-reference, point to autonegotiation resolution section
- expand notes on PAUSE vs. PFC exclusivity
- include generated enums (pause / pause-stat) in UAPI with kernel-doc
changes v2:
- remove recommendations
- add note about autoneg resolutio
---
 Documentation/netlink/specs/ethtool.yaml  |  24 ++
 Documentation/networking/flow_control.rst | 375 ++++++++++++++++++++++
 Documentation/networking/index.rst        |   1 +
 Documentation/networking/phy.rst          |  12 +-
 include/linux/ethtool.h                   |  45 ++-
 net/dcb/dcbnl.c                           |   2 +
 net/ethtool/pause.c                       |   4 +
 7 files changed, 450 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/networking/flow_control.rst

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 05d2b6508b59..671e65d1b6e9 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -864,6 +864,7 @@ attribute-sets:
 
   -
     name: pause-stat
+    doc: Statistics counters for link-wide PAUSE frames (IEEE 802.3 Annex 31B).
     attr-cnt-name: __ethtool-a-pause-stat-cnt
     attributes:
       -
@@ -875,12 +876,15 @@ attribute-sets:
         type: pad
       -
         name: tx-frames
+        doc: Number of PAUSE frames transmitted.
         type: u64
       -
         name: rx-frames
+        doc: Number of PAUSE frames received.
         type: u64
   -
     name: pause
+    doc: Parameters for link-wide PAUSE (IEEE 802.3 Annex 31B).
     attr-cnt-name: __ethtool-a-pause-cnt
     attributes:
       -
@@ -893,19 +897,39 @@ attribute-sets:
         nested-attributes: header
       -
         name: autoneg
+        doc: |
+          Acts as a mode selector for the driver.
+          On GET: indicates the driver's behavior. If true, the driver will
+          respect the negotiated outcome; if false, the driver will use a
+          forced configuration.
+          On SET: if true, the driver configures the PHY's advertisement based
+          on the rx and tx attributes. If false, the driver forces the MAC
+          into the state defined by the rx and tx attributes.
         type: u8
       -
         name: rx
+        doc: |
+          Enable receiving PAUSE frames (pausing local TX).
+          On GET: reflects the currently preferred configuration state.
         type: u8
       -
         name: tx
+        doc: |
+          Enable transmitting PAUSE frames (pausing peer TX).
+          On GET: reflects the currently preferred configuration state.
         type: u8
       -
         name: stats
+        doc: |
+          Contains the pause statistics counters.
         type: nest
         nested-attributes: pause-stat
       -
         name: stats-src
+        doc: |
+          Selects the source of the MAC statistics, values from
+          enum ethtool_mac_stats_src. This allows requesting statistics
+          from the individual components of the MAC Merge layer.
         type: u32
   -
     name: eee
diff --git a/Documentation/networking/flow_control.rst b/Documentation/networking/flow_control.rst
new file mode 100644
index 000000000000..67e8b3814fcc
--- /dev/null
+++ b/Documentation/networking/flow_control.rst
@@ -0,0 +1,375 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _ethernet-flow-control:
+
+=====================
+Ethernet Flow Control
+=====================
+
+This document is a practical guide to Ethernet Flow Control in Linux, covering
+what it is, how it works, and how to configure it.
+
+What is Flow Control?
+=====================
+
+Flow Control is a mechanism to prevent a fast sender from overwhelming a
+slow receiver with data, which would cause buffer overruns and dropped packets.
+The receiver can signal the sender to temporarily stop transmitting, giving it
+time to process its backlog.
+
+Standards references
+====================
+
+Ethernet Flow Control mechanisms are specified across consolidated IEEE base
+standards; some originated as amendments:
+
+- Collision-based Flow Control is part of CSMA/CD in **IEEE 802.3**
+  (half-duplex).
+- Link-wide PAUSE is defined in **IEEE 802.3 Annex 31B**
+  (originally **802.3x**).
+- Priority-based Flow Control (PFC) is defined in **IEEE 802.1Q Clause 36**
+  (originally **802.1Qbb**).
+
+In the remainder of this document, the consolidated clause numbers are used.
+
+How It Works: The Mechanisms
+============================
+
+The method used for Flow Control depends on the link's duplex mode.
+
+.. note::
+   The user-visible ``ethtool`` pause API described in this document controls
+   **link-wide PAUSE** (IEEE 802.3 Annex 31B) only. It does not control the
+   collision-based behavior on half-duplex links, nor Priority-based Flow
+   Control (PFC).
+
+1. Half-Duplex: Collision-Based Flow Control
+--------------------------------------------
+On half-duplex links, a device cannot send and receive simultaneously, so PAUSE
+frames are not used. Flow Control is achieved by leveraging the CSMA/CD
+(Carrier Sense Multiple Access with Collision Detection) protocol itself.
+
+* **How it works**: To inhibit incoming data, a receiving device can force a
+  collision on the line. When the sending station detects this collision, it
+  terminates its transmission, sends a "jam" signal, and then executes the
+  "Collision backoff and retransmission" procedure as defined in IEEE 802.3,
+  Section 4.2.3.2.5. This algorithm makes the sender wait for a random
+  period before attempting to retransmit. By repeatedly forcing collisions,
+  the receiver can effectively throttle the sender's transmission rate.
+
+.. note::
+    While this mechanism is part of the IEEE standard, there is currently no
+    generic kernel API to configure or control it. Drivers should not enable
+    this feature until a standardized interface is available.
+
+.. warning::
+   On shared-medium networks (e.g. 10BASE2, or twisted-pair networks using a
+   hub rather than a switch) forcing collisions inhibits traffic **across the
+   entire shared segment**, not just a single point-to-point link. Enabling
+   such behavior is generally undesirable.
+
+2. Full-Duplex: Link-wide PAUSE (IEEE 802.3 Annex 31B)
+------------------------------------------------------
+On full-duplex links, devices can send and receive at the same time. Flow
+control is achieved by sending a special **PAUSE frame**, defined by IEEE
+802.3 Annex 31B. This mechanism pauses all traffic on the link and is therefore
+called *link-wide PAUSE*.
+
+* **What it is**: A standard Ethernet frame with a globally reserved
+  destination MAC address (``01-80-C2-00-00-01``). This address is in a range
+  that standard IEEE 802.1D-compliant bridges do not forward. However, some
+  unmanaged or misconfigured bridges have been reported to forward these
+  frames, which can disrupt Flow Control across a network.
+
+* **How it works**: The frame contains a MAC Control opcode for PAUSE
+  (``0x0001``) and a ``pause_time`` value, telling the sender how long to
+  wait before sending more data frames. This time is specified in units of
+  "pause quantum", where one quantum is the time it takes to transmit 512 bits.
+  For example, one pause quantum is 51.2 microseconds on a 10 Mbit/s link,
+  and 512 nanoseconds on a 1 Gbit/s link. A ``pause_time`` of zero indicates
+  that the transmitter can resume transmission, even if a previous non-zero
+  pause time has not yet elapsed.
+
+* **Who uses it**: Any full-duplex link, from 10 Mbit/s to multi-gigabit speeds.
+
+3. Full-Duplex: Priority-based Flow Control (PFC) (IEEE 802.1Q Clause 36)
+-------------------------------------------------------------------------
+Priority-based Flow Control is an enhancement to the standard PAUSE mechanism
+that allows Flow Control to be applied independently to different classes of
+traffic, identified by their priority level (mapped from the 3-bit PCP field in
+the 802.1Q VLAN tag).
+
+* **What it is**: PFC allows a receiver to pause traffic for one or more of the
+  8 standard priority levels without stopping traffic for other priorities.
+  This is critical in data center environments for protocols that cannot
+  tolerate packet loss due to congestion (e.g., Fibre Channel over Ethernet
+  (FCoE) or RDMA over Converged Ethernet (RoCE)).
+
+* **How it works**: PFC uses a specific PAUSE frame format. It shares the same
+  globally reserved destination MAC address (``01-80-C2-00-00-01``) as legacy
+  PAUSE frames but uses a unique opcode (``0x0101``). The frame payload
+  contains two key fields:
+
+  - **``priority_enable_vector``**: An 8-bit mask where each bit corresponds to
+    one of the 8 priorities. If a bit is set to 1, it means the pause time
+    for that priority is active.
+  - **``time_vector``**: A list of eight 2-octet fields, one for each priority.
+    Each field specifies the ``pause_time`` for its corresponding priority,
+    measured in units of ``pause_quanta`` (the time to transmit 512 bits).
+
+.. note::
+    When PFC is enabled for at least one priority on a port, the standard
+    **link-wide PAUSE** (IEEE 802.3 Annex 31B) must be disabled for that port.
+    The two mechanisms are mutually exclusive (IEEE 802.1Q Clause 36).
+
+Configuring Flow Control
+========================
+
+Link-wide PAUSE and Priority-based Flow Control are configured with different
+tools.
+
+Configuring Link-wide PAUSE with ``ethtool`` (IEEE 802.3 Annex 31B)
+-------------------------------------------------------------------
+Use ``ethtool -a <interface>`` to view and ``ethtool -A <interface>`` to change
+the link-wide PAUSE settings.
+
+.. code-block:: bash
+
+  # View current link-wide PAUSE settings
+  ethtool -a eth0
+
+  # Enable RX and TX pause, with autonegotiation
+  ethtool -A eth0 autoneg on rx on tx on
+
+**Key Configuration Concepts**:
+
+* **Pause Autoneg vs Generic Autoneg**: ``ethtool -A ... autoneg {on,off}``
+  controls **Pause Autoneg** (Annex 31B) only. It is independent from the
+  **Generic link autonegotiation** configured with ``ethtool -s``. A device can
+  have Generic autoneg **on** while Pause Autoneg is **off**, and vice versa.
+
+* **If Pause Autoneg is off** (``-A ... autoneg off``): the device will **not**
+  advertise pause in the PHY. The MAC PAUSE state is **forced** according to
+  ``rx``/``tx`` and does not depend on partner capabilities or resolution.
+  Ensure the peer is configured complementarily for PAUSE to be effective.
+
+* **If generic autoneg is off** but **Pause Autoneg is on**, the pause policy
+  is **remembered** by the kernel and applied later when Generic autoneg is
+  enabled again.
+
+* **Autonegotiation Mode**: The PHY will *advertise* the ``rx`` and ``tx``
+  capabilities. The final active state is determined by what both sides of the
+  link agree on. See the "PHY (Physical Layer Transceiver)" section below,
+  especially the *Resolution* subsection, for details of the negotiation rules.
+
+* **Forced Mode**: This mode is necessary when autonegotiation is not used or
+  not possible. This includes links where one or both partners have
+  autonegotiation disabled, or in setups without a PHY (e.g., direct
+  MAC-to-MAC connections). The driver bypasses PHY advertisement and
+  directly forces the MAC into the specified ``rx``/``tx`` state. The
+  configuration on both sides of the link must be complementary. For
+  example, if one side is set to ``tx on`` ``rx off``, the link partner must be
+  set to ``tx off`` ``rx on`` for Flow Control to function correctly.
+
+Configuring PFC with ``dcb`` (IEEE 802.1Q Clause 36)
+----------------------------------------------------
+PFC is part of the Data Center Bridging (DCB) subsystem and is managed with the
+``dcb`` tool (iproute2). Some deployments use ``dcbtool`` (lldpad) instead; this
+document shows ``dcb(8)`` examples.
+
+**Viewing PFC Settings**:
+
+.. code-block:: text
+
+  $ dcb pfc show dev eth0
+  pfc-cap 8 macsec-bypass off delay 4096
+  prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:on 7:on
+
+This shows the PFC state (on/off) for each priority (0-7).
+
+**Changing PFC Settings**:
+
+.. code-block:: bash
+
+  # Enable PFC on priorities 6 and 7, leaving others as they are
+  $ dcb pfc set dev eth0 prio-pfc 6:on 7:on
+
+  # Disable PFC for all priorities except 6 and 7
+  $ dcb pfc set dev eth0 prio-pfc all:off 6:on 7:on
+
+Monitoring Flow Control
+=======================
+
+The standard way to check if Flow Control is actively being used is to view the
+pause-related statistics.
+
+**Monitoring Link-wide PAUSE**:
+Use ``ethtool --include-statistics -a <interface>``.
+
+.. code-block:: text
+
+  $ ethtool --include-statistics -a eth0
+  Pause parameters for eth0:
+  ...
+  Statistics:
+    tx_pause_frames: 0
+    rx_pause_frames: 0
+
+**Monitoring PFC**:
+PFC statistics (sent and received frames per priority) are available
+through the ``dcb`` tool.
+
+.. code-block:: text
+
+  $ dcb pfc show dev eth0 requests indications
+  requests 0:0 1:0 2:0 3:1024 4:2048 5:0 6:0 7:0
+  indications 0:0 1:0 2:0 3:512 4:4096 5:0 6:0 7:0
+
+The ``requests`` counters track transmitted PFC frames (TX), and the
+``indications`` counters track received PFC frames (RX).
+
+Link-wide PAUSE Autonegotiation Details
+=======================================
+
+The autonegotiation process for link-wide PAUSE is managed by the PHY and
+involves advertising capabilities and resolving the outcome.
+
+* Terminology (link-wide PAUSE):
+
+  - **Symmetric pause**: both directions are paused when requested (TX+RX
+    enabled).
+  - **Asymmetric pause**: only one direction is paused (e.g., RX-only or
+    TX-only).
+
+  In IEEE 802.3 advertisement/resolution, symmetric/asymmetric are encoded
+  using two bits (Pause/Asym) and resolved per the standard truth tables
+  below.
+
+* **Advertisement**: The PHY advertises the MAC's Flow Control capabilities.
+  This is done using two bits in the advertisement register: "Symmetric
+  Pause" (Pause) and "Asymmetric Pause" (Asym). These bits should be
+  interpreted as a combined value, not as independent flags. The kernel
+  converts the user's ``rx`` and ``tx`` settings into this two-bit value as
+  follows:
+
+  .. code-block:: text
+
+    tx  rx | Pause  Asym
+    -------+-------------
+     0   0 |   0      0
+     0   1 |   1      1
+     1   0 |   0      1
+     1   1 |   1      0
+
+* **Resolution**: After negotiation, the PHY reports the link partner's
+  advertised Pause and Asym bits. The final Flow Control mode is determined
+  by the combination of the local and partner advertisements, according to
+  the IEEE 802.3 standard:
+
+  .. code-block:: text
+
+    Local Device       | Link Partner       | Result
+    Pause  Asym        | Pause   Asym       |
+    -------------------+--------------------+---------
+      0      X         |  0       X         | Disabled
+      0      1         |  1       0         | Disabled
+      0      1         |  1       1         | TX only
+      1      0         |  0       X         | Disabled
+      1      X         |  1       X         | TX + RX
+      1      1         |  0       1         | RX only
+
+  It is important to note that the advertised bits reflect the *current
+  configuration* of the MAC, which may not represent its full hardware
+  capabilities.
+
+Kernel Policy: "Set and Trust"
+==============================
+
+The ethtool pause API is defined as a **wish policy** for
+IEEE 802.3 link-wide PAUSE only. User requests express the preferred
+configuration, but drivers may reject unsupported combinations and it
+may not be possible to apply a request in all link states.
+
+Key constraints:
+
+- Link-wide PAUSE is not valid on half-duplex links.
+- Link-wide PAUSE cannot be used together with Priority-based Flow Control
+  (PFC, IEEE 802.1Q Clause 36).
+- Drivers may require generic link autonegotiation to be enabled before
+  allowing Pause Autonegotiation to be enabled.
+
+Because of these constraints, the configuration applied to the MAC
+may differ from the user request depending on the active link mode.
+
+Implications for userspace:
+
+1. Set once (the "wish"): the requested Rx/Tx PAUSE policy is
+   remembered even if it cannot be applied immediately.
+2. Applied conditionally: when the link comes up, the kernel enables
+   PAUSE only if the active mode allows it.
+
+Component Roles in Flow Control
+===============================
+
+The configuration of Flow Control involves several components, each with a
+distinct role.
+
+The MAC (Media Access Controller)
+---------------------------------
+The MAC is the hardware component that actually sends and receives PAUSE
+frames. Its capabilities define the upper limit of what the driver can support.
+For link-wide PAUSE, MACs can vary in their support for symmetric (both
+directions) or asymmetric (independent TX/RX) Flow Control.
+
+For PFC, the MAC must be capable of generating and interpreting the
+priority-based PAUSE frames and managing separate pause states for each
+traffic class.
+
+Many MACs also implement automatic PAUSE frame transmission based on the fill
+level of their internal RX FIFO. This is typically configured with two
+thresholds:
+
+* **FLOW_ON (High Water Mark)**: When the RX FIFO usage reaches this
+  threshold, the MAC automatically transmits a PAUSE frame to stop the sender.
+
+* **FLOW_OFF (Low Water Mark)**: When the RX FIFO usage drops below this
+  threshold, the MAC transmits a PAUSE frame with a quantum of zero to tell
+  the sender it can resume transmission.
+
+The PHY (Physical Layer Transceiver)
+------------------------------------
+The PHY's role is distinct for each Flow Control mechanism:
+
+* **Link-wide PAUSE**: During the autonegotiation process, the PHY is
+  responsible for advertising the device's Flow Control capabilities. See the
+  "Link-wide PAUSE Autonegotiation Details" section for more information.
+
+* **Half-Duplex Collision-Based Flow Control**: The PHY is fundamental to the
+  CSMA/CD process. It performs carrier sensing (checking if the line is idle)
+  and collision detection, which is the mechanism leveraged to throttle the
+  sender.
+
+* **Priority-based Flow Control (PFC)**: The PHY is not directly involved in
+  negotiating PFC capabilities. Its role is to establish the physical link.
+  PFC negotiation happens at a higher layer via the Data Center Bridging
+  Capability Exchange Protocol (DCBX).
+
+User Space Interface
+====================
+The primary user space tools are ``ethtool`` for link-wide PAUSE and ``dcb`` for
+PFC. They communicate with the kernel to configure the network device driver
+and underlying hardware.
+
+**Link-wide PAUSE Netlink Interface (``ethtool``)**
+
+See the ethtool Netlink spec (``Documentation/netlink/specs/ethtool.yaml``)
+for the authoritative definition of the Pause control and Pause statistics
+attributes. The generated UAPI is in
+``include/uapi/linux/ethtool_netlink_generated.h``.
+
+**PFC Netlink Interface (``dcb``)**
+
+The authoritative definitions for DCB/PFC netlink attributes and commands are in
+``include/uapi/linux/dcbnl.h``. See also the ``dcb(8)`` manual page and the DCB
+subsystem documentation for userspace configuration details.
+
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 75db2251649b..7efec5ab08cb 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -55,6 +55,7 @@ Contents:
    eql
    fib_trie
    filter
+   flow_control
    generic-hdlc
    generic_netlink
    ../netlink/specs/index
diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst
index b0f2ef83735d..40cc0a988d60 100644
--- a/Documentation/networking/phy.rst
+++ b/Documentation/networking/phy.rst
@@ -343,16 +343,8 @@ Some of the interface modes are described below:
 Pause frames / flow control
 ===========================
 
-The PHY does not participate directly in flow control/pause frames except by
-making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
-MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
-controller supports such a thing. Since flow control/pause frames generation
-involves the Ethernet MAC driver, it is recommended that this driver takes care
-of properly indicating advertisement and support for such features by setting
-the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
-either before or after phy_connect() and/or as a result of implementing the
-ethtool::set_pauseparam feature.
-
+For detailed link-wide PAUSE and PFC behavior and configuration, see
+flow_control.rst.
 
 Keeping Close Tabs on the PAL
 =============================
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 5c9162193d26..7738fe0f4461 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -953,9 +953,48 @@ struct kernel_ethtool_ts_info {
  * @get_pause_stats: Report pause frame statistics. Drivers must not zero
  *	statistics which they don't report. The stats structure is initialized
  *	to ETHTOOL_STAT_NOT_SET indicating driver does not report statistics.
- * @get_pauseparam: Report pause parameters
- * @set_pauseparam: Set pause parameters.  Returns a negative error code
- *	or zero.
+ *
+ * @get_pauseparam: Report the configured policy for link-wide PAUSE
+ *      (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
+ *      such that:
+ *      @autoneg:
+ *              This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
+ *              and is part of the link autonegotiation process.
+ *              true  -> the device follows the negotiated result of pause
+ *                       autonegotiation (Pause/Asym);
+ *              false -> the device uses a forced MAC state independent of
+ *                       negotiation.
+ *      @rx_pause/@tx_pause:
+ *              represent the desired policy (preferred configuration).
+ *              In autoneg mode they describe what is to be advertised;
+ *              in forced mode they describe the MAC state to apply.
+ *
+ *      Drivers should reject a non-zero setting of @autoneg when
+ *      autonegotiation is disabled (or not supported) for the link.
+ *      If generic autonegotiation is disabled, pause autonegotiation is
+ *      treated as disabled/inactive.
+ *
+ * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
+ *      If @autoneg is true:
+ *              Arrange for pause advertisement (Pause/Asym) based on
+ *              @rx_pause/@tx_pause and program the MAC to follow the
+ *              negotiated result (which may be symmetric, asymmetric, or off
+ *              depending on the link partner).
+ *      If @autoneg is false:
+ *              Do not rely on autonegotiation; force the MAC RX/TX pause
+ *              state directly per @rx_pause/@tx_pause.
+ *
+ *      Implementations that integrate with PHYLIB/PHYLINK should cooperate
+ *      with those frameworks for advertisement and resolution; MAC drivers are
+ *      still responsible for applying the required MAC state.
+ *
+ *      Return: 0 on success or a negative errno. Return -EOPNOTSUPP if
+ *      link-wide PAUSE is unsupported. If only symmetric pause is supported,
+ *      reject unsupported asymmetric requests with -EINVAL (or document any
+ *      coercion policy).
+ *
+ *      See also: Documentation/networking/flow_control.rst
+ *
  * @self_test: Run specified self-tests
  * @get_strings: Return a set of strings that describe the requested objects
  * @set_phys_id: Identify the physical devices, e.g. by flashing an LED
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 03eb1d941fca..91ee22f53774 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -27,6 +27,8 @@
  *
  * Priority-based Flow Control (PFC) - provides a flow control mechanism which
  *   can work independently for each 802.1p priority.
+ *   See Documentation/networking/flow_control.rst for a high level description
+ *   of the user space interface for Priority-based Flow Control (PFC).
  *
  * Congestion Notification - provides a mechanism for end-to-end congestion
  *   control for protocols which do not have built-in congestion management.
diff --git a/net/ethtool/pause.c b/net/ethtool/pause.c
index 0f9af1e66548..eacf6a4859bf 100644
--- a/net/ethtool/pause.c
+++ b/net/ethtool/pause.c
@@ -1,5 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+/* See Documentation/networking/flow_control.rst for a high level description of
+ * the userspace interface.
+ */
+
 #include "netlink.h"
 #include "common.h"
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-19 14:03 [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API Oleksij Rempel
@ 2025-11-26  2:19 ` Jakub Kicinski
  2025-11-26  8:36   ` Oleksij Rempel
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2025-11-26  2:19 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Florian Fainelli,
	Maxime Chevallier, Kory Maincent, Lukasz Majewski,
	Jonathan Corbet, Donald Hunter, Vadim Fedorenko, Jiri Pirko,
	Vladimir Oltean, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, kernel, linux-kernel,
	netdev, Russell King, Divya.Koppera, Sabrina Dubroca,
	Stanislav Fomichev

On Wed, 19 Nov 2025 15:03:17 +0100 Oleksij Rempel wrote:
> +Kernel Policy: "Set and Trust"
> +==============================
> +
> +The ethtool pause API is defined as a **wish policy** for
> +IEEE 802.3 link-wide PAUSE only. User requests express the preferred
> +configuration, but drivers may reject unsupported combinations and it
> +may not be possible to apply a request in all link states.
> +
> +Key constraints:
> +
> +- Link-wide PAUSE is not valid on half-duplex links.
> +- Link-wide PAUSE cannot be used together with Priority-based Flow Control
> +  (PFC, IEEE 802.1Q Clause 36).
> +- Drivers may require generic link autonegotiation to be enabled before
> +  allowing Pause Autonegotiation to be enabled.
> +
> +Because of these constraints, the configuration applied to the MAC
> +may differ from the user request depending on the active link mode.
> +
> +Implications for userspace:
> +
> +1. Set once (the "wish"): the requested Rx/Tx PAUSE policy is
> +   remembered even if it cannot be applied immediately.
> +2. Applied conditionally: when the link comes up, the kernel enables
> +   PAUSE only if the active mode allows it.

This section is quite confusing. Documenting the constrains make sense
but it seems like this mostly applies to autoneg on. Without really
saying so. Plus the get behavior.. see below..

> + * @get_pauseparam: Report the configured policy for link-wide PAUSE
> + *      (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
> + *      such that:
> + *      @autoneg:
> + *              This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
> + *              and is part of the link autonegotiation process.
> + *              true  -> the device follows the negotiated result of pause
> + *                       autonegotiation (Pause/Asym);
> + *              false -> the device uses a forced MAC state independent of
> + *                       negotiation.
> + *      @rx_pause/@tx_pause:
> + *              represent the desired policy (preferred configuration).
> + *              In autoneg mode they describe what is to be advertised;
> + *              in forced mode they describe the MAC state to apply.

How is the user supposed to know what ended up getting configured?
Why do we need to configure autoneg via this API and not link modes directly?

> + *      Drivers should reject a non-zero setting of @autoneg when
> + *      autonegotiation is disabled (or not supported) for the link.

I think this belong in the @set doc below..

> + *      If generic autonegotiation is disabled, pause autonegotiation is
> + *      treated as disabled/inactive.
> + *
> + * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
> + *      If @autoneg is true:
> + *              Arrange for pause advertisement (Pause/Asym) based on
> + *              @rx_pause/@tx_pause and program the MAC to follow the
> + *              negotiated result (which may be symmetric, asymmetric, or off
> + *              depending on the link partner).
> + *      If @autoneg is false:
> + *              Do not rely on autonegotiation; force the MAC RX/TX pause
> + *              state directly per @rx_pause/@tx_pause.
> + *
> + *      Implementations that integrate with PHYLIB/PHYLINK should cooperate
> + *      with those frameworks for advertisement and resolution; MAC drivers are
> + *      still responsible for applying the required MAC state.
> + *
> + *      Return: 0 on success or a negative errno. Return -EOPNOTSUPP if
> + *      link-wide PAUSE is unsupported. If only symmetric pause is supported,
> + *      reject unsupported asymmetric requests with -EINVAL (or document any
> + *      coercion policy).
> + *
> + *      See also: Documentation/networking/flow_control.rst

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-26  2:19 ` Jakub Kicinski
@ 2025-11-26  8:36   ` Oleksij Rempel
  2025-11-26 22:42     ` Jakub Kicinski
  2025-11-28  1:27     ` Russell King (Oracle)
  0 siblings, 2 replies; 21+ messages in thread
From: Oleksij Rempel @ 2025-11-26  8:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Vladimir Oltean, Alexei Starovoitov, Russell King,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Tue, Nov 25, 2025 at 06:19:57PM -0800, Jakub Kicinski wrote:
> On Wed, 19 Nov 2025 15:03:17 +0100 Oleksij Rempel wrote:
> > +Kernel Policy: "Set and Trust"
> > +==============================
> > +
> > +The ethtool pause API is defined as a **wish policy** for
> > +IEEE 802.3 link-wide PAUSE only. User requests express the preferred
> > +configuration, but drivers may reject unsupported combinations and it
> > +may not be possible to apply a request in all link states.
> > +
> > +Key constraints:
> > +
> > +- Link-wide PAUSE is not valid on half-duplex links.
> > +- Link-wide PAUSE cannot be used together with Priority-based Flow Control
> > +  (PFC, IEEE 802.1Q Clause 36).
> > +- Drivers may require generic link autonegotiation to be enabled before
> > +  allowing Pause Autonegotiation to be enabled.
> > +
> > +Because of these constraints, the configuration applied to the MAC
> > +may differ from the user request depending on the active link mode.
> > +
> > +Implications for userspace:
> > +
> > +1. Set once (the "wish"): the requested Rx/Tx PAUSE policy is
> > +   remembered even if it cannot be applied immediately.
> > +2. Applied conditionally: when the link comes up, the kernel enables
> > +   PAUSE only if the active mode allows it.
> 
> This section is quite confusing. Documenting the constrains make sense
> but it seems like this mostly applies to autoneg on. Without really
> saying so. Plus the get behavior.. see below..
> 
> > + * @get_pauseparam: Report the configured policy for link-wide PAUSE
> > + *      (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
> > + *      such that:
> > + *      @autoneg:
> > + *              This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
> > + *              and is part of the link autonegotiation process.
> > + *              true  -> the device follows the negotiated result of pause
> > + *                       autonegotiation (Pause/Asym);
> > + *              false -> the device uses a forced MAC state independent of
> > + *                       negotiation.
> > + *      @rx_pause/@tx_pause:
> > + *              represent the desired policy (preferred configuration).
> > + *              In autoneg mode they describe what is to be advertised;
> > + *              in forced mode they describe the MAC state to apply.
> 
> How is the user supposed to know what ended up getting configured?

My current understanding is that get_pauseparam() is mainly a
configuration API. It seems to be designed symmetric to
set_pauseparam(): it reports the requested policy (autoneg flag and
rx/tx pause), not the resolved MAC state.

In autoneg mode this means the user sees what we intend to advertise
or force, but not necessarily what the MAC actually ended up with
after resolution.

The ethtool userspace tool tries to fill this gap by showing
"RX negotiated" and "TX negotiated" fields, for example:

  Pause parameters for lan1:
    Autonegotiate:  on
    RX:             off
    TX:             off
    RX negotiated:  on
    TX negotiated:  on

As far as I can see, these "negotiated" values are not read from hardware or
kernel. They are guessed in userspace from the local and link partner
advertisements, assuming that the kernel follows the same pause resolution
rules as ethtool does. If the kernel or hardware behaves differently, these
values can be wrong.

So, with the current API, the user gets:
- the configured policy via get_pauseparam(), and
- an ethtool-side guess of the resolved state via
  "RX negotiated"/"TX negotiated",

> Why do we need to configure autoneg via this API and not link modes directly?

I am not aware of a clear reason. This documentation aims to describe
the current behavior and capture the rationale of the existing API.

Configuring it via link modes directly would likely resolve some of this
confusion, but for now we focus on documenting how the current API is
expected to behave.

> > + *      Drivers should reject a non-zero setting of @autoneg when
> > + *      autonegotiation is disabled (or not supported) for the link.
> 
> I think this belong in the @set doc below..

ack

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-26  8:36   ` Oleksij Rempel
@ 2025-11-26 22:42     ` Jakub Kicinski
  2025-11-27  9:20       ` Oleksij Rempel
  2025-11-28  1:27     ` Russell King (Oracle)
  1 sibling, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2025-11-26 22:42 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Vladimir Oltean, Alexei Starovoitov, Russell King,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Wed, 26 Nov 2025 09:36:42 +0100 Oleksij Rempel wrote:
> On Tue, Nov 25, 2025 at 06:19:57PM -0800, Jakub Kicinski wrote:
> > On Wed, 19 Nov 2025 15:03:17 +0100 Oleksij Rempel wrote:  
> > > + * @get_pauseparam: Report the configured policy for link-wide PAUSE
> > > + *      (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
> > > + *      such that:
> > > + *      @autoneg:
> > > + *              This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
> > > + *              and is part of the link autonegotiation process.
> > > + *              true  -> the device follows the negotiated result of pause
> > > + *                       autonegotiation (Pause/Asym);
> > > + *              false -> the device uses a forced MAC state independent of
> > > + *                       negotiation.
> > > + *      @rx_pause/@tx_pause:
> > > + *              represent the desired policy (preferred configuration).
> > > + *              In autoneg mode they describe what is to be advertised;
> > > + *              in forced mode they describe the MAC state to apply.  
> > 
> > How is the user supposed to know what ended up getting configured?  
> 
> My current understanding is that get_pauseparam() is mainly a
> configuration API. It seems to be designed symmetric to
> set_pauseparam(): it reports the requested policy (autoneg flag and
> rx/tx pause), not the resolved MAC state.
> 
> In autoneg mode this means the user sees what we intend to advertise
> or force, but not necessarily what the MAC actually ended up with
> after resolution.
> 
> The ethtool userspace tool tries to fill this gap by showing
> "RX negotiated" and "TX negotiated" fields, for example:
> 
>   Pause parameters for lan1:
>     Autonegotiate:  on
>     RX:             off
>     TX:             off
>     RX negotiated:  on
>     TX negotiated:  on
> 
> As far as I can see, these "negotiated" values are not read from hardware or
> kernel. They are guessed in userspace from the local and link partner
> advertisements, assuming that the kernel follows the same pause resolution
> rules as ethtool does. If the kernel or hardware behaves differently, these
> values can be wrong.
> 
> So, with the current API, the user gets:
> - the configured policy via get_pauseparam(), and
> - an ethtool-side guess of the resolved state via
>   "RX negotiated"/"TX negotiated",

Again, that's all well and good for autoneg, but in DC use cases with
integrated NICs autoneg is usually off. And in that case having get
report "desired" config of some sort makes much less sense, when we also
recommend that drivers reject unsupported configurations.

> > Why do we need to configure autoneg via this API and not link modes directly?  
> 
> I am not aware of a clear reason. This documentation aims to describe
> the current behavior and capture the rationale of the existing API.

To spell it out more forcefully I think it describes the current
behavior for certain devices. I could be wrong but the expectations
for when autoneg is off should be different.

> Configuring it via link modes directly would likely resolve some of this
> confusion, but for now we focus on documenting how the current API is
> expected to behave.

You say current API - is setting Pause and Asym_Pause via link modes
today rejected? I don't see an explicit check by grepping but I haven't
really tried..

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-26 22:42     ` Jakub Kicinski
@ 2025-11-27  9:20       ` Oleksij Rempel
  2025-11-27 15:07         ` Andrew Lunn
  2025-11-27 16:10         ` Russell King (Oracle)
  0 siblings, 2 replies; 21+ messages in thread
From: Oleksij Rempel @ 2025-11-27  9:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Vladimir Oltean, Alexei Starovoitov, Russell King,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Wed, Nov 26, 2025 at 02:42:25PM -0800, Jakub Kicinski wrote:
> On Wed, 26 Nov 2025 09:36:42 +0100 Oleksij Rempel wrote:
> > On Tue, Nov 25, 2025 at 06:19:57PM -0800, Jakub Kicinski wrote:
> > > On Wed, 19 Nov 2025 15:03:17 +0100 Oleksij Rempel wrote:  
> > > > + * @get_pauseparam: Report the configured policy for link-wide PAUSE
> > > > + *      (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
> > > > + *      such that:
> > > > + *      @autoneg:
> > > > + *              This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
> > > > + *              and is part of the link autonegotiation process.
> > > > + *              true  -> the device follows the negotiated result of pause
> > > > + *                       autonegotiation (Pause/Asym);
> > > > + *              false -> the device uses a forced MAC state independent of
> > > > + *                       negotiation.
> > > > + *      @rx_pause/@tx_pause:
> > > > + *              represent the desired policy (preferred configuration).
> > > > + *              In autoneg mode they describe what is to be advertised;
> > > > + *              in forced mode they describe the MAC state to apply.  
> > > 
> > > How is the user supposed to know what ended up getting configured?  
> > 
> > My current understanding is that get_pauseparam() is mainly a
> > configuration API. It seems to be designed symmetric to
> > set_pauseparam(): it reports the requested policy (autoneg flag and
> > rx/tx pause), not the resolved MAC state.
> > 
> > In autoneg mode this means the user sees what we intend to advertise
> > or force, but not necessarily what the MAC actually ended up with
> > after resolution.
> > 
> > The ethtool userspace tool tries to fill this gap by showing
> > "RX negotiated" and "TX negotiated" fields, for example:
> > 
> >   Pause parameters for lan1:
> >     Autonegotiate:  on
> >     RX:             off
> >     TX:             off
> >     RX negotiated:  on
> >     TX negotiated:  on
> > 
> > As far as I can see, these "negotiated" values are not read from hardware or
> > kernel. They are guessed in userspace from the local and link partner
> > advertisements, assuming that the kernel follows the same pause resolution
> > rules as ethtool does. If the kernel or hardware behaves differently, these
> > values can be wrong.
> > 
> > So, with the current API, the user gets:
> > - the configured policy via get_pauseparam(), and
> > - an ethtool-side guess of the resolved state via
> >   "RX negotiated"/"TX negotiated",
> 
> Again, that's all well and good for autoneg, but in DC use cases with
> integrated NICs autoneg is usually off. And in that case having get
> report "desired" config of some sort makes much less sense, when we also
> recommend that drivers reject unsupported configurations.
> 
> > > Why do we need to configure autoneg via this API and not link modes directly?  
> > 
> > I am not aware of a clear reason. This documentation aims to describe
> > the current behavior and capture the rationale of the existing API.
> 
> To spell it out more forcefully I think it describes the current
> behavior for certain devices. I could be wrong but the expectations
> for when autoneg is off should be different.
> 
> > Configuring it via link modes directly would likely resolve some of this
> > confusion, but for now we focus on documenting how the current API is
> > expected to behave.
> 
> You say current API - is setting Pause and Asym_Pause via link modes
> today rejected? I don't see an explicit check by grepping but I haven't
> really tried..

Haw about following wording:
Kernel Policy: Administrative vs. Operational State
===================================================

The ethtool pause API configures the **administrative state** of the network
device. The **operational state** (the actual pause behavior active on the
wire) depends on the active link mode and the link partner.

The semantics of the configuration depend on the ``autoneg`` parameter:

1. **Autonegotiation Mode** (``autoneg on``)
   In this mode, the ``rx`` and ``tx`` parameters specify the **advertisement**
   (the "wish").

   - The driver configures the PHY to advertise these capabilities.
   - The actual Flow Control mode is determined by the standard resolution
     truth table (see "Link-wide PAUSE Autonegotiation Details") based on the
     link partner's advertisement.
   - ``get_pauseparam`` reports the advertisement policy, not the resolved
     outcome.

2. **Forced Mode** (``autoneg off``)
   In this mode, the ``rx`` and ``tx`` parameters constitute a direct
   **command** to the interface.

   - The system bypasses advertisement and forces the MAC into the specified
     configuration.
   - Drivers should reject configurations that the hardware cannot support in
     forced mode.
   - ``get_pauseparam`` reports the forced configuration.

**Common Constraints**
Regardless of the mode, the following constraints apply:

- Link-wide PAUSE is not valid on half-duplex links.
- Link-wide PAUSE cannot be used together with Priority-based Flow Control
  (PFC).


/**
 * ...
 * @get_pauseparam: Report the configured administrative policy for link-wide
 *	PAUSE (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
 *	such that:
 *	@autoneg:
 *		This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
 *		and is part of the link autonegotiation process.
 *		true  -> the device follows the negotiated result of pause
 *			 autonegotiation (Pause/Asym);
 *		false -> the device uses a forced configuration independent
 *			 of negotiation.
 *	@rx_pause/@tx_pause:
 *		represent the desired policy (administrative state).
 *		In autoneg mode they describe what is to be advertised;
 *		in forced mode they describe the MAC configuration to be forced.
 *
 * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
 *	@rx_pause/@tx_pause:
 *		Desired state. If @autoneg is true, these define the
 *		advertisement. If @autoneg is false, these define the
 *		forced MAC configuration.
 *	@autoneg:
 *		Select autonegotiation or forced mode.
 *
 *	**Constraint Checking:**
 *	Drivers should reject a non-zero setting of @autoneg when
 *	autonegotiation is disabled (or not supported) for the link.
 *	Drivers should reject unsupported rx/tx combinations with -EINVAL.
 * ...
 */

Open Questions:

Pre-link Configuration (Administrative UP, Physical DOWN) How should drivers
handle set_pauseparam when the link is physically down?

 Fully Forced: If speed/duplex are forced, we can validate the pause request
 immediately.

 Parallel Detection: If the link comes up later (e.g., as Half Duplex via
 parallel detection), a previously accepted "forced pause" configuration might
 become invalid. Should we block forced pause settings until the link is
 physically up?

State Persistence and Toggling When toggling autoneg (e.g., autoneg on -> off
-> on), should the kernel or driver cache the previous advertisement?

  Currently, if a user switches to forced mode and back, the previous
  advertisement preferences might be lost or reset to defaults depending on the
  driver.

  Similarly, if no administrative configuration has ever been set, what should
  get_pauseparam report? Should it read the current hardware state (which might
  be default) or return zero/empty?

Synchronization with Link Modes Configuring pause via set_pauseparam vs.
link_ksettings can lead to desynchronization.

  My testing shows that set_pauseparam often updates the driver's internal
  pause state but may not trigger the necessary link reset/re-advertisement
  that link_ksettings does.

  This results in the reported "Advertised" pause modes in ethtool output being
  out of sync with the actual Pause API settings.

  Combining configuration over different interfaces sometimes will avoid
  link reset, so new configuration is not advertised.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-27  9:20       ` Oleksij Rempel
@ 2025-11-27 15:07         ` Andrew Lunn
  2025-11-27 15:31           ` Maxime Chevallier
  2025-11-27 16:14           ` Russell King (Oracle)
  2025-11-27 16:10         ` Russell King (Oracle)
  1 sibling, 2 replies; 21+ messages in thread
From: Andrew Lunn @ 2025-11-27 15:07 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Jakub Kicinski, Vladimir Oltean, Alexei Starovoitov, Russell King,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

> Haw about following wording:
> Kernel Policy: Administrative vs. Operational State
> ===================================================
> 
> The ethtool pause API configures the **administrative state** of the network
> device. The **operational state** (the actual pause behavior active on the
> wire) depends on the active link mode and the link partner.
> 
> The semantics of the configuration depend on the ``autoneg`` parameter:
> 
> 1. **Autonegotiation Mode** (``autoneg on``)
>    In this mode, the ``rx`` and ``tx`` parameters specify the **advertisement**
>    (the "wish").
> 
>    - The driver configures the PHY to advertise these capabilities.
>    - The actual Flow Control mode is determined by the standard resolution
>      truth table (see "Link-wide PAUSE Autonegotiation Details") based on the
>      link partner's advertisement.
>    - ``get_pauseparam`` reports the advertisement policy, not the resolved
>      outcome.
> 
> 2. **Forced Mode** (``autoneg off``)
>    In this mode, the ``rx`` and ``tx`` parameters constitute a direct
>    **command** to the interface.
> 
>    - The system bypasses advertisement and forces the MAC into the specified
>      configuration.
>    - Drivers should reject configurations that the hardware cannot support in
>      forced mode.
>    - ``get_pauseparam`` reports the forced configuration.

There is one additional thing which plays into this, link
autonegotiation, ethtool -s autoneg on|off.

If link auto negotiation is on, you can then have both of the two
cases above, negotiated pause, or forced pause. If link auto
negotiation is off, you can only have forced mode. The text you have
below does however cover this. But this is one of the areas developers
get wrong, they don't consider how the link autoneg affects the pause
autoneg.

But i do agree that get_pauseparam is rather odd. It returns the
current configuration, not necessarily how the MAC hardware has been
programmed.

> **Common Constraints**
> Regardless of the mode, the following constraints apply:
> 
> - Link-wide PAUSE is not valid on half-duplex links.
> - Link-wide PAUSE cannot be used together with Priority-based Flow Control
>   (PFC).
> 
> 
> /**
>  * ...
>  * @get_pauseparam: Report the configured administrative policy for link-wide
>  *	PAUSE (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
>  *	such that:
>  *	@autoneg:
>  *		This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
>  *		and is part of the link autonegotiation process.
>  *		true  -> the device follows the negotiated result of pause
>  *			 autonegotiation (Pause/Asym);
>  *		false -> the device uses a forced configuration independent
>  *			 of negotiation.
>  *	@rx_pause/@tx_pause:
>  *		represent the desired policy (administrative state).
>  *		In autoneg mode they describe what is to be advertised;
>  *		in forced mode they describe the MAC configuration to be forced.
>  *
>  * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
>  *	@rx_pause/@tx_pause:
>  *		Desired state. If @autoneg is true, these define the
>  *		advertisement. If @autoneg is false, these define the
>  *		forced MAC configuration.
>  *	@autoneg:
>  *		Select autonegotiation or forced mode.
>  *
>  *	**Constraint Checking:**
>  *	Drivers should reject a non-zero setting of @autoneg when
>  *	autonegotiation is disabled (or not supported) for the link.
>  *	Drivers should reject unsupported rx/tx combinations with -EINVAL.

I'm not so keen on this last little section. What we actually want is
the drivers use phylink, and let phylink implement all the 'business
logic'. phylink will then tell the MAC driver the two bits it needs to
program the hardware. phylink does all the validation, so all a MAC
driver needs to do is call phylink_ethtool_get_pauseparam() and
phylink_ethtool_set_pauseparam(). If we say the driver reject some
combinations, we might have developers implementing that before
calling phylink_ethtool_set_pauseparam(), which is pointless, and
maybe getting it wrong.

So i would prefer something more like:

 *	**Constraint Checking:**

 *	 Ideally, drivers should simply call phylink_ethtool_get_pauseparam()
 *       and phylink_ethtool_set_pauseparam(). phylink will then perform
 *       all the needed validation, and perform all the actions based on
 *	 the current **Pause Autoneg** and link Autoneg.
 *
 *       If phylink is not being used, the driver most perform validation,
 *       reject a non-zero setting of @autoneg when autonegotiation is disabled
 *       (or not supported) for the link. Drivers should reject unsupported rx/tx
 *       combinations with -EINVAL.

> Open Questions:
> 
> Pre-link Configuration (Administrative UP, Physical DOWN) How should drivers
> handle set_pauseparam when the link is physically down?

You can program the PHY/PCS with what you want it to negotiate. Once
the link comes up, you can then look if you are in a half duplex mode
when determining how to program the MAC hardware.

>  Parallel Detection: If the link comes up later (e.g., as Half Duplex via
>  parallel detection), a previously accepted "forced pause" configuration might
>  become invalid. Should we block forced pause settings until the link is
>  physically up?

Forced is forced. Forced is always a potential foot gun, since you can
end up with the link peers having different ideas about what is being
used on the link. autoneg of half duplex link is just one of the
scenarios where you gain a hole in your foot.

> State Persistence and Toggling When toggling autoneg (e.g., autoneg on -> off
> -> on), should the kernel or driver cache the previous advertisement?

This has been discussed in the past, and i _think_ phylink does.

But before we go too far into edge causes, my review experience is
that MAC drivers get the basics wrong. What we really want to do here
is:

1) Push driver developers towards phylink
2) For those who don't use phylink give clear documentation of the
   basics.

We can look at edge cases, but i would only do it in the context of
phylink. Its one central implementation means we can add complexity
there and not overload developers who get the basics wrong.

	Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-27 15:07         ` Andrew Lunn
@ 2025-11-27 15:31           ` Maxime Chevallier
  2025-11-27 15:48             ` Andrew Lunn
  2025-11-27 16:14           ` Russell King (Oracle)
  1 sibling, 1 reply; 21+ messages in thread
From: Maxime Chevallier @ 2025-11-27 15:31 UTC (permalink / raw)
  To: Andrew Lunn, Oleksij Rempel
  Cc: Jakub Kicinski, Vladimir Oltean, Alexei Starovoitov, Russell King,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

Hi Andrew

I am sorry, I have a bit of sidetracking...

>> State Persistence and Toggling When toggling autoneg (e.g., autoneg on -> off
>> -> on), should the kernel or driver cache the previous advertisement?
> 
> This has been discussed in the past, and i _think_ phylink does.
> 
> But before we go too far into edge causes, my review experience is
> that MAC drivers get the basics wrong. What we really want to do here
> is:
> 
> 1) Push driver developers towards phylink

Is it something we should insist on in the review process ? Can we make
it a hard requirement that _new_ MAC drivers need to use phylink, if the
driver plans to interact with a PHY ?

phylink has long outgrown the original use-case of supporting SFPs by
abstracting away the MAc to [PHY/SFP] interactions, it's now used as a
an abstraction layer that avoids MAC drivers making the same mistakes
over and over again on a lot of cases that don't have anything to do
with SFP.

I think we can no longer really say "If your driver is simple enough,
you can stick to using phylib directly", at least not for new drivers,
as phylink now simplifies EEE, WoL, Pause, etc.

> 2) For those who don't use phylink give clear documentation of the
>    basics.
> 
> We can look at edge cases, but i would only do it in the context of
> phylink. Its one central implementation means we can add complexity
> there and not overload developers who get the basics wrong.
> 
> 	Andrew

Maxime

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-27 15:31           ` Maxime Chevallier
@ 2025-11-27 15:48             ` Andrew Lunn
  2025-11-27 16:18               ` Russell King (Oracle)
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Lunn @ 2025-11-27 15:48 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Oleksij Rempel, Jakub Kicinski, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Eric Dumazet, Rob Herring,
	Florian Fainelli, Donald Hunter, Daniel Borkmann, Jonathan Corbet,
	John Fastabend, Lukasz Majewski, Stanislav Fomichev, Paolo Abeni,
	Jiri Pirko, Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Thu, Nov 27, 2025 at 04:31:50PM +0100, Maxime Chevallier wrote:
> Hi Andrew
> 
> I am sorry, I have a bit of sidetracking...
> 
> >> State Persistence and Toggling When toggling autoneg (e.g., autoneg on -> off
> >> -> on), should the kernel or driver cache the previous advertisement?
> > 
> > This has been discussed in the past, and i _think_ phylink does.
> > 
> > But before we go too far into edge causes, my review experience is
> > that MAC drivers get the basics wrong. What we really want to do here
> > is:
> > 
> > 1) Push driver developers towards phylink
> 
> Is it something we should insist on in the review process ? Can we make
> it a hard requirement that _new_ MAC drivers need to use phylink, if the
> driver plans to interact with a PHY ?
> 
> phylink has long outgrown the original use-case of supporting SFPs by
> abstracting away the MAc to [PHY/SFP] interactions, it's now used as a
> an abstraction layer that avoids MAC drivers making the same mistakes
> over and over again on a lot of cases that don't have anything to do
> with SFP.

This is something i've been considering for a while.

Maybe for the last year, when i have seen broken pause, i've been
reporting the problems but also pushing developers towards
phylink. phylink also does all the business logic for EEE, and is
starting to get WoL support. So we really should be pushing developers
in that direction.

Is it time to deprecated direct phylib access?

I think the answer is Yes.

	Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-27  9:20       ` Oleksij Rempel
  2025-11-27 15:07         ` Andrew Lunn
@ 2025-11-27 16:10         ` Russell King (Oracle)
  1 sibling, 0 replies; 21+ messages in thread
From: Russell King (Oracle) @ 2025-11-27 16:10 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Jakub Kicinski, Andrew Lunn, Vladimir Oltean, Alexei Starovoitov,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Thu, Nov 27, 2025 at 10:20:54AM +0100, Oleksij Rempel wrote:
> 2. **Forced Mode** (``autoneg off``)

This should state that this is the "autoneg" on the ethtool -A / --pause
command, not the ethtool -s / --change command.

> Pre-link Configuration (Administrative UP, Physical DOWN) How should drivers
> handle set_pauseparam when the link is physically down?
> 
>  Fully Forced: If speed/duplex are forced, we can validate the pause request
>  immediately.
> 
>  Parallel Detection: If the link comes up later (e.g., as Half Duplex via
>  parallel detection), a previously accepted "forced pause" configuration might
>  become invalid. Should we block forced pause settings until the link is
>  physically up?

Why would the users request become invalid? Why should the user have to
re-set their requested policy if the link flips from FD to HD and back
to FD for whatever reason? The kernel should accept the users requested
policy, and apply it when appropriate (in other words, when in FD mode.)

> State Persistence and Toggling When toggling autoneg (e.g., autoneg on -> off
> -> on), should the kernel or driver cache the previous advertisement?
> 
>   Currently, if a user switches to forced mode and back, the previous
>   advertisement preferences might be lost or reset to defaults depending on the
>   driver.

Turning pause autoneg off should not change the advertisement. It should
be thought of a control that selects whether the results of autoneg are
used vs not used. Note that phylink updates the advertisement even when
pause autoneg is turned off. This follows the stated API documentation
(please ensure your documentation conforms to the already existing API
documentation, and doesn't inadvertently propose something different -
this is exactly why I hate that we're getting multiple definitions of
the same stuff in different places.)

 * If the link is autonegotiated, drivers should use
 * mii_advertise_flowctrl() or similar code to set the advertised
 * pause frame capabilities based on the @rx_pause and @tx_pause flags,
 * even if @autoneg is zero.  They should also allow the advertised
 * pause frame capabilities to be controlled directly through the
 * advertising field of &struct ethtool_cmd.

Note that this requires that the advertisement is updated even if pause
autoneg is zero. Phylink implements this.

>   Similarly, if no administrative configuration has ever been set, what should
>   get_pauseparam report? Should it read the current hardware state (which might
>   be default) or return zero/empty?
> 
> Synchronization with Link Modes Configuring pause via set_pauseparam vs.
> link_ksettings can lead to desynchronization.
> 
>   My testing shows that set_pauseparam often updates the driver's internal
>   pause state but may not trigger the necessary link reset/re-advertisement
>   that link_ksettings does.
> 
>   This results in the reported "Advertised" pause modes in ethtool output being
>   out of sync with the actual Pause API settings.
> 
>   Combining configuration over different interfaces sometimes will avoid
>   link reset, so new configuration is not advertised.

... which I'm sure Andrew will argue is a reason for drivers to use
phylink which implements this properly!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-27 15:07         ` Andrew Lunn
  2025-11-27 15:31           ` Maxime Chevallier
@ 2025-11-27 16:14           ` Russell King (Oracle)
  2025-11-28  1:21             ` Jakub Kicinski
  1 sibling, 1 reply; 21+ messages in thread
From: Russell King (Oracle) @ 2025-11-27 16:14 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Oleksij Rempel, Jakub Kicinski, Vladimir Oltean,
	Alexei Starovoitov, Eric Dumazet, Rob Herring, Florian Fainelli,
	Donald Hunter, Daniel Borkmann, Jonathan Corbet, John Fastabend,
	Lukasz Majewski, Maxime Chevallier, Stanislav Fomichev,
	Paolo Abeni, Jiri Pirko, Jesper Dangaard Brouer, Divya.Koppera,
	Kory Maincent, Vadim Fedorenko, netdev, Sabrina Dubroca,
	linux-kernel, kernel, Krzysztof Kozlowski, David S. Miller,
	Heiner Kallweit

On Thu, Nov 27, 2025 at 04:07:19PM +0100, Andrew Lunn wrote:
> There is one additional thing which plays into this, link
> autonegotiation, ethtool -s autoneg on|off.
> 
> If link auto negotiation is on, you can then have both of the two
> cases above, negotiated pause, or forced pause. If link auto
> negotiation is off, you can only have forced mode. The text you have
> below does however cover this. But this is one of the areas developers
> get wrong, they don't consider how the link autoneg affects the pause
> autoneg.

If there is no autoneg exchange, the capabilities of the remote end have
to be assumed to be Pause=0 AsymDir=0.

> But i do agree that get_pauseparam is rather odd. It returns the
> current configuration, not necessarily how the MAC hardware has been
> programmed.
> 
> > **Common Constraints**
> > Regardless of the mode, the following constraints apply:
> > 
> > - Link-wide PAUSE is not valid on half-duplex links.
> > - Link-wide PAUSE cannot be used together with Priority-based Flow Control
> >   (PFC).
> > 
> > 
> > /**
> >  * ...
> >  * @get_pauseparam: Report the configured administrative policy for link-wide
> >  *	PAUSE (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
> >  *	such that:
> >  *	@autoneg:
> >  *		This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
> >  *		and is part of the link autonegotiation process.
> >  *		true  -> the device follows the negotiated result of pause
> >  *			 autonegotiation (Pause/Asym);
> >  *		false -> the device uses a forced configuration independent
> >  *			 of negotiation.
> >  *	@rx_pause/@tx_pause:
> >  *		represent the desired policy (administrative state).
> >  *		In autoneg mode they describe what is to be advertised;
> >  *		in forced mode they describe the MAC configuration to be forced.
> >  *
> >  * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
> >  *	@rx_pause/@tx_pause:
> >  *		Desired state. If @autoneg is true, these define the
> >  *		advertisement. If @autoneg is false, these define the
> >  *		forced MAC configuration.
> >  *	@autoneg:
> >  *		Select autonegotiation or forced mode.
> >  *
> >  *	**Constraint Checking:**
> >  *	Drivers should reject a non-zero setting of @autoneg when
> >  *	autonegotiation is disabled (or not supported) for the link.
> >  *	Drivers should reject unsupported rx/tx combinations with -EINVAL.

Definitely not. Drivers should accept autoneg=1 because that is the
user stating "my desire is to use the result of autonegotiation when
it becomes available". Just because autoneg may be disabled doesn't
mean it will remain disabled, and having to issue ethtool commands
in the right sequence leads to poor user experiences.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-27 15:48             ` Andrew Lunn
@ 2025-11-27 16:18               ` Russell King (Oracle)
  0 siblings, 0 replies; 21+ messages in thread
From: Russell King (Oracle) @ 2025-11-27 16:18 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Maxime Chevallier, Oleksij Rempel, Jakub Kicinski,
	Vladimir Oltean, Alexei Starovoitov, Eric Dumazet, Rob Herring,
	Florian Fainelli, Donald Hunter, Daniel Borkmann, Jonathan Corbet,
	John Fastabend, Lukasz Majewski, Stanislav Fomichev, Paolo Abeni,
	Jiri Pirko, Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Thu, Nov 27, 2025 at 04:48:42PM +0100, Andrew Lunn wrote:
> Maybe for the last year, when i have seen broken pause, i've been
> reporting the problems but also pushing developers towards
> phylink. phylink also does all the business logic for EEE, and is
> starting to get WoL support.

Not "starting" - it has a full implementation for it, unless I'm
missing something.

What is missing is the upgrade of phylib drivers to a more modern WoL
approach where they actually tell us that they truly are capable of
waking the system - I don't think that's something we has core code
maintainers can really get involved with, but push people to do the
necessary leg work to make it so.

We have far too many phylib drivers that implement the get_wol
without a thought to whether the hardware can actually wake the
system, and that is completely incompatible with having logic to
determine whether a certain WoL configuration should be set on the
PHY or the MAC.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-27 16:14           ` Russell King (Oracle)
@ 2025-11-28  1:21             ` Jakub Kicinski
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2025-11-28  1:21 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Oleksij Rempel, Vladimir Oltean, Alexei Starovoitov,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Thu, 27 Nov 2025 16:14:15 +0000 Russell King (Oracle) wrote:
> > >  *	**Constraint Checking:**
> > >  *	Drivers should reject a non-zero setting of @autoneg when
> > >  *	autonegotiation is disabled (or not supported) for the link.
> > >  *	Drivers should reject unsupported rx/tx combinations with -EINVAL.  
> 
> Definitely not. Drivers should accept autoneg=1 because that is the
> user stating "my desire is to use the result of autonegotiation when
> it becomes available". Just because autoneg may be disabled doesn't
> mean it will remain disabled, and having to issue ethtool commands
> in the right sequence leads to poor user experiences.

It's an existing recommendation, coming from 6a7a1081cebacc4.
I thought it's just because of the ambiguity what the settings mean
with autoneg on or off. But looks like Ben has been trying to push
people towards link mode bits 11 years ago already :(

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-26  8:36   ` Oleksij Rempel
  2025-11-26 22:42     ` Jakub Kicinski
@ 2025-11-28  1:27     ` Russell King (Oracle)
  2025-11-28  1:47       ` Russell King (Oracle)
  1 sibling, 1 reply; 21+ messages in thread
From: Russell King (Oracle) @ 2025-11-28  1:27 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Jakub Kicinski, Andrew Lunn, Vladimir Oltean, Alexei Starovoitov,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Wed, Nov 26, 2025 at 09:36:42AM +0100, Oleksij Rempel wrote:
> My current understanding is that get_pauseparam() is mainly a
> configuration API. It seems to be designed symmetric to
> set_pauseparam(): it reports the requested policy (autoneg flag and
> rx/tx pause), not the resolved MAC state.
> 
> In autoneg mode this means the user sees what we intend to advertise
> or force, but not necessarily what the MAC actually ended up with
> after resolution.
> 
> The ethtool userspace tool tries to fill this gap by showing
> "RX negotiated" and "TX negotiated" fields, for example:
> 
>   Pause parameters for lan1:
>     Autonegotiate:  on
>     RX:             off
>     TX:             off
>     RX negotiated:  on
>     TX negotiated:  on
> 
> As far as I can see, these "negotiated" values are not read from hardware or
> kernel. They are guessed in userspace from the local and link partner
> advertisements

They are not "guessed". IEEE 802.3 defines how the negotiation resolves
to these, and ethtool implements that, just the same as how we resolve
it in phylib.

Whether the MAC takes any notice of that or not is a MAC driver problem.

> , assuming that the kernel follows the same pause resolution
> rules as ethtool does. If the kernel or hardware behaves differently, these
> values can be wrong.

If it doesn't follow IEEE 802.3 resolution, then it's quite simply
broken. IEEE 802.3 requires certain resolution methods from the
negotiation in order for both link partners to inter-operate.

Don't make this more complex than it needs to be!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28  1:27     ` Russell King (Oracle)
@ 2025-11-28  1:47       ` Russell King (Oracle)
  2025-11-28  8:55         ` Oleksij Rempel
  0 siblings, 1 reply; 21+ messages in thread
From: Russell King (Oracle) @ 2025-11-28  1:47 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Jakub Kicinski, Andrew Lunn, Vladimir Oltean, Alexei Starovoitov,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Fri, Nov 28, 2025 at 01:27:29AM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 26, 2025 at 09:36:42AM +0100, Oleksij Rempel wrote:
> > My current understanding is that get_pauseparam() is mainly a
> > configuration API. It seems to be designed symmetric to
> > set_pauseparam(): it reports the requested policy (autoneg flag and
> > rx/tx pause), not the resolved MAC state.
> > 
> > In autoneg mode this means the user sees what we intend to advertise
> > or force, but not necessarily what the MAC actually ended up with
> > after resolution.
> > 
> > The ethtool userspace tool tries to fill this gap by showing
> > "RX negotiated" and "TX negotiated" fields, for example:
> > 
> >   Pause parameters for lan1:
> >     Autonegotiate:  on
> >     RX:             off
> >     TX:             off
> >     RX negotiated:  on
> >     TX negotiated:  on
> > 
> > As far as I can see, these "negotiated" values are not read from hardware or
> > kernel. They are guessed in userspace from the local and link partner
> > advertisements
> 
> They are not "guessed". IEEE 802.3 defines how the negotiation resolves
> to these, and ethtool implements that, just the same as how we resolve
> it in phylib.
> 
> Whether the MAC takes any notice of that or not is a MAC driver problem.
> 
> > , assuming that the kernel follows the same pause resolution
> > rules as ethtool does. If the kernel or hardware behaves differently, these
> > values can be wrong.
> 
> If it doesn't follow IEEE 802.3 resolution, then it's quite simply
> broken. IEEE 802.3 requires certain resolution methods from the
> negotiation in order for both link partners to inter-operate.
> 
> Don't make this more complex than it needs to be!

Also note that there is hardware out there which can't tell us "the
hardware enabled transmission of pause frames" and "the hardware will
respect received pause frames". One example is some of the Marvell
DSA switches which only have a single status bit. Whether that means
they only support symmetric pause, I'm not certain, the docs don't
say.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28  1:47       ` Russell King (Oracle)
@ 2025-11-28  8:55         ` Oleksij Rempel
  2025-11-28  9:35           ` Russell King (Oracle)
  2025-11-28 18:32           ` Jakub Kicinski
  0 siblings, 2 replies; 21+ messages in thread
From: Oleksij Rempel @ 2025-11-28  8:55 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Vladimir Oltean, Alexei Starovoitov,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

Hi all,

Before sending v9, I would like to summarize the discussion and validate
the intended logic one last time.

Based on the feedback (specifically Russell's clarification on API
semantics and Phylink behavior), I will document the following logic.

Proposed Text: Documentation/networking/flow_control.rst
--------------------------------------------------------

Kernel Policy: User Intent & Resolution
=======================================

The ethtool pause API ('ethtool -A' or '--pause') configures the **User
Intent** for **Link-wide PAUSE** (IEEE 802.3 Annex 31B). The
**Operational State** (what actually happens on the wire) is derived
from this intent, the active link mode, and the link partner.

**Disambiguation: Pause Autoneg vs. Link Autoneg**
In this section, "autonegotiation" refers exclusively to the **Pause
Autonegotiation** parameter ('ethtool -A / --pause ... autoneg <on|off>').
This is distinct from, but interacts with, **Generic Link
Autonegotiation** ('ethtool -s / --change ... autoneg <on|off>').

The semantics of the Pause API depend on the 'autoneg' parameter:

1. **Resolution Mode** ('ethtool -A ... autoneg on')
   The user intends for the device to **respect the negotiated result**.

   - **Advertisement:** The system updates the PHY advertisement
     (Symmetric/Asymmetric pause bits if the link medium supports
     advertisement) to match the ``rx`` and ``tx`` parameters.
   - **Resolution:** The system configures the MAC to follow the standard
     IEEE 802.3 Resolution Truth Table based on the Local Advertisement
     vs. Link Partner Advertisement.
   - **Constraint:** If Link Autonegotiation ('ethtool -s / --change')
     is disabled, the resolution cannot occur. The Operational State
     effectively becomes **Disabled** (as negotiation is impossible)
     regardless of the advertisement. However, the system **MUST**
     accept this configuration as a valid stored intent for future use.

2. **Forced Mode** ('ethtool -A ... autoneg off')
   The user intends to **override negotiation** and force a specific
   state (if the link mode permits).

   - **Advertisement:** The system should update the PHY advertisement
     (if the link medium supports advertisement) to match the ``rx`` and
     ``tx`` parameters, ensuring the link partner is aware of the forced
     configuration.
   - **Resolution:** The system configures the MAC according to the
     specified ``rx`` and ``tx`` parameters, ignoring the link partner's
     advertisement.

**Global Constraint: Full-Duplex Only**
Link-wide PAUSE (Annex 31B) is strictly defined for **Full-Duplex** links.
If the link mode is **Half-Duplex** (whether forced or negotiated),
Link-wide PAUSE is operationally **disabled** regardless of the
parameters set above.

**Summary of "autoneg" Flag Meaning:**
- true  -> **Delegate decision:** "Use the IEEE 802.3 logic to decide."
- false -> **Force decision:** "Do exactly what I say (if the link supports it)."

Proposed Text: include/linux/ethtool.h
--------------------------------------

/**
 * @get_pauseparam: Report the configured administrative policy for
 * link-wide PAUSE (IEEE 802.3 Annex 31B). Drivers must fill struct
 * ethtool_pauseparam such that:
 * @autoneg:
 *   This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only.
 *   true  -> the device follows the negotiated result of pause
 *     autonegotiation (Pause/Asym) when the link allows it;
 *   false -> the device uses a forced configuration.
 * @rx_pause/@tx_pause:
 *   Represent the desired policy (Administrative State).
 *   In autoneg mode they describe what is to be advertised;
 *   in forced mode they describe the MAC configuration to be forced.
 *
 * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
 * @rx_pause/@tx_pause:
 *   Desired state. If @autoneg is true, these define the
 *   advertisement. If @autoneg is false, these define the
 *   forced MAC configuration (and preferably the advertisement too).
 * @autoneg:
 *   Select Resolution Mode (true) or Forced Mode (false).
 *
 * **Constraint Checking:**
 *   Drivers MUST accept a setting of @autoneg (true) even if generic
 *   link autonegotiation ('ethtool -s / --change') is currently disabled.
 *   This allows the user to pre-configure the desired policy for future
 *   link modes.
 *
 * New drivers are strongly encouraged to use phylink_ethtool_get_pauseparam()
 * and phylink_ethtool_set_pauseparam() which implement this logic
 * correctly.
 */

Best Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28  8:55         ` Oleksij Rempel
@ 2025-11-28  9:35           ` Russell King (Oracle)
  2025-11-28 18:32           ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Russell King (Oracle) @ 2025-11-28  9:35 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Jakub Kicinski, Andrew Lunn, Vladimir Oltean, Alexei Starovoitov,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Fri, Nov 28, 2025 at 09:55:22AM +0100, Oleksij Rempel wrote:
> Hi all,
> 
> Before sending v9, I would like to summarize the discussion and validate
> the intended logic one last time.
> 
> Based on the feedback (specifically Russell's clarification on API
> semantics and Phylink behavior), I will document the following logic.
> 
> Proposed Text: Documentation/networking/flow_control.rst
> --------------------------------------------------------
> 
> Kernel Policy: User Intent & Resolution
> =======================================
> 
> The ethtool pause API ('ethtool -A' or '--pause') configures the **User
> Intent** for **Link-wide PAUSE** (IEEE 802.3 Annex 31B). The
> **Operational State** (what actually happens on the wire) is derived
> from this intent, the active link mode, and the link partner.
> 
> **Disambiguation: Pause Autoneg vs. Link Autoneg**
> In this section, "autonegotiation" refers exclusively to the **Pause
> Autonegotiation** parameter ('ethtool -A / --pause ... autoneg <on|off>').
> This is distinct from, but interacts with, **Generic Link
> Autonegotiation** ('ethtool -s / --change ... autoneg <on|off>').
> 
> The semantics of the Pause API depend on the 'autoneg' parameter:
> 
> 1. **Resolution Mode** ('ethtool -A ... autoneg on')
>    The user intends for the device to **respect the negotiated result**.
> 
>    - **Advertisement:** The system updates the PHY advertisement
>      (Symmetric/Asymmetric pause bits if the link medium supports
>      advertisement) to match the ``rx`` and ``tx`` parameters.
>    - **Resolution:** The system configures the MAC to follow the standard
>      IEEE 802.3 Resolution Truth Table based on the Local Advertisement
>      vs. Link Partner Advertisement.
>    - **Constraint:** If Link Autonegotiation ('ethtool -s / --change')
>      is disabled, the resolution cannot occur. The Operational State
>      effectively becomes **Disabled** (as negotiation is impossible)
>      regardless of the advertisement. However, the system **MUST**
>      accept this configuration as a valid stored intent for future use.

This looks fine to me now, thanks.

> 
> 2. **Forced Mode** ('ethtool -A ... autoneg off')
>    The user intends to **override negotiation** and force a specific
>    state (if the link mode permits).
> 
>    - **Advertisement:** The system should update the PHY advertisement
>      (if the link medium supports advertisement) to match the ``rx`` and
>      ``tx`` parameters, ensuring the link partner is aware of the forced
>      configuration.
>    - **Resolution:** The system configures the MAC according to the
>      specified ``rx`` and ``tx`` parameters, ignoring the link partner's
>      advertisement.
> 
> **Global Constraint: Full-Duplex Only**
> Link-wide PAUSE (Annex 31B) is strictly defined for **Full-Duplex** links.
> If the link mode is **Half-Duplex** (whether forced or negotiated),
> Link-wide PAUSE is operationally **disabled** regardless of the
> parameters set above.
> 
> **Summary of "autoneg" Flag Meaning:**
> - true  -> **Delegate decision:** "Use the IEEE 802.3 logic to decide."
> - false -> **Force decision:** "Do exactly what I say (if the link supports it)."

"if the network device supports it"

> 
> Proposed Text: include/linux/ethtool.h
> --------------------------------------
> 
> /**
>  * @get_pauseparam: Report the configured administrative policy for
>  * link-wide PAUSE (IEEE 802.3 Annex 31B). Drivers must fill struct
>  * ethtool_pauseparam such that:
>  * @autoneg:
>  *   This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only.
>  *   true  -> the device follows the negotiated result of pause
>  *     autonegotiation (Pause/Asym) when the link allows it;

               "the device follows the result of pause autonegotiation
	 when the link allows it;"

>  *   false -> the device uses a forced configuration.
>  * @rx_pause/@tx_pause:
>  *   Represent the desired policy (Administrative State).
>  *   In autoneg mode they describe what is to be advertised;
>  *   in forced mode they describe the MAC configuration to be forced.
>  *
>  * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
>  * @rx_pause/@tx_pause:
>  *   Desired state. If @autoneg is true, these define the
>  *   advertisement. If @autoneg is false, these define the
>  *   forced MAC configuration (and preferably the advertisement too).
>  * @autoneg:
>  *   Select Resolution Mode (true) or Forced Mode (false).
>  *
>  * **Constraint Checking:**
>  *   Drivers MUST accept a setting of @autoneg (true) even if generic
>  *   link autonegotiation ('ethtool -s / --change') is currently disabled.
>  *   This allows the user to pre-configure the desired policy for future
>  *   link modes.
>  *
>  * New drivers are strongly encouraged to use phylink_ethtool_get_pauseparam()
>  * and phylink_ethtool_set_pauseparam() which implement this logic
>  * correctly.
>  */

Apart from the two minor issues above,

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28  8:55         ` Oleksij Rempel
  2025-11-28  9:35           ` Russell King (Oracle)
@ 2025-11-28 18:32           ` Jakub Kicinski
  2025-11-28 20:16             ` Andrew Lunn
  1 sibling, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2025-11-28 18:32 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Russell King (Oracle), Andrew Lunn, Vladimir Oltean,
	Alexei Starovoitov, Eric Dumazet, Rob Herring, Florian Fainelli,
	Donald Hunter, Daniel Borkmann, Jonathan Corbet, John Fastabend,
	Lukasz Majewski, Maxime Chevallier, Stanislav Fomichev,
	Paolo Abeni, Jiri Pirko, Jesper Dangaard Brouer, Divya.Koppera,
	Kory Maincent, Vadim Fedorenko, netdev, Sabrina Dubroca,
	linux-kernel, kernel, Krzysztof Kozlowski, David S. Miller,
	Heiner Kallweit

On Fri, 28 Nov 2025 09:55:22 +0100 Oleksij Rempel wrote:
>  * **Constraint Checking:**
>  *   Drivers MUST accept a setting of @autoneg (true) even if generic
>  *   link autonegotiation ('ethtool -s / --change') is currently disabled.
>  *   This allows the user to pre-configure the desired policy for future
>  *   link modes.

!? I pointed out so many times that this contradicts the long standing
recommendation.

Can you please tell me what is preventing us from deprecating pauseparam
API *for autoneg* and using linkmodes which are completely unambiguous.
And allows the user to "pre configure" the advertisement.

The pause set API should remain primarily for forced mode configuration.
Perhaps the move is to make it read only for new drivers when aneg is
turned on?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28 18:32           ` Jakub Kicinski
@ 2025-11-28 20:16             ` Andrew Lunn
  2025-11-28 20:38               ` Russell King (Oracle)
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Lunn @ 2025-11-28 20:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Oleksij Rempel, Russell King (Oracle), Vladimir Oltean,
	Alexei Starovoitov, Eric Dumazet, Rob Herring, Florian Fainelli,
	Donald Hunter, Daniel Borkmann, Jonathan Corbet, John Fastabend,
	Lukasz Majewski, Maxime Chevallier, Stanislav Fomichev,
	Paolo Abeni, Jiri Pirko, Jesper Dangaard Brouer, Divya.Koppera,
	Kory Maincent, Vadim Fedorenko, netdev, Sabrina Dubroca,
	linux-kernel, kernel, Krzysztof Kozlowski, David S. Miller,
	Heiner Kallweit

> Can you please tell me what is preventing us from deprecating pauseparam
> API *for autoneg* and using linkmodes which are completely unambiguous.

Just to make sure i understand you here...

You mean make use of

        ETHTOOL_LINK_MODE_Pause_BIT             = 13,
        ETHTOOL_LINK_MODE_Asym_Pause_BIT        = 14,

So i would do a ksettings_set() with

__ETHTOOL_LINK_MODE_LEGACY_MASK(Pause) | __ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)

to indicate both pause and asym pause should be advertised.

The man page for ethtool does not indicate you can do this. It does
have a list of link mode bits you can pass via the advertise option to
ethtool -s, bit they are all actual link modes, not features like TP,
AUI, BNC, Pause, Backplane, FEC none, FEC baser, etc.

	Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28 20:16             ` Andrew Lunn
@ 2025-11-28 20:38               ` Russell King (Oracle)
  2025-11-28 22:17                 ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Russell King (Oracle) @ 2025-11-28 20:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Oleksij Rempel, Vladimir Oltean,
	Alexei Starovoitov, Eric Dumazet, Rob Herring, Florian Fainelli,
	Donald Hunter, Daniel Borkmann, Jonathan Corbet, John Fastabend,
	Lukasz Majewski, Maxime Chevallier, Stanislav Fomichev,
	Paolo Abeni, Jiri Pirko, Jesper Dangaard Brouer, Divya.Koppera,
	Kory Maincent, Vadim Fedorenko, netdev, Sabrina Dubroca,
	linux-kernel, kernel, Krzysztof Kozlowski, David S. Miller,
	Heiner Kallweit

On Fri, Nov 28, 2025 at 09:16:24PM +0100, Andrew Lunn wrote:
> > Can you please tell me what is preventing us from deprecating pauseparam
> > API *for autoneg* and using linkmodes which are completely unambiguous.
> 
> Just to make sure i understand you here...
> 
> You mean make use of
> 
>         ETHTOOL_LINK_MODE_Pause_BIT             = 13,
>         ETHTOOL_LINK_MODE_Asym_Pause_BIT        = 14,
> 
> So i would do a ksettings_set() with
> 
> __ETHTOOL_LINK_MODE_LEGACY_MASK(Pause) | __ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)
> 
> to indicate both pause and asym pause should be advertised.
> 
> The man page for ethtool does not indicate you can do this. It does
> have a list of link mode bits you can pass via the advertise option to
> ethtool -s, bit they are all actual link modes, not features like TP,
> AUI, BNC, Pause, Backplane, FEC none, FEC baser, etc.

I see the latest ethtool now supports -s ethX advertise MODE on|off,
but it doesn't describe that in the parameter entry for "advertise"
and doesn't suggest what MODE should be, nor how to specify multiple
modes that one may wish to turn on/off. I'm guessing this is what you're
referring to.

The ports never get advertised, so I don't think they're relevant.

However, the lack of the pause bits means that one is forced to use
the hex number, and I don't deem that to be a user interface. That's
a programmers interface, or rather a nightmare, because even if you're
a programmer, you still end up looking at include/uapi/linux/ethtool.h
and doing the maths to work out the hex number to pass, and then you
mistype it with the wrong number of zeros, so you try again, and
eventually you get the advertisement you wanted.

So no, I don't accept Jakub's argument right now. Forcing people into
the nightmare of working out a hex number isn't something for users.
That's a debug tool at best.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28 20:38               ` Russell King (Oracle)
@ 2025-11-28 22:17                 ` Jakub Kicinski
  2025-12-01  9:49                   ` Oleksij Rempel
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2025-11-28 22:17 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Oleksij Rempel, Vladimir Oltean, Alexei Starovoitov,
	Eric Dumazet, Rob Herring, Florian Fainelli, Donald Hunter,
	Daniel Borkmann, Jonathan Corbet, John Fastabend, Lukasz Majewski,
	Maxime Chevallier, Stanislav Fomichev, Paolo Abeni, Jiri Pirko,
	Jesper Dangaard Brouer, Divya.Koppera, Kory Maincent,
	Vadim Fedorenko, netdev, Sabrina Dubroca, linux-kernel, kernel,
	Krzysztof Kozlowski, David S. Miller, Heiner Kallweit

On Fri, 28 Nov 2025 20:38:28 +0000 Russell King (Oracle) wrote:
> On Fri, Nov 28, 2025 at 09:16:24PM +0100, Andrew Lunn wrote:
> > > Can you please tell me what is preventing us from deprecating pauseparam
> > > API *for autoneg* and using linkmodes which are completely unambiguous.  
> > 
> > Just to make sure i understand you here...
> > 
> > You mean make use of
> > 
> >         ETHTOOL_LINK_MODE_Pause_BIT             = 13,
> >         ETHTOOL_LINK_MODE_Asym_Pause_BIT        = 14,
> > 
> > So i would do a ksettings_set() with
> > 
> > __ETHTOOL_LINK_MODE_LEGACY_MASK(Pause) | __ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)
> > 
> > to indicate both pause and asym pause should be advertised.
> > 
> > The man page for ethtool does not indicate you can do this. It does
> > have a list of link mode bits you can pass via the advertise option to
> > ethtool -s, bit they are all actual link modes, not features like TP,
> > AUI, BNC, Pause, Backplane, FEC none, FEC baser, etc.  
> 
> I see the latest ethtool now supports -s ethX advertise MODE on|off,
> but it doesn't describe that in the parameter entry for "advertise"
> and doesn't suggest what MODE should be, nor how to specify multiple
> modes that one may wish to turn on/off. I'm guessing this is what you're
> referring to.
> 
> The ports never get advertised, so I don't think they're relevant.
> 
> However, the lack of the pause bits means that one is forced to use
> the hex number, and I don't deem that to be a user interface. That's
> a programmers interface, or rather a nightmare, because even if you're
> a programmer, you still end up looking at include/uapi/linux/ethtool.h
> and doing the maths to work out the hex number to pass, and then you
> mistype it with the wrong number of zeros, so you try again, and
> eventually you get the advertisement you wanted.
> 
> So no, I don't accept Jakub's argument right now. Forcing people into
> the nightmare of working out a hex number isn't something for users.

I did some digging, too, just now. Looks like the options are indeed
not documented in the man page but ethtool uses the "forward compatible"
scheme with strings coming from the kernel. So this:

  ethtool -s enp0s13f0u1u1 advertise Pause on Asym_Pause on

works just fine, with no changes in CLI.

We should probably document that it works in the ethtool help and man
page. And possibly add some synthetic options like Receive-Only /
Transmit-Only so that users don't have to be aware of the encoding
details? Let me know if it's impractical, otherwise I think we'll
agree that having ethtool that makes it obvious how to achieve the
desired configuration beats best long form docs in the kernel..

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API
  2025-11-28 22:17                 ` Jakub Kicinski
@ 2025-12-01  9:49                   ` Oleksij Rempel
  0 siblings, 0 replies; 21+ messages in thread
From: Oleksij Rempel @ 2025-12-01  9:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Russell King (Oracle), Andrew Lunn, Vladimir Oltean,
	Alexei Starovoitov, Eric Dumazet, Rob Herring, Florian Fainelli,
	Donald Hunter, Daniel Borkmann, Jonathan Corbet, John Fastabend,
	Lukasz Majewski, Maxime Chevallier, Stanislav Fomichev,
	Paolo Abeni, Jiri Pirko, Jesper Dangaard Brouer, Divya.Koppera,
	Kory Maincent, Vadim Fedorenko, netdev, Sabrina Dubroca,
	linux-kernel, kernel, Krzysztof Kozlowski, David S. Miller,
	Heiner Kallweit

Hi Jakub, Russell, all,

On Fri, Nov 28, 2025 at 02:17:10PM -0800, Jakub Kicinski wrote:
> On Fri, 28 Nov 2025 20:38:28 +0000 Russell King (Oracle) wrote:
> > On Fri, Nov 28, 2025 at 09:16:24PM +0100, Andrew Lunn wrote:
> > > > Can you please tell me what is preventing us from deprecating pauseparam
> > > > API *for autoneg* and using linkmodes which are completely unambiguous.  
> > > 
> > > Just to make sure i understand you here...
> > > 
> > > You mean make use of
> > > 
> > >         ETHTOOL_LINK_MODE_Pause_BIT             = 13,
> > >         ETHTOOL_LINK_MODE_Asym_Pause_BIT        = 14,
> > > 
> > > So i would do a ksettings_set() with
> > > 
> > > __ETHTOOL_LINK_MODE_LEGACY_MASK(Pause) | __ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)
> > > 
> > > to indicate both pause and asym pause should be advertised.
> > > 
> > > The man page for ethtool does not indicate you can do this. It does
> > > have a list of link mode bits you can pass via the advertise option to
> > > ethtool -s, bit they are all actual link modes, not features like TP,
> > > AUI, BNC, Pause, Backplane, FEC none, FEC baser, etc.  
> > 
> > I see the latest ethtool now supports -s ethX advertise MODE on|off,
> > but it doesn't describe that in the parameter entry for "advertise"
> > and doesn't suggest what MODE should be, nor how to specify multiple
> > modes that one may wish to turn on/off. I'm guessing this is what you're
> > referring to.
> > 
> > The ports never get advertised, so I don't think they're relevant.
> > 
> > However, the lack of the pause bits means that one is forced to use
> > the hex number, and I don't deem that to be a user interface. That's
> > a programmers interface, or rather a nightmare, because even if you're
> > a programmer, you still end up looking at include/uapi/linux/ethtool.h
> > and doing the maths to work out the hex number to pass, and then you
> > mistype it with the wrong number of zeros, so you try again, and
> > eventually you get the advertisement you wanted.
> > 
> > So no, I don't accept Jakub's argument right now. Forcing people into
> > the nightmare of working out a hex number isn't something for users.
> 
> I did some digging, too, just now. Looks like the options are indeed
> not documented in the man page but ethtool uses the "forward compatible"
> scheme with strings coming from the kernel. So this:
> 
>   ethtool -s enp0s13f0u1u1 advertise Pause on Asym_Pause on
> 
> works just fine, with no changes in CLI.
> 
> We should probably document that it works in the ethtool help and man
> page. And possibly add some synthetic options like Receive-Only /
> Transmit-Only so that users don't have to be aware of the encoding
> details? Let me know if it's impractical, otherwise I think we'll
> agree that having ethtool that makes it obvious how to achieve the
> desired configuration beats best long form docs in the kernel..

1. Reject vs Accept autoneg=1

I audited set_pauseparam implementations across the tree. We are seeing two
valid but distinct models here, driven by different hardware realities:

- Strict Hardware Model (Jakub's point): Mostly Enterprise/Server NICs (bnx2x,
  bnxt, i40e, ice, cxgb4). These devices often rejects advertisement changes
  if Link AN is off. They enforce a strict dependency for correctness.

- User Intent Model (Russell's point): Mostly embedded, older drivers, and
  phylink users (e1000, igb, fec, mvneta, stmmac). These drivers handle the
  state in software, accepting the config as a "wish" for when Link AN becomes
  active.

Plan for v9: Since this is not a discussion about which model will win, but
rather documentation of the current reality, the text will support both
realities. I will document "User Intent" (Accepting configuration) as the
recommended behavior for flexible hardware to keep administrative state
separate from operational state. However, I will explicitly note that drivers
MAY enforce a strict dependency if their hardware/firmware model requires it,
so users are aware that behavior varies.

2. Deprecating pauseparam in favor of ethtool -s ... advertise

Jakub suggested deprecating set_pauseparam for autoneg in favor of ethtool -s
... advertise.

I agree with the technical merit: ethtool -s ... advertise is cleaner for
negotiation because it targets the Advertiser (PHY/Autoneg logic) directly. It
maps 1:1 to the hardware capability and avoids ambiguity.

However, ethtool -s cannot replace set_pauseparam entirely because it cannot
handle Forced Mode (Manual MAC override). We would still need a separate
interface for that.

Therefore, I prefer to keep ethtool -A (Pause UAPI) as the unified Link-wide
PAUSE Abstraction. It shields the user from knowing whether the underlying
hardware is using an Advertiser (Resolution Mode) or a Manual Override (Forced
Mode).

Proposed Text: Documentation/networking/flow_control.rst

Kernel Policy: User Intent & Resolution
=======================================

The ethtool pause API ('ethtool -A' or '--pause') configures the **User
Intent** for **Link-wide PAUSE** (IEEE 802.3 Annex 31B). The
**Operational State** (what actually happens on the wire) is derived
from this intent, the active link mode, and the link partner.

**Disambiguation: Pause Autoneg vs. Link Autoneg**
In this section, "autonegotiation" refers exclusively to the **Pause
Autonegotiation** parameter ('ethtool -A / --pause ... autoneg <on|off>').
This is distinct from, but interacts with, **Generic Link
Autonegotiation** ('ethtool -s / --change ... autoneg <on|off>').

The semantics of the Pause API depend on the 'autoneg' parameter:

1. **Resolution Mode** ('ethtool -A ... autoneg on')
   The user intends for the device to **respect the negotiated result**.

   - **Hardware Capability Check:** The driver must verify that the hardware
     is capable of Autonegotiation. If the hardware is fixed-link or
     lacks AN logic entirely, this request must be rejected (``-EOPNOTSUPP``).
   - **Advertisement:** The system updates the PHY advertisement
     (Symmetric/Asymmetric pause bits) to match the ``rx`` and ``tx`` parameters.
   - **Resolution:** The system configures the MAC to follow the standard
     IEEE 802.3 Resolution Truth Table based on the Local Advertisement
     vs. Link Partner Advertisement.
   - **Interaction with Link Autoneg:** If Generic Link Autonegotiation is
     currently disabled, resolution cannot occur. The Operational State
     effectively becomes **Disabled**.
     
     **Note on Implementation Variation:** Provided the hardware supports AN
     in principle, the system **SHOULD** accept this configuration as a valid
     stored intent for when Link Autonegotiation is re-enabled. However,
     legacy or strict-hardware drivers **MAY** reject this request if Link
     Autonegotiation is disabled, enforcing a strict dependency.

2. **Forced Mode** ('ethtool -A ... autoneg off')
   The user intends to **override negotiation** and force a specific
   state.

   - **Hardware Capability Check:** The driver must verify that the hardware
     supports forced manual configuration. If the hardware is tightly coupled
     to AN logic and cannot be forced, this request must be rejected.
   - **Advertisement:** The system should update the PHY advertisement
     to match the ``rx`` and ``tx`` parameters, ensuring the link partner
     is aware of the forced configuration.
   - **Resolution:** The system configures the MAC according to the
     specified ``rx`` and ``tx`` parameters, ignoring the link partner's
     advertisement.

**Global Constraint: Full-Duplex Only**
Link-wide PAUSE (Annex 31B) is strictly defined for **Full-Duplex** links.
If the link mode is **Half-Duplex** (whether forced or negotiated),
Link-wide PAUSE is operationally **disabled** regardless of the
parameters set above.

**Summary of "autoneg" Flag Meaning:**
- true  -> **Delegate decision:** "Use the IEEE 802.3 logic to decide."
- false -> **Force decision:** "Do exactly what I say (if the network device
  supports it)."

Proposed Text: include/linux/ethtool.h

/**
 * @get_pauseparam: Report the configured administrative policy for
 *   link-wide PAUSE (IEEE 802.3 Annex 31B). Drivers must fill struct
 *   ethtool_pauseparam such that:
 * @autoneg:
 *   This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only.
 *   true  -> the device follows the result of pause autonegotiation
 *     (Pause/Asym) when the link allows it;
 *   false -> the device uses a forced configuration.
 * @rx_pause/@tx_pause:
 *   Represent the desired policy (Administrative State).
 *   In autoneg mode they describe what is to be advertised;
 *   in forced mode they describe the MAC configuration to be forced.
 *
 * @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
 * @rx_pause/@tx_pause:
 *   Desired state. If @autoneg is true, these define the
 *   advertisement. If @autoneg is false, these define the
 *   forced MAC configuration (and preferably the advertisement too).
 * @autoneg:
 *   Select Resolution Mode (true) or Forced Mode (false).
 *
 * **Constraint Checking:**
 *   Drivers MUST validate that the hardware capabilities support the
 *   requested mode.
 * - If the hardware does not support Autonegotiation (e.g. fixed link),
 *   drivers MUST reject @autoneg=1 with -EOPNOTSUPP.
 * - If the hardware does not support Forced configuration (e.g. strict AN),
 *   drivers MUST reject @autoneg=0 with -EOPNOTSUPP.
 *
 * Provided the hardware capability exists, drivers SHOULD accept a setting
 * of @autoneg=1 even if generic link autonegotiation ('ethtool -s') is
 * currently disabled. This allows the user to pre-configure the desired
 * policy for future link modes. Users should be aware that some drivers
 * may strictly enforce the dependency and reject this configuration.
 *
 * New drivers are strongly encouraged to use phylink_ethtool_get_pauseparam()
 * and phylink_ethtool_set_pauseparam() which implement this logic
 * correctly.
 */

Best Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-12-01  9:49 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 14:03 [PATCH net-next v8 1/1] Documentation: net: add flow control guide and document ethtool API Oleksij Rempel
2025-11-26  2:19 ` Jakub Kicinski
2025-11-26  8:36   ` Oleksij Rempel
2025-11-26 22:42     ` Jakub Kicinski
2025-11-27  9:20       ` Oleksij Rempel
2025-11-27 15:07         ` Andrew Lunn
2025-11-27 15:31           ` Maxime Chevallier
2025-11-27 15:48             ` Andrew Lunn
2025-11-27 16:18               ` Russell King (Oracle)
2025-11-27 16:14           ` Russell King (Oracle)
2025-11-28  1:21             ` Jakub Kicinski
2025-11-27 16:10         ` Russell King (Oracle)
2025-11-28  1:27     ` Russell King (Oracle)
2025-11-28  1:47       ` Russell King (Oracle)
2025-11-28  8:55         ` Oleksij Rempel
2025-11-28  9:35           ` Russell King (Oracle)
2025-11-28 18:32           ` Jakub Kicinski
2025-11-28 20:16             ` Andrew Lunn
2025-11-28 20:38               ` Russell King (Oracle)
2025-11-28 22:17                 ` Jakub Kicinski
2025-12-01  9:49                   ` Oleksij Rempel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).