linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] net: phy: Add support for rate adaptation
@ 2022-07-19 23:49 Sean Anderson
  2022-07-19 23:49 ` [PATCH v2 05/11] " Sean Anderson
  2022-07-20 12:03 ` [PATCH v2 00/11] " Russell King (Oracle)
  0 siblings, 2 replies; 4+ messages in thread
From: Sean Anderson @ 2022-07-19 23:49 UTC (permalink / raw)
  To: netdev, Andrew Lunn, Heiner Kallweit, Russell King
  Cc: Alexandru Marginean, Paolo Abeni, David S . Miller, linux-kernel,
	Vladimir Oltean, Eric Dumazet, Jakub Kicinski, Sean Anderson,
	Bhadram Varka, Jonathan Corbet, Madalin Bucur, linux-doc

This adds support for phy rate adaptation: when a phy adapts between
differing phy interface and link speeds. It was originally submitted as
part of [1], which is considered "v1" of this series.

We need support for rate adaptation for two reasons. First, the phy
consumer needs to know if the phy will perform rate adaptation in order to
program the correct advertising. An unaware consumer will only program
support for link modes at the phy interface mode's native speed. This will
cause autonegotiation to fail if the link partner only advertises support
for lower speed link modes.

Second, to reduce packet loss it may be desirable to throttle packet
throughput. In past discussions [2-4], this behavior has been
controversial. It is the opinion of several developers that it is the
responsibility of the system integrator or end user to set the link
settings appropriately for rate adaptation. In particular, it was argued
that it is difficult to determine whether a particular phy has rate
adaptation enabled, and it is simpler to keep such determinations out of
the kernel. Another criticism is that packet loss may happen anyway, such
as if a faster link is used with a switch or repeater that does not support
pause frames.

I believe that our current approach is limiting, especially when
considering that rate adaptation (in two forms) has made it into IEEE
standards. In general, When we have appropriate information we should set
sensible defaults. To consider use a contrasting example, we enable pause
frames by default for switches which autonegotiate for them. When it's the
phy itself generating these frames, we don't even have to autonegotiate to
know that we should enable pause frames.

Our current approach also encourages workarounds, such as commit
73a21fa817f0 ("dpaa_eth: support all modes with rate adapting PHYs").
These workarounds are fine for phylib drivers, but phylink drivers cannot
use this approach (since there is no direct access to the phy). Note that
even when we determine (e.g.) the pause settings based on whether rate
adaptation is enabled, they can still be overridden by userspace (using
ethtool). It might be prudent to allow disabling of rate adaptation
generally in ethtool as well.

[1] https://lore.kernel.org/netdev/20220715215954.1449214-1-sean.anderson@seco.com/T/#t
[2] https://lore.kernel.org/netdev/1579701573-6609-1-git-send-email-madalin.bucur@oss.nxp.com/
[3] https://lore.kernel.org/netdev/1580137671-22081-1-git-send-email-madalin.bucur@oss.nxp.com/
[4] https://lore.kernel.org/netdev/20200116181933.32765-1-olteanv@gmail.com/

Changes in v2:
- Use int/defines instead of enum to allow for use in ioctls/netlink
- Add locking to phy_get_rate_adaptation
- Add (read-only) ethtool support for rate adaptation
- Move part of commit message to cover letter, as it gives a good
  overview of the whole series, and allows this patch to focus more on
  the specifics.
- Support keeping track of link duplex
- Rewrite commit message for clarity
- Expand documentation of (link_)?(speed|duplex)
- Fix handling of speed/duplex gotten from MAC drivers
- Use the phy's rate adaptation setting to determine whether to use its
  link speed/duplex or the MAC's speed/duplex with MLO_AN_INBAND.
- Always use the rate adaptation setting to determine the interface
  speed/duplex (instead of sometimes using the interface mode).
- Determine the interface speed and max mac speed directly instead of
  guessing based on the caps.
- Add comments clarifying the register defines
- Reorder variables in aqr107_read_rate

Sean Anderson (11):
  net: dpaa: Fix <1G ethernet on LS1046ARDB
  net: phy: Add 1000BASE-KX interface mode
  net: phylink: Export phylink_caps_to_linkmodes
  net: phylink: Generate caps and convert to linkmodes separately
  net: phy: Add support for rate adaptation
  net: phylink: Support differing link/interface speed/duplex
  net: phylink: Adjust link settings based on rate adaptation
  net: phylink: Adjust advertisement based on rate adaptation
  [RFC] net: phylink: Add support for CRS-based rate adaptation
  net: phy: aquantia: Add some additional phy interfaces
  net: phy: aquantia: Add support for rate adaptation

 Documentation/networking/ethtool-netlink.rst  |   2 +
 .../net/ethernet/freescale/dpaa/dpaa_eth.c    |   6 +-
 drivers/net/phy/aquantia_main.c               |  68 +++-
 drivers/net/phy/phy-core.c                    |  15 +
 drivers/net/phy/phy.c                         |  28 ++
 drivers/net/phy/phylink.c                     | 340 +++++++++++++++---
 include/linux/phy.h                           |  26 +-
 include/linux/phylink.h                       |  23 +-
 include/uapi/linux/ethtool.h                  |  18 +-
 include/uapi/linux/ethtool_netlink.h          |   1 +
 net/ethtool/ioctl.c                           |   1 +
 net/ethtool/linkmodes.c                       |   5 +
 12 files changed, 470 insertions(+), 63 deletions(-)

-- 
2.35.1.1320.gc452695387.dirty


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 05/11] net: phy: Add support for rate adaptation
  2022-07-19 23:49 [PATCH v2 00/11] net: phy: Add support for rate adaptation Sean Anderson
@ 2022-07-19 23:49 ` Sean Anderson
  2022-07-20 12:03 ` [PATCH v2 00/11] " Russell King (Oracle)
  1 sibling, 0 replies; 4+ messages in thread
From: Sean Anderson @ 2022-07-19 23:49 UTC (permalink / raw)
  To: netdev, Andrew Lunn, Heiner Kallweit, Russell King
  Cc: Alexandru Marginean, Paolo Abeni, David S . Miller, linux-kernel,
	Vladimir Oltean, Eric Dumazet, Jakub Kicinski, Sean Anderson,
	Jonathan Corbet, linux-doc

This adds support for rate adaptation to the phy subsystem. The general
idea is that the phy interface runs at one speed, and the MAC throttles
the rate at which it sends packets to the link speed. There's a good
overview of several techniques for achieving this at [1]. This patch
adds support for three: pause-frame based (such as in Aquantia phys),
CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such
as in 10GBASE-W).

This patch makes a few assumptions and a few non assumptions about the
types of rate adaptation available. First, it assumes that different phys
may use different forms of rate adaptation. Second, it assumes that phys
can use rate adaptation for any of their supported link speeds (e.g. if a
phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
Third, it does not assume that all interface modes will use the same form
of rate adaptation. Fourth, it does not assume that all phy devices will
support rate adaptation (even some do). Relaxing or strengthening these
(non-)assumptions could result in a different API. For example, if all
interface modes were assumed to use the same form of rate adaptation, then
a bitmask of interface modes supportting rate adaptation would suffice.

For some better visibility into the process, the current rate adaptation
mode is exposed as part of the ethtool ksettings. For the moment, only
read access is supported. I'm not sure what userspace might want to
configure yet (disable it altogether, disable just one mode, specify the
mode to use, etc.). For the moment, since only pause-based rate
adaptation support is added in the next few commits, rate adaptation can
be disabled altogether by adjusting the advertisement.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
---
Should the unimplemented adaptation modes be kept in?

Changes in v2:
- Use int/defines instead of enum to allow for use in ioctls/netlink
- Add locking to phy_get_rate_adaptation
- Add (read-only) ethtool support for rate adaptation
- Move part of commit message to cover letter, as it gives a good
  overview of the whole series, and allows this patch to focus more on
  the specifics.

 Documentation/networking/ethtool-netlink.rst |  2 ++
 drivers/net/phy/phy-core.c                   | 15 +++++++++++
 drivers/net/phy/phy.c                        | 28 ++++++++++++++++++++
 include/linux/phy.h                          | 22 ++++++++++++++-
 include/uapi/linux/ethtool.h                 | 18 +++++++++++--
 include/uapi/linux/ethtool_netlink.h         |  1 +
 net/ethtool/ioctl.c                          |  1 +
 net/ethtool/linkmodes.c                      |  5 ++++
 8 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index dbca3e9ec782..65ed29e78499 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -426,6 +426,7 @@ Kernel response contents:
   ``ETHTOOL_A_LINKMODES_DUPLEX``              u8      duplex mode
   ``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG``    u8      Master/slave port mode
   ``ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE``  u8      Master/slave port state
+  ``ETHTOOL_A_LINKMODES_RATE_ADAPTATION``     u8      PHY rate adaptation
   ==========================================  ======  ==========================
 
 For ``ETHTOOL_A_LINKMODES_OURS``, value represents advertised modes and mask
@@ -449,6 +450,7 @@ Request contents:
   ``ETHTOOL_A_LINKMODES_SPEED``               u32     link speed (Mb/s)
   ``ETHTOOL_A_LINKMODES_DUPLEX``              u8      duplex mode
   ``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG``    u8      Master/slave port mode
+  ``ETHTOOL_A_LINKMODES_RATE_ADAPTATION``     u8      PHY rate adaptation
   ``ETHTOOL_A_LINKMODES_LANES``               u32     lanes
   ==========================================  ======  ==========================
 
diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 1f2531a1a876..dc70a9088544 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -74,6 +74,21 @@ const char *phy_duplex_to_str(unsigned int duplex)
 }
 EXPORT_SYMBOL_GPL(phy_duplex_to_str);
 
+const char *phy_rate_adaptation_to_str(int rate_adaptation)
+{
+	switch (rate_adaptation) {
+	case RATE_ADAPT_NONE:
+		return "none";
+	case RATE_ADAPT_PAUSE:
+		return "pause";
+	case RATE_ADAPT_CRS:
+		return "crs";
+	case RATE_ADAPT_OPEN_LOOP:
+		return "open-loop";
+	}
+	return "Unsupported (update phy-core.c)";
+}
+
 /* A mapping of all SUPPORTED settings to speed/duplex.  This table
  * must be grouped by speed and sorted in descending match priority
  * - iow, descending speed.
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 8d3ee3a6495b..77cbf07852e6 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -114,6 +114,33 @@ void phy_print_status(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_print_status);
 
+/**
+ * phy_get_rate_adaptation - determine if rate adaptation is supported
+ * @phydev: The phy device to return rate adaptation for
+ * @iface: The interface mode to use
+ *
+ * This determines the type of rate adaptation (if any) that @phy supports
+ * using @iface. @iface may be %PHY_INTERFACE_MODE_NA to determine if any
+ * interface supports rate adaptation.
+ *
+ * Return: The type of rate adaptation @phy supports for @iface, or
+ *         %RATE_ADAPT_NONE.
+ */
+int phy_get_rate_adaptation(struct phy_device *phydev,
+			    phy_interface_t iface)
+{
+	int ret = RATE_ADAPT_NONE;
+
+	if (phydev->drv->get_rate_adaptation) {
+		mutex_lock(&phydev->lock);
+		ret = phydev->drv->get_rate_adaptation(phydev, iface);
+		mutex_unlock(&phydev->lock);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(phy_get_rate_adaptation);
+
 /**
  * phy_config_interrupt - configure the PHY device for the requested interrupts
  * @phydev: the phy_device struct
@@ -256,6 +283,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
 	cmd->base.duplex = phydev->duplex;
 	cmd->base.master_slave_cfg = phydev->master_slave_get;
 	cmd->base.master_slave_state = phydev->master_slave_state;
+	cmd->base.rate_adaptation = phydev->rate_adaptation;
 	if (phydev->interface == PHY_INTERFACE_MODE_MOCA)
 		cmd->base.port = PORT_BNC;
 	else
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 81ce76c3e799..4ba8126b64f3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -276,7 +276,6 @@ static inline const char *phy_modes(phy_interface_t interface)
 	}
 }
 
-
 #define PHY_INIT_TIMEOUT	100000
 #define PHY_FORCE_TIMEOUT	10
 
@@ -570,6 +569,7 @@ struct macsec_ops;
  * @lp_advertising: Current link partner advertised linkmodes
  * @eee_broken_modes: Energy efficient ethernet modes which should be prohibited
  * @autoneg: Flag autoneg being used
+ * @rate_adaptation: Current rate adaptation mode
  * @link: Current link state
  * @autoneg_complete: Flag auto negotiation of the link has completed
  * @mdix: Current crossover
@@ -637,6 +637,8 @@ struct phy_device {
 	unsigned irq_suspended:1;
 	unsigned irq_rerun:1;
 
+	int rate_adaptation;
+
 	enum phy_state state;
 
 	u32 dev_flags;
@@ -801,6 +803,21 @@ struct phy_driver {
 	 */
 	int (*get_features)(struct phy_device *phydev);
 
+	/**
+	 * @get_rate_adaptation: Get the supported type of rate adaptation for a
+	 * particular phy interface. This is used by phy consumers to determine
+	 * whether to advertise lower-speed modes for that interface. It is
+	 * assumed that if a rate adaptation mode is supported on an interface,
+	 * then that interface's rate can be adapted to all slower link speeds
+	 * supported by the phy. If iface is %PHY_INTERFACE_MODE_NA, and the phy
+	 * supports any kind of rate adaptation for any interface, then it must
+	 * return that rate adaptation mode (preferring %RATE_ADAPT_PAUSE, to
+	 * %RATE_ADAPT_CRS). If the interface is not supported, this should
+	 * return %RATE_ADAPT_NONE.
+	 */
+	int (*get_rate_adaptation)(struct phy_device *phydev,
+				   phy_interface_t iface);
+
 	/* PHY Power Management */
 	/** @suspend: Suspend the hardware, saving state if needed */
 	int (*suspend)(struct phy_device *phydev);
@@ -967,6 +984,7 @@ struct phy_fixup {
 
 const char *phy_speed_to_str(int speed);
 const char *phy_duplex_to_str(unsigned int duplex);
+const char *phy_rate_adaptation_to_str(int rate_adaptation);
 
 /* A structure for mapping a particular speed and duplex
  * combination to a particular SUPPORTED and ADVERTISED value
@@ -1681,6 +1699,8 @@ int phy_disable_interrupts(struct phy_device *phydev);
 void phy_request_interrupt(struct phy_device *phydev);
 void phy_free_interrupt(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
+int phy_get_rate_adaptation(struct phy_device *phydev,
+			    phy_interface_t iface);
 void phy_set_max_speed(struct phy_device *phydev, u32 max_speed);
 void phy_remove_link_mode(struct phy_device *phydev, u32 link_mode);
 void phy_advertise_supported(struct phy_device *phydev);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index e0f0ee9bc89e..3978f9c3fb83 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1840,6 +1840,20 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #define MASTER_SLAVE_STATE_SLAVE		3
 #define MASTER_SLAVE_STATE_ERR			4
 
+/* These are used to throttle the rate of data on the phy interface when the
+ * native speed of the interface is higher than the link speed. These should
+ * not be used for phy interfaces which natively support multiple speeds (e.g.
+ * MII or SGMII).
+ */
+/* No rate adaptation performed. */
+#define RATE_ADAPT_NONE		0
+/* The phy sends pause frames to throttle the MAC. */
+#define RATE_ADAPT_PAUSE	1
+/* The phy asserts CRS to prevent the MAC from transmitting. */
+#define RATE_ADAPT_CRS		2
+/* The MAC is programmed with a sufficiently-large IPG. */
+#define RATE_ADAPT_OPEN_LOOP	3
+
 /* Which connector port. */
 #define PORT_TP			0x00
 #define PORT_AUI		0x01
@@ -2033,8 +2047,8 @@ enum ethtool_reset_flags {
  *	reported consistently by PHYLIB.  Read-only.
  * @master_slave_cfg: Master/slave port mode.
  * @master_slave_state: Master/slave port state.
+ * @rate_adaptation: Rate adaptation performed by the PHY
  * @reserved: Reserved for future use; see the note on reserved space.
- * @reserved1: Reserved for future use; see the note on reserved space.
  * @link_mode_masks: Variable length bitmaps.
  *
  * If autonegotiation is disabled, the speed and @duplex represent the
@@ -2085,7 +2099,7 @@ struct ethtool_link_settings {
 	__u8	transceiver;
 	__u8	master_slave_cfg;
 	__u8	master_slave_state;
-	__u8	reserved1[1];
+	__u8	rate_adaptation;
 	__u32	reserved[7];
 	__u32	link_mode_masks[0];
 	/* layout of link_mode_masks fields:
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d2fb4f7be61b..3a5d81769ff4 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -242,6 +242,7 @@ enum {
 	ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG,	/* u8 */
 	ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE,	/* u8 */
 	ETHTOOL_A_LINKMODES_LANES,		/* u32 */
+	ETHTOOL_A_LINKMODES_RATE_ADAPTATION,	/* u8 */
 
 	/* add new constants above here */
 	__ETHTOOL_A_LINKMODES_CNT,
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 6a7308de192d..ef0ad300393a 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -571,6 +571,7 @@ static int ethtool_get_link_ksettings(struct net_device *dev,
 		= __ETHTOOL_LINK_MODE_MASK_NU32;
 	link_ksettings.base.master_slave_cfg = MASTER_SLAVE_CFG_UNSUPPORTED;
 	link_ksettings.base.master_slave_state = MASTER_SLAVE_STATE_UNSUPPORTED;
+	link_ksettings.base.rate_adaptation = RATE_ADAPT_NONE;
 
 	return store_link_ksettings_for_user(useraddr, &link_ksettings);
 }
diff --git a/net/ethtool/linkmodes.c b/net/ethtool/linkmodes.c
index 99b29b4fe947..7905bd985c7f 100644
--- a/net/ethtool/linkmodes.c
+++ b/net/ethtool/linkmodes.c
@@ -70,6 +70,7 @@ static int linkmodes_reply_size(const struct ethnl_req_info *req_base,
 		+ nla_total_size(sizeof(u32)) /* LINKMODES_SPEED */
 		+ nla_total_size(sizeof(u32)) /* LINKMODES_LANES */
 		+ nla_total_size(sizeof(u8)) /* LINKMODES_DUPLEX */
+		+ nla_total_size(sizeof(u8)) /* LINKMODES_RATE_ADAPTATION */
 		+ 0;
 	ret = ethnl_bitset_size(ksettings->link_modes.advertising,
 				ksettings->link_modes.supported,
@@ -143,6 +144,10 @@ static int linkmodes_fill_reply(struct sk_buff *skb,
 		       lsettings->master_slave_state))
 		return -EMSGSIZE;
 
+	if (nla_put_u8(skb, ETHTOOL_A_LINKMODES_RATE_ADAPTATION,
+		       lsettings->rate_adaptation))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
-- 
2.35.1.1320.gc452695387.dirty


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 00/11] net: phy: Add support for rate adaptation
  2022-07-19 23:49 [PATCH v2 00/11] net: phy: Add support for rate adaptation Sean Anderson
  2022-07-19 23:49 ` [PATCH v2 05/11] " Sean Anderson
@ 2022-07-20 12:03 ` Russell King (Oracle)
  2022-07-21 17:40   ` Sean Anderson
  1 sibling, 1 reply; 4+ messages in thread
From: Russell King (Oracle) @ 2022-07-20 12:03 UTC (permalink / raw)
  To: Sean Anderson
  Cc: netdev, Andrew Lunn, Heiner Kallweit, Alexandru Marginean,
	Paolo Abeni, David S . Miller, linux-kernel, Vladimir Oltean,
	Eric Dumazet, Jakub Kicinski, Bhadram Varka, Jonathan Corbet,
	Madalin Bucur, linux-doc

On Tue, Jul 19, 2022 at 07:49:50PM -0400, Sean Anderson wrote:
> Second, to reduce packet loss it may be desirable to throttle packet
> throughput. In past discussions [2-4], this behavior has been
> controversial.

It isn't controversial at all. It's something we need to support, but
the point I've been making is that if we're adding rate adaption, then
we need to do a better job when designing the infrastructure to cater
for all currently known forms of rate adaption amongst the knowledge
pool that we have, not just one. That's why I brought up the IPG-based
method used by 88x3310.

Phylink development is extremely difficult, and takes months or years
for changes to get into mainline when updates to drivers are required -
this is why I have a massive queue of changes all the time.

> It is the opinion of several developers that it is the
> responsibility of the system integrator or end user to set the link
> settings appropriately for rate adaptation. In particular, it was argued
> that it is difficult to determine whether a particular phy has rate
> adaptation enabled, and it is simpler to keep such determinations out of
> the kernel.

I don't think I've ever said that...

> Another criticism is that packet loss may happen anyway, such
> as if a faster link is used with a switch or repeater that does not support
> pause frames.

That isn't what I've said. Packet loss may happen if (a) pause frames
can not be sent by a PHY in rate adaption mode and (b) if the MAC can't
pace its transmission for the media/line speed. This is a fundamental
fact where a PHY will only have so much buffering capability, that if
the MAC sends packets at a faster rate than the PHY can get them out, it
runs out of buffer space. That isn't a criticism, it's a statement of
fact.

> I believe that our current approach is limiting, especially when
> considering that rate adaptation (in two forms) has made it into IEEE
> standards. In general, When we have appropriate information we should set
> sensible defaults. To consider use a contrasting example, we enable pause
> frames by default for switches which autonegotiate for them. When it's the
> phy itself generating these frames, we don't even have to autonegotiate to
> know that we should enable pause frames.

I'm not sure I understand what you're saying, because it doesn't match
what I've seen.

"we enable pause frames by default for swithes which autonegotiate for
them" - what are you talking about there? The "user" ports on the
switch, or the DSA/CPU ports? It has been argued that pause frames
should not be enabled for the CPU port, particularly when the CPU port
runs at a slower speed than the switch - which happens e.g. on the VF610
platforms.

Most CPU ports to switches I'm aware of are specified either using a
fixed link in firmware or default to a fixed link both without pause
frames. Maybe this is just a quirk of the mv88e6xxx setup.

"when it's the phy itself generating these frames, we don't even have to
autonegotiate to know that we should enable pause frames." I'm not sure
that's got any relevance. When a PHY is in rate adapting mode, there are
two separate things that are going on. There's the media side link
negotiation and parameters, and then there's the requirements of the
host-side link. The parameters of the host-side link do not need to be
negotiated with the link partner, but they do potentially affect what
link modes we can negotiate with our link partner (for example, if the
PHY can't handle HD on the media side with the MAC operating FD). In any
case, if the PHY requires the MAC to receive pause frames for its rate
adaption to work, then this doesn't affect the media side
autonegotiation at all. Hence, I don't understand this comment.

> Note that
> even when we determine (e.g.) the pause settings based on whether rate
> adaptation is enabled, they can still be overridden by userspace (using
> ethtool). It might be prudent to allow disabling of rate adaptation
> generally in ethtool as well.

This is no longer true as this patch set overrides whatever receive
pause state has been negotiated or requested by userspace so that rate
adaption can still work.

The future work here is to work out whether we should disable rate
adaption if userspace requests receive pause frames to be disabled, or
whether switching to another form of controlling rate adaption would be
appropriate and/or possible.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 00/11] net: phy: Add support for rate adaptation
  2022-07-20 12:03 ` [PATCH v2 00/11] " Russell King (Oracle)
@ 2022-07-21 17:40   ` Sean Anderson
  0 siblings, 0 replies; 4+ messages in thread
From: Sean Anderson @ 2022-07-21 17:40 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: netdev, Andrew Lunn, Heiner Kallweit, Alexandru Marginean,
	Paolo Abeni, David S . Miller, linux-kernel, Vladimir Oltean,
	Eric Dumazet, Jakub Kicinski, Bhadram Varka, Jonathan Corbet,
	Madalin Bucur, linux-doc



On 7/20/22 8:03 AM, Russell King (Oracle) wrote:
> On Tue, Jul 19, 2022 at 07:49:50PM -0400, Sean Anderson wrote:
>> Second, to reduce packet loss it may be desirable to throttle packet
>> throughput. In past discussions [2-4], this behavior has been
>> controversial.
> 
> It isn't controversial at all. It's something we need to support, but
> the point I've been making is that if we're adding rate adaption, then
> we need to do a better job when designing the infrastructure to cater
> for all currently known forms of rate adaption amongst the knowledge
> pool that we have, not just one. That's why I brought up the IPG-based
> method used by 88x3310.
> 
> Phylink development is extremely difficult, and takes months or years
> for changes to get into mainline when updates to drivers are required -
> this is why I have a massive queue of changes all the time.
> 
>> It is the opinion of several developers that it is the
>> responsibility of the system integrator or end user to set the link
>> settings appropriately for rate adaptation. In particular, it was argued
>> that it is difficult to determine whether a particular phy has rate
>> adaptation enabled, and it is simpler to keep such determinations out of
>> the kernel.
> 
> I don't think I've ever said that...

You haven't. This mostly stems from

https://lore.kernel.org/netdev/DB8PR04MB6985139D4ABED85B701445A9EC050@DB8PR04MB6985.eurprd04.prod.outlook.com/

where there was some discussion about whose responsibility it was to determine
whether rate adaptation was supported. The implication being that we should
delay support for rate adaptation until we could reliably determine whether
it was supported. This of course is not quite implemented yet. While we can
determine whether rate adaptation is actually in-use, I don't know if we can
determine whether it is available before trying to bring the link up.

>> Another criticism is that packet loss may happen anyway, such
>> as if a faster link is used with a switch or repeater that does not support
>> pause frames.
> 
> That isn't what I've said. Packet loss may happen if (a) pause frames
> can not be sent by a PHY in rate adaption mode and (b) if the MAC can't
> pace its transmission for the media/line speed. This is a fundamental
> fact where a PHY will only have so much buffering capability, that if
> the MAC sends packets at a faster rate than the PHY can get them out, it
> runs out of buffer space. That isn't a criticism, it's a statement of
> fact.

You're right. I mainly wanted to bring up what you just noted: that we may
have packet loss anyway, and that higher-layer protocols already deal with
packet loss. So a MAC unaware of the rate adaptation is not necessarily the
worst thing.

>> I believe that our current approach is limiting, especially when
>> considering that rate adaptation (in two forms) has made it into IEEE
>> standards. In general, When we have appropriate information we should set
>> sensible defaults. To consider use a contrasting example, we enable pause
>> frames by default for switches which autonegotiate for them. When it's the
>> phy itself generating these frames, we don't even have to autonegotiate to
>> know that we should enable pause frames.
> 
> I'm not sure I understand what you're saying, because it doesn't match
> what I've seen.
> 
Sorry, I was unclear here. I meant link partners, not local (DSA) switches.

> "we enable pause frames by default for swithes which autonegotiate for
> them" - what are you talking about there? The "user" ports on the
> switch, or the DSA/CPU ports? It has been argued that pause frames
> should not be enabled for the CPU port, particularly when the CPU port
> runs at a slower speed than the switch - which happens e.g. on the VF610
> platforms.
> 
> Most CPU ports to switches I'm aware of are specified either using a
> fixed link in firmware or default to a fixed link both without pause
> frames. Maybe this is just a quirk of the mv88e6xxx setup.
> 
> "when it's the phy itself generating these frames, we don't even have to
> autonegotiate to know that we should enable pause frames." I'm not sure
> that's got any relevance. When a PHY is in rate adapting mode, there are
> two separate things that are going on. There's the media side link
> negotiation and parameters, and then there's the requirements of the
> host-side link. The parameters of the host-side link do not need to be
> negotiated with the link partner, but they do potentially affect what
> link modes we can negotiate with our link partner (for example, if the
> PHY can't handle HD on the media side with the MAC operating FD). In any
> case, if the PHY requires the MAC to receive pause frames for its rate
> adaption to work, then this doesn't affect the media side
> autonegotiation at all. Hence, I don't understand this comment.
> 
>> Note that
>> even when we determine (e.g.) the pause settings based on whether rate
>> adaptation is enabled, they can still be overridden by userspace (using
>> ethtool). It might be prudent to allow disabling of rate adaptation
>> generally in ethtool as well.
> 
> This is no longer true as this patch set overrides whatever receive
> pause state has been negotiated or requested by userspace so that rate
> adaption can still work.

Right, I forgot to edit this.

> The future work here is to work out whether we should disable rate
> adaption if userspace requests receive pause frames to be disabled, or
> whether switching to another form of controlling rate adaption would be
> appropriate and/or possible.
> 

I'm not sure what the best course here is either.

--Sean

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-21 17:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-19 23:49 [PATCH v2 00/11] net: phy: Add support for rate adaptation Sean Anderson
2022-07-19 23:49 ` [PATCH v2 05/11] " Sean Anderson
2022-07-20 12:03 ` [PATCH v2 00/11] " Russell King (Oracle)
2022-07-21 17:40   ` Sean Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).