* [PATCH net 0/2] aqc111: Thermal throttling feature
@ 2018-12-12 13:50 Igor Russkikh
2018-12-12 13:50 ` [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw)
To: linux-usb@vger.kernel.org, davem@davemloft.net,
netdev@vger.kernel.org
Cc: Dmitry Bezrukov, Igor Russkikh
This patches introduce the thermal throttling feature to prevent possible
heat damage to the hardware.
Dmitry Bezrukov (2):
net: usb: aqc111: Add read_mdio operation
net: usb: aqc111: Support for thermal throttling feature
drivers/net/usb/aqc111.c | 83 ++++++++++++++++++++++++++++++++++++++++
drivers/net/usb/aqc111.h | 19 +++++++++
2 files changed, 102 insertions(+)
--
2.17.1
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation 2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh @ 2018-12-12 13:50 ` Igor Russkikh 2018-12-12 16:11 ` Andrew Lunn 2018-12-12 13:50 ` [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature Igor Russkikh ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw) To: linux-usb@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org Cc: Dmitry Bezrukov, Igor Russkikh From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Add necessary defines to read phy registers and temparature Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> --- drivers/net/usb/aqc111.c | 5 +++++ drivers/net/usb/aqc111.h | 11 +++++++++++ 2 files changed, 16 insertions(+) diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c index 57f1c94fca0b..8867f6a3eaa7 100644 --- a/drivers/net/usb/aqc111.c +++ b/drivers/net/usb/aqc111.c @@ -193,6 +193,11 @@ static int aqc111_write16_cmd_async(struct usbnet *dev, u8 cmd, u16 value, sizeof(tmp), &tmp); } +static int aqc111_mdio_read(struct usbnet *dev, u16 value, u16 index, u16 *data) +{ + return aqc111_read16_cmd(dev, AQ_PHY_CMD, value, index, data); +} + static void aqc111_get_drvinfo(struct net_device *net, struct ethtool_drvinfo *info) { diff --git a/drivers/net/usb/aqc111.h b/drivers/net/usb/aqc111.h index 4d68b3a6067c..359663635b49 100644 --- a/drivers/net/usb/aqc111.h +++ b/drivers/net/usb/aqc111.h @@ -18,9 +18,20 @@ #define AQ_ACCESS_MAC 0x01 #define AQ_FLASH_PARAMETERS 0x20 #define AQ_PHY_POWER 0x31 +#define AQ_PHY_CMD 0x32 #define AQ_WOL_CFG 0x60 #define AQ_PHY_OPS 0x61 +#define AQC111_PHY_ID 0x00 +#define AQ_PHY_ADDR(mmd) ((AQC111_PHY_ID << 8) | (mmd)) + +#define AQ_PHY_GLOBAL_MMD 0x1E +#define AQ_PHY_GLOBAL_ADDR AQ_PHY_ADDR(AQ_PHY_GLOBAL_MMD) + +#define AQ_GLB_THERMAL_STATUS1 0xC820 +#define AQ_GLB_THERMAL_STATUS2 0xC821 + #define AQ_THERM_ST_READY 0x0001 + #define AQ_USB_PHY_SET_TIMEOUT 10000 #define AQ_USB_SET_TIMEOUT 4000 -- 2.17.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation 2018-12-12 13:50 ` [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh @ 2018-12-12 16:11 ` Andrew Lunn 2018-12-12 19:38 ` Igor Russkikh 0 siblings, 1 reply; 16+ messages in thread From: Andrew Lunn @ 2018-12-12 16:11 UTC (permalink / raw) To: Igor Russkikh Cc: linux-usb@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org, Dmitry Bezrukov On Wed, Dec 12, 2018 at 01:50:08PM +0000, Igor Russkikh wrote: > From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> > > Add necessary defines to read phy registers and temparature Hi Igor [Puts tongue in cheek] I thought the firmware was supposed to manage the PHY. So maybe the better fix is to add code to allow firmware upgrade? And issue new firmware to linux-firmware? Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation 2018-12-12 16:11 ` Andrew Lunn @ 2018-12-12 19:38 ` Igor Russkikh 0 siblings, 0 replies; 16+ messages in thread From: Igor Russkikh @ 2018-12-12 19:38 UTC (permalink / raw) To: Andrew Lunn Cc: linux-usb@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org, Dmitry Bezrukov >> >> Add necessary defines to read phy registers and temparature > > Hi Igor > > [Puts tongue in cheek] > > I thought the firmware was supposed to manage the PHY. FW team due to various reasons do not want to have thermal throttling in their control, Thus at this moment we are trying to release the product which will not burn out the table under it ;-) > So maybe the better fix is to add code to allow firmware upgrade? And > issue new firmware to linux-firmware? I would say thats our long term future. On your request for linux-firmware, I've pushed the question to Phy team, will inform you on any news. Regards, Igor ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature 2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh 2018-12-12 13:50 ` [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh @ 2018-12-12 13:50 ` Igor Russkikh 2018-12-12 16:08 ` Andrew Lunn 2018-12-12 13:54 ` [PATCH net 0/2] aqc111: Thermal " Igor Russkikh 2018-12-13 0:18 ` David Miller 3 siblings, 1 reply; 16+ messages in thread From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw) To: linux-usb@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org Cc: Dmitry Bezrukov, Igor Russkikh From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Lab testing shows that chip may get overheated when network link is connected on high speed (2.5G/5G). Default hardware design uses only passive heatsink without a cooler, and that makes things worse. To prevent possible chip damage here we add throttling mechanism. There is a worker that monitors the PHY's temperature via reading PHY registers. When PHY's temperature reaches the critical threshold (it is 106 degrees of Celsius) it changes the link speed to the lowest regardless user/default link speed settings. It should reduce the PHY's temperature to normal numbers. When the PHY's temparature is back to normal (low threshold, it is 85 degrees) it restores user/default link speed settings. Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> --- drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++ drivers/net/usb/aqc111.h | 8 +++++ 2 files changed, 86 insertions(+) diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c index 8867f6a3eaa7..fa6b9dfc2a6f 100644 --- a/drivers/net/usb/aqc111.c +++ b/drivers/net/usb/aqc111.c @@ -17,6 +17,7 @@ #include <linux/usb/cdc.h> #include <linux/usb/usbnet.h> #include <linux/linkmode.h> +#include <linux/workqueue.h> #include "aqc111.h" @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed) aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) & AQ_DSH_RETRIES_MASK; + if (aqc111_data->thermal_throttling) + speed = SPEED_100; + if (autoneg == AUTONEG_ENABLE) { switch (speed) { case SPEED_5000: @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf) /* store aqc111_data pointer in device data field */ dev->driver_priv = aqc111_data; + aqc111_data->dev = dev; + /* Init the MAC address */ ret = aqc111_read_perm_mac(dev); if (ret) @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev) return 0; } +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature) +{ + u16 reg16 = 0; + + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR, + ®16) < 0) + goto err; + + if (!(reg16 & AQ_THERM_ST_READY)) + goto err; + + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR, + ®16) < 0) + goto err; + + /*convert from 1/256 to 1/100 degrees of Celsius*/ + *temperature = (u32)reg16 * 100 / 256; + return 0; + +err: + *temperature = 0; + return -1; +} + +void aqc111_thermal_work_cb(struct work_struct *w) +{ + struct delayed_work *dw = to_delayed_work(w); + struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data, + therm_work); + unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS); + struct usbnet *dev = aqc111_data->dev; + u32 temperature = 0; + u8 reset_speed = 0; + + if (!aqc111_data->link) + /* poll not so frequently */ + timeout *= 2; + + if (aqc111_get_temperature(dev, &temperature) != 0) + goto end; + + if (aqc111_data->thermal_throttling && + temperature <= AQ_NORMAL_TEMP_THRESHOLD) { + netdev_info(dev->net, "The temperature is back to normal(%u)", + AQ_NORMAL_TEMP_THRESHOLD / 100); + aqc111_data->thermal_throttling = 0; + reset_speed = 1; + } + + if (!aqc111_data->thermal_throttling && + temperature >= AQ_CRITICAL_TEMP_THRESHOLD) { + netdev_warn(dev->net, "Critical temperature(%u) is reached", + AQ_CRITICAL_TEMP_THRESHOLD / 100); + aqc111_data->thermal_throttling = 1; + reset_speed = 1; + } + + if (reset_speed) + aqc111_set_phy_speed(dev, aqc111_data->autoneg, + aqc111_data->advertised_speed); + +end: + schedule_delayed_work(&aqc111_data->therm_work, timeout); +} + static int aqc111_reset(struct usbnet *dev) { struct aqc111_data *aqc111_data = dev->driver_priv; @@ -1032,6 +1103,10 @@ static int aqc111_reset(struct usbnet *dev) aqc111_set_phy_speed(dev, aqc111_data->autoneg, aqc111_data->advertised_speed); + INIT_DELAYED_WORK(&aqc111_data->therm_work, &aqc111_thermal_work_cb); + schedule_delayed_work(&aqc111_data->therm_work, + msecs_to_jiffies(AQ_THERMAL_TIMER_MS)); + return 0; } @@ -1040,6 +1115,9 @@ static int aqc111_stop(struct usbnet *dev) struct aqc111_data *aqc111_data = dev->driver_priv; u16 reg16 = 0; + cancel_delayed_work_sync(&aqc111_data->therm_work); + aqc111_data->thermal_throttling = 0; + aqc111_read16_cmd(dev, AQ_ACCESS_MAC, SFR_MEDIUM_STATUS_MODE, 2, ®16); reg16 &= ~SFR_MEDIUM_RECEIVE_EN; diff --git a/drivers/net/usb/aqc111.h b/drivers/net/usb/aqc111.h index 359663635b49..7b50e3cd217b 100644 --- a/drivers/net/usb/aqc111.h +++ b/drivers/net/usb/aqc111.h @@ -35,6 +35,11 @@ #define AQ_USB_PHY_SET_TIMEOUT 10000 #define AQ_USB_SET_TIMEOUT 4000 +#define AQ_THERMAL_TIMER_MS 500 +/* Temperature thresholds in units 1/100 degree of Celsius */ +#define AQ_NORMAL_TEMP_THRESHOLD 8500 +#define AQ_CRITICAL_TEMP_THRESHOLD 10600 + /* Feature. ********************************************/ #define AQ_SUPPORT_FEATURE (NETIF_F_SG | NETIF_F_IP_CSUM |\ NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM |\ @@ -183,6 +188,9 @@ struct aqc111_data { } fw_ver; u32 phy_cfg; u8 wol_flags; + struct delayed_work therm_work; + u8 thermal_throttling; + struct usbnet *dev; }; #define AQ_LS_MASK 0x8000 -- 2.17.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature 2018-12-12 13:50 ` [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature Igor Russkikh @ 2018-12-12 16:08 ` Andrew Lunn 2018-12-12 19:34 ` Igor Russkikh 0 siblings, 1 reply; 16+ messages in thread From: Andrew Lunn @ 2018-12-12 16:08 UTC (permalink / raw) To: Igor Russkikh Cc: linux-usb@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org, Dmitry Bezrukov On Wed, Dec 12, 2018 at 01:50:10PM +0000, Igor Russkikh wrote: > From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> > > Lab testing shows that chip may get overheated when > network link is connected on high speed (2.5G/5G). > > Default hardware design uses only passive heatsink without a cooler, > and that makes things worse. > > To prevent possible chip damage here we add throttling mechanism. > > There is a worker that monitors the PHY's temperature via reading > PHY registers. When PHY's temperature reaches the critical threshold > (it is 106 degrees of Celsius) it changes the link speed to the lowest > regardless user/default link speed settings. It should reduce the PHY's > temperature to normal numbers. > > When the PHY's temparature is back to normal (low threshold, it is > 85 degrees) it restores user/default link speed settings. Hi Igor Please could you also export the temperature using HWMON. The Marvell PHY drivers are good examples. I also have a patch for driver/net/phy/aquantia.c which adds HWMON support to that PHY. > > Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> > Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> > --- > drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++ > drivers/net/usb/aqc111.h | 8 +++++ > 2 files changed, 86 insertions(+) > > diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c > index 8867f6a3eaa7..fa6b9dfc2a6f 100644 > --- a/drivers/net/usb/aqc111.c > +++ b/drivers/net/usb/aqc111.c > @@ -17,6 +17,7 @@ > #include <linux/usb/cdc.h> > #include <linux/usb/usbnet.h> > #include <linux/linkmode.h> > +#include <linux/workqueue.h> > > #include "aqc111.h" > > @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed) > aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) & > AQ_DSH_RETRIES_MASK; > > + if (aqc111_data->thermal_throttling) > + speed = SPEED_100; > + That is a big jump. Do you need to cool is down quickly, or would 1Gbps be sufficient? I think as a user, i would prefer it ping-pongs between 5G and 1G, not 5G and 100M. > if (autoneg == AUTONEG_ENABLE) { > switch (speed) { > case SPEED_5000: It looks like this only works when auto-neg is enabled. If i've fixed configured it i don't think this works? > @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf) > /* store aqc111_data pointer in device data field */ > dev->driver_priv = aqc111_data; > > + aqc111_data->dev = dev; > + > /* Init the MAC address */ > ret = aqc111_read_perm_mac(dev); > if (ret) > @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev) > return 0; > } > > +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature) > +{ > + u16 reg16 = 0; > + > + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR, > + ®16) < 0) > + goto err; > + > + if (!(reg16 & AQ_THERM_ST_READY)) > + goto err; > + > + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR, > + ®16) < 0) > + goto err; > + > + /*convert from 1/256 to 1/100 degrees of Celsius*/ > + *temperature = (u32)reg16 * 100 / 256; > + return 0; > + > +err: > + *temperature = 0; > + return -1; > +} > + > +void aqc111_thermal_work_cb(struct work_struct *w) > +{ > + struct delayed_work *dw = to_delayed_work(w); > + struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data, > + therm_work); > + unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS); > + struct usbnet *dev = aqc111_data->dev; > + u32 temperature = 0; > + u8 reset_speed = 0; > + > + if (!aqc111_data->link) > + /* poll not so frequently */ > + timeout *= 2; > + > + if (aqc111_get_temperature(dev, &temperature) != 0) > + goto end; > + > + if (aqc111_data->thermal_throttling && > + temperature <= AQ_NORMAL_TEMP_THRESHOLD) { > + netdev_info(dev->net, "The temperature is back to normal(%u)", > + AQ_NORMAL_TEMP_THRESHOLD / 100); How often do you see these messages? I'm wondering if they need to be rate limited? > + aqc111_data->thermal_throttling = 0; > + reset_speed = 1; > + } > + > + if (!aqc111_data->thermal_throttling && > + temperature >= AQ_CRITICAL_TEMP_THRESHOLD) { Should there be some hysteresis in here? In fact, if temperature is AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at the same time! > + netdev_warn(dev->net, "Critical temperature(%u) is reached", > + AQ_CRITICAL_TEMP_THRESHOLD / 100); > + aqc111_data->thermal_throttling = 1; > + reset_speed = 1; update_speed might be a better name, since you are not always resetting it. Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature 2018-12-12 16:08 ` Andrew Lunn @ 2018-12-12 19:34 ` Igor Russkikh 2018-12-12 20:28 ` Andrew Lunn 0 siblings, 1 reply; 16+ messages in thread From: Igor Russkikh @ 2018-12-12 19:34 UTC (permalink / raw) To: Andrew Lunn Cc: linux-usb@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org, Dmitry Bezrukov Hi Andrew, >> When the PHY's temparature is back to normal (low threshold, it is >> 85 degrees) it restores user/default link speed settings. > > Hi Igor > > Please could you also export the temperature using HWMON. The Marvell > PHY drivers are good examples. > > I also have a patch for driver/net/phy/aquantia.c which adds HWMON > support to that PHY. We actually have almost ready patches to submit hwmon support separately for both AQC USB and AQC PCI MAC drivers. Will do that in near time. >> + if (aqc111_data->thermal_throttling) >> + speed = SPEED_100; >> + > > That is a big jump. Do you need to cool is down quickly, or would > 1Gbps be sufficient? I think as a user, i would prefer it ping-pongs > between 5G and 1G, not 5G and 100M. In case overheat happens that really means something bad is happening outside, or heatsink is bad/detached. I'll consult our HW team once again, but 100M was the original request from our phy/hw team. It seems it really much less on power and 1G may not be enough. > >> if (autoneg == AUTONEG_ENABLE) { >> switch (speed) { >> case SPEED_5000: > > It looks like this only works when auto-neg is enabled. If i've fixed > configured it i don't think this works? Looks you are right, will check this. >> + if (aqc111_data->thermal_throttling && >> + temperature <= AQ_NORMAL_TEMP_THRESHOLD) { >> + netdev_info(dev->net, "The temperature is back to normal(%u)", >> + AQ_NORMAL_TEMP_THRESHOLD / 100); > > How often do you see these messages? I'm wondering if they need to be > rate limited? In worst case that will be at least limited by link down/up internal, which in case of 5G normally takes 3-5secs. >> + if (!aqc111_data->thermal_throttling && >> + temperature >= AQ_CRITICAL_TEMP_THRESHOLD) { > > Should there be some hysteresis in here? In fact, if temperature is > AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at > the same time! NORMAL_TEMP and CRITICAL_TEMP values are actually a hysteresis. In cool down case only after TEMP_NORMAL temperature link will be flipped back again. > >> + netdev_warn(dev->net, "Critical temperature(%u) is reached", >> + AQ_CRITICAL_TEMP_THRESHOLD / 100); >> + aqc111_data->thermal_throttling = 1; >> + reset_speed = 1; > > update_speed might be a better name, since you are not always > resetting it. Agreed. Regards, Igor ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature 2018-12-12 19:34 ` Igor Russkikh @ 2018-12-12 20:28 ` Andrew Lunn 0 siblings, 0 replies; 16+ messages in thread From: Andrew Lunn @ 2018-12-12 20:28 UTC (permalink / raw) To: Igor Russkikh Cc: linux-usb@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org, Dmitry Bezrukov > NORMAL_TEMP and CRITICAL_TEMP values are actually a hysteresis. In > cool down case only after TEMP_NORMAL temperature link will be > flipped back again. Ah, yes, sorry, i read that wrong. Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh 2018-12-12 13:50 ` [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh 2018-12-12 13:50 ` [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature Igor Russkikh @ 2018-12-12 13:54 ` Igor Russkikh 2018-12-12 18:15 ` Florian Fainelli 2018-12-13 0:18 ` David Miller 3 siblings, 1 reply; 16+ messages in thread From: Igor Russkikh @ 2018-12-12 13:54 UTC (permalink / raw) To: davem@davemloft.net, netdev@vger.kernel.org; +Cc: Dmitry Bezrukov On 12.12.2018 16:50, Igor Russkikh wrote: > This patches introduce the thermal throttling feature to prevent possible > heat damage to the hardware. > > Dmitry Bezrukov (2): > net: usb: aqc111: Add read_mdio operation > net: usb: aqc111: Support for thermal throttling feature > > drivers/net/usb/aqc111.c | 83 ++++++++++++++++++++++++++++++++++++++++ > drivers/net/usb/aqc111.h | 19 +++++++++ > 2 files changed, 102 insertions(+) > Hi David, all, This is of course designated for the "net-next" tree. I'll resubmit this after review. Regards, Igor ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-12 13:54 ` [PATCH net 0/2] aqc111: Thermal " Igor Russkikh @ 2018-12-12 18:15 ` Florian Fainelli 2018-12-12 20:08 ` Igor Russkikh 0 siblings, 1 reply; 16+ messages in thread From: Florian Fainelli @ 2018-12-12 18:15 UTC (permalink / raw) To: Igor Russkikh, davem@davemloft.net, netdev@vger.kernel.org Cc: Dmitry Bezrukov, andrew On 12/12/18 5:54 AM, Igor Russkikh wrote: > > > On 12.12.2018 16:50, Igor Russkikh wrote: >> This patches introduce the thermal throttling feature to prevent possible >> heat damage to the hardware. >> >> Dmitry Bezrukov (2): >> net: usb: aqc111: Add read_mdio operation >> net: usb: aqc111: Support for thermal throttling feature >> >> drivers/net/usb/aqc111.c | 83 ++++++++++++++++++++++++++++++++++++++++ >> drivers/net/usb/aqc111.h | 19 +++++++++ >> 2 files changed, 102 insertions(+) >> > > Hi David, all, > > This is of course designated for the "net-next" tree. > > I'll resubmit this after review. The idea of having the PHY/network device as a cooling agent is something valuable, but as Andrew pointed out, you need to expose this as a standard HWMON device, and you need to let user-space implement the appropriate thermal policy, not do that in the network driver underneath the user's feet with no feedback other than link dropped, got re-negotiated at a different speed. How would one be able to differentiate those events from a faulty link partner for instance? None of what you are doing here is specific to your device driver and the policy of downgrading the link speed to lower the thermal budget is something that is nearly universally applicable to all network equipments because higher speeds just require higher power. -- Florian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-12 18:15 ` Florian Fainelli @ 2018-12-12 20:08 ` Igor Russkikh 2018-12-12 20:28 ` Florian Fainelli 2018-12-12 20:38 ` Andrew Lunn 0 siblings, 2 replies; 16+ messages in thread From: Igor Russkikh @ 2018-12-12 20:08 UTC (permalink / raw) To: Florian Fainelli, davem@davemloft.net, netdev@vger.kernel.org Cc: Dmitry Bezrukov, andrew@lunn.ch > > The idea of having the PHY/network device as a cooling agent is > something valuable, but as Andrew pointed out, you need to expose this > as a standard HWMON device, and you need to let user-space implement the > appropriate thermal policy, not do that in the network driver underneath > the user's feet with no feedback other than link dropped, got > re-negotiated at a different speed. How would one be able to > differentiate those events from a faulty link partner for instance? > > None of what you are doing here is specific to your device driver and > the policy of downgrading the link speed to lower the thermal budget is > something that is nearly universally applicable to all network > equipments because higher speeds just require higher power. > Hi Florian, Partially agreed with you, but as far as I know there is no much of ready to use infrastructure for this to use right now? IMHO that could be a both-way solution, where short term driver patch will secure against hardware burn out right now, and long term hwmon based infrastructure could handle that on userspace level. A whole separate concern is how much userspace should be involved here. It could be a very device specific (and therefore driver specific) logic on how to do device's thermal control. Regards, Igor ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-12 20:08 ` Igor Russkikh @ 2018-12-12 20:28 ` Florian Fainelli 2018-12-12 20:38 ` Andrew Lunn 1 sibling, 0 replies; 16+ messages in thread From: Florian Fainelli @ 2018-12-12 20:28 UTC (permalink / raw) To: Igor Russkikh, davem@davemloft.net, netdev@vger.kernel.org Cc: Dmitry Bezrukov, andrew@lunn.ch On 12/12/18 12:08 PM, Igor Russkikh wrote: > >> >> The idea of having the PHY/network device as a cooling agent is >> something valuable, but as Andrew pointed out, you need to expose this >> as a standard HWMON device, and you need to let user-space implement the >> appropriate thermal policy, not do that in the network driver underneath >> the user's feet with no feedback other than link dropped, got >> re-negotiated at a different speed. How would one be able to >> differentiate those events from a faulty link partner for instance? > >> >> None of what you are doing here is specific to your device driver and >> the policy of downgrading the link speed to lower the thermal budget is >> something that is nearly universally applicable to all network >> equipments because higher speeds just require higher power. >> > > Hi Florian, > Partially agreed with you, but as far as I know there is no much of > ready to use infrastructure for this to use right now? If you use programs like thermald, I am quite positive you could script and action which involves re-negotiation of the link at a lower speed and that would be something applicable to a variety of network devices. > > IMHO that could be a both-way solution, where short term driver patch > will secure against hardware burn out right now, and long term hwmon > based infrastructure could handle that on userspace level. The short term and most effective solution would be to have the firmware running on the device do the thermal throttling, that way, if the host CPU is crashed/unresponsive, you can still take corrective actions. Your response to Andrew seems to suggest this is not possible, so if we are reaching the critical junction temperature of your chip and that in turn, causes the enclosure to melt down, then clearly the runaway solution is not good. > > A whole separate concern is how much userspace should be involved here. > It could be a very device specific (and therefore driver specific) logic > on how to do device's thermal control. My problem with your approach is people doing the same thing to each and every one of their driver and building policy, as opposed to mechanisms in the kernel. If the argument is "user space may not be running a thermal solution", then clearly you need a hardware driven (or firmware driven) approach) which works across all possible use cases, including those where appropriate SW is not there. If you look at how your desktop PC likely manages the fans in the chassis, they can be SW controlled, or ACPI controlled, for the same reasons. -- Florian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-12 20:08 ` Igor Russkikh 2018-12-12 20:28 ` Florian Fainelli @ 2018-12-12 20:38 ` Andrew Lunn 2018-12-13 0:43 ` David Miller 1 sibling, 1 reply; 16+ messages in thread From: Andrew Lunn @ 2018-12-12 20:38 UTC (permalink / raw) To: Igor Russkikh Cc: Florian Fainelli, davem@davemloft.net, netdev@vger.kernel.org, Dmitry Bezrukov > A whole separate concern is how much userspace should be involved here. > It could be a very device specific (and therefore driver specific) logic > on how to do device's thermal control. Hi Igor Well, if you fully expose the PHY to Linux using a PHY driver, it would not be device specific at all. The PHY layer knows how to ask the PHY to drop to lower speeds. The Marvell PHYs with their temperature sensors could also use this core code. You are running into trouble because you want to both to hide the PHY from Linux, but also have Linux control the PHY to avoid it melting. This is why i actually think you should be doing this in firmware. Andrew ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-12 20:38 ` Andrew Lunn @ 2018-12-13 0:43 ` David Miller 2018-12-14 11:43 ` Igor Russkikh 0 siblings, 1 reply; 16+ messages in thread From: David Miller @ 2018-12-13 0:43 UTC (permalink / raw) To: andrew; +Cc: Igor.Russkikh, f.fainelli, netdev, Dmitry.Bezrukov From: Andrew Lunn <andrew@lunn.ch> Date: Wed, 12 Dec 2018 21:38:46 +0100 > You are running into trouble because you want to both to hide the PHY > from Linux, but also have Linux control the PHY to avoid it melting. > This is why i actually think you should be doing this in firmware. I completely agree with Andrew. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-13 0:43 ` David Miller @ 2018-12-14 11:43 ` Igor Russkikh 0 siblings, 0 replies; 16+ messages in thread From: Igor Russkikh @ 2018-12-14 11:43 UTC (permalink / raw) To: David Miller, andrew@lunn.ch Cc: f.fainelli@gmail.com, netdev@vger.kernel.org, Dmitry Bezrukov On 13.12.2018 3:43, David Miller wrote: > From: Andrew Lunn <andrew@lunn.ch> > Date: Wed, 12 Dec 2018 21:38:46 +0100 > >> You are running into trouble because you want to both to hide the PHY >> from Linux, but also have Linux control the PHY to avoid it melting. >> This is why i actually think you should be doing this in firmware. > > I completely agree with Andrew. > Hi David, all, Thank you all for your feedback. I basically agree with your arguments - at this stage it's really a better idea to put such a logic into HW/FW. I've passed on that info to our FW team. We are still relatively safe because I've got an info the FW already has a thermal safety trigger. Now it just powers off the chip to eliminate the risk of burnout. In addition to that they've agreed to integrate the same hysteresis speeddown logic into FW. I will resubmit the patch, but without the throttling logic, just hwmon temperature sensor interface. Regards, Igor ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 0/2] aqc111: Thermal throttling feature 2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh ` (2 preceding siblings ...) 2018-12-12 13:54 ` [PATCH net 0/2] aqc111: Thermal " Igor Russkikh @ 2018-12-13 0:18 ` David Miller 3 siblings, 0 replies; 16+ messages in thread From: David Miller @ 2018-12-13 0:18 UTC (permalink / raw) To: Igor.Russkikh; +Cc: linux-usb, netdev, Dmitry.Bezrukov From: Igor Russkikh <Igor.Russkikh@aquantia.com> Date: Wed, 12 Dec 2018 13:50:06 +0000 > This patches introduce the thermal throttling feature to prevent possible > heat damage to the hardware. I see what seems to be a bit of a conflict here, maybe you can explain the situation better to me. Andrew suggested that the firmware manage the thermal aspects of the PHY since it manages all other aspects of the PHY too. And then the feedback was that the firmware folks don't want to do that right now. But then it was also stated that the long term goal is to support what Andrew asked for, firmware update in the driver and updated firmwares submitted to linux-firmware. If the firwmare will eventually have support for thermal management added, then the code in this series is going to be not used and just taking up space. Please explain how all of this is going to fit together, and how we'll not end up with having to keep this thermal code around forever. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-12-14 11:43 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh 2018-12-12 13:50 ` [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh 2018-12-12 16:11 ` Andrew Lunn 2018-12-12 19:38 ` Igor Russkikh 2018-12-12 13:50 ` [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature Igor Russkikh 2018-12-12 16:08 ` Andrew Lunn 2018-12-12 19:34 ` Igor Russkikh 2018-12-12 20:28 ` Andrew Lunn 2018-12-12 13:54 ` [PATCH net 0/2] aqc111: Thermal " Igor Russkikh 2018-12-12 18:15 ` Florian Fainelli 2018-12-12 20:08 ` Igor Russkikh 2018-12-12 20:28 ` Florian Fainelli 2018-12-12 20:38 ` Andrew Lunn 2018-12-13 0:43 ` David Miller 2018-12-14 11:43 ` Igor Russkikh 2018-12-13 0:18 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).