From: Andrew Lunn <andrew@lunn.ch>
To: Igor Russkikh <Igor.Russkikh@aquantia.com>
Cc: "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
"davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Dmitry Bezrukov <Dmitry.Bezrukov@aquantia.com>
Subject: [net,2/2] net: usb: aqc111: Support for thermal throttling feature
Date: Wed, 12 Dec 2018 17:08:11 +0100 [thread overview]
Message-ID: <20181212160811.GF1549@lunn.ch> (raw)
On Wed, Dec 12, 2018 at 01:50:10PM +0000, Igor Russkikh wrote:
> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
>
> Lab testing shows that chip may get overheated when
> network link is connected on high speed (2.5G/5G).
>
> Default hardware design uses only passive heatsink without a cooler,
> and that makes things worse.
>
> To prevent possible chip damage here we add throttling mechanism.
>
> There is a worker that monitors the PHY's temperature via reading
> PHY registers. When PHY's temperature reaches the critical threshold
> (it is 106 degrees of Celsius) it changes the link speed to the lowest
> regardless user/default link speed settings. It should reduce the PHY's
> temperature to normal numbers.
>
> When the PHY's temparature is back to normal (low threshold, it is
> 85 degrees) it restores user/default link speed settings.
Hi Igor
Please could you also export the temperature using HWMON. The Marvell
PHY drivers are good examples.
I also have a patch for driver/net/phy/aquantia.c which adds HWMON
support to that PHY.
>
> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
> ---
> drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++
> drivers/net/usb/aqc111.h | 8 +++++
> 2 files changed, 86 insertions(+)
>
> diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
> index 8867f6a3eaa7..fa6b9dfc2a6f 100644
> --- a/drivers/net/usb/aqc111.c
> +++ b/drivers/net/usb/aqc111.c
> @@ -17,6 +17,7 @@
> #include <linux/usb/cdc.h>
> #include <linux/usb/usbnet.h>
> #include <linux/linkmode.h>
> +#include <linux/workqueue.h>
>
> #include "aqc111.h"
>
> @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed)
> aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) &
> AQ_DSH_RETRIES_MASK;
>
> + if (aqc111_data->thermal_throttling)
> + speed = SPEED_100;
> +
That is a big jump. Do you need to cool is down quickly, or would
1Gbps be sufficient? I think as a user, i would prefer it ping-pongs
between 5G and 1G, not 5G and 100M.
> if (autoneg == AUTONEG_ENABLE) {
> switch (speed) {
> case SPEED_5000:
It looks like this only works when auto-neg is enabled. If i've fixed
configured it i don't think this works?
> @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
> /* store aqc111_data pointer in device data field */
> dev->driver_priv = aqc111_data;
>
> + aqc111_data->dev = dev;
> +
> /* Init the MAC address */
> ret = aqc111_read_perm_mac(dev);
> if (ret)
> @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev)
> return 0;
> }
>
> +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature)
> +{
> + u16 reg16 = 0;
> +
> + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR,
> + ®16) < 0)
> + goto err;
> +
> + if (!(reg16 & AQ_THERM_ST_READY))
> + goto err;
> +
> + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR,
> + ®16) < 0)
> + goto err;
> +
> + /*convert from 1/256 to 1/100 degrees of Celsius*/
> + *temperature = (u32)reg16 * 100 / 256;
> + return 0;
> +
> +err:
> + *temperature = 0;
> + return -1;
> +}
> +
> +void aqc111_thermal_work_cb(struct work_struct *w)
> +{
> + struct delayed_work *dw = to_delayed_work(w);
> + struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data,
> + therm_work);
> + unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS);
> + struct usbnet *dev = aqc111_data->dev;
> + u32 temperature = 0;
> + u8 reset_speed = 0;
> +
> + if (!aqc111_data->link)
> + /* poll not so frequently */
> + timeout *= 2;
> +
> + if (aqc111_get_temperature(dev, &temperature) != 0)
> + goto end;
> +
> + if (aqc111_data->thermal_throttling &&
> + temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
> + netdev_info(dev->net, "The temperature is back to normal(%u)",
> + AQ_NORMAL_TEMP_THRESHOLD / 100);
How often do you see these messages? I'm wondering if they need to be
rate limited?
> + aqc111_data->thermal_throttling = 0;
> + reset_speed = 1;
> + }
> +
> + if (!aqc111_data->thermal_throttling &&
> + temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {
Should there be some hysteresis in here? In fact, if temperature is
AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at
the same time!
> + netdev_warn(dev->net, "Critical temperature(%u) is reached",
> + AQ_CRITICAL_TEMP_THRESHOLD / 100);
> + aqc111_data->thermal_throttling = 1;
> + reset_speed = 1;
update_speed might be a better name, since you are not always
resetting it.
Andrew
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Lunn <andrew@lunn.ch>
To: Igor Russkikh <Igor.Russkikh@aquantia.com>
Cc: "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
"davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Dmitry Bezrukov <Dmitry.Bezrukov@aquantia.com>
Subject: Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature
Date: Wed, 12 Dec 2018 17:08:11 +0100 [thread overview]
Message-ID: <20181212160811.GF1549@lunn.ch> (raw)
In-Reply-To: <658f5fab466face2aed2d53dabfe3af16595019a.1544622414.git.igor.russkikh@aquantia.com>
On Wed, Dec 12, 2018 at 01:50:10PM +0000, Igor Russkikh wrote:
> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
>
> Lab testing shows that chip may get overheated when
> network link is connected on high speed (2.5G/5G).
>
> Default hardware design uses only passive heatsink without a cooler,
> and that makes things worse.
>
> To prevent possible chip damage here we add throttling mechanism.
>
> There is a worker that monitors the PHY's temperature via reading
> PHY registers. When PHY's temperature reaches the critical threshold
> (it is 106 degrees of Celsius) it changes the link speed to the lowest
> regardless user/default link speed settings. It should reduce the PHY's
> temperature to normal numbers.
>
> When the PHY's temparature is back to normal (low threshold, it is
> 85 degrees) it restores user/default link speed settings.
Hi Igor
Please could you also export the temperature using HWMON. The Marvell
PHY drivers are good examples.
I also have a patch for driver/net/phy/aquantia.c which adds HWMON
support to that PHY.
>
> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
> ---
> drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++
> drivers/net/usb/aqc111.h | 8 +++++
> 2 files changed, 86 insertions(+)
>
> diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
> index 8867f6a3eaa7..fa6b9dfc2a6f 100644
> --- a/drivers/net/usb/aqc111.c
> +++ b/drivers/net/usb/aqc111.c
> @@ -17,6 +17,7 @@
> #include <linux/usb/cdc.h>
> #include <linux/usb/usbnet.h>
> #include <linux/linkmode.h>
> +#include <linux/workqueue.h>
>
> #include "aqc111.h"
>
> @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed)
> aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) &
> AQ_DSH_RETRIES_MASK;
>
> + if (aqc111_data->thermal_throttling)
> + speed = SPEED_100;
> +
That is a big jump. Do you need to cool is down quickly, or would
1Gbps be sufficient? I think as a user, i would prefer it ping-pongs
between 5G and 1G, not 5G and 100M.
> if (autoneg == AUTONEG_ENABLE) {
> switch (speed) {
> case SPEED_5000:
It looks like this only works when auto-neg is enabled. If i've fixed
configured it i don't think this works?
> @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
> /* store aqc111_data pointer in device data field */
> dev->driver_priv = aqc111_data;
>
> + aqc111_data->dev = dev;
> +
> /* Init the MAC address */
> ret = aqc111_read_perm_mac(dev);
> if (ret)
> @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev)
> return 0;
> }
>
> +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature)
> +{
> + u16 reg16 = 0;
> +
> + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR,
> + ®16) < 0)
> + goto err;
> +
> + if (!(reg16 & AQ_THERM_ST_READY))
> + goto err;
> +
> + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR,
> + ®16) < 0)
> + goto err;
> +
> + /*convert from 1/256 to 1/100 degrees of Celsius*/
> + *temperature = (u32)reg16 * 100 / 256;
> + return 0;
> +
> +err:
> + *temperature = 0;
> + return -1;
> +}
> +
> +void aqc111_thermal_work_cb(struct work_struct *w)
> +{
> + struct delayed_work *dw = to_delayed_work(w);
> + struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data,
> + therm_work);
> + unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS);
> + struct usbnet *dev = aqc111_data->dev;
> + u32 temperature = 0;
> + u8 reset_speed = 0;
> +
> + if (!aqc111_data->link)
> + /* poll not so frequently */
> + timeout *= 2;
> +
> + if (aqc111_get_temperature(dev, &temperature) != 0)
> + goto end;
> +
> + if (aqc111_data->thermal_throttling &&
> + temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
> + netdev_info(dev->net, "The temperature is back to normal(%u)",
> + AQ_NORMAL_TEMP_THRESHOLD / 100);
How often do you see these messages? I'm wondering if they need to be
rate limited?
> + aqc111_data->thermal_throttling = 0;
> + reset_speed = 1;
> + }
> +
> + if (!aqc111_data->thermal_throttling &&
> + temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {
Should there be some hysteresis in here? In fact, if temperature is
AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at
the same time!
> + netdev_warn(dev->net, "Critical temperature(%u) is reached",
> + AQ_CRITICAL_TEMP_THRESHOLD / 100);
> + aqc111_data->thermal_throttling = 1;
> + reset_speed = 1;
update_speed might be a better name, since you are not always
resetting it.
Andrew
next reply other threads:[~2018-12-12 16:08 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-12 16:08 Andrew Lunn [this message]
2018-12-12 16:08 ` [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature Andrew Lunn
-- strict thread matches above, loose matches on Subject: below --
2018-12-12 20:28 [net,2/2] " Andrew Lunn
2018-12-12 20:28 ` [PATCH net 2/2] " Andrew Lunn
2018-12-12 19:38 [net,1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh
2018-12-12 19:38 ` [PATCH net 1/2] " Igor Russkikh
2018-12-12 19:34 [net,2/2] net: usb: aqc111: Support for thermal throttling feature Igor Russkikh
2018-12-12 19:34 ` [PATCH net 2/2] " Igor Russkikh
2018-12-12 16:11 [net,1/2] net: usb: aqc111: Add read_mdio operation Andrew Lunn
2018-12-12 16:11 ` [PATCH net 1/2] " Andrew Lunn
2018-12-12 13:50 [net,2/2] net: usb: aqc111: Support for thermal throttling feature Igor Russkikh
2018-12-12 13:50 ` [PATCH net 2/2] " Igor Russkikh
2018-12-12 13:50 [net,1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh
2018-12-12 13:50 ` [PATCH net 1/2] " Igor Russkikh
2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh
2018-12-12 13:54 ` Igor Russkikh
2018-12-12 18:15 ` Florian Fainelli
2018-12-12 20:08 ` Igor Russkikh
2018-12-12 20:28 ` Florian Fainelli
2018-12-12 20:38 ` Andrew Lunn
2018-12-13 0:43 ` David Miller
2018-12-14 11:43 ` Igor Russkikh
2018-12-13 0:18 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181212160811.GF1549@lunn.ch \
--to=andrew@lunn.ch \
--cc=Dmitry.Bezrukov@aquantia.com \
--cc=Igor.Russkikh@aquantia.com \
--cc=davem@davemloft.net \
--cc=linux-usb@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.