From: "Wysocki, Rafael J" <rafael.j.wysocki@intel.com>
To: Petr Machata <petrm@nvidia.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>, <netdev@vger.kernel.org>
Cc: Ido Schimmel <idosch@nvidia.com>, <mlxsw@nvidia.com>,
Lukasz Luba <lukasz.luba@arm.com>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
"Vadim Pasternak" <vadimp@nvidia.com>
Subject: Re: [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure
Date: Mon, 17 Jun 2024 21:53:59 +0200 [thread overview]
Message-ID: <d3c8f29c-22ca-4ece-8beb-ed14587bcaf0@intel.com> (raw)
In-Reply-To: <daab03a50e29278ae1e19a00a23b4f73a9124883.1718641468.git.petrm@nvidia.com>
On 6/17/2024 6:56 PM, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> Commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to
> thermal_debug_cdev_add()") changed the thermal core to read the current
> state of the cooling device as part of the cooling device's
> registration. This is incompatible with the current implementation of
> the cooling device operations in mlxsw, leading to initialization
> failure with errors such as:
>
> mlxsw_spectrum 0000:01:00.0: Failed to register cooling device
> mlxsw_spectrum 0000:01:00.0: cannot register bus device
Is this still a problem after
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=thermal&id=1af89dedc8a58006d8e385b1e0d2cd24df8a3b69
which has been merged into 6.10-rc4?
> The reason for the failure is that when the get current state operation
> is invoked the driver tries to derive the index of the cooling device by
> walking a per thermal zone array and looking for the matching cooling
> device pointer. However, the pointer is returned from the registration
> function and therefore only set in the array after the registration.
>
> Fix by passing to the registration function a per cooling device private
> data that already has the cooling device index populated.
>
> Decided to fix the issue in the driver since as far as I can tell other
> drivers do not suffer from this problem.
>
> Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
> Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: Lukasz Luba <lukasz.luba@arm.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
> .../ethernet/mellanox/mlxsw/core_thermal.c | 50 ++++++++++---------
> 1 file changed, 26 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> index 5c511e1a8efa..eee3e37983ca 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> @@ -100,6 +100,12 @@ static const struct mlxsw_cooling_states default_cooling_states[] = {
>
> struct mlxsw_thermal;
>
> +struct mlxsw_thermal_cooling_device {
> + struct mlxsw_thermal *thermal;
> + struct thermal_cooling_device *cdev;
> + unsigned int idx;
> +};
> +
> struct mlxsw_thermal_module {
> struct mlxsw_thermal *parent;
> struct thermal_zone_device *tzdev;
> @@ -123,7 +129,7 @@ struct mlxsw_thermal {
> const struct mlxsw_bus_info *bus_info;
> struct thermal_zone_device *tzdev;
> int polling_delay;
> - struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
> + struct mlxsw_thermal_cooling_device cdevs[MLXSW_MFCR_PWMS_MAX];
> struct thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
> struct mlxsw_cooling_states cooling_states[MLXSW_THERMAL_NUM_TRIPS];
> struct mlxsw_thermal_area line_cards[];
> @@ -147,7 +153,7 @@ static int mlxsw_get_cooling_device_idx(struct mlxsw_thermal *thermal,
> int i;
>
> for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
> - if (thermal->cdevs[i] == cdev)
> + if (thermal->cdevs[i].cdev == cdev)
> return i;
>
> /* Allow mlxsw thermal zone binding to an external cooling device */
> @@ -352,17 +358,14 @@ static int mlxsw_thermal_get_cur_state(struct thermal_cooling_device *cdev,
> unsigned long *p_state)
>
> {
> - struct mlxsw_thermal *thermal = cdev->devdata;
> + struct mlxsw_thermal_cooling_device *mlxsw_cdev = cdev->devdata;
> + struct mlxsw_thermal *thermal = mlxsw_cdev->thermal;
> struct device *dev = thermal->bus_info->dev;
> char mfsc_pl[MLXSW_REG_MFSC_LEN];
> - int err, idx;
> u8 duty;
> + int err;
>
> - idx = mlxsw_get_cooling_device_idx(thermal, cdev);
> - if (idx < 0)
> - return idx;
> -
> - mlxsw_reg_mfsc_pack(mfsc_pl, idx, 0);
> + mlxsw_reg_mfsc_pack(mfsc_pl, mlxsw_cdev->idx, 0);
> err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
> if (err) {
> dev_err(dev, "Failed to query PWM duty\n");
> @@ -378,22 +381,19 @@ static int mlxsw_thermal_set_cur_state(struct thermal_cooling_device *cdev,
> unsigned long state)
>
> {
> - struct mlxsw_thermal *thermal = cdev->devdata;
> + struct mlxsw_thermal_cooling_device *mlxsw_cdev = cdev->devdata;
> + struct mlxsw_thermal *thermal = mlxsw_cdev->thermal;
> struct device *dev = thermal->bus_info->dev;
> char mfsc_pl[MLXSW_REG_MFSC_LEN];
> - int idx;
> int err;
>
> if (state > MLXSW_THERMAL_MAX_STATE)
> return -EINVAL;
>
> - idx = mlxsw_get_cooling_device_idx(thermal, cdev);
> - if (idx < 0)
> - return idx;
> -
> /* Normalize the state to the valid speed range. */
> state = max_t(unsigned long, MLXSW_THERMAL_MIN_STATE, state);
> - mlxsw_reg_mfsc_pack(mfsc_pl, idx, mlxsw_state_to_duty(state));
> + mlxsw_reg_mfsc_pack(mfsc_pl, mlxsw_cdev->idx,
> + mlxsw_state_to_duty(state));
> err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
> if (err) {
> dev_err(dev, "Failed to write PWM duty\n");
> @@ -753,17 +753,21 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
> }
> for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
> if (pwm_active & BIT(i)) {
> + struct mlxsw_thermal_cooling_device *mlxsw_cdev;
> struct thermal_cooling_device *cdev;
>
> + mlxsw_cdev = &thermal->cdevs[i];
> + mlxsw_cdev->thermal = thermal;
> + mlxsw_cdev->idx = i;
> cdev = thermal_cooling_device_register("mlxsw_fan",
> - thermal,
> + mlxsw_cdev,
> &mlxsw_cooling_ops);
> if (IS_ERR(cdev)) {
> err = PTR_ERR(cdev);
> dev_err(dev, "Failed to register cooling device\n");
> goto err_thermal_cooling_device_register;
> }
> - thermal->cdevs[i] = cdev;
> + mlxsw_cdev->cdev = cdev;
> }
> }
>
> @@ -824,8 +828,8 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
> err_thermal_zone_device_register:
> err_thermal_cooling_device_register:
> for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
> - if (thermal->cdevs[i])
> - thermal_cooling_device_unregister(thermal->cdevs[i]);
> + if (thermal->cdevs[i].cdev)
> + thermal_cooling_device_unregister(thermal->cdevs[i].cdev);
> err_reg_write:
> err_reg_query:
> kfree(thermal);
> @@ -848,10 +852,8 @@ void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
> }
>
> for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
> - if (thermal->cdevs[i]) {
> - thermal_cooling_device_unregister(thermal->cdevs[i]);
> - thermal->cdevs[i] = NULL;
> - }
> + if (thermal->cdevs[i].cdev)
> + thermal_cooling_device_unregister(thermal->cdevs[i].cdev);
> }
>
> kfree(thermal);
next prev parent reply other threads:[~2024-06-17 19:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-17 16:55 [PATCH net 0/3] mlxsw: Fixes Petr Machata
2024-06-17 16:56 ` [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4 Petr Machata
2024-06-19 14:50 ` Simon Horman
2024-06-17 16:56 ` [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure Petr Machata
2024-06-17 19:53 ` Wysocki, Rafael J [this message]
2024-06-18 6:55 ` Ido Schimmel
2024-06-17 16:56 ` [PATCH net 3/3] mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems Petr Machata
2024-06-19 14:51 ` Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3c8f29c-22ca-4ece-8beb-ed14587bcaf0@intel.com \
--to=rafael.j.wysocki@intel.com \
--cc=daniel.lezcano@linaro.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=idosch@nvidia.com \
--cc=kuba@kernel.org \
--cc=lukasz.luba@arm.com \
--cc=mlxsw@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=petrm@nvidia.com \
--cc=vadimp@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.