public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
To: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>,
	Amit Kucheria <amitk@kernel.org>,
	Thara Gopinath <thara.gopinath@gmail.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Daniel Lezcano <daniel.lezcano@kernel.org>,
	Zhang Rui <rui.zhang@intel.com>,
	Lukasz Luba <lukasz.luba@arm.com>
Cc: linux-pm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	linux-kernel@vger.kernel.org, manaf.pallikunhi@oss.qualcomm.com
Subject: Re: [PATCH 1/2] thermal: qcom: tsens: atomic temperature read with hardware-guided retries
Date: Tue, 5 May 2026 11:41:00 +0530	[thread overview]
Message-ID: <e473e26b-f4bc-4044-a893-c0f255de6cb8@oss.qualcomm.com> (raw)
In-Reply-To: <bfecf67e-faf2-4889-b29a-2d4d5cd0d1a6@oss.qualcomm.com>



On 04-05-2026 10:59 pm, Daniel Lezcano wrote:
> On 4/30/26 07:44, Priyansh Jain wrote:
>> The existing TSENS temperature read logic polls the valid bit and then
>> reads the temperature register. When temperature reads are triggered
>> at very short intervals, this can race with hardware updates and allow
>> the temperature field to be read while it is still being updated.
>>
>> In this case, the valid bit may already be asserted even though the
>> temperature value is transitioning, resulting in an incorrect reading.
>>
>> Hardware programming guidelines require the temperature value and the
>> valid bit to be sampled atomically in the same read transaction. A
>> reading is considered valid only if the valid bit is observed set in
>> that same sample.
>>
>> The guidelines further specify that software should attempt the
>> temperature read up to three times to account for transient update
>> windows. If none of the attempts observe a valid sample, a stable
>> fallback value must be returned: if the first and second samples match,
>> the second value is returned; otherwise, if the second and third
>> samples match, the third value is returned.
>>
>> Update the TSENS sensor read logic to implement atomic sampling along
>> with the recommended retry-and-compare fallback behavior. This removes
>> the race window and ensures deterministic temperature values in
>> accordance with hardware requirements.
>>
>> Signed-off-by: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
>> ---
>>   drivers/thermal/qcom/tsens-v1.c |   6 +-
>>   drivers/thermal/qcom/tsens-v2.c |   6 +-
>>   drivers/thermal/qcom/tsens.c    | 118 +++++++++++++++++++++-----------
>>   drivers/thermal/qcom/tsens.h    |  22 ++----
>>   4 files changed, 91 insertions(+), 61 deletions(-)
>>
>> diff --git a/drivers/thermal/qcom/tsens-v1.c b/drivers/thermal/qcom/ 
>> tsens-v1.c
>> index faa5d00788ca..2e0a01348c48 100644
>> --- a/drivers/thermal/qcom/tsens-v1.c
>> +++ b/drivers/thermal/qcom/tsens-v1.c
>> @@ -77,6 +77,9 @@ static struct tsens_features tsens_v1_feat = {
>>       .max_sensors    = 11,
>>       .trip_min_temp    = -40000,
>>       .trip_max_temp    = 120000,
>> +    .valid_bit = BIT(14),
>> +    .last_temp_mask = 0x3FF,
> 
> This is GENMASK(9, 0)
> 
>> +    .last_temp_resolution = 9,
> 
> Please comply with the SSOT, in the init function compute the mask with:
> 
>      ->last_temp_mask = GENMASK(9, 0);
> 
> and remove the initialization here
Thanks for pointing this out — yes, this approach looks better.
If I understand correctly, you’re suggesting that the mask should simply 
be defined in the init function as follows:
priv->feat->last_temp_mask = GENMASK(priv->feat->last_temp_resolution, 0);
?
> 
>>   };
>>   static struct tsens_features tsens_v1_no_rpm_feat = {
>> @@ -132,8 +135,7 @@ static const struct reg_field 
>> tsens_v1_regfields[MAX_REGFIELDS] = {
>>       /* NO CRITICAL INTERRUPT SUPPORT on v1 */
>>       /* Sn_STATUS */
>> -    REG_FIELD_FOR_EACH_SENSOR11(LAST_TEMP,    TM_Sn_STATUS_OFF,  0,  9),
>> -    REG_FIELD_FOR_EACH_SENSOR11(VALID,        TM_Sn_STATUS_OFF, 14, 14),
>> +    REG_FIELD_FOR_EACH_SENSOR11(LAST_TEMP,    TM_Sn_STATUS_OFF,  0,  
>> 14),
>>       /* xxx_STATUS bits: 1 == threshold violated */
>>       REG_FIELD_FOR_EACH_SENSOR11(MIN_STATUS,   TM_Sn_STATUS_OFF, 10, 
>> 10),
>>       REG_FIELD_FOR_EACH_SENSOR11(LOWER_STATUS, TM_Sn_STATUS_OFF, 11, 
>> 11),
>> diff --git a/drivers/thermal/qcom/tsens-v2.c b/drivers/thermal/qcom/ 
>> tsens-v2.c
>> index 8d9698ea3ec4..814147735ba5 100644
>> --- a/drivers/thermal/qcom/tsens-v2.c
>> +++ b/drivers/thermal/qcom/tsens-v2.c
>> @@ -56,6 +56,9 @@ static struct tsens_features tsens_v2_feat = {
>>       .max_sensors    = 16,
>>       .trip_min_temp    = -40000,
>>       .trip_max_temp    = 120000,
>> +    .valid_bit = BIT(21),
>> +    .last_temp_mask = 0xFFF,
>> +    .last_temp_resolution = 11,
> 
> Ditto
ACK
> 
>>   };
>>   static struct tsens_features ipq8074_feat = {
>> @@ -125,8 +128,7 @@ static const struct reg_field 
>> tsens_v2_regfields[MAX_REGFIELDS] = {
>>       [WDOG_BARK_COUNT]  = REG_FIELD(TM_WDOG_LOG_OFF,             0,  7),
>>       /* Sn_STATUS */
>> -    REG_FIELD_FOR_EACH_SENSOR16(LAST_TEMP,       TM_Sn_STATUS_OFF,  
>> 0,  11),
>> -    REG_FIELD_FOR_EACH_SENSOR16(VALID,           TM_Sn_STATUS_OFF, 
>> 21,  21),
>> +    REG_FIELD_FOR_EACH_SENSOR16(LAST_TEMP,       TM_Sn_STATUS_OFF,  
>> 0,  21),
>>       /* xxx_STATUS bits: 1 == threshold violated */
>>       REG_FIELD_FOR_EACH_SENSOR16(MIN_STATUS,      TM_Sn_STATUS_OFF, 
>> 16,  16),
>>       REG_FIELD_FOR_EACH_SENSOR16(LOWER_STATUS,    TM_Sn_STATUS_OFF, 
>> 17,  17),
>> diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
>> index a2422ebee816..15392a17ef41 100644
>> --- a/drivers/thermal/qcom/tsens.c
>> +++ b/drivers/thermal/qcom/tsens.c
>> @@ -315,10 +315,66 @@ static inline int code_to_degc(u32 adc_code, 
>> const struct tsens_sensor *s)
>>       return degc;
>>   }
>> +static inline enum tsens_ver tsens_version(struct tsens_priv *priv)
>> +{
>> +    return priv->feat->ver_major;
>> +}
> 
> I agree putting accessor functions is a good practice but here as it 
> results in duplicating the function, the benefit is discutable.
> 
I did not introduce this new function; it was already present and I only 
moved it from the bottom of the file to the top since it was being used 
in tsens_read_temp().
However, this change is no longer required as I am removing the use of 
tsens_version() in tsens_read_temp(). As discussed earlier with Konrad, 
it makes more sense to check for valid‑bit support rather than relying 
on the TSENS version check in tsens_read_temp().
>> +/**
>> + * tsens_read_temp - To read temperature from hw in deciCelsius.
>> + * @s:     Pointer to sensor struct
>> + * @field: Index into regmap_field array pointing to temperature data
>> + * @temp: temperature in deciCelsius to be read from hardware
>> + *
>> + * This function handles temperature returned in ADC code or deciCelsius
>> + * depending on IP version.
>> + *
>> + * Return: 0 on success, a negative errno will be returned in error 
>> cases
>> + */
>> +static int tsens_read_temp(const struct tsens_sensor *s, int field, 
>> int *temp)
>> +{
>> +    struct tsens_priv *priv = s->priv;
>> +    int temp_val[3] = {0};
>> +    unsigned int status = 0;
>> +    int ret = 0, i;
>> +    int max_retry = 3;
> 
> Please avoid litterals. Add a macro for max number of retries. As the 
> value 3 is not an arbitrary value but a documented value, add a small 
> comment to tell it is a hardware requirement.
> 
ACK
>> +    ret = regmap_field_read(priv->rf[field], &status);
>> +    if (ret)
>> +        return ret;
>> +
>> +    /* VER_0 doesn't have VALID bit */
>> +    if (tsens_version(priv) == VER_0) {
>> +        *temp = status;
>> +        return ret;
>> +    }
> 
> Please use a callback for v0 and v1. Set it at probe time, so the 
> version does not have to be checked at very read.
> 
Yes i am removing version check, instead adding valid bit check as
discussed with Konrad earlier.

>> +    for (i = 0; i < max_retry; i++) {
>> +        temp_val[i] = status & priv->feat->last_temp_mask;
>> +        if (() {
>> +            *temp = temp_val[i];
>> +            return ret;
>> +        }
>> +        ret = regmap_field_read(priv->rf[field], &status);
>> +        if (ret)
>> +            return ret;
> 
> It looks like more than max_retry is happening. One time before the 
> loop, then 3 times in loop. So 4 times in total.

Thanks for pointing this out, Yes correct read will happen 4 times will
update the logic.

> 
>> +    }
>> +
>> +    if (temp_val[0] == temp_val[1])
>> +        *temp = temp_val[1];
>> +    else if (temp_val[1] == temp_val[2])
>> +        *temp = temp_val[2];
>> +    else
>> +        return -EAGAIN;
> 
> We have a, b and c.
> 
> if a == b, then return b
> else b == c, then return c
> else return -EAGAIN
> 
> It is like we have two consecutives successful read. IMO that could be 
> simplified to:
> 
> int prev = INTMAX;
> 
> /*
>   * An explanation ...
>   */
> 
> for (i = 0; i < max_retry; i++) {
> 
>      int value, valid;
> 
>      ret = regmap_field_read(priv->rf[field], &status);
>      if (ret)
>          return ret;
> 
>      value = FIELD_GET(priv->feat->last_temp_mask, status);
> 
>      valid = FIELD_GET(priv->feat->valid_bit, status)
>      if (valid)
>          return value;
> 
>      if (value == prev)
>          return value;
> 
>      prev = value;
> }
> 
> return -EAGAIN;
> 
> (Not tested)
This approach has some misalignment with the HW recommendations.
As per the HW guidelines, 3 back‑to‑back reads must be performed until a 
valid read is observed.
b or c should be returned only if none of the three reads(a,b,c) report 
the valid bit not set.

If a == b, return b
Else if b == c, return c
Else return -EAGAIN

Regards,
Priyansh
> 
> 
> 


  reply	other threads:[~2026-05-05  6:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  5:44 [PATCH 0/2] thermal: qcom: tsens: fix temperature handling Priyansh Jain
2026-04-30  5:44 ` [PATCH 1/2] thermal: qcom: tsens: atomic temperature read with hardware-guided retries Priyansh Jain
2026-04-30 15:51   ` Konrad Dybcio
     [not found]     ` <10c07347-a0df-42d3-b216-5150817b9ed2@oss.qualcomm.com>
2026-05-04  9:59       ` Konrad Dybcio
2026-05-04 10:34         ` Priyansh Jain
2026-04-30 16:00   ` Konrad Dybcio
     [not found]     ` <fc027ab4-695b-4622-b30e-8a79ce6e1781@oss.qualcomm.com>
2026-05-04  9:46       ` Konrad Dybcio
2026-05-04 17:29   ` Daniel Lezcano
2026-05-05  6:11     ` Priyansh Jain [this message]
2026-05-05  7:43       ` Daniel Lezcano
2026-05-05  8:48         ` Priyansh Jain
2026-05-05  9:35           ` Daniel Lezcano
2026-05-05  9:39             ` Priyansh Jain
2026-04-30  5:44 ` [PATCH 2/2] thermal: qcom: tsens: widen temperature limits to match hardware range Priyansh Jain
2026-04-30 16:01   ` Konrad Dybcio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e473e26b-f4bc-4044-a893-c0f255de6cb8@oss.qualcomm.com \
    --to=priyansh.jain@oss.qualcomm.com \
    --cc=amitk@kernel.org \
    --cc=daniel.lezcano@kernel.org \
    --cc=daniel.lezcano@oss.qualcomm.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=manaf.pallikunhi@oss.qualcomm.com \
    --cc=rafael@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=thara.gopinath@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox