From: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
To: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>,
Amit Kucheria <amitk@kernel.org>,
Thara Gopinath <thara.gopinath@gmail.com>,
"Rafael J . Wysocki" <rafael@kernel.org>,
Daniel Lezcano <daniel.lezcano@kernel.org>,
Zhang Rui <rui.zhang@intel.com>,
Lukasz Luba <lukasz.luba@arm.com>
Cc: linux-pm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
linux-kernel@vger.kernel.org, manaf.pallikunhi@oss.qualcomm.com
Subject: Re: [PATCH 1/2] thermal: qcom: tsens: atomic temperature read with hardware-guided retries
Date: Tue, 5 May 2026 11:41:00 +0530 [thread overview]
Message-ID: <e473e26b-f4bc-4044-a893-c0f255de6cb8@oss.qualcomm.com> (raw)
In-Reply-To: <bfecf67e-faf2-4889-b29a-2d4d5cd0d1a6@oss.qualcomm.com>
On 04-05-2026 10:59 pm, Daniel Lezcano wrote:
> On 4/30/26 07:44, Priyansh Jain wrote:
>> The existing TSENS temperature read logic polls the valid bit and then
>> reads the temperature register. When temperature reads are triggered
>> at very short intervals, this can race with hardware updates and allow
>> the temperature field to be read while it is still being updated.
>>
>> In this case, the valid bit may already be asserted even though the
>> temperature value is transitioning, resulting in an incorrect reading.
>>
>> Hardware programming guidelines require the temperature value and the
>> valid bit to be sampled atomically in the same read transaction. A
>> reading is considered valid only if the valid bit is observed set in
>> that same sample.
>>
>> The guidelines further specify that software should attempt the
>> temperature read up to three times to account for transient update
>> windows. If none of the attempts observe a valid sample, a stable
>> fallback value must be returned: if the first and second samples match,
>> the second value is returned; otherwise, if the second and third
>> samples match, the third value is returned.
>>
>> Update the TSENS sensor read logic to implement atomic sampling along
>> with the recommended retry-and-compare fallback behavior. This removes
>> the race window and ensures deterministic temperature values in
>> accordance with hardware requirements.
>>
>> Signed-off-by: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
>> ---
>> drivers/thermal/qcom/tsens-v1.c | 6 +-
>> drivers/thermal/qcom/tsens-v2.c | 6 +-
>> drivers/thermal/qcom/tsens.c | 118 +++++++++++++++++++++-----------
>> drivers/thermal/qcom/tsens.h | 22 ++----
>> 4 files changed, 91 insertions(+), 61 deletions(-)
>>
>> diff --git a/drivers/thermal/qcom/tsens-v1.c b/drivers/thermal/qcom/
>> tsens-v1.c
>> index faa5d00788ca..2e0a01348c48 100644
>> --- a/drivers/thermal/qcom/tsens-v1.c
>> +++ b/drivers/thermal/qcom/tsens-v1.c
>> @@ -77,6 +77,9 @@ static struct tsens_features tsens_v1_feat = {
>> .max_sensors = 11,
>> .trip_min_temp = -40000,
>> .trip_max_temp = 120000,
>> + .valid_bit = BIT(14),
>> + .last_temp_mask = 0x3FF,
>
> This is GENMASK(9, 0)
>
>> + .last_temp_resolution = 9,
>
> Please comply with the SSOT, in the init function compute the mask with:
>
> ->last_temp_mask = GENMASK(9, 0);
>
> and remove the initialization here
Thanks for pointing this out — yes, this approach looks better.
If I understand correctly, you’re suggesting that the mask should simply
be defined in the init function as follows:
priv->feat->last_temp_mask = GENMASK(priv->feat->last_temp_resolution, 0);
?
>
>> };
>> static struct tsens_features tsens_v1_no_rpm_feat = {
>> @@ -132,8 +135,7 @@ static const struct reg_field
>> tsens_v1_regfields[MAX_REGFIELDS] = {
>> /* NO CRITICAL INTERRUPT SUPPORT on v1 */
>> /* Sn_STATUS */
>> - REG_FIELD_FOR_EACH_SENSOR11(LAST_TEMP, TM_Sn_STATUS_OFF, 0, 9),
>> - REG_FIELD_FOR_EACH_SENSOR11(VALID, TM_Sn_STATUS_OFF, 14, 14),
>> + REG_FIELD_FOR_EACH_SENSOR11(LAST_TEMP, TM_Sn_STATUS_OFF, 0,
>> 14),
>> /* xxx_STATUS bits: 1 == threshold violated */
>> REG_FIELD_FOR_EACH_SENSOR11(MIN_STATUS, TM_Sn_STATUS_OFF, 10,
>> 10),
>> REG_FIELD_FOR_EACH_SENSOR11(LOWER_STATUS, TM_Sn_STATUS_OFF, 11,
>> 11),
>> diff --git a/drivers/thermal/qcom/tsens-v2.c b/drivers/thermal/qcom/
>> tsens-v2.c
>> index 8d9698ea3ec4..814147735ba5 100644
>> --- a/drivers/thermal/qcom/tsens-v2.c
>> +++ b/drivers/thermal/qcom/tsens-v2.c
>> @@ -56,6 +56,9 @@ static struct tsens_features tsens_v2_feat = {
>> .max_sensors = 16,
>> .trip_min_temp = -40000,
>> .trip_max_temp = 120000,
>> + .valid_bit = BIT(21),
>> + .last_temp_mask = 0xFFF,
>> + .last_temp_resolution = 11,
>
> Ditto
ACK
>
>> };
>> static struct tsens_features ipq8074_feat = {
>> @@ -125,8 +128,7 @@ static const struct reg_field
>> tsens_v2_regfields[MAX_REGFIELDS] = {
>> [WDOG_BARK_COUNT] = REG_FIELD(TM_WDOG_LOG_OFF, 0, 7),
>> /* Sn_STATUS */
>> - REG_FIELD_FOR_EACH_SENSOR16(LAST_TEMP, TM_Sn_STATUS_OFF,
>> 0, 11),
>> - REG_FIELD_FOR_EACH_SENSOR16(VALID, TM_Sn_STATUS_OFF,
>> 21, 21),
>> + REG_FIELD_FOR_EACH_SENSOR16(LAST_TEMP, TM_Sn_STATUS_OFF,
>> 0, 21),
>> /* xxx_STATUS bits: 1 == threshold violated */
>> REG_FIELD_FOR_EACH_SENSOR16(MIN_STATUS, TM_Sn_STATUS_OFF,
>> 16, 16),
>> REG_FIELD_FOR_EACH_SENSOR16(LOWER_STATUS, TM_Sn_STATUS_OFF,
>> 17, 17),
>> diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
>> index a2422ebee816..15392a17ef41 100644
>> --- a/drivers/thermal/qcom/tsens.c
>> +++ b/drivers/thermal/qcom/tsens.c
>> @@ -315,10 +315,66 @@ static inline int code_to_degc(u32 adc_code,
>> const struct tsens_sensor *s)
>> return degc;
>> }
>> +static inline enum tsens_ver tsens_version(struct tsens_priv *priv)
>> +{
>> + return priv->feat->ver_major;
>> +}
>
> I agree putting accessor functions is a good practice but here as it
> results in duplicating the function, the benefit is discutable.
>
I did not introduce this new function; it was already present and I only
moved it from the bottom of the file to the top since it was being used
in tsens_read_temp().
However, this change is no longer required as I am removing the use of
tsens_version() in tsens_read_temp(). As discussed earlier with Konrad,
it makes more sense to check for valid‑bit support rather than relying
on the TSENS version check in tsens_read_temp().
>> +/**
>> + * tsens_read_temp - To read temperature from hw in deciCelsius.
>> + * @s: Pointer to sensor struct
>> + * @field: Index into regmap_field array pointing to temperature data
>> + * @temp: temperature in deciCelsius to be read from hardware
>> + *
>> + * This function handles temperature returned in ADC code or deciCelsius
>> + * depending on IP version.
>> + *
>> + * Return: 0 on success, a negative errno will be returned in error
>> cases
>> + */
>> +static int tsens_read_temp(const struct tsens_sensor *s, int field,
>> int *temp)
>> +{
>> + struct tsens_priv *priv = s->priv;
>> + int temp_val[3] = {0};
>> + unsigned int status = 0;
>> + int ret = 0, i;
>> + int max_retry = 3;
>
> Please avoid litterals. Add a macro for max number of retries. As the
> value 3 is not an arbitrary value but a documented value, add a small
> comment to tell it is a hardware requirement.
>
ACK
>> + ret = regmap_field_read(priv->rf[field], &status);
>> + if (ret)
>> + return ret;
>> +
>> + /* VER_0 doesn't have VALID bit */
>> + if (tsens_version(priv) == VER_0) {
>> + *temp = status;
>> + return ret;
>> + }
>
> Please use a callback for v0 and v1. Set it at probe time, so the
> version does not have to be checked at very read.
>
Yes i am removing version check, instead adding valid bit check as
discussed with Konrad earlier.
>> + for (i = 0; i < max_retry; i++) {
>> + temp_val[i] = status & priv->feat->last_temp_mask;
>> + if (() {
>> + *temp = temp_val[i];
>> + return ret;
>> + }
>> + ret = regmap_field_read(priv->rf[field], &status);
>> + if (ret)
>> + return ret;
>
> It looks like more than max_retry is happening. One time before the
> loop, then 3 times in loop. So 4 times in total.
Thanks for pointing this out, Yes correct read will happen 4 times will
update the logic.
>
>> + }
>> +
>> + if (temp_val[0] == temp_val[1])
>> + *temp = temp_val[1];
>> + else if (temp_val[1] == temp_val[2])
>> + *temp = temp_val[2];
>> + else
>> + return -EAGAIN;
>
> We have a, b and c.
>
> if a == b, then return b
> else b == c, then return c
> else return -EAGAIN
>
> It is like we have two consecutives successful read. IMO that could be
> simplified to:
>
> int prev = INTMAX;
>
> /*
> * An explanation ...
> */
>
> for (i = 0; i < max_retry; i++) {
>
> int value, valid;
>
> ret = regmap_field_read(priv->rf[field], &status);
> if (ret)
> return ret;
>
> value = FIELD_GET(priv->feat->last_temp_mask, status);
>
> valid = FIELD_GET(priv->feat->valid_bit, status)
> if (valid)
> return value;
>
> if (value == prev)
> return value;
>
> prev = value;
> }
>
> return -EAGAIN;
>
> (Not tested)
This approach has some misalignment with the HW recommendations.
As per the HW guidelines, 3 back‑to‑back reads must be performed until a
valid read is observed.
b or c should be returned only if none of the three reads(a,b,c) report
the valid bit not set.
If a == b, return b
Else if b == c, return c
Else return -EAGAIN
Regards,
Priyansh
>
>
>
next prev parent reply other threads:[~2026-05-05 6:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 5:44 [PATCH 0/2] thermal: qcom: tsens: fix temperature handling Priyansh Jain
2026-04-30 5:44 ` [PATCH 1/2] thermal: qcom: tsens: atomic temperature read with hardware-guided retries Priyansh Jain
2026-04-30 15:51 ` Konrad Dybcio
[not found] ` <10c07347-a0df-42d3-b216-5150817b9ed2@oss.qualcomm.com>
2026-05-04 9:59 ` Konrad Dybcio
2026-05-04 10:34 ` Priyansh Jain
2026-04-30 16:00 ` Konrad Dybcio
[not found] ` <fc027ab4-695b-4622-b30e-8a79ce6e1781@oss.qualcomm.com>
2026-05-04 9:46 ` Konrad Dybcio
2026-05-04 17:29 ` Daniel Lezcano
2026-05-05 6:11 ` Priyansh Jain [this message]
2026-05-05 7:43 ` Daniel Lezcano
2026-05-05 8:48 ` Priyansh Jain
2026-05-05 9:35 ` Daniel Lezcano
2026-05-05 9:39 ` Priyansh Jain
2026-04-30 5:44 ` [PATCH 2/2] thermal: qcom: tsens: widen temperature limits to match hardware range Priyansh Jain
2026-04-30 16:01 ` Konrad Dybcio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e473e26b-f4bc-4044-a893-c0f255de6cb8@oss.qualcomm.com \
--to=priyansh.jain@oss.qualcomm.com \
--cc=amitk@kernel.org \
--cc=daniel.lezcano@kernel.org \
--cc=daniel.lezcano@oss.qualcomm.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=manaf.pallikunhi@oss.qualcomm.com \
--cc=rafael@kernel.org \
--cc=rui.zhang@intel.com \
--cc=thara.gopinath@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox