Re: [PATCH V10 4/4] thermal: qcom: add support for PMIC5 Gen3 ADC thermal monitoring

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

From: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
To: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com>
Cc: jic23@kernel.org, robh@kernel.org, krzk+dt@kernel.org,
	conor+dt@kernel.org, agross@kernel.org, andersson@kernel.org,
	lumag@kernel.org, dmitry.baryshkov@oss.qualcomm.com,
	konradybcio@kernel.org, daniel.lezcano@linaro.org,
	sboyd@kernel.org, amitk@kernel.org, thara.gopinath@gmail.com,
	lee@kernel.org, rafael@kernel.org,
	subbaraman.narayanamurthy@oss.qualcomm.com,
	david.collins@oss.qualcomm.com,
	anjelique.melendez@oss.qualcomm.com,
	kamal.wadhwa@oss.qualcomm.com, rui.zhang@intel.com,
	lukasz.luba@arm.com, devicetree@vger.kernel.org,
	linux-arm-msm@vger.kernel.org, linux-iio@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	cros-qcom-dts-watchers@chromium.org, quic_kotarake@quicinc.com,
	neil.armstrong@linaro.org, stephan.gerhold@linaro.org
Subject: Re: [PATCH V10 4/4] thermal: qcom: add support for PMIC5 Gen3 ADC thermal monitoring
Date: Thu, 16 Apr 2026 23:12:04 +0200	[thread overview]
Message-ID: <f46ef4af-1c05-4473-9226-901d3393ea89@oss.qualcomm.com> (raw)
In-Reply-To: <12d683aa-44c2-4e2d-8459-78ba9f2ab61e@oss.qualcomm.com>

On 4/16/26 10:05, Jishnu Prakash wrote:
> Hi Daniel,
> 
> On 4/9/2026 11:42 AM, Daniel Lezcano wrote:
>> On Fri, Jan 30, 2026 at 05:24:21PM +0530, Jishnu Prakash wrote:
>>> Add support for ADC_TM part of PMIC5 Gen3.
>>>
>>> This is an auxiliary driver under the Gen3 ADC driver, which implements the
>>> threshold setting and interrupt generating functionalities of QCOM ADC_TM
>>> drivers, used to support thermal trip points.
>>>
>>> Signed-off-by: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com>
> 
> ...
> 
>>> +
>>> +static irqreturn_t adctm5_gen3_isr(int irq, void *dev_id)
>>> +{
>>> +	struct adc_tm5_gen3_chip *adc_tm5 = dev_id;
>>> +	int ret, sdam_num;
>>> +	u8 tm_status[2];
>>> +	u8 status, val;
>>> +
>>> +	sdam_num = get_sdam_from_irq(adc_tm5, irq);
>>> +	if (sdam_num < 0) {
>>> +		dev_err(adc_tm5->dev, "adc irq %d not associated with an sdam\n",
>>> +			irq);
>>> +		return IRQ_HANDLED;
>>> +	}
>>> +
>>> +	ret = adc5_gen3_read(adc_tm5->dev_data, sdam_num, ADC5_GEN3_STATUS1,
>>> +			     &status, sizeof(status));
>>> +	if (ret) {
>>> +		dev_err(adc_tm5->dev, "adc read status1 failed with %d\n", ret);
>>> +		return IRQ_HANDLED;
>>> +	}
>>> +
>>> +	if (status & ADC5_GEN3_STATUS1_CONV_FAULT) {
>>> +		dev_err_ratelimited(adc_tm5->dev,
>>> +				    "Unexpected conversion fault, status:%#x\n",
>>> +				    status);
>>> +		val = ADC5_GEN3_CONV_ERR_CLR_REQ;
>>> +		adc5_gen3_status_clear(adc_tm5->dev_data, sdam_num,
>>> +				       ADC5_GEN3_CONV_ERR_CLR, &val, 1);
>>> +		return IRQ_HANDLED;
>>> +	}
>>> +
>>> +	ret = adc5_gen3_read(adc_tm5->dev_data, sdam_num, ADC5_GEN3_TM_HIGH_STS,
>>> +			     tm_status, sizeof(tm_status));
>>> +	if (ret) {
>>> +		dev_err(adc_tm5->dev, "adc read TM status failed with %d\n", ret);
>>> +		return IRQ_HANDLED;
>>> +	}
>>> +
>>> +	if (tm_status[0] || tm_status[1])
>>> +		schedule_work(&adc_tm5->tm_handler_work);
>>> +
>>> +	dev_dbg(adc_tm5->dev, "Interrupt status:%#x, high:%#x, low:%#x\n",
>>> +		status, tm_status[0], tm_status[1]);
>>> +
>>> +	return IRQ_HANDLED;
>>
>> This ISR routine should be revisited:
>>
>>   - no error message inside
> 
> I'll drop all the error messages, but does that also include the debug print at the end?
> In addition, the print for conversion fault is ratelimited and may be useful as it
> indicates a possible HW issue, can I keep that?

It is not a good practice to put an error message in the ISR. If the 
conversion fails, then the thread blocked on the read will timeout and 
then show a message.

>>   - use a shared interrupt to split what is handled by the ADC and the
>>      TM drivers
> 
> I'll make the required updates in the main ADC driver and this driver to share the first
> SDAM's interrupt.
> 
>>
>>   - do not return IRQ_HANDLED in case of error (cf. irqreturn.h doc)
>>
> 
> I'll replace IRQ_HANDLED with IRQ_NONE at places where errors are returned.
> But in the case of conversion fault, I think returning IRQ_HANDLED may be
> more appropriate because we do handle it by clearing the status, to
> allow subsequent conversion requests to be sent.
> 
> What do you think, is this fine?

It is a good point.

Actually, if get_sdam_from_irq() or adc5_gen3_read() fail, they will 
return without clearing the interrupt flag, so we should potentially end 
up in an infinite loop.

So the status should be cleared at the end with IRQ_HANDLED. IRQ_NONE 
returned if it is for another subsystem.

If you think there can be a significant number of errors in the handler 
may be you should add statistics but later in an additional series if it 
makes sense.

[ ... ]

>>> +	adc_tm5 = prop->chip;
>>> +
>>> +	if (prop->last_temp_set) {
>>> +		pr_debug("last_temp: %d\n", prop->last_temp);
>>> +		prop->last_temp_set = false;
>>> +		*temp = prop->last_temp;
>>> +		return 0;
>>> +	}
>>
>> Why do you need to do that?
>>
>> The temperature should reflect the current situation even if the
>> reading was triggered by a thermal trip violation.
>>
> 
> This logic is needed to handle a corner case issue we have seen earlier.
> In this case, the ADC_TM threshold violation interrupt gets triggered ,
> but when get_temp() is subsequently called by the thermal framework, the
> temperature has fluctuated and the value read now lies within the thresholds,
> so the thresholds do not get updated by the thermal framework and the violation
> interrupts get repeated several times, until there is a get_temp() call
> which returns a temperature outside the threshold range.

Oh, that's clearly an issue with the thermal framework, not the driver.

> In order to avoid this issue, when the interrupt handler runs, we find the actual
> temperature read in ADC_TM that led to threshold violation by reading the ADC_TM
> data registers and we cache it and return it when get_temp() is called in the flow
> of thermal_zone_device_update(). Any subsequent calls to get_temp() would
> return the actual channel temperature at the time.
> 
> This is only done to avoid delaying thermal mitigation due to temperature
> fluctuations. Do you think this needs to be changed?

I think it is an interesting problem certainly impacting all thermal 
sensors. It should be fixed in the thermal framework itself if possible. 
Just drop this portion of code and let's handle that correctly in the 
thermal framework.

[ ... ]

>>> +	dev_dbg(adc_tm5->dev, "channel:%s, low_temp(mdegC):%d, high_temp(mdegC):%d\n",
>>> +		prop->common_props.label, low_temp, high_temp);
>>> +
>>> +	guard(adc5_gen3)(adc_tm5);
>>> +	if (high_temp == INT_MAX && low_temp == -INT_MAX)
>>> +		return adc_tm5_gen3_disable_channel(prop);
>>
>> Why disable the channel instead of returning an errno ?
>>
> 
> This is the convention we follow in our existing ADC_TM driver at
> drivers/thermal/qcom/qcom-spmi-adc-tm5.c. If both upper and lower
> thresholds are meant to be disabled, we disable the channel fully
> in HW to save some power and it can be enabled later if this API
> is called for it with valid thresholds.
> 
> Is it considered invalid in the thermal framework to try to disable
> both thresholds? Should I both disable the channel and return some
> error from here?

Well, if the channel is disabled, then the temperature sensor of the 
thermal zone is disabled, consequently the thermal zone is disabled from 
a HW POV but enabled from the kernel POV.

Why not add the 'change_mode' ops and then disable the thermal zone (+ 
pm_runtime) ?

[ ... ]

>>> +	/*
>>> +	 * Skipping first SDAM IRQ as it is requested in parent driver.
>>> +	 * If there is a TM violation on that IRQ, the parent driver calls
>>> +	 * the notifier (adctm_event_handler) exposed from this driver to handle it.
>>> +	 */
>>> +	for (i = 1; i < adc_tm5->dev_data->num_sdams; i++) {
>>> +		ret = devm_request_threaded_irq(dev,
>>> +						adc_tm5->dev_data->base[i].irq,
>>> +						NULL, adctm5_gen3_isr, IRQF_ONESHOT,
>>> +						adc_tm5->dev_data->base[i].irq_name,
>>> +						adc_tm5);
>>
>> The threaded interrupts set the isr in a thread and from the thread
>> handling the event, there is a work queue scheduled. Why not use the
>> top and bottom halves of the threaded interrupt ? Hopefully you should
>> be able to remove the lock.
> 
> Yes, I can use the top and bottom halves of the threaded interrupt as you
> suggested. But what exactly do you mean by removing the lock?
> 
> If you meant the mutex lock used in this driver, we cannot remove that.
> This is because the ADC_TM driver needs to write into several registers
> shared with the main ADC driver for setting new thresholds, so we
> have to share a mutex between the drivers to prevent concurrency issues.

When using a workqueue tampering with registers while an interrupt 
handler is doing the same, the lock is needed.

But if the workqueue is replaced by threaded interrupt, the lock *may* 
not be needed because the design may prevent race conditions.

That may be not true in this case, I did not investigate deeper in the 
code to figure it out. Let's see the next version

> I'll address all your other comments too in the next version of this patch.

Thanks

   -- Daniel

     prev parent reply	other threads:[~2026-04-16 21:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 11:54 [PATCH V10 0/4] Add support for QCOM SPMI PMIC5 Gen3 ADC Jishnu Prakash
2026-01-30 11:54 ` [PATCH V10 1/4] dt-bindings: iio: adc: Split out QCOM VADC channel properties Jishnu Prakash
2026-01-30 11:54 ` [PATCH V10 2/4] dt-bindings: iio: adc: Add support for QCOM PMIC5 Gen3 ADC Jishnu Prakash
2026-01-30 11:54 ` [PATCH V10 3/4] " Jishnu Prakash
2026-01-31 17:39   ` Jonathan Cameron
2026-02-06 13:15     ` Jishnu Prakash
2026-02-07 16:56       ` Jonathan Cameron
2026-02-23 12:19         ` Jishnu Prakash
2026-02-23 20:31           ` Jonathan Cameron
2026-03-17 13:33             ` Jishnu Prakash
2026-03-17 13:39               ` Daniel Lezcano
2026-01-30 11:54 ` [PATCH V10 4/4] thermal: qcom: add support for PMIC5 Gen3 ADC thermal monitoring Jishnu Prakash
2026-01-31 17:54   ` Jonathan Cameron
2026-02-06 13:15     ` Jishnu Prakash
2026-02-07 16:55       ` Jonathan Cameron
2026-04-09  6:12   ` Daniel Lezcano
2026-04-16  8:05     ` Jishnu Prakash
2026-04-16 21:12       ` Daniel Lezcano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f46ef4af-1c05-4473-9226-901d3393ea89@oss.qualcomm.com \
    --to=daniel.lezcano@oss.qualcomm.com \
    --cc=agross@kernel.org \
    --cc=amitk@kernel.org \
    --cc=andersson@kernel.org \
    --cc=anjelique.melendez@oss.qualcomm.com \
    --cc=conor+dt@kernel.org \
    --cc=cros-qcom-dts-watchers@chromium.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=david.collins@oss.qualcomm.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dmitry.baryshkov@oss.qualcomm.com \
    --cc=jic23@kernel.org \
    --cc=jishnu.prakash@oss.qualcomm.com \
    --cc=kamal.wadhwa@oss.qualcomm.com \
    --cc=konradybcio@kernel.org \
    --cc=krzk+dt@kernel.org \
    --cc=lee@kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=lumag@kernel.org \
    --cc=neil.armstrong@linaro.org \
    --cc=quic_kotarake@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=sboyd@kernel.org \
    --cc=stephan.gerhold@linaro.org \
    --cc=subbaraman.narayanamurthy@oss.qualcomm.com \
    --cc=thara.gopinath@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox