Re: [PATCH v2 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Bjorn Andersson <bjorn.andersson@linaro.org>
To: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Andy Gross <agross@kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	linux-arm-msm@vger.kernel.org, linux-pm@vger.kernel.org,
	Thara Gopinath <thara.gopinath@gmail.com>
Subject: Re: [PATCH v2 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
Date: Thu, 24 Mar 2022 09:00:03 -0700	[thread overview]
Message-ID: <YjyVgx08RiFDYwT5@ripper> (raw)
In-Reply-To: <20220309223938.3819715-3-dmitry.baryshkov@linaro.org>

On Wed 09 Mar 14:39 PST 2022, Dmitry Baryshkov wrote:

> qcom_lmh_dcvs_poll() can be running when the cpu is being put offline.
> This results in the following warnings and an oops. The driver would
> disable the worker, but it happens closer to the end of
> cpufreq_offline(). Change the locking in the qcom_lmh_dcvs_poll(), so
> that the worker can not run in parallel with cpufreq_offline() call.
> 
> [   55.650435] (NULL device *): dev_pm_opp_find_freq_floor: Invalid argument freq=00000000709ccbf9
> [   55.659420] (NULL device *): Can't find the OPP for throttling: -EINVAL!
> [   55.666329] Unable to handle kernel paging request at virtual address ffffadfba4bb6d81
> [   55.674491] Mem abort info:
> [   55.677363]   ESR = 0x96000004
> [   55.680527]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   55.686001]   SET = 0, FnV = 0
> [   55.689164]   EA = 0, S1PTW = 0
> [   55.692418]   FSC = 0x04: level 0 translation fault
> [   55.697449] Data abort info:
> [   55.700426]   ISV = 0, ISS = 0x00000004
> [   55.704383]   CM = 0, WnR = 0
> [   55.707455] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000a98e9000
> [   55.714354] [ffffadfba4bb6d81] pgd=0000000000000000, p4d=0000000000000000
> [   55.721388] Internal error: Oops: 96000004 [#1] SMP
> [   55.726397] Modules linked in:
> [   55.729542] CPU: 7 PID: 162 Comm: kworker/7:1H Tainted: G S      W         5.17.0-rc6-00100-g04890a1d9672 #724
> [   55.746066] Workqueue: events_highpri qcom_lmh_dcvs_poll
> [   55.751527] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   55.758669] pc : cpufreq_cpu_get_raw+0x20/0x44
> [   55.763233] lr : qcom_cpufreq_hw_get+0x10/0x64
> [   55.767799] sp : ffff800009983d10
> [   55.771207] x29: ffff800009983d10 x28: ffffaa13a4f2b000 x27: ffff7b31329f9305
> [   55.778530] x26: ffffaa13a4f30af8 x25: ffffaa13a4f4e4c8 x24: ffff7b2ec2eda000
> [   55.785851] x23: ffffffffffffffea x22: ffff7b2ec2e9fc18 x21: ffff7b2ec2e9fc00
> [   55.793170] x20: 0000000000000100 x19: ffff7b2ec2e9fcc0 x18: ffffffffffffffff
> [   55.800490] x17: 726620746e656d75 x16: 6772612064696c61 x15: ffff8000899839c7
> [   55.807812] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [   55.815140] x11: ffff7b2ec2e9fc80 x10: ffffaa13a59a1a70 x9 : 0000000000000000
> [   55.822468] x8 : ffff7b2eca6917c0 x7 : ffffaa13a528b000 x6 : 0000000000000001
> [   55.829788] x5 : 0000000000040000 x4 : 000000000000024f x3 : 0000000000000000
> [   55.837116] x2 : 0000000000000100 x1 : ffffaa13a4bb6d80 x0 : 000003e800000001
> [   55.844439] Call trace:
> [   55.846951]  cpufreq_cpu_get_raw+0x20/0x44

While I don't have the line numbers, I presume this would be cause by
policy->cpus being empty and:

   int cpu = cpumask_first(policy->cpus);

returning >= nr_cpu_ids, which means that the get_cpu_device(cpu); on
the next line will return NULL and then we keep playing opp on that
NULL?


Seems like this would be exactly the same mistake as I did wrt
policy->cpus vs policy->related_cpus and we don't actually need the
specific CPU, we just need a cpu device in the frequency domain.

As such we should actually do cpumaks_first(policy->related_cpus)
instead.

> [   55.851155]  qcom_lmh_dcvs_poll+0x104/0x160
> [   55.855452]  process_one_work+0x288/0x69c
> [   55.859574]  worker_thread+0x74/0x470
> [   55.863336]  kthread+0xfc/0x100
> [   55.866568]  ret_from_fork+0x10/0x20
> [   55.870246] Code: b00065c1 91360021 d503233f f8625800 (f8616800)
> [   55.876501] ---[ end trace 0000000000000000 ]---
> 
> Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 44d46e52baea..7c1bb002e1c3 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -296,6 +296,23 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  	struct dev_pm_opp *opp;
>  	unsigned int freq;
>  
> +	/*
> +	 * Synchronize against CPU going offline.
> +	 * cpufreq_offline() will get the write lock on policy->rwsem.
> +	 */
> +retry:
> +	if (unlikely(!down_read_trylock(&policy->rwsem))) {
> +		mutex_lock(&data->throttle_lock);
> +		if (data->cancel_throttle) {
> +			mutex_unlock(&data->throttle_lock);
> +			return;
> +		}
> +
> +		mutex_unlock(&data->throttle_lock);
> +
> +		schedule();
> +		goto retry;
> +	}

And doing that instead would remove the need for doing this crazy
locking between the cpufreq core and qcom driver.

Above change would be trivial and -rc material.

Regards,
Bjorn

>  	/*
>  	 * Get the h/w throttled frequency, normalize it using the
>  	 * registered opp table and use it to calculate thermal pressure.
> @@ -314,9 +331,10 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  
>  	/*
>  	 * In the unlikely case policy is unregistered do not enable
> -	 * polling or h/w interrupt
> +	 * polling or h/w interrupt.
> +	 * If we are here, we have the policy->rwsem read lock,
> +	 * cancel_throttle can be toggled only with the write lock.
>  	 */
> -	mutex_lock(&data->throttle_lock);
>  	if (data->cancel_throttle)
>  		goto out;
>  
> @@ -331,7 +349,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  				 msecs_to_jiffies(10));
>  
>  out:
> -	mutex_unlock(&data->throttle_lock);
> +	up_read(&policy->rwsem);
>  }
>  
>  static void qcom_lmh_dcvs_poll(struct work_struct *work)
> -- 
> 2.34.1
>

next prev parent reply	other threads:[~2022-03-24 15:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09 22:39 [PATCH v2 0/4] cpufreq: qcom-hw: Fixes for cpu hotplug support Dmitry Baryshkov
2022-03-09 22:39 ` [PATCH v2 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Dmitry Baryshkov
2022-03-11  8:03   ` Vladimir Zapolskiy
2022-03-17 23:03   ` Vladimir Zapolskiy
2022-03-24 16:05   ` Bjorn Andersson
2022-03-09 22:39 ` [PATCH v2 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp Dmitry Baryshkov
2022-03-17 23:10   ` Vladimir Zapolskiy
2022-03-17 23:21     ` Dmitry Baryshkov
2022-03-24 16:00   ` Bjorn Andersson [this message]
2022-03-25 18:52     ` Dmitry Baryshkov
2022-03-09 22:39 ` [PATCH v2 3/4] cpufreq: qcom-hw: fix the opp entries refcounting Dmitry Baryshkov
2022-03-17 23:04   ` Vladimir Zapolskiy
2022-03-24 16:06   ` Bjorn Andersson
2022-03-09 22:39 ` [PATCH v2 4/4] cpufreq: qcom-hw: provide online/offline operations Dmitry Baryshkov
2022-03-17 23:05   ` Vladimir Zapolskiy
2022-03-24 16:08   ` Bjorn Andersson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjyVgx08RiFDYwT5@ripper \
    --to=bjorn.andersson@linaro.org \
    --cc=agross@kernel.org \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=thara.gopinath@gmail.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).