Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max

Linux Power Management development
 help / color / mirror / Atom feed

From: Sultan Alsawaf <sultan@kerneltoast.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Mario Limonciello <mario.limonciello@amd.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Christian Loehle <christian.loehle@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Valentin Schneider <vschneid@redhat.com>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
Date: Sat, 19 Apr 2025 08:21:56 +1000	[thread overview]
Message-ID: <aALQhEi609NQAV7S@sultan-box.localdomain> (raw)
In-Reply-To: <CAJZ5v0iAmutVUQtMP_yThRx2J39Ng96osv1BMHX0gRf-8oJ3TA@mail.gmail.com>

On Fri, Apr 18, 2025 at 09:42:15PM +0200, Rafael J. Wysocki wrote:
> On Fri, Apr 18, 2025 at 12:18 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> >
> > On Tue, Apr 15, 2025 at 12:04:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Since cpufreq_driver_resolve_freq() can run in parallel with
> > > cpufreq_set_policy() and there is no synchronization between them,
> > > the former may access policy->min and policy->max while the latter
> > > is updating them and it may see intermediate values of them due
> > > to the way the update is carried out.  Also the compiler is free
> > > to apply any optimizations it wants both to the stores in
> > > cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> > > which may result in additional inconsistencies.
> > >
> > > To address this, use WRITE_ONCE() when updating policy->min and
> > > policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> > > them in cpufreq_driver_resolve_freq().  Moreover, rearrange the update
> > > in cpufreq_set_policy() to avoid storing intermediate values in
> > > policy->min and policy->max with the help of the observation that
> > > their new values are expected to be properly ordered upfront.
> > >
> > > Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> > > ordering of policy->min and policy->max, which may happen depending on
> > > the ordering of operations when this function and cpufreq_set_policy()
> > > run concurrently, into account by always honoring the max when it
> > > turns out to be less than the min (in case it comes from thermal
> > > throttling or similar).
> > >
> > > Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> > > Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > v1 -> v2: Minor edit in the subject
> > >
> > > ---
> > >  drivers/cpufreq/cpufreq.c |   46 ++++++++++++++++++++++++++++++++++++----------
> > >  1 file changed, 36 insertions(+), 10 deletions(-)
> > >
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -490,14 +490,12 @@
> > >  }
> > >  EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
> > >
> > > -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > -                                        unsigned int target_freq,
> > > -                                        unsigned int relation)
> > > +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> > > +                                unsigned int target_freq,
> > > +                                unsigned int relation)
> > >  {
> > >       unsigned int idx;
> > >
> > > -     target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > -
> > >       if (!policy->freq_table)
> > >               return target_freq;
> > >
> > > @@ -507,6 +505,15 @@
> > >       return policy->freq_table[idx].frequency;
> > >  }
> > >
> > > +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > +                                        unsigned int target_freq,
> > > +                                        unsigned int relation)
> > > +{
> > > +     target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > +
> > > +     return __resolve_freq(policy, target_freq, relation);
> > > +}
> > > +
> > >  /**
> > >   * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> > >   * one.
> > > @@ -521,7 +528,22 @@
> > >  unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > >                                        unsigned int target_freq)
> > >  {
> > > -     return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> > > +     unsigned int min = READ_ONCE(policy->min);
> > > +     unsigned int max = READ_ONCE(policy->max);
> > > +
> > > +     /*
> > > +      * If this function runs in parallel with cpufreq_set_policy(), it may
> > > +      * read policy->min before the update and policy->max after the update
> > > +      * or the other way around, so there is no ordering guarantee.
> > > +      *
> > > +      * Resolve this by always honoring the max (in case it comes from
> > > +      * thermal throttling or similar).
> > > +      */
> > > +     if (unlikely(min > max))
> > > +             min = max;
> > > +
> > > +     return __resolve_freq(policy, clamp_val(target_freq, min, max),
> > > +                           CPUFREQ_RELATION_LE);
> > >  }
> > >  EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> > >
> > > @@ -2632,11 +2654,15 @@
> > >        * Resolve policy min/max to available frequencies. It ensures
> > >        * no frequency resolution will neither overshoot the requested maximum
> > >        * nor undershoot the requested minimum.
> > > +      *
> > > +      * Avoid storing intermediate values in policy->max or policy->min and
> > > +      * compiler optimizations around them because them may be accessed
> > > +      * concurrently by cpufreq_driver_resolve_freq() during the update.
> > >        */
> > > -     policy->min = new_data.min;
> > > -     policy->max = new_data.max;
> > > -     policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> > > -     policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> > > +     WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> > > +     new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> > > +     WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
> >
> > I don't think this is sufficient, because this still permits an incoherent
> > policy->min and policy->max combination, which makes it possible for schedutil
> > to honor the incoherent limits; i.e., schedutil may observe old policy->min and
> > new policy->max or vice-versa.
> 
> Yes, it may, as stated in the new comment in cpufreq_driver_resolve_freq().

Thanks for pointing that out; I had ignored that hunk while reviewing.

But I ignored it because schedutil still accesses policy->min/max unprotected
via cpufreq_policy_apply_limits() and __cpufreq_driver_target(). The race still
affects those calls.

> > We also can't permit a wrong freq to be propagated to the driver and then send
> > the _right_ freq afterwards; IOW, we can't let a bogus freq slip through and
> > just correct it later.
> 
> The frequency is neither wrong nor bogus, it is only affected by one
> of the limits that were in effect previously or will be in effect
> going forward.  They are valid limits in either case.

I would argue that limits only make sense as a pair, not on their own. Checking
for min > max only covers the case where the new min exceeds the old max; this
means that, when min is raised without exceeding the old max, a thermal throttle
attempt could instead result in a raised frequency floor:

	1. policy->min == 100000, policy->max == 2500000
	2. Policy limit update request: new min of 400000, new max of 500000
	3. schedutil observes policy->min == 400000, policy->max == 2500000

Raising the min freq while lowering the max freq can be a valid thermal throttle
scheme. But it only makes sense if both limits are applied simultaneously.

> > How about using a seqlock?
> 
> This would mean extra overhead in the scheduler path pretty much for no gain.

Or there's the slightly cursed approach of using a union to facilitate an atomic
64-bit store of policy->min and max at the same time, since min/max are 32 bits.

Sultan

next prev parent reply	other threads:[~2025-04-18 22:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-15  9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
2025-04-15  9:58 ` [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Rafael J. Wysocki
2025-04-16 11:35   ` Christian Loehle
2025-04-20  1:10   ` Sultan Alsawaf
2025-04-15  9:59 ` [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling Rafael J. Wysocki
2025-04-16 12:01   ` Christian Loehle
2025-04-16 12:28     ` Rafael J. Wysocki
2025-04-15 10:00 ` [PATCH v2 3/6] cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit() Rafael J. Wysocki
2025-04-16 12:26   ` Christian Loehle
2025-04-15 10:02 ` [PATCH v2 4/6] cpufreq: Rename __resolve_freq() to clamp_and_resolve_freq() Rafael J. Wysocki
2025-04-15 10:04 ` [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max Rafael J. Wysocki
2025-04-16 12:39   ` Christian Loehle
2025-04-16 12:50     ` Rafael J. Wysocki
2025-04-18 10:18   ` Sultan Alsawaf
2025-04-18 19:42     ` Rafael J. Wysocki
2025-04-18 22:21       ` Sultan Alsawaf [this message]
2025-04-19 10:39         ` Rafael J. Wysocki
2025-04-15 10:05 ` [PATCH v2 6/6] cpufreq: Eliminate clamp_and_resolve_freq() Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aALQhEi609NQAV7S@sultan-box.localdomain \
    --to=sultan@kerneltoast.com \
    --cc=christian.loehle@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox