From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juri Lelli Subject: Re: sched-freq locking Date: Wed, 20 Jan 2016 15:58:29 +0000 Message-ID: <20160120155829.GV8573@e106622-lin> References: <569D568D.5000500@linaro.org> <569E90CF.9050503@linaro.org> <569EB225.4040707@linaro.org> <569EE1E1.3050407@linaro.org> <20160120121801.GR8573@e106622-lin> <569F9EB3.60600@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from foss.arm.com ([217.140.101.70]:46226 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933838AbcATP6C (ORCPT ); Wed, 20 Jan 2016 10:58:02 -0500 Content-Disposition: inline In-Reply-To: <569F9EB3.60600@linaro.org> Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: Steve Muckle Cc: Michael Turquette , Vincent Guittot , Patrick Bellasi , Morten Rasmussen , Dietmar Eggemann , Viresh Kumar , "linux-kernel@vger.kernel.org" , "linux-pm@vger.kernel.org" , Peter Zijlstra , "Rafael J. Wysocki" , Javi Merino , Punit Agrawal On 20/01/16 06:50, Steve Muckle wrote: > On 01/20/2016 04:18 AM, Juri Lelli wrote: > > I fear that caching could break thermal. If everybody was already using > > sched-freq interface to control frequency this won't probably be a > > problem, but we are not yet there :(. So, IIUC by caching policy->max, > > for example, we might affect what thermal expects from cpufreq. > > I think we could probably just access policy->max without rwsem, as long > as we ensure policy is valid. Cpufreq will take care to cap a frequency > request to an upper limit put in by thermal anyway so I don't think it's > a problematic race. But I haven't spent much time thinking about it. > Mmm, can't the fact that the scheduler might think it can still request a frequency above max be problematic? Anyway, thermal seems to use cpufreq_set_policy() for changing max (if I got it right Punit :)), so we might be able to intercept such sites and do proper update of cached values. > > > ... > >> Done (added linux-pm, PeterZ and Rafael as well). > >> > > > > This discussion is pretty interesting, yes. I'm a bit afraid people > > bumped into this might have troubles understanding context, though. > > And I'm not sure how to give them that context; maybe start a new thread > > summarizing what has been discussed so far? > > Yeah, that occurred to me as well. I wasn't sure whether to restart the > thread or put in the locking I had in mind and just send it with the > next schedfreq RFC series, which should be in the next few weeks. I was > going to do the latter but here is a recap of the topic from my side: > Thanks a lot for writing this down! > Currently schedfreq has to go through two stages of aggregation. The > first is aggregating the capacity request votes from the different sched > classes within a CPU. The second is aggregating the capacity request > votes from the CPUs within a frequency domain. I'm looking to solve the > locking issues with both of these stages as well as those with calling > into cpufreq in the fast path. > > For the first stage, currently we assume that the rq lock is held. This > is the case for the existing calls into schedfreq (the load balancer > required a minor change). There are a few more hooks that need to go > into migration paths in core.c which will be slightly more painful, but > they are IMO doable. > > For the second stage I believe an internal schedfreq spinlock is > required, for one thing to protect against competing updates within the > same frequency domain from different CPUs, but also so that we can > guarantee the cpufreq policy is valid, which can be done if the > governor_start/stop callbacks take the spinlock as well. > Does this need to nest within the rq lock? > As for accessing various things in cpufreq->policy and trying to take > rwsem in the fast path, we should be able to either cache some of the > items in the governor_start() path, eliminate the references, or access > them without locking rwsem (as long as we know the policy is valid, If we only need to guarantee existence of the policy, without any direct updates, RCU might be a good fit. > which we do by taking the spinlock in governor_start()). Here are the > things we currently reference in the schedfreq set capacity hook and my > thoughts on each of them: > > policy->governor: this is currently tested to see if schedfreq is > enabled, but this can be tracked internally to schedfreq and set in the > governor_start/stop callbacks > > policy->governor_data: used to access schedfreq's data for the policy, > could be changed to an internal schedfreq percpu pointer > > policy->cpus, policy->freq_table: these can be cached internally to > schedfreq on governor_start/stop > > policy->max: as mentioned above I *think* we could get away with > accessing this without locking rwsem as discussed above > > policy->transition_ongoing: this isn't strictly required as schedfreq > could track internally whether it has a transition outstanding, but we > should also be okay to look at this without policy->rwsem since > schedfreq is the only one queuing transitions, again assuming we take > care to ensure policy is valid as above > > Finally when actually requesting a frequency change in the fast path, we > can trylock policy->rwsem and fall back to the slow path (kthread) if > that fails. > OTOH, all the needed aggregation could make the fast path not so fast in the end. So, maybe having a fast and a slow path is always good? Thanks, - Juri