From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juri Lelli Subject: Re: [RFC PATCH 15/19] cpufreq: remove useless usage of cpufreq_governor_mutex in __cpufreq_governor Date: Tue, 19 Jan 2016 16:49:41 +0000 Message-ID: <20160119164941.GI8573@e106622-lin> References: <1452533760-13787-1-git-send-email-juri.lelli@arm.com> <1452533760-13787-16-git-send-email-juri.lelli@arm.com> <20160112110658.GG1084@ubuntu> <20160115163031.GU18603@e106622-lin> <20160118055034.GC30762@vireshk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from foss.arm.com ([217.140.101.70]:40856 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754881AbcASQtO (ORCPT ); Tue, 19 Jan 2016 11:49:14 -0500 Content-Disposition: inline In-Reply-To: <20160118055034.GC30762@vireshk> Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: Viresh Kumar Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, peterz@infradead.org, rjw@rjwysocki.net, mturquette@baylibre.com, steve.muckle@linaro.org, vincent.guittot@linaro.org, morten.rasmussen@arm.com, dietmar.eggemann@arm.com On 18/01/16 11:20, Viresh Kumar wrote: > On 15-01-16, 16:30, Juri Lelli wrote: > > But governor_enabled seems to not be checked anymore outside cpufreq.c > > (see also 01/19), as it was in the commit you are referring to. > > Okay, I must have told you this earlier but anyway .. > > governor_enabled was introduced long back to keep governor state > changes serialized. Because of the complex cases we had in hand > (governor-per-policy or system wide governors, etc.), it failed to do > so. Though simple races were avoided with it, complex ones still came > back to haunt us. > > We fixed that by managing state changes within ondemand and > conservative governors instead and that worked very well. > > Then I wrote a patch to kill the stupid code around governor_enabled > thing, but I got into few races. Those races happened because of > userspace governor, which was getting into invalid states on some > extreme cases (These were caught using the test-suite I wrote and you > perhaps used it). > > And I never came back to fix those corner cases .. > OK, thanks for the explanation. > You can try that on ARM or x86 by running following command from my > test-suite (I remember that you are using it, right?): > Yep, I'm constantly running those on my boxes. > ./runme.sh -f sp1 or sp2 or sp3 > > Only one of sp1, sp2 or sp3 is required.. > I'm actually hitting this running sp2, on linux-pm/linux-next :/. ====================================================== [ INFO: possible circular locking dependency detected ] 4.4.0+ #445 Not tainted ------------------------------------------------------- trace.sh/1723 is trying to acquire lock: (s_active#48){++++.+}, at: [] kernfs_remove_by_name_ns+0x4c/0x94 but task is already holding lock: (od_dbs_cdata.mutex){+.+.+.}, at: [] cpufreq_governor_dbs+0x34/0x5d4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (od_dbs_cdata.mutex){+.+.+.}: [] mutex_lock_nested+0x7c/0x434 [] cpufreq_governor_dbs+0x34/0x5d4 [] return_to_handler+0x0/0x18 -> #1 (&policy->rwsem){+++++.}: [] down_read+0x58/0x94 [] show+0x30/0x60 [] sysfs_kf_seq_show+0x90/0xfc [] kernfs_seq_show+0x34/0x38 [] seq_read+0x1e4/0x4e4 [] kernfs_fop_read+0x120/0x1a0 [] __vfs_read+0x3c/0xe0 [] vfs_read+0x98/0x104 [] SyS_read+0x50/0x90 [] ret_fast_syscall+0x0/0x1c -> #0 (s_active#48){++++.+}: [] lock_acquire+0xd4/0x20c [] __kernfs_remove+0x288/0x328 [] kernfs_remove_by_name_ns+0x4c/0x94 [] remove_files+0x44/0x88 [] sysfs_remove_group+0x50/0xa4 [] cpufreq_governor_dbs+0x3f0/0x5d4 [] return_to_handler+0x0/0x18 other info that might help us debug this: Chain exists of: s_active#48 --> &policy->rwsem --> od_dbs_cdata.mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(od_dbs_cdata.mutex); lock(&policy->rwsem); lock(od_dbs_cdata.mutex); lock(s_active#48); *** DEADLOCK *** 5 locks held by trace.sh/1723: #0: (sb_writers#6){.+.+.+}, at: [] __sb_start_write+0xb4/0xc0 #1: (&of->mutex){+.+.+.}, at: [] kernfs_fop_write+0x6c/0x1c8 #2: (s_active#35){.+.+.+}, at: [] kernfs_fop_write+0x74/0x1c8 #3: (cpu_hotplug.lock){++++++}, at: [] get_online_cpus+0x48/0xb8 #4: (od_dbs_cdata.mutex){+.+.+.}, at: [] cpufreq_governor_dbs+0x34/0x5d4 stack backtrace: CPU: 2 PID: 1723 Comm: trace.sh Not tainted 4.4.0+ #445 Hardware name: ARM-Versatile Express [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [] (show_stack) from [] (dump_stack+0x80/0xb4) [] (dump_stack) from [] (print_circular_bug+0x29c/0x2f0) [] (print_circular_bug) from [] (__lock_acquire+0x163c/0x1d74) [] (__lock_acquire) from [] (lock_acquire+0xd4/0x20c) [] (lock_acquire) from [] (__kernfs_remove+0x288/0x328) [] (__kernfs_remove) from [] (kernfs_remove_by_name_ns+0x4c/0x94) [] (kernfs_remove_by_name_ns) from [] (remove_files+0x44/0x88) [] (remove_files) from [] (sysfs_remove_group+0x50/0xa4) [] (sysfs_remove_group) from [] (cpufreq_governor_dbs+0x3f0/0x5d4) [] (cpufreq_governor_dbs) from [] (return_to_handler+0x0/0x18) Now, I couldn't yet make sense of this, but it seems to be triggered by setting ondemand, printing its attributes and then switching to conservative (that's what sp2 does, right?). Also, s_active seems to come into play only when lockdep is enabled. Are you seeing this as well? > > Now that > > users of this should be holding policy->rwsem, so that should suffice > > for protecting governor_enabled, as governor_enabled is only changed > > inside here. > > If we can get rid of the rwsem dropping problem, then yeah this can be > killed for sure. > OK. > > I run some test on a x86 box I setup and didn't see anything related to > > this. I'll wait to get the first 0-day report anyway. > 0-day is setup. I didn't yet receive any major bad thing from it :). > Okay, so run the above test and make sure you have following enabled > in your configuration: > > CONFIG_LOCKDEP_SUPPORT=y > CONFIG_DEBUG_RT_MUTEXES=y > CONFIG_DEBUG_PI_LIST=y > CONFIG_DEBUG_SPINLOCK=y > CONFIG_DEBUG_MUTEXES=y > CONFIG_DEBUG_LOCK_ALLOC=y > CONFIG_PROVE_LOCKING=y > CONFIG_LOCKDEP=y > CONFIG_DEBUG_ATOMIC_SLEEP=y > Yep, that's what I normally use for developing. Thanks, - Juri > > > > - mutex_lock(&cpufreq_governor_mutex); > > > > if ((policy->governor_enabled && event == CPUFREQ_GOV_START) > > > > || (!policy->governor_enabled > > > > && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) { > > > > - mutex_unlock(&cpufreq_governor_mutex); > > > > return -EBUSY; > > > > } > > > > > > Actually the above checks should also be removed as the governors are > > > responsible for maintaining their state machines. But > > > userspace/powersave/performance don't have that support yet and so > > > these checks save them from going into undefined states. > > > > > > Over that, above and below checks are incomplete.. > > > > > > > You mean we need an additional patch that extends the checks performed? > > Yeah, we need to add some state-management code in > userspace/powersave/performance governors as well. > > -- > viresh >