From mboxrd@z Thu Jan 1 00:00:00 1970 From: Russell King - ARM Linux Subject: Re: 3.18: lockdep problems in cpufreq Date: Mon, 15 Dec 2014 13:46:36 +0000 Message-ID: <20141215134636.GI11285@n2100.arm.linux.org.uk> References: <20141214213655.GA11285@n2100.arm.linux.org.uk> <7573578.UE8ufgjWuX@vostro.rjw.lan> <002f01d0186b$2700b730$75022590$%brar@samsung.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from gw-1.arm.linux.org.uk ([78.32.30.217]:59751 "EHLO pandora.arm.linux.org.uk" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751022AbaLONqz (ORCPT ); Mon, 15 Dec 2014 08:46:55 -0500 Content-Disposition: inline In-Reply-To: <002f01d0186b$2700b730$75022590$%brar@samsung.com> Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: Yadwinder Singh Brar Cc: 'Viresh Kumar' , "'Rafael J. Wysocki'" , linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org, 'Eduardo Valentin' On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote: > Unfortunately, I didn=E2=80=99t get any such warning though I tested > patch enabling CONFIG_PROVE_LOCKING before posting. It seems > Russell is trying to load imx_thermal as module and parallely > Changing cpufreq governor from userspace, which was not my > test case. No. Yes, imx_thermal is a module, which is loaded by udev very early in the userspace boot. Later on, in the rc.local, I poke the governor from userspace. This is evidenced by the bluetooth modules being loaded after imx_therm= al: Module Size Used by bnep 10574 2 rfcomm 33720 0 bluetooth 293598 10 bnep,rfcomm nfsd 90264 0 exportfs 3988 1 nfsd hid_cypress 1763 0 snd_soc_fsl_spdif 9753 2 imx_pcm_dma 1137 1 snd_soc_fsl_spdif imx_sdma 12885 2 imx2_wdt 3248 0 imx_thermal 5386 0 snd_soc_imx_spdif 1877 0 and the timestamp on the bluetooth message: [ 25.503739] Bluetooth: Core ver 2.19 vs the timestamp on the lockdep message: [ 29.499195] [ INFO: possible circular locking dependency detected ] My rc.local does this: # Configure cpufreq echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_gov= ernor > Anyways, after analyzing log and code,I think problem is not > in cpufreq_thermal_notifier which was modified in patch as > stated above. Actual problem is in __cpufreq_cooling_register > which is unnecessarily calling cpufreq_register_notifier() > inside section protected by cooling_cpufreq_lock. > Because cpufreq_policy_notifier_list).rwsem is already held > by store_scaling_governor when __cpufreq_cooling_register is > trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_= lock.=20 >=20 > So I think following can fix the problem: >=20 > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cool= ing.c > index ad09e51..622ea40 100644 > --- a/drivers/thermal/cpu_cooling.c > +++ b/drivers/thermal/cpu_cooling.c > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *= np, > cpufreq_dev->cpufreq_state =3D 0; > mutex_lock(&cooling_cpufreq_lock); > =20 > - /* Register the notifier for first cpufreq cooling device */ > - if (cpufreq_dev_count =3D=3D 0) > - cpufreq_register_notifier(&thermal_cpufreq_notifier_b= lock, > - CPUFREQ_POLICY_NOTIFIER); > cpufreq_dev_count++; > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > =20 > mutex_unlock(&cooling_cpufreq_lock); > =20 > + /* Register the notifier for first cpufreq cooling device */ > + if (cpufreq_dev_count =3D=3D 0) > + cpufreq_register_notifier(&thermal_cpufreq_notifier_b= lock, > + CPUFREQ_POLICY_NOTIFIER); No, sorry, this patch is worse. 1. cpufreq_register_notifier() will never be called, even on the first caller to this code: at the point where it's tested for zero, it wil= l be incremented to one by the previous code. 2. What happens if two threads come here? The first thread succeeds in grabbing cooling_cpufreq_lock. The sec= ond thread stalls waiting for cooling_cpufreq_lock to be released. The first thread increments cpufreq_dev_count, adds to the list, and= then releases the lock. At this point, control may be passed to the seco= nd thread (since mutex_unlock() will wake it up.) The second thread ge= ts into the critical region, and increments cpufreq_dev_count again. By the time the first thread runs, cpufreq_dev_count may be two at t= his point. Unfortunately, you do need some kind of synchronisation here. If it's not important when cpufreq_register_notifier() gets called, maybe this can work: bool register; mutex_lock(&cooling_cpufreq_lock); register =3D cpufreq_dev_count++ =3D=3D 0; list_add(&cpufreq_dev->node, &cpufreq_dev_list); mutex_unlock(&cooling_cpufreq_lock); if (register) cpufreq_register_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); However, I suspect that may well be buggy if we have a cleanup path whi= ch unregisters the notifier when cpufreq_dev_count is decremented to zero.= =2E. which we seem to have in cpufreq_cooling_unregister(). The answer may well be to have finer grained locking here: - one lock to protect cpufreq_dev_list, which is only ever taken when adding or removing entries from it - a second lock to protect cpufreq_dev_count and the calls to cpufreq_register_notifier() and cpufreq_unregister_notifier() and you would /never/ take either of those two locks inside each other. In other words: mutex_lock(&cooling_list_lock); list_add(&cpufreq_dev->node, &cpufreq_dev_list); mutex_unlock(&cooling_list_lock); mutex_lock(&cooling_cpufreq_lock); if (cpufreq_dev_count++ =3D=3D 0) cpufreq_register_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); mutex_unlock(&cooling_cpufreq_lock); and similar in the cleanup path. The notifier itself would only ever take the cooling_list_lock. --=20 =46TTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up according to speedtest.net.