From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932300AbcDKPeE (ORCPT ); Mon, 11 Apr 2016 11:34:04 -0400 Received: from lists.s-osg.org ([54.187.51.154]:41785 "EHLO lists.s-osg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755599AbcDKPdf (ORCPT ); Mon, 11 Apr 2016 11:33:35 -0400 Subject: Re: Odroid XU4 deadlock To: Krzysztof Kozlowski , Anand Moon References: <5707C673.9010404@osg.samsung.com> Cc: Viresh Kumar , Bartlomiej Zolnierkiewicz , Lukasz Majewski , "linux-samsung-soc@vger.kernel.org" , Linux Kernel From: Javier Martinez Canillas Message-ID: <570BC3C0.3020007@osg.samsung.com> Date: Mon, 11 Apr 2016 11:33:20 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Krzysztof, On 04/09/2016 07:00 AM, Krzysztof Kozlowski wrote: > On Fri, Apr 8, 2016 at 5:36 PM, Anand Moon wrote: >> hi Javier, >> >> On 8 April 2016 at 20:25, Javier Martinez Canillas >> wrote: >>> Hello Anand, >>> >>> On 04/08/2016 09:51 AM, Anand Moon wrote: >>>> Hi All, >>>> >>>> I am observing this deadlock or warning on my Odroid XU4. >>>> I thinks the is related to clk from exynos5_i2c_xfer .... clk_prepare_enable >>>> but I am not able to figure out which clk. >>>> >>> >>> By reading your logs, it seems the problem is a possible ABBA deadlock since >>> the S3C RTC driver grabs the prepare_lock and then the regmap->lock, and the >>> CPUFreq driver leads to the locks being grabbed in the inverse order. > > First of all this is not a deadlock. Yet. :) This is just lockdep Yes, that's why I said the problem seemed to be a possible deadlock :) > report of possible deadlock. This is deadlock: > http://www.krzk.eu/builders/boot-odroid-xu3-multi_v7/builds/13/steps/Boot%20odroid/logs/serial > > ...and I believe it is the same as the report from lockdep. Which > means that this report truly a predicted deadlock. There are minor > differences between deadlock on my board and report from Anand but > still the same devices are involved. In Anands report the involved > S2MPS11 regulator and S2MPS11 clock use the same regmap. In my case > one path comes from S5M RTC which uses different regmap than S2MPS11 > clock. This is the difference which puzzles me. > Yeah, that is puzzling indeed... > Anyway this is known issue. In case of my board mentioned deadlock > happens only on multi_v7 on around 15% of boots. I noticed it around > v4.4 and it happens still. > >>> >>>> [ 11.640221] ====================================================== >>>> [ 11.646343] [ INFO: possible circular locking dependency detected ] >>>> [ 11.652590] 4.6.0-rc2-xu4ml #35 Not tainted >>>> [ 11.656749] ------------------------------------------------------- >>>> [ 11.662992] kworker/1:1/85 is trying to acquire lock: >>>> [ 11.668010] (prepare_lock){+.+...}, at: [] >>>> clk_prepare_lock+0x50/0xf8 >>>> [ 11.675375] >>>> [ 11.675375] but task is already holding lock: >>>> [ 11.681185] (sec_core:428:(regmap)->lock){+.+...}, at: >>>> [] regmap_read+0x2c/0x5c >>>> [ 11.689417] >>>> [ 11.689417] which lock already depends on the new lock. >>>> [ 11.689417] >>>> [ 11.697561] >>>> [ 11.697561] the existing dependency chain (in reverse order) is: >>>> [ 11.705008] >>>> [ 11.705008] -> #1 (sec_core:428:(regmap)->lock){+.+...}: >>>> [ 11.710467] [] regmap_update_bits_base+0x30/0x70 >>>> [ 11.716545] [] s2mps11_clk_prepare+0x30/0x38 >>>> [ 11.722262] [] clk_core_prepare+0x98/0xbc >>>> [ 11.727711] [] clk_prepare+0x1c/0x2c >>>> [ 11.732734] [] s3c_rtc_probe+0x2d0/0x43c >>> >>> the S3C RTC driver tries to prepare the RTC source clock S2MPS11_CLK_AP >>> and that leads to the prepare_lock to be held and then the regmap->lock >>> since s2mps11_clk_prepare() calls regmap_update_bits(). >>> >>>> [ 11.738108] [] platform_drv_probe+0x4c/0xb0 >>>> [ 11.743749] [] driver_probe_device+0x20c/0x2b8 >>>> [ 11.749641] [] bus_for_each_drv+0x60/0x94 >>>> [ 11.755103] [] __device_attach+0xb4/0x118 >>>> [ 11.760558] [] bus_probe_device+0x88/0x90 >>>> [ 11.766020] [] deferred_probe_work_func+0x6c/0x9c >>>> [ 11.772169] [] process_one_work+0x1a8/0x514 >>>> [ 11.777806] [] worker_thread+0x38/0x56c >>>> [ 11.783082] [] kthread+0xf4/0x10c >>>> [ 11.787847] [] ret_from_fork+0x14/0x24 >>>> [ 11.793050] >>>> [ 11.793050] -> #0 (prepare_lock){+.+...}: >>>> [ 11.797212] [] mutex_lock_nested+0x78/0x4dc >>>> [ 11.802845] [] clk_prepare_lock+0x50/0xf8 >>>> [ 11.808304] [] clk_unprepare+0x1c/0x2c >>>> [ 11.813499] [] exynos5_i2c_xfer+0x1dc/0x304 >>> >>> and here the locks are grabbed in the inverse order, since the regulator >>> driver uses regmap read (grabbing the regmap->lock) and then a clock is >>> prepared in exynos5_i2c_xfer. >>> >>>> [ 11.819129] [] __i2c_transfer+0x13c/0x278 >>>> [ 11.824589] [] i2c_transfer+0x94/0xc4 >>>> [ 11.829701] [] regmap_i2c_read+0x48/0x64 >>>> [ 11.835074] [] _regmap_raw_read+0xb4/0x114 >>>> [ 11.840630] [] _regmap_bus_read+0x24/0x58 >>>> [ 11.846084] [] _regmap_read+0x60/0xb8 >>>> [ 11.851201] [] regmap_read+0x3c/0x5c >>>> [ 11.856224] [] regulator_get_voltage_sel_regmap+0x20/0x54 >>>> [ 11.863109] [] _regulator_get_voltage+0x20/0xb8 >>>> [ 11.863141] [] _regulator_do_set_voltage+0x240/0x370 >>>> [ 11.863173] [] regulator_set_voltage_unlocked+0xcc/0x230 >>>> [ 11.863205] [] regulator_set_voltage+0x28/0x54 >>>> [ 11.863234] [] _set_opp_voltage+0x30/0x98 >>>> [ 11.863264] [] dev_pm_opp_set_rate+0x1e0/0x540 >>>> [ 11.863337] [] __cpufreq_driver_target+0x168/0x290 >>>> [ 11.863375] [] od_dbs_timer+0xdc/0x164 >>>> [ 11.863407] [] dbs_work_handler+0x30/0x58 >>>> [ 11.863437] [] process_one_work+0x1a8/0x514 >>>> [ 11.863465] [] worker_thread+0x38/0x56c >>>> [ 11.863496] [] kthread+0xf4/0x10c >>>> [ 11.863528] [] ret_from_fork+0x14/0x24 >>>> [ 11.863538] >>>> [ 11.863538] other info that might help us debug this: >>>> [ 11.863538] >>>> [ 11.863548] Possible unsafe locking scenario: >>>> [ 11.863548] >>>> [ 11.863557] CPU0 CPU1 >>>> [ 11.863565] ---- ---- >>>> [ 11.863589] lock(sec_core:428:(regmap)->lock); >>>> [ 11.863612] lock(prepare_lock); >>>> [ 11.863634] lock(sec_core:428:(regmap)->lock); >>>> [ 11.863655] lock(prepare_lock); >>>> [ 11.863664] >>>> [ 11.863664] *** DEADLOCK *** >>>> [ 11.863664] >>> >>> I should take a look in deep to the regmap and clock subsystems to better >>> understand the issue on how to solve it. But AFAICT the problem is that >>> the prepare_lock is a global mutex and the same regmap is used for both >>> the s2mps11 clocks and regulators, so the ABBA deadlock described below >>> can happen. > > Exactly. One of ideas I remember was to make clocks mutex more > granular but I did not see any patches doing that. Another interesting > point is (in my case) deadlock itself (not lockdep report) happens > only on multi_v7, not on exynos defconfig. > I see that RTC_DRV_S5M is enabled as a module in multi_v7 but built-in in exynos_defconfig. Maybe that could cause the behavior difference? > BTW, Anand, if you report please provide more details for > reproduction: kernel config and reproduction steps. > I see that Anand provided his defconfig, so I'll try to reproduce the lockdep report and look at the issue deeper this week. > Best regards, > Krzysztof > Best regards, -- Javier Martinez Canillas Open Source Group Samsung Research America