From mboxrd@z Thu Jan 1 00:00:00 1970 From: skannan@codeaurora.org Subject: Re: [PATCH] clk: Fix race conditions between clk_set_parent() and clk_enable() Date: Wed, 16 May 2012 02:19:55 -0700 (PDT) Message-ID: <20357906133225932a45e9cefefac829.squirrel@www.codeaurora.org> References: <1336798797-8724-1-git-send-email-skannan@codeaurora.org> <20120515194245.GO30400@pengutronix.de> <4FB2B3AA.3010903@codeaurora.org> <20120515200040.GP30400@pengutronix.de> <4FB2B7DC.4070706@codeaurora.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from wolverine02.qualcomm.com ([199.106.114.251]:65279 "EHLO wolverine02.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932936Ab2EPJTz (ORCPT ); Wed, 16 May 2012 05:19:55 -0400 In-Reply-To: Sender: linux-arm-msm-owner@vger.kernel.org List-Id: linux-arm-msm@vger.kernel.org To: "Turquette, Mike" Cc: Saravana Kannan , Sascha Hauer , Andrew Lunn , Paul Walmsley , Linus Walleij , Stephen Boyd , linux-arm-msm@vger.kernel.org, Mark Brown , Magnus Damm , linux-kernel@vger.kernel.org, Rob Herring , Richard Zhao , Grant Likely , Deepak Saxena , Thomas Gleixner , Shawn Guo , Amit Kucheria , Jamie Iles , Russell King , Arnd Bergman , Jeremy Kerr , linux-arm-kernel@lists.infradead.or > On Tue, May 15, 2012 at 1:09 PM, Saravana Kannan > wrote: >> On 05/15/2012 01:00 PM, Sascha Hauer wrote: >>> >>> On Tue, May 15, 2012 at 12:51:06PM -0700, Saravana Kannan wrote: >>>>>> >>>>>> =A0 =A0 =A0 =A0ret =3D clk->ops->set_parent(clk->hw, i); >>>>> >>>>> >>>>> You call ->set_parent while holding a spinlock. This won't work w= ith >>>>> i2c >>>>> clocks. >>>> >>>> >>>> I did account for that. I explained it in the commit text. Please >>>> let me know if any part of that is not clear or is not correct. >>>> >>> >>> I missed this part in the commit log. I have no idea whether we can >>> live >>> with this limitation though. >>> >>> Sascha >>> >> >> It's not really an artificial limitation of the patch. This has to b= e >> enforced if the clock is to be managed correctly while allowing >> .set_parent >> to NOT be atomic. >> >> There is no way to guarantee that the enable/disable is properly >> propagated >> to the parent clock if we can't guarantee mutual exclusion between >> changing >> parents and calling enable/disable. >> > > We know for sure that __clk_enable was propagated since it won't > return until it is done. The bigger issue here is the inherent > prepare_lock/enable_lock raciness, which this patch does not solve. > Your approach above is like putting a band-aid gunshot wound :-) No, I'm not trying to fix a race that's purely between prepare and enab= le. The operation of updating the HW state (calling .set_parent) and the operation of updating the SW state (setting clk->parent =3D new parent)= is not atomic with respect to other code that cares about the HW/SW state (enable/disable). This would be similar to holding the enable spinlock around incrementing the enable count but not holding it while calling =2Eenable on the clock -- that is just plain wrong. If you still claim that's this is nothing more than prepare/enable race= , then why are we implementing a half-baked enable/disable propagation up the parents? In it's current state, it's never guaranteed to be correct for ANY clock. Why give an illusion of correctness? Just don't do it -- that will make the common clock framework less useful, but it's better than something that pretends to be correct. Also, just because we pushed the enforcement of correctness between clk_prepare() and clk_enable() to the end user doesn't mean we should throw up our arms and push all the correctness enforcement to the end user. The API should try to limit the amount of burden of correctness t= hat we put on the end user. The only thing we should avoid is writing code that's correct for the end user only under some circumstances. It's relatively easy to implement device driver code that ensures that enable is only called on a prepared clock (since we have ref counting a= nd it's just a matter of matching enables with prepares). But it would be much harder to implement useful and power efficient device driver code that guarantees that enable/set parent is mutually exclusive. You also need to keep in mind the comment I made in the original email about set rates that need reparenting. set parent is just another way of setting = the rate and vice-versa (in many cases). This stance of yours means that an= y clock that needs to reparent during a set rate while the clock is enabl= ed can never be implemented correctly. A very simple example is a device driver that's trying to do power management by enabling/disabling clocks in interrupt context and also scaling the clock based on load on the device. Those drivers will fail horribly if the set rate needs reparenting. > What is really needed is a way to synchronize all of > the clock operations and drop these race conditions. Until that > problem is solved OR until it is proven impossible to fix with the > API's given to us I won't be taking this patch. It is invalid for th= e > set of clocks that don't set the CLK_SET_PARENT_GATE flag AND have > .set_parent callbacks which might sleep. This is clearly the opposite of the direction we (the community) decide= d to go with the design of the clock APIs -- clk_prepare/clk_enable split= =2E If we need to question that, that should be a separate discussion and n= ot part of a discussion where we try to fix issues in the implementation o= f those APIs. There is not much to prove here either -- you just can't synchronize between critical sections where some are protected by mutex and others by spinlocks. Giving this is a reason to not pick up patches that fix bugs is not well justified. Again, I'm not saying your questio= n is/isn't justified, just that it's a separate discussion. > It is invalid for the > set of clocks that don't set the CLK_SET_PARENT_GATE flag AND have > .set_parent callbacks which might sleep. Sure, but how does it affect the decision of whether we need to fix a r= ace condition? Any clock platform driver that implements this incorrectly c= an be easily caught in testing or code review. Every single call to that s= et parent is going to spew a warning. Thanks, Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora For= um.