From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751504AbdISGMv (ORCPT ); Tue, 19 Sep 2017 02:12:51 -0400 Received: from mout.gmx.net ([212.227.17.22]:56036 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751098AbdISGMt (ORCPT ); Tue, 19 Sep 2017 02:12:49 -0400 Message-ID: <1505801513.29698.10.camel@gmx.de> Subject: Re: Query regarding synchronize_sched_expedited and resched_cpu From: Mike Galbraith To: Boqun Feng , "Paul E. McKenney" Cc: Byungchul Park , Steven Rostedt , Neeraj Upadhyay , josh@joshtriplett.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, linux-kernel@vger.kernel.org, sramana@codeaurora.org, prsood@codeaurora.org, pkondeti@codeaurora.org, markivx@codeaurora.org, peterz@infradead.org, kernel-team@lge.com Date: Tue, 19 Sep 2017 08:11:53 +0200 In-Reply-To: <20170919053749.GA12412@tardis> References: <20170918121213.312c82b0@gandalf.local.home> <20170918162412.GM3521@linux.vnet.ibm.com> <20170918122931.0e3341f3@gandalf.local.home> <20170918165527.GN3521@linux.vnet.ibm.com> <20170918235311.GA20177@linux.vnet.ibm.com> <20170919015027.GD5994@X58A-UD3R> <20170919020610.GF5994@X58A-UD3R> <20170919023329.GA3521@linux.vnet.ibm.com> <20170919024822.GG5994@X58A-UD3R> <20170919040456.GC3521@linux.vnet.ibm.com> <20170919053749.GA12412@tardis> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.20.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:Wo+PQdKAUqx3R73FEpzJF4fcCwxN7gqtv7BDSbMUGX2jo8L3bIN 0cYAeavF6hsFbObwtT26j5QX+CiqwApV03Qi84cOz/dL3Jz5839qAZKzwGF2KWQTXoiEScQ spmW6OXgtHcoUnQYsyTZ3vlexMJub3rLD0+zUN2C9mAiBpQNwd25DF/XJgWAMLXWhJ8slJ5 w9l0pV3v/tFHyuP/Rvtpg== X-UI-Out-Filterresults: notjunk:1;V01:K0:ZaeGhZ8Asp4=:s3qbiqazQieznJOYJxGoyc 3TVQoVRaHU1453p3pUtaEcB6hXqpv485Ypj2gm39VHZNSEKZTpBKJabn2BI7t1kfwTPp2jvS+ MR9fzPtu8UCi+882zlKahxGHkTFgZv2i+HlvcYxC2Tz1s1UMI64oypACTs6hryMvht0tOxrQu TBJiSlSTaP3Wd6qWiFQ5eCp//PJH6KnXbI8sbXWkrd3PWqSTTQ6iiy1evYcj+M/xZws5mBuLA UT9DsO4ibiiXDje+l07a+0vsc6Gu16OgE56ge6xpCKjRgq5siucTckPBJOSLLEFG7Ez6Z9nyl onzvp3TSk1uAE1Hv9rC1PkrI78baTZtJsjYWJyh51KLMkwFGwLWHzxoZxaDOT89UYgdr7RzuF DWvn7vxn2l4o0tcQohYCkD38uesQ9IChwWdM4H137cpSvNm4nFJyv5BvKZUteMb7HQvvJgWn1 45Be+a2K0vIxNmaF1T+6H+5eDbjhXqPPSzYotLprpuyhxGhrQJfhOEXukt2tGBSJNmZvAKMSp AEyGHtum7Cl+Y634BYsjZeKlyCx4xe9/BanQoIGnmN2m5Y4RnvNVv9N23x8C2vI5JtG7hmiV5 +aPCcC7Q2f0mSox1UJMXWvZyTyukOfLkIE2bkcvI9/N86MebjAJ6Ql3jPifBbo7LwFdm3TAJT cHNjfdwHctHNDCfea4DswgdcBFMDwPFzNgNGtQnBDq0wlk1z+DacacQR3OxGSfDDDqALxi5sk Ak5x10b/fLGjXzRjxxjoeeLkOovZ44SPKWoRTCdu60KkjWpqXYjcCwnZdvfIOLI5I+N++tfWX ZrsRIC05y60zoubGFDBsHOBXkIkE29FFmUq0xV6ouuO5NCZw+A= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2017-09-19 at 13:37 +0800, Boqun Feng wrote: > On Mon, Sep 18, 2017 at 09:04:56PM -0700, Paul E. McKenney wrote: > > On Tue, Sep 19, 2017 at 11:48:22AM +0900, Byungchul Park wrote: > > > On Mon, Sep 18, 2017 at 07:33:29PM -0700, Paul E. McKenney wrote: > > > > > > Hello Paul and Steven, > > > > > > > > So I think this is another false positive, and the reason is we use > st->done for multiple purposes. > > > > > > > This is saying: > > > > > > > > > > > > Thread A > > > > > > -------- > > > > > > takedown_cpu() > > > > > > irq_lock_sparse() > > > > > > wait_for_completion(&st->done) // Wait for completion of B > > Thread A wait for the idle task on the outgoing to set the st->state to > CPUHP_AP_IDLE_DEAD(i.e. the corresponding complete() is the one in > cpuhp_complete_idle_dead()), and it happens when we try to _offline_ a > cpu. > > > > > > > irq_unlock_sparse() > > > > > > > > > > > > Thread B > > > > > > -------- > > > > > > cpuhp_invoke_callback() > > > > > > irq_lock_sparse() // Wait for A to irq_unlock_sparse() > > irq_affinity_online_cpu() is called here, so it happens when we try to > _online_ a cpu. > > > > > > > (on the way going to complete(&st->done)) > > and we are going to complete(&st->done) in a hotplug thread context to > indicate the hotplug thread has finished its job(i.e. this complete() is > the one in cpuhp_thread_fun()). > > > So even though the &st->done are the same instance, the deadlock could > not happen, I think, as we could not up/down a same cpu at the same > time? > > If I'm not missing something subtle. To fix this we can either > > 1) have dedicated completion instances for different wait purposes > in cpuhp_cpu_state. > > or > > 2) extend crossrelease to have the "subclass" concept, so that > callsite of complete() and wait_for_completion() for the same > completion instance but with different purposes could be > differed by lockdep. > > Thoughts? https://lkml.org/lkml/2017/9/5/184 Peter's patches worked for me, but per tglx, additional (non- grasshopper level) hotplug-fu is required. -Mike