From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754835AbcBPKvM (ORCPT ); Tue, 16 Feb 2016 05:51:12 -0500 Received: from mga03.intel.com ([134.134.136.65]:40402 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750927AbcBPKvK (ORCPT ); Tue, 16 Feb 2016 05:51:10 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,455,1449561600"; d="scan'208";a="885913966" Message-ID: <1455619863.4977.29.camel@linux.intel.com> Subject: Re: [PATCH] [RFC] kernel/cpu: Use lockref for online CPU reference counting From: Joonas Lahtinen To: Peter Zijlstra Cc: Intel graphics driver community testing & development , Linux kernel development , Ingo Molnar , David Hildenbrand , "Paul E. McKenney" , "Gautham R. Shenoy" , Chris Wilson , Daniel Vetter Date: Tue, 16 Feb 2016 12:51:03 +0200 In-Reply-To: <20160216091440.GT6357@twins.programming.kicks-ass.net> References: <1455539803-13913-1-git-send-email-joonas.lahtinen@linux.intel.com> <20160215141755.GG6357@twins.programming.kicks-ass.net> <20160215170618.GL6375@twins.programming.kicks-ass.net> <1455612576.4977.11.camel@linux.intel.com> <20160216091440.GT6357@twins.programming.kicks-ass.net> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.4 (3.18.4-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On ti, 2016-02-16 at 10:14 +0100, Peter Zijlstra wrote: > On Tue, Feb 16, 2016 at 10:49:36AM +0200, Joonas Lahtinen wrote: > > I originally thought of implementing this more similar to what you > > specify, but then I came across a discussion in the mailing list where > > it was NAKed adding more members to task_struct; > > > > http://comments.gmane.org/gmane.linux.kernel/970273 > > > > Adding proper recursion (the way my initial implementation was going) > > got ugly without modifying task_struct becauseĀ get_online_cpus() is a > > speed critical code path. > > Yeah, just don't let Linus hear you say that. get_online_cpus() is _not_ > considered performance critical. Oh well, at least changes to it added quite noticeably to the bootup time of a system. > > > So I'm all for fixing the current code in a different way if that will > > then be merged. > > So I'm not sure why you're poking at this horror show to begin with. > ISTR you mentioning a lockdep splat for SKL, but failed to provide > detail. > Quoting my original patch; "See the Bugzilla link for more details. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93294" The improvement my patch implements is to use lockref for locked reference counting (hotplug code previously rolled its own mutex + atomic combo), which gets rid of the deadlock scenario described and linked in the initial patch. Trace for the scenario; https://bugs.freedesktop.org/attachment.cgi?id=121490 I think using lockref makes it substantially less special, lockref code being a lot more battle-tested in the FS code than the previous cpu_hotplug.lock mess. > Making the hotplug lock _more_ special to fix that is just wrong. Fix > the retarded locking that lead to it. > I do agree that it's still not pretty, but now it does correctly what the previous code was trying to do with custom mutex + atomic. I'm all for fixing the code further, but prior to proceeding there needs to be some sort of an agreement on either making get_online_cpus() slower (which does not seem like a good idea) or adding more members to task_struct. Regards, Joonas > -- Joonas Lahtinen Open Source Technology Center Intel Corporation