From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754558AbcBPItl (ORCPT <rfc822;w@1wt.eu>);
	Tue, 16 Feb 2016 03:49:41 -0500
Received: from mga11.intel.com ([192.55.52.93]:56900 "EHLO mga11.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754291AbcBPItk (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 16 Feb 2016 03:49:40 -0500
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.22,454,1449561600"; 
   d="scan'208";a="915834851"
Message-ID: <1455612576.4977.11.camel@linux.intel.com>
Subject: Re: [PATCH] [RFC] kernel/cpu: Use lockref for online CPU reference
 counting
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Intel graphics driver community testing & development 
	<intel-gfx@lists.freedesktop.org>,
        Linux kernel development <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@kernel.org>,
        David Hildenbrand <dahi@linux.vnet.ibm.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
        Chris Wilson <chris@chris-wilson.co.uk>,
        Daniel Vetter <daniel@ffwll.ch>
Date: Tue, 16 Feb 2016 10:49:36 +0200
In-Reply-To: <20160215170618.GL6375@twins.programming.kicks-ass.net>
References: <1455539803-13913-1-git-send-email-joonas.lahtinen@linux.intel.com>
	 <20160215141755.GG6357@twins.programming.kicks-ass.net>
	 <20160215170618.GL6375@twins.programming.kicks-ass.net>
Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.18.4 (3.18.4-1.fc23) 
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi,

On ma, 2016-02-15 at 18:06 +0100, Peter Zijlstra wrote:
> On Mon, Feb 15, 2016 at 03:17:55PM +0100, Peter Zijlstra wrote:
> > On Mon, Feb 15, 2016 at 02:36:43PM +0200, Joonas Lahtinen wrote:
> > > Instead of implementing a custom locked reference counting, use lockref.
> > > 
> > > Current implementation leads to a deadlock splat on Intel SKL platforms
> > > when lockdep debugging is enabled.
> > > 
> > > This is due to few of CPUfreq drivers (including Intel P-state) having this;
> > > policy->rwsem is locked during driver initialization and the functions called
> > > during init that actually apply CPU limits use get_online_cpus (because they
> > > have other calling paths too), which will briefly lock cpu_hotplug.lock to
> > > increase cpu_hotplug.refcount.
> > > 
> > > On later calling path, when doing a suspend, when cpu_hotplug_begin() is called
> > > in disable_nonboot_cpus(), callbacks to CPUfreq functions get called after,
> > > which will lock policy->rwsem and cpu_hotplug.lock is already held by
> > > cpu_hotplug_begin() and we do have a potential deadlock scenario reported by
> > > our CI system (though it is a very unlikely one). See the Bugzilla link for more
> > > details.
> > 
> > I've been meaning to change the thing into a percpu-rwsem, I just
> > haven't had time to look into the lockdep splat that generated.
> 
> 
> The below has plenty lockdep issues because percpu-rwsem is
> reader-writer fair (like the regular rwsem), so it does throw up a fair
> number of very icky issues.
> 

I originally thought of implementing this more similar to what you
specify, but then I came across a discussion in the mailing list where
it was NAKed adding more members to task_struct;

http://comments.gmane.org/gmane.linux.kernel/970273

Adding proper recursion (the way my initial implementation was going)
got ugly without modifying task_struct because get_online_cpus() is a
speed critical code path.

So I'm all for fixing the current code in a different way if that will
then be merged.

Regards, Joonas

> If at all possible, I'd really rather fix those and have a 'saner'
> hotplug lock, rather than muddle on with open-coded horror lock we have
> now.
> 
> 

<SNIP>

-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation