From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758331AbcATJXV (ORCPT ); Wed, 20 Jan 2016 04:23:21 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:50211 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751627AbcATJXL (ORCPT ); Wed, 20 Jan 2016 04:23:11 -0500 Date: Wed, 20 Jan 2016 10:22:44 +0100 From: Peter Zijlstra To: Huang Rui Cc: Borislav Petkov , Ingo Molnar , Andy Lutomirski , Thomas Gleixner , Robert Richter , Jacob Shin , John Stultz , =?utf-8?B?RnLvv71k77+9cmlj?= Weisbecker , linux-kernel@vger.kernel.org, spg_linux_kernel@amd.com, x86@kernel.org, Guenter Roeck , Andreas Herrmann , Suravee Suthikulpanit , Aravind Gopalakrishnan , Borislav Petkov , Fengguang Wu , Aaron Lu Subject: Re: [PATCH v2 5/5] perf/x86/amd/power: Add AMD accumulated power reporting mechanism Message-ID: <20160120092244.GH6357@twins.programming.kicks-ass.net> References: <1452739808-11871-1-git-send-email-ray.huang@amd.com> <1452739808-11871-6-git-send-email-ray.huang@amd.com> <20160119121250.GA6344@twins.programming.kicks-ass.net> <20160120044823.GA13477@hr-amur2> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160120044823.GA13477@hr-amur2> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 20, 2016 at 12:48:24PM +0800, Huang Rui wrote: > Hi Peter, > > Thanks so much to your comments. > > On Tue, Jan 19, 2016 at 01:12:50PM +0100, Peter Zijlstra wrote: > > On Thu, Jan 14, 2016 at 10:50:08AM +0800, Huang Rui wrote: > > > +struct power_pmu { > > > + spinlock_t lock; > > > > This should be a raw_spinlock_t, as it'll be nested under other > > raw_spinlock_t's. > > > > Do you mean the following spinlock operations are in hardware > interrupts disabled case, so I need use raw_spinlock_t instead, right? mainline -rt raw_spinlock_t spin-waits spin-waits spinlock_t spin-waits blocks (rt-mutex) struct mutex blocks blocks (rt-mutex) since these functions are themselves called with raw_spinlock_t held (perf_event_context::lock for example, but also rq::lock), any lock nested inside them must also be raw_spinlock_t. I have a lockdep patch somewhere that checks these ordering things; I should rebase and post that again. > Use raw_spin_lock_irqsave/raw_spin_unlock_irqrestore? pmu::{start,stop,add,del} will be called with IRQs already disabled. > > > +static int power_cpu_init(int cpu) > > > +{ > > > + int i, cu, ret = 0; > > > + cpumask_var_t mask, dummy_mask; > > > + > > > + cu = cpu / cores_per_cu; > > > + > > > + if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) > > > + return -ENOMEM; > > > + > > > + if (!zalloc_cpumask_var(&dummy_mask, GFP_KERNEL)) { > > > + ret = -ENOMEM; > > > + goto out; > > > + } > > > + > > > + for (i = 0; i < cores_per_cu; i++) > > > + cpumask_set_cpu(i, mask); > > > + > > > + cpumask_shift_left(mask, mask, cu * cores_per_cu); > > > + > > > + if (!cpumask_and(dummy_mask, mask, &cpu_mask)) > > > + cpumask_set_cpu(cpu, &cpu_mask); > > > + > > > + free_cpumask_var(dummy_mask); > > > +out: > > > + free_cpumask_var(mask); > > > + > > > + return ret; > > > +} > > > > > +static int power_cpu_notifier(struct notifier_block *self, > > > + unsigned long action, void *hcpu) > > > +{ > > > + unsigned int cpu = (long)hcpu; > > > + > > > + switch (action & ~CPU_TASKS_FROZEN) { > > > + case CPU_UP_PREPARE: > > > + if (power_cpu_prepare(cpu)) > > > + return NOTIFY_BAD; > > > + break; > > > + case CPU_STARTING: > > > + if (power_cpu_init(cpu)) > > > + return NOTIFY_BAD; > > > > this is called with IRQs disabled, which makes those GFP_KERNEL allocs > > above a pretty bad idea. > > > > Right, so should I use GFP_ATOMIC to allocate cpumask here? One should not use GFP_ATOMIC if at all possible, also no, -rt cannot do _any_ allocations from this site. > > Also, note that -rt cannot actually do _any_ allocations/frees from > > STARTING. > > > > Please move the allocs/frees to PREPARE/ONLINE. > > > > How about add two cpumask_var_t at power_pmu structure? Then allocate > the two cpumask_var_t (pmu->mask, pmu->dummy_mask), and they can be > also used on power_cpu_init. That would work.