From mboxrd@z Thu Jan 1 00:00:00 1970 From: ethan zhao Subject: Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject Date: Fri, 30 Jan 2015 10:21:47 +0800 Message-ID: <54CAEABB.4060508@oracle.com> References: <54CADEAE.2090305@oracle.com> <54CAE80C.4060406@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from aserp1040.oracle.com ([141.146.126.69]:28169 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752880AbbA3CWD (ORCPT ); Thu, 29 Jan 2015 21:22:03 -0500 In-Reply-To: Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: Viresh Kumar Cc: Rafael Wysocki , santosh shilimkar , Linaro Kernel Mailman List , "linux-pm@vger.kernel.org" , "# 3.13.x" On 2015/1/30 10:13, Viresh Kumar wrote: > On 30 January 2015 at 07:40, ethan zhao wrote: >> For a PPC notification and xen-bus thread race, could you tell me a way how >> to reproduce it by trigger the PPC notification and xen-bus events manually >> ? >> You really want me write some code into a test kernel to flood the PPC and >> xen-bus at the same time ? if we could analysis code and get the issue >> clearly, we wouldn't wait the users to yell out. > I thought you already have a test where you are hitting the issue you originally > reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel > during boot.. As I know, PPC notification only happens when power capping needed, maybe the server over-hot, if the cooling condition recover, you couldn't reproduce it either !. > > My reasoning of why your observation doesn't fit here: > > Copying from your earlier mail.. > > Thread A: Workqueue: kacpi_notify > > acpi_processor_notify() > acpi_processor_ppc_has_changed() > cpufreq_update_policy() > cpufreq_cpu_get() > kobject_get() > > This tries to increment the count and the warning you have mentioned > happen because: > > WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); > > i.e. even after incrementing the count, it is < 2. Which I believe will be > 1. Which means that we have tried to do kobject_get() on a kobject > for which kobject_put() is already done. > > Thread B: xenbus_thread() > > xenbus_thread() > msg->u.watch.handle->callback() > handle_vcpu_hotplug_event() > vcpu_hotplug() > cpu_down() > __cpu_notify(CPU_DOWN_PREPARE..) > cpufreq_cpu_callback() > __cpufreq_remove_dev_prepare() > update_policy_cpu() > kobject_move() > > > Okay, where is the race or kobject_put() here ? We are just moving > the kobject and it has nothing to do with the refcount of kobject. > > Why do you see its a race ? I mean the policy->cpu has been changed, that CPU is about to be down, Thread A continue to get and update the policy for it blindly, that is what I Say 'race', not the refcount itself. Thanks, Ethan