From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758248AbbIDHUk (ORCPT ); Fri, 4 Sep 2015 03:20:40 -0400 Received: from mga11.intel.com ([192.55.52.93]:57711 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752801AbbIDHUh (ORCPT ); Fri, 4 Sep 2015 03:20:37 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,467,1437462000"; d="scan'208";a="638351890" Subject: Re: Possible deadlock related to CPU hotplug and kernfs To: "Rafael J. Wysocki" , Tejun Heo References: <55E54FE2.7030601@linux.intel.com> <20150902161445.GI22326@mtj.duckdns.org> <12669324.bPBpI0mOPP@vostro.rjw.lan> <20150903161904.GC10394@mtj.duckdns.org> Cc: "Rafael J. Wysocki" , linux hotplug mailing , Linux Kernel Mailing List , ACPI Devel Maling List From: Jiang Liu Organization: Intel Message-ID: <55E94642.6020707@linux.intel.com> Date: Fri, 4 Sep 2015 15:20:34 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/9/4 4:08, Rafael J. Wysocki wrote: > Hi Tejun, > > On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo wrote: >> Hello, Rafael. >> >> On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote: >>> So acpi_device_hotplug() calls lock_device_hotplug() which simply >>> acquires device_hotplug_lock. It is held throughout the entire >>> hot-add/hot-remove code path. >>> >>> Witing anything to /sys/devices/system/cpu/cpux/online goes through >>> online_store() in drivers/base/core.c and that does >>> lock_device_hotplug_sysfs() which then attempts to acquire >>> device_hotplug_lock using mutex_trylock(). And it only calls >>> either device_online() or device_offline() if it ends up with the >>> lock held. >>> >>> Quite frankly, I don't see how these particular two code paths can >>> deadlock in any way. >>> >>> So either a third code path is involved which is not executed >>> under device_hotplug_lock, or lockdep needs to be told to actually >>> take device_hotplug_lock into account in this case IMO. >> >> Hmm... all sysfs rw functions are protected from removal. ie. by >> default, removal of a sysfs file drains in-flight rw operations, so >> the hot plug path grabs a lock and then tries to remove a file and >> writing to the online file makes the file's write method to try to >> grab the same lock. It deadlocks if the hotunplug path already has >> the lock and trying to drain the online file for removal. > > My point is that you cannot get into that situation. If hotplug > already holds device_hotplug_lock, the write to "online" will end up > doing restart_syscall(). > > If the "online" code path is holding the lock, hotplug cannot acquire > it and cannot proceed. > > Am I missing anything? Hi Rafael, I think your are right. The lock_device_hotplug_sysfs() has already provided a solution for such a deadlock scenario. And there's another related code path at boot as: smp_init() ->cpu_up() ->cpu_hotplug_begin() So it seems to be a false alarm. Any way to teach lockdep about this to get rid of the false alarm? Thanks! Gerry > > Thanks, > Rafael >