From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 188E716DEDF for ; Fri, 24 Jan 2025 15:55:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737734154; cv=none; b=ufvUn/ZVriIEAtC5wx3AGUKq5LkimWbWNiTRQ83CXWQX5tzFFi7IouFmWqPahBEcmJEguKCb3weIKMohb75ZWrQoKRhPlNzLHNmwQgc5fx2IcIQN0SQrkbYsukOobhELp3irnw07og+7vEShaH+NZSu6dpHPvezm8iFqMAEIXaM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737734154; c=relaxed/simple; bh=pAxM33PdAxs66I5JgmnltDwx4ArxBGf/W2NPO7gWXOs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=cST+6ugomkHCgCLCAwNu3x/vVXdKOGhhwRpdopkLwF8AIH0ggIwBuwNNykxxwx05xPrtIOxTx1sxY0YOjeTMeuDTtqi8PsKceh24K0X20kwZlQBlOvScmmgNPhPX4cToSeD6k3wNVlo20Veo1y04WNYPAHHgp0GzDRuM1uEVbuI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=I0QnRfxq; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="I0QnRfxq" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50O89Qvj008243; Fri, 24 Jan 2025 15:55:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=ZdVAvy Tf8P1+X1WIRotgm+qO6BxbNYxWkPX6PBmD2GQ=; b=I0QnRfxqPPSOwiFo6OWOcZ oiv7+I+t8abuxh4651QIZAOUm0QHQuYC0SqD6qqLHbsSNEnVRVlxe3lvuu6/Kqak brm22XO67SF0sPLoZoxtf9CXT3WrP5BL6T3qC7DSfOzhWnwmZW3WtL+ai5m0UGEg Y5fG9Nsr4LXk61X6J1EuqhcV8Hatz8z6rD6JZusQBxJ70FVXjfSNUElmOJkLunPf aBEY67pxS97HJkrVjPhN2UUiwgYUK69pwKqs03P3GAjCPTDIFO+EjRIUB+SLvH6q Jm9iqCvuvGTv51zX6DivdXnLTOy+Gpnr/xpcQSrpllcmkxj7UmEq9LgyRgF/RPTw == Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 44c1pyvcv9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Jan 2025 15:55:34 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 50OC0apr024307; Fri, 24 Jan 2025 15:55:33 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 448q0ym8mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Jan 2025 15:55:33 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 50OFtVKu11469300 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 15:55:31 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3BEB320085; Fri, 24 Jan 2025 15:34:01 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0356920087; Fri, 24 Jan 2025 15:34:00 +0000 (GMT) Received: from linux.ibm.com (unknown [9.109.245.203]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTPS; Fri, 24 Jan 2025 15:33:59 +0000 (GMT) Date: Fri, 24 Jan 2025 21:03:55 +0530 From: Vishal Chourasia To: Koichiro Den Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, peterz@infradead.org Subject: Re: [PATCH] cpu/hotplug: disallow writing any state in atomic AP section to sysfs target Message-ID: References: <20241220141538.4018232-1-koichiro.den@canonical.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20241220141538.4018232-1-koichiro.den@canonical.com> X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: S2bamSP27e0VbcVv8ezjnxoKZpZ9Mq4j X-Proofpoint-GUID: S2bamSP27e0VbcVv8ezjnxoKZpZ9Mq4j X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_06,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 phishscore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 spamscore=0 mlxscore=0 priorityscore=1501 adultscore=0 impostorscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2501240109 On Fri, Dec 20, 2024 at 11:15:38PM +0900, Koichiro Den wrote: > When CONFIG_CPU_HOTPLUG_STATE_CONTROL=y, writing a state within the > atomic AP section to 'hotplug/target' file for a fully online cpu can > cause a kernel crash [1]. This occurs because take_cpu_down() disables > the CPU, but the state machine does not reach CPUHP_AP_OFFLINE. As a > result, when cpu stopper thread finishes its work and idle task takes > over, cpuhp_report_idle_dead() crashes on 'BUG_ON(st->state != > CPUHP_AP_OFFLINE)'. > > In the opposite direction, start_secondary() assumes all startup > callbacks have been invoked and transitions to CPUHP_AP_ONLINE_IDLE, > regardless of the written target. This can result in some callbacks in > the section being silently skipped. > > To address the issue, disable writing any state within the atomic AP > states to sysfs target. Additionally, set cant_stop to true for both > CPUHP_BP_KICK_AP (when CONFIG_HOTPLUG_SPLIT_STARTUP=y) and > CPUHP_AP_ONLINE since we do not automatically make the state machine > proceed to the other end of the atomic states. > > [1]: > > # grep 'tick:dying' /sys/devices/system/cpu/hotplug/states > 143: tick:dying > # cat /sys/devices/system/cpu/cpu7/hotplug/target > 238 # fully online > # echo 143 > /sys/devices/system/cpu/cpu7/hotplug/target After applying the patch, I don't see any system crash. Thank you for the patch! Tested-By: Vishal Chourasia > > [ 145.091832] ------------[ cut here ]------------ > [ 145.092928] kernel BUG at kernel/cpu.c:1365! > [ 145.093960] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > --(snip)-- > > Signed-off-by: Koichiro Den > --- > Previous attempt: > https://lore.kernel.org/all/20241207144721.2828390-1-koichiro.den@canonical.com/ > --- > kernel/cpu.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 34f1a09349fc..c877443f5888 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2127,6 +2127,7 @@ static struct cpuhp_step cpuhp_hp_states[] = { > [CPUHP_BP_KICK_AP] = { > .name = "cpu:kick_ap", > .startup.single = cpuhp_kick_ap_alive, > + .cant_stop = true, > }, > > /* > @@ -2192,6 +2193,7 @@ static struct cpuhp_step cpuhp_hp_states[] = { > * state for synchronsization */ > [CPUHP_AP_ONLINE] = { > .name = "ap:online", > + .cant_stop = true, > }, > /* > * Handled on control processor until the plugged processor manages > @@ -2759,7 +2761,8 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr, > return ret; > > #ifdef CONFIG_CPU_HOTPLUG_STATE_CONTROL > - if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE) > + if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE || > + cpuhp_is_atomic_state(target)) > return -EINVAL; > #else > if (target != CPUHP_OFFLINE && target != CPUHP_ONLINE) pretty straight forward... While I agree that this is one way to patch things up and, at the very least, prevent a kernel crash, I wonder what infrastructure is needed to make CONFIG_CPU_HOTPLUG_STATE_CONTROL=y work for the prepare phase. IIUC: 1. The control CPU invokes `takedown_cpu()`, which sets the new target state as `CPUHP_AP_OFFLINE`. 2. `take_cpu_down()` is executed on the AP in the stopper thread context and starts invoking the prepare phase callbacks. 3. Suppose the user wants to stop at some state in the prepare phase; after executing that state's teardown callback, the stopper thread returns control back to `takedown_cpu()` on the control CPU. 4. Here’s the part I am not sure about: On CPU 6, after the stopper thread finishes, a context switch occurs, the idle thread starts executing, and it encounters `BUG_ON(st->state != CPUHP_AP_OFFLINE)` in `cpuhp_report_idle_dead()` → `do_idle()`. So, the problem arises from the stopper thread going away in the middle (depending on which step in the prepare phase was set as the target). The idle thread starts next and expects that, if the AP is marked as offline, its current cpuhp_state should be CPUHP_AP_OFFLINE. However, it won't be, causing a kernel crash in cpuhp_report_idle_dead(). vishalc > -- > 2.43.0 >