From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7F9A2C32793 for ; Wed, 18 Jan 2023 22:38:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=i7k8E8IlL+nH7eW1MU6lWN1Y7nWrvCpSsTUUIr9FaIM=; b=URs8mG/bZ9IZgg kJZfM9BZDK8PR69a/u/6LvXkqPVFiNcvFAGSJ+B0bM78rv4inQgeB2xqin9RT25hv10r+Ja9V2tMU inD3TPOvsEGLQpZvw43zX2BLNtBgSr6v3j654O/IFZm7b0hRBbXnV/oYYjVahH9n2wH57evIBuT71 WoTRxMWFDC6zKcsskX68aCxfczMB/XsMYyDudh6Khw+qeHFv9MsJO4PGndUkmDcRsVFoDcKEyP7NH P68HZr5Yu6n5/j5eyzRf0tmaAADUoHQ2dNSBzdnxPy5DU8r72dA79XYOdcj5/f7cM+CdGR41LSgHJ MP6WgmDNcCL30I5Yrk9A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pIH3W-002s1h-2l; Wed, 18 Jan 2023 22:37:18 +0000 Received: from mail-qt1-x831.google.com ([2607:f8b0:4864:20::831]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pIH3R-002s0k-O9 for linux-arm-kernel@lists.infradead.org; Wed, 18 Jan 2023 22:37:15 +0000 Received: by mail-qt1-x831.google.com with SMTP id x5so174028qti.3 for ; Wed, 18 Jan 2023 14:37:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=x8nilPDHPTNTDsdy0as1344ajayaNhEwa9q313GGoo8=; b=Bho8E1plr/ocpXHVTvujLXsfY4hrvu6oX58J6SQ+NrBLrIN6gRFDOMfV8/FcLMPxal 4M8s2QrijLuRykES/Ohudiv6Pv5ZnfkucJUjFos4kBkGtZYvzeTP/lL0oc1+7gyY4IT3 +bUKBqM461ySYDVIVMf+sPSyDJliXYkZwmge4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=x8nilPDHPTNTDsdy0as1344ajayaNhEwa9q313GGoo8=; b=ZO6q2bVolobJEAbF4W6E9Yr0P92fv4TgvfVmp60vG3egnoIfcqjtjLjh9kra+lPoT4 G7e7Cbp7+utymkJTolydwrLOYCIMIK9OFgxx6FULOysivoPz8atJtLi0Buvpn6Lb/NPs W1LBOThc0R143Qd39PcHgxHzApbf2O7My1/oH6K01Oan1omvx25ClEIunHwYiM/qBfXG xD4ykYQ3oye3q1OZ+irbNxA5PgETkUDHir6NmkTt8x4K2ai4jvtzwSw9iGRogFk1qRUE IIV2gCNsZ5FsACgVDeklF8ovsahF5pf74UX0e1+/HrBgzGHzvx1dawb6FCWyfrdPhs8r J8KA== X-Gm-Message-State: AFqh2kqoyrSBMmv54V5CmzJZVhr2mNvxv0NBc2gBbUKfAV9q0/sPoR4a 2F8hW+nsJkJ+kVCdUth1bshf+w== X-Google-Smtp-Source: AMrXdXvTTJkmISBqPJxvnZwaAWKCdAocfGG7LX1HBRXrXsw3kIdSOqt9XiY8OCUZRFRbe0r4FcCMAQ== X-Received: by 2002:ac8:6891:0:b0:3b0:8493:233a with SMTP id m17-20020ac86891000000b003b08493233amr11310881qtq.10.1674081430277; Wed, 18 Jan 2023 14:37:10 -0800 (PST) Received: from localhost (129.239.188.35.bc.googleusercontent.com. [35.188.239.129]) by smtp.gmail.com with ESMTPSA id e13-20020ac84b4d000000b003b62cd6e60esm5908420qts.43.2023.01.18.14.37.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 14:37:09 -0800 (PST) Date: Wed, 18 Jan 2023 22:37:08 +0000 From: Joel Fernandes To: "Paul E. McKenney" Cc: Zhouyi Zhou , "moderated list:ARM/STM32 ARCHITECTURE" , Will Deacon , Marc Zyngier , Mark Rutland , Catalin Marinas , rcu , Frederic Weisbecker Subject: Re: arm64 torture test hotplug failures (offlining causes -EBUSY) Message-ID: References: <20230117043011.GD2948950@paulmck-ThinkPad-P17-Gen-1> <24953EEA-5B3E-4046-B106-7A7FBE8B8995@joelfernandes.org> <20230117045456.GG2948950@paulmck-ThinkPad-P17-Gen-1> <20230117204231.GP2948950@paulmck-ThinkPad-P17-Gen-1> <20230118040058.GV2948950@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230118040058.GV2948950@paulmck-ThinkPad-P17-Gen-1> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230118_143713_808253_196642EF X-CRM114-Status: GOOD ( 36.81 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Jan 17, 2023 at 08:00:58PM -0800, Paul E. McKenney wrote: [...] > > > > > Is there a plan to make CPU hotplug failures more frequent? > > > > > > > > I am not aware of such a plan but I was going by "There are quite some > > > > reasons why a CPU-hotplug or a hot-unplug operation can fail, which is > > > > not a fatal problem, really." in [1]. > > > > > > > > What about an rcutorture to skip hotplug for a certain cpu id, > > > > rcutorture.skip_hotplug_cpus="0". Can be a last resort. But we/I > > > > should debug this issue more before getting to that. > > > > > > Yes, in fact there already are some checks along those lines, for example, > > > the torture_offline() function's check of cpu_is_hotpluggable(). So for > > > example, as I understand it, a CONFIG_NO_HZ_FULL=y system should mark > > > the housekeeping CPU as !cpu_is_hotpluggable(). > > > > I don't think CONFIG_NO_HZ_FULL does any such marking (at least I am > > not seeing it). Even on x86, if you enable > > CONFIG_BOOTPARAM_HOTPLUG_CPU0=y , and CONFIG_NO_HZ_FULL=y, and run > > rcutorture with boot args: > > > > nohz_full=0-3 rcutorture.onoff_interval=100 rcutorture.onoff_holdoff=2 > > rcutorture.shutdown_secs=30 > > > > You will see this in the kernel logs: > > [ 2.816022] rcu-torture:torture_onoff task: offline 0 failed: errno -16 > > [ 2.975913] rcu-torture:torture_onoff task: offline 0 failed: errno -16 > > > > So RCU torture test clearly thought the CPUs were hot-pluggable, when > > they was chance for them to return -EBUSY (due to housekeeping and > > what not). So this issue seems to be architecture independent, in that > > sense. > > > > So the 2 ways forward I see are: > > - Make the torture test aware of which CPUs are 'house keeping' > > - Make it possible to turn off CPU0 hotplugging on ARM64 by default > > (via CONFIG or boot option). > > > > Another option could be, forgive -EBUSY on CPU0 for > > CONFIG_NO_HZ_FULL=y. Is it possible to assign a non-0 CPU id as a > > housekeeping CPU? > > I would be happier to forgive failure to offline housekeeping CPUs than > blanket forgiveness of CPU 0. Especially given that I recently got > burned by a non-zero boot cpu. ;-) > > But wouldn't it be even better for cpu_is_hotpluggable() to know the > NO_HZ_FULL rules of the road? That's a great idea. I found a way to do that without having to do the EXPORT_SYMBOL (like in Zhouyi's patch). Would the following be acceptable (only build-tested)? I can run more tests and submit a patch: diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 55405ebf23ab..f73bc520b70e 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -487,7 +487,8 @@ static const struct attribute_group *cpu_root_attr_groups[] = { bool cpu_is_hotpluggable(unsigned int cpu) { struct device *dev = get_cpu_device(cpu); - return dev && container_of(dev, struct cpu, dev)->hotpluggable; + return dev && container_of(dev, struct cpu, dev)->hotpluggable + && !tick_nohz_cpu_hotpluggable(cpu); } EXPORT_SYMBOL_GPL(cpu_is_hotpluggable); diff --git a/include/linux/tick.h b/include/linux/tick.h index bfd571f18cfd..9459fef5b857 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -216,6 +216,7 @@ extern void tick_nohz_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit); extern void tick_nohz_dep_clear_signal(struct signal_struct *signal, enum tick_dep_bits bit); +extern bool tick_nohz_cpu_hotpluggable(unsigned int cpu); /* * The below are tick_nohz_[set,clear]_dep() wrappers that optimize off-cases @@ -280,6 +281,7 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { } static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { } static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { } +static inline bool tick_nohz_cpu_hotpluggable(unsigned int cpu) { return true; } static inline void tick_dep_set(enum tick_dep_bits bit) { } static inline void tick_dep_clear(enum tick_dep_bits bit) { } diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 9c6f661fb436..d1cc7525240e 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -522,6 +522,11 @@ static int tick_nohz_cpu_down(unsigned int cpu) return 0; } +bool tick_nohz_cpu_hotpluggable(unsigned int cpu) +{ + return tick_nohz_cpu_down(cpu) == 0; +} + void __init tick_nohz_init(void) { int cpu, ret; -- 2.39.0.246.g2a6d74b583-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel