From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16CE0C54E67 for ; Thu, 14 Mar 2024 13:58:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EFDE800AA; Thu, 14 Mar 2024 09:58:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A087800A9; Thu, 14 Mar 2024 09:58:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88F94800AA; Thu, 14 Mar 2024 09:58:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 79D2F800A9 for ; Thu, 14 Mar 2024 09:58:09 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3C301411C5 for ; Thu, 14 Mar 2024 13:58:09 +0000 (UTC) X-FDA: 81895798698.30.A58E332 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf30.hostedemail.com (Postfix) with ESMTP id AA26680002 for ; Thu, 14 Mar 2024 13:58:06 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf30.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710424687; a=rsa-sha256; cv=none; b=PJHG58TV8rF3yq+xUT3/cTZAaAk0bG23oSfu/gwdj9yLWobTTqsmR4Fh/7HPipPcPw0MoG jo7FJtkXb9Pr9HwW0RXq2Wylz2IC1RhnMe7DQ1a83wh/4peR0MNd0WbTX9vwXXCuxcGrtC KiP12L+Ev0qUPtHGeN1jZlOcJleMtmM= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf30.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710424687; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IDJQg9gXssCJoVAslcaU5eXmZfJVb+T/F/7dq9Ia9fQ=; b=lnISOCqou320R/09GR8fR3RRR9/mkfD8AaBeKKSIuwwfmsBFed2zM4WnIBX1LoGSaM22xu 7NWoj/nMFce9AC48Z5uahxFqBsoBHrfBbMB5N6WK3nMbReqL7jXLOeirstKTxMR+WQTAhH jscuCEnXc9o0B7cLSbDBlWDTbkmQRWs= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 84CF1CE1B79; Thu, 14 Mar 2024 13:58:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6083BC433F1; Thu, 14 Mar 2024 13:57:57 +0000 (UTC) Date: Thu, 14 Mar 2024 13:57:54 +0000 From: Catalin Marinas To: Marek Szyprowski Cc: "Russell King (Oracle)" , Sudeep Holla , "Christoph Lameter (Ampere)" , Mark Rutland , "linux-pm@vger.kernel.org" , "Rafael J. Wysocki" , Viresh Kumar , Will Deacon , Jonathan.Cameron@huawei.com, Matteo.Carlini@arm.com, Valentin.Schneider@arm.com, akpm@linux-foundation.org, anshuman.khandual@arm.com, Eric Mackay , dave.kleikamp@oracle.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, robin.murphy@arm.com, vanshikonda@os.amperecomputing.com, yang@os.amperecomputing.com, Nishanth Menon , Stephen Boyd Subject: Re: [PATCH v3] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 Message-ID: References: <9352f410-9dad-ac89-181a-b3cfc86176b8@linux.com> <432c1980-b00f-4b07-9e24-0bec52ccb5d6@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: AA26680002 X-Stat-Signature: ux9br48nyerq98m11s9k5j18wucoc7s5 X-Rspam-User: X-HE-Tag: 1710424686-427299 X-HE-Meta: U2FsdGVkX1++xHBTFWjq/qfcw9+YeewkPQIWcO9kH0NKTbRksgUZ9HfjI7G4Suwt03cq0MRlpOVR2FDX0dRucq9hVSsb33dfIUyG0pKOts8hLSXa8nngmcZaMMW+coZ/qjOTD5n2VoAwyylZkj6j7MAk31W6q062wtlZktoinjg8DekSaBGtNRLJMLQhuVIRuKm8SS25u4ydAl+ZHnWJgKWnx1F338qvHXJt3OJEgik5EW15176ialgikQIHIjvJcV2ouLD8dr4gs9/joFXtfx0bGbTjiM+t2fFWN08t8R6BWm5G5yEJz24KMWnIFBHkFawcgfHvzMt4EhM37Ec0eahz8iqP1EHS5fzZn2L3HIxvmRWaIC7PcYUeIpUZXcOV2T1/Ar/+Q15ZfTVytaVLTomh7yTh7D8U24MoQvQh/YAgScWU2bQFLfLSLjWrcpTcNHB6jEBHx/3F1nJJsIoaof1vi5FFAWQ2jPiENFBsoMK7ScdlPpjT/zBfusHhkBXT7LUaaRh2j0XtMg3piC2yylFkV3/UI3pF98EHFyDONcyRx4M00QIHby6ljWt5wr7JWanGzeBNJ7iYo0vjdEwT67Cv71dFbSicy9UCknTmPoBT/rI0aPU9xYre57WpEmb/FkhA1X+5b9BXUrcGmc7lfDw61e5lIH3qZyyVd8+uXEuHFZim0v64VbOf8/R+wrLKP+7dIaPdaojBcPUe8oCZLcnx+EdEpf66W6ZvqLwrKVzWIgb9twf3o7iHFrSUMI7PjqlattgtOfQYl3Cy4VHOBZqk0ZXFc0WluKL1m+n5HTrmkyrfUpFhTpAeI2vbr0hq7Zoa1LkknNPKx1a/9mx9lxubq42I6PnGTOCJa9d9VBz57kB34pcn6gO8vZKlmd/kPJMurvsSaxnGMglzUr789J+UPtSyE1jdzNeB8WUy+et1HzD+U9v2tyNVZkfOaEnUe0FhuKio8hk1d+2JstN JnBt54CG JFaD6e0I1U1IMdysrHfgXsvy4i5XdxA1p1OoHAZJ06A69M65WvRYvx01Yh8eo1ivp7fIr6nLVDUMDD4ODOEdzskZXR8wFVPJG3hlQKICzUANR85UGgMNGolCxrsOD4ha1SwilIgwAMfVHTBMqlHBlFglpLdTpqJ/k5kvMa1awcfzYYmSWLCZ50GLdwhS0YuAwIau/lwqGsNJdoKoYGE3Asb/sHQbZgBVufE6kemV7tigvnm+ZlpVLonOjedcfNkVUa00cP9dh047UAv9FeYQC1kTn2eN4xtK1npUwoalOkFEc60srDYL8bXgHOxL+BIwIgQcEmzPtyop5VhIQyeWqxk5mgHa8ob1bCtfFw9u/VIGZuUNyjy0VDeJDd8JZ2uYJOCCU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 14, 2024 at 01:28:40PM +0100, Marek Szyprowski wrote: > On 14.03.2024 09:39, Catalin Marinas wrote: > > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote: > >> So, I wonder whether what you're seeing is a latent bug which is > >> being tickled by the presence of the CPU masks being off-stack > >> changing the kernel timing. > >> > >> I would suggest the printk debug approach may help here to see when > >> the OPPs are begun to be parsed, when they're created etc and their > >> timing relationship to being used. Given the suspicion, it's possible > >> that the mere addition of printk() may "fix" the problem, which again > >> would be another semi-useful data point. > > It might be an init order problem. Passing "initcall_debug" on the > > cmdline might help a bit. > > > > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON > > block, to print opp_table->opp_list.next to get an idea whether it looks > > like a valid pointer or memory corruption. > > I've finally found some time to do the step-by-step printk-based > debugging of this issue and finally found what's broken! > > Here is the fix: > > diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c > index 8bd6e5e8f121..2d83bbc65dd0 100644 > --- a/drivers/cpufreq/cpufreq-dt.c > +++ b/drivers/cpufreq/cpufreq-dt.c > @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, > int cpu) >         if (!priv) >                 return -ENOMEM; > > -       if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL)) > +       if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL)) >                 return -ENOMEM; > >         cpumask_set_cpu(cpu, priv->cpus); > > > It is really surprising that this didn't blow up for anyone else so > far... This means that the $subject patch is fine. > > I will send a proper patch fixing this issue in a few minutes. Nice. Many thanks for tracking this down. I'll revert the revert of the CPUMASK_OFFSTACK in the second part of the merging window (I already sent the pull request). -- Catalin