From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id ED35CC52D7C
	for <linux-arm-kernel@archiver.kernel.org>; Fri, 23 Aug 2024 08:57:25 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help
	:List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version:
	References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	bh=dgIyoT7H07Kex87PeUYqhBpBA/2NZhckLJxkXhGnQbc=; b=hrGnN3PfZ9XWLEIWnuKpMy8In/
	+8LPYOmDMNKG1NgjIAuYo0r6bbbjJ8UkyUgPfX1dhydOGl17u8fwI/JJj/b6ktmJVCT+NwemiBzqN
	c+rNbk4lVjcLXZVV9OEYKXTSvl2JUXnPyDoPyA2VsjPgDUhhdZonh2spevhTcKR7mNBkdaGMQMNy2
	N9ZRG7psBzPXCcmK8rKEWC7uaD5RxkhsrVUE46M2lh/+PNJHv3myWPjYtZyNQOA61qA9SnPD29INh
	hYyz+mxn9of0s0LIkp+0PXIf1cY8nnIETUKvsN7d3lrRMyxgVhaSuQbMuldknY19nlLtboewu7Fu8
	9AhSWQOA==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1shQ6a-0000000Fylw-1tvW;
	Fri, 23 Aug 2024 08:57:12 +0000
Received: from nyc.source.kernel.org ([147.75.193.91])
	by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux))
	id 1shPz7-0000000FxIW-2Yst
	for linux-arm-kernel@lists.infradead.org;
	Fri, 23 Aug 2024 08:49:31 +0000
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
	by nyc.source.kernel.org (Postfix) with ESMTP id 01D48A4038F;
	Fri, 23 Aug 2024 08:49:22 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BCCFC32786;
	Fri, 23 Aug 2024 08:49:28 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1724402968;
	bh=zCE2kZSuqOPFejVfzIWBbTQf/qt0/VX9kCnDZTFKx2Y=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=i8xYP4k3ewj6fE4mQ+/VBWhzCjnSPwmNYTT4N81gCesGnl05TVs7xarsdMDTEhsua
	 WrQETnlwpTX8Uzjzo9d0Ez1otntpVCMyp10pxmCDz4RwHLpMp5oSqxjr4U/1V1kyYD
	 PNOu5j9E56plxGGMvIqPZ5AT9+2jhN+mUiaFGMSYbPWWWBfp2FPdI6IMVwacvfGVfl
	 FjGp41usHf/BBtikuakb0zBMszV/+TfNy9PuUSyBqwzK/oP+5fS2WkHf80VV4pLorU
	 ZxcCsUdHX2A0fMZQ1+rdPmLoFCQ8cAddOVxQUt5Jp9+6YJBwbe81vbt7V0FYLCnk9f
	 QDtEvaVyThBOg==
Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org)
	by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <maz@kernel.org>)
	id 1shPz3-006Bvc-Vn;
	Fri, 23 Aug 2024 09:49:26 +0100
Date: Fri, 23 Aug 2024 09:49:25 +0100
Message-ID: <86zfp3wrmy.wl-maz@kernel.org>
From: Marc Zyngier <maz@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>, Kunkun Jiang <jiangkunkun@huawei.com>
Cc: Oliver Upton <oliver.upton@linux.dev>,
	James Morse
 <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui
 Yu <yuzenghui@huawei.com>,
	"open list:IRQ\
 SUBSYSTEM" <linux-kernel@vger.kernel.org>,
	"moderated list:ARM SMMU\
 DRIVERS" <linux-arm-kernel@lists.infradead.org>,
	kvmarm@lists.linux.dev,
	"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>,
	nizhiqiang1@huawei.com,
	"tangnianyao@huawei.com" <tangnianyao@huawei.com>,
	wangzhou1@hisilicon.com
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the same time will greatly increase the time consumption
In-Reply-To: <87o75kgspg.ffs@tglx>
References: <a7fc58e4-64c2-77fc-c1dc-f5eb78dbbb01@huawei.com>
	<86msl6xhu2.wl-maz@kernel.org>
	<f1574274-efd8-eb56-436b-5a1dd7620f2c@huawei.com>
	<867cc9x8si.wl-maz@kernel.org>
	<bd3c3103-a6d7-a91b-911d-5bc5f2382dae@huawei.com>
	<864j7cybay.wl-maz@kernel.org>
	<87o75kgspg.ffs@tglx>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4
 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: tglx@linutronix.de, jiangkunkun@huawei.com, oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, wanghaibin.wang@huawei.com, nizhiqiang1@huawei.com, tangnianyao@huawei.com, wangzhou1@hisilicon.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240823_014929_812919_0FB8C9A1 
X-CRM114-Status: GOOD (  38.09  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Thu, 22 Aug 2024 22:20:43 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Thu, Aug 22 2024 at 13:47, Marc Zyngier wrote:
> > On Thu, 22 Aug 2024 11:59:50 +0100,
> > Kunkun Jiang <jiangkunkun@huawei.com> wrote:
> >> > but that will eat a significant portion of your stack if your kernel is
> >> > configured for a large number of CPUs.
> >> > 
> >> 
> >> Currently CONFIG_NR_CPUS=4096,each `struct cpumask` occupies 512 bytes.
> >
> > This seems crazy. Why would you build a kernel with something *that*
> > big, specially considering that you have a lot less than 1k CPUs?
> 
> That's why CONFIG_CPUMASK_OFFSTACK exists, but that does not help in
> that context. :)
>
> >> > The removal of this global lock is the only option in my opinion.
> >> > Either the cpumask becomes a stack variable, or it becomes a static
> >> > per-CPU variable. Both have drawbacks, but they are not a bottleneck
> >> > anymore.
> >> 
> >> I also prefer to remove the global lock. Which variable do you think is
> >> better?
> >
> > Given the number of CPUs your system is configured for, there is no
> > good answer. An on-stack variable is dangerously large, and a per-CPU
> > cpumask results in 2MB being allocated, which I find insane.
> 
> Only if there are actually 4096 CPUs enumerated. The per CPU magic is
> smart enough to limit the damage to the actual number of possible CPUs
> which are enumerated at boot time. It still will over-allocate due to
> NR_CPUS being insanely large but on a 4 CPU machine this boils down to
> 2k of memory waste unless Aaarg64 is stupid enough to allocate for
> NR_CPUS instead of num_possible_cpus()...

No difference between arm64 and xyz85.999 here.

> 
> That said, on a real 4k CPU system 2M of memory should be the least of
> your worries.

Don't underestimate the general level of insanity!

> 
> > You'll have to pick your own poison and convince Thomas of the
> > validity of your approach.
> 
> As this is an operation which is really not suitable for on demand
> or large stack allocations the per CPU approach makes sense.

Right, so let's shoot for that. Kunkun, can you please give the
following hack a go with your workload?

Thanks,

	M.

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index dd53298ef1a5..b6aa259ac749 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -224,15 +224,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	struct irq_desc *desc = irq_data_to_desc(data);
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
 	const struct cpumask  *prog_mask;
+	struct cpumask *tmp_mask;
 	int ret;
 
-	static DEFINE_RAW_SPINLOCK(tmp_mask_lock);
-	static struct cpumask tmp_mask;
+	static DEFINE_PER_CPU(struct cpumask, __tmp_mask);
 
 	if (!chip || !chip->irq_set_affinity)
 		return -EINVAL;
 
-	raw_spin_lock(&tmp_mask_lock);
+	tmp_mask = this_cpu_ptr(&__tmp_mask);
+
 	/*
 	 * If this is a managed interrupt and housekeeping is enabled on
 	 * it check whether the requested affinity mask intersects with
@@ -258,11 +259,11 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 
 		hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
 
-		cpumask_and(&tmp_mask, mask, hk_mask);
-		if (!cpumask_intersects(&tmp_mask, cpu_online_mask))
+		cpumask_and(tmp_mask, mask, hk_mask);
+		if (!cpumask_intersects(tmp_mask, cpu_online_mask))
 			prog_mask = mask;
 		else
-			prog_mask = &tmp_mask;
+			prog_mask = tmp_mask;
 	} else {
 		prog_mask = mask;
 	}
@@ -272,16 +273,14 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	 * unless we are being asked to force the affinity (in which
 	 * case we do as we are told).
 	 */
-	cpumask_and(&tmp_mask, prog_mask, cpu_online_mask);
-	if (!force && !cpumask_empty(&tmp_mask))
-		ret = chip->irq_set_affinity(data, &tmp_mask, force);
+	cpumask_and(tmp_mask, prog_mask, cpu_online_mask);
+	if (!force && !cpumask_empty(tmp_mask))
+		ret = chip->irq_set_affinity(data, tmp_mask, force);
 	else if (force)
 		ret = chip->irq_set_affinity(data, mask, force);
 	else
 		ret = -EINVAL;
 
-	raw_spin_unlock(&tmp_mask_lock);
-
 	switch (ret) {
 	case IRQ_SET_MASK_OK:
 	case IRQ_SET_MASK_OK_DONE:

-- 
Without deviation from the norm, progress is not possible.