From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 2ABE2C3DA4A
	for <linux-arm-kernel@archiver.kernel.org>; Thu, 22 Aug 2024 12:48:12 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help
	:List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding:
	Content-Type:MIME-Version:References:In-Reply-To:Subject:Cc:To:From:
	Message-ID:Date:Reply-To:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	bh=OeEY4e098mOUjxjAkPp6s9TbQAmOn+KZC5hdXB12dQk=; b=RAOmU7zIXY2+OD90rePcr8eN9Z
	pNYvevu9v2a25wPUnB4/22NCn3RfYsKs8BgATQ0NwBS5kce3Eq6W0auypgCOClviDUyh9XyuoaXrn
	J9fgvGldfprG4LG9dGGVDiHB+7ekWHrLAV9QGqQPwiib5KS1gBrAKnHc4Ab+heQn/3oE1BP27UPwL
	rRXaaZwOHCJOZCdkmhWgGG4E7Q2LLzCFpH7dNDOwgvsynY7tfpGPuc9mSIOJ9vATqR/kGg9Eidbw3
	vGkgi0FW4U6ApI7gLoQ8s+aIcMzAtLomCGJmjNOfA+csaS2UN4iPnfaLnKsbyin+32F6IlWQeSxCj
	XCDl91cA==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1sh7EI-0000000CoYn-07Dd;
	Thu, 22 Aug 2024 12:47:54 +0000
Received: from nyc.source.kernel.org ([2604:1380:45d1:ec00::3])
	by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux))
	id 1sh7DW-0000000CoLY-1tLN
	for linux-arm-kernel@lists.infradead.org;
	Thu, 22 Aug 2024 12:47:07 +0000
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
	by nyc.source.kernel.org (Postfix) with ESMTP id 7C9DAA40299;
	Thu, 22 Aug 2024 12:46:57 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C98E6C32782;
	Thu, 22 Aug 2024 12:47:03 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1724330823;
	bh=UL9KdxQwjMHFvFq1mPEw+yGPoW0CCK/syQv6TvsTFbo=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=WvyvQvDvHj1VB8ey/AVxBXTbGwFiJsoSe+5GQwtp+6RJp6V+iTTZBGcTDb8ZqOHwt
	 syHzi2tMvtx13/QZCaVXB+tvc5udI+IXnW5Wct6pgDzk+iK3ZKSj1fftMu4mGdAJ8r
	 +iFwetdkE55FK72qocSkNbk1WCCoSvMuFZMW/Et5eslQpR/q/OPxtzE/sXqbvfAyHi
	 GbCJZ1GugutbcIe9IwQI6TvBSpS1xBpvsIZSkxoURcwy2mYxxP8qvAxLVp6EMR3QaH
	 W2d1m7pvQ+tLSPXxPVwFFunyUYxBgbuQK9PBzE+TEG+GejFbj55ggUhQHWYiRV9nam
	 4B4eHSv2XnZmg==
Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org)
	by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <maz@kernel.org>)
	id 1sh7DR-005wRE-OP;
	Thu, 22 Aug 2024 13:47:01 +0100
Date: Thu, 22 Aug 2024 13:47:01 +0100
Message-ID: <864j7cybay.wl-maz@kernel.org>
From: Marc Zyngier <maz@kernel.org>
To: Kunkun Jiang <jiangkunkun@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Oliver Upton
	<oliver.upton@linux.dev>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose
	<suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	"open list:IRQ\
 SUBSYSTEM" <linux-kernel@vger.kernel.org>,
	"moderated list:ARM SMMU DRIVERS"
	<linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.linux.dev>,
	"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>,
	<nizhiqiang1@huawei.com>,
	"tangnianyao@huawei.com" <tangnianyao@huawei.com>,
	<wangzhou1@hisilicon.com>
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the same time will greatly increase the time consumption
In-Reply-To: <bd3c3103-a6d7-a91b-911d-5bc5f2382dae@huawei.com>
References: <a7fc58e4-64c2-77fc-c1dc-f5eb78dbbb01@huawei.com>
	<86msl6xhu2.wl-maz@kernel.org>
	<f1574274-efd8-eb56-436b-5a1dd7620f2c@huawei.com>
	<867cc9x8si.wl-maz@kernel.org>
	<bd3c3103-a6d7-a91b-911d-5bc5f2382dae@huawei.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4
 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: jiangkunkun@huawei.com, tglx@linutronix.de, oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, wanghaibin.wang@huawei.com, nizhiqiang1@huawei.com, tangnianyao@huawei.com, wangzhou1@hisilicon.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240822_054706_638655_AB3E3331 
X-CRM114-Status: GOOD (  33.47  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Thu, 22 Aug 2024 11:59:50 +0100,
Kunkun Jiang <jiangkunkun@huawei.com> wrote:
>=20
> Hi Marc,
>=20
> On 2024/8/22 16:26, Marc Zyngier wrote:
> >>>> According to analysis, this problem is due to the execution of vgic_=
v4_load.
> >>>> vcpu_load or kvm_sched_in
> >>>>   =C2=A0=C2=A0=C2=A0 kvm_arch_vcpu_load
> >>>>   =C2=A0=C2=A0=C2=A0 ...
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 vgic_v4_load
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 irq_set_a=
ffinity
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ...
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 irq_do_set_affinity
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 =C2=A0=C2=A0=C2=A0 raw_spin_lock(&tmp_mask_lock)
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 =C2=A0=C2=A0=C2=A0 chip->irq_set_affinity
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ...
> >>>>   =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0 its_vpe_set_affinity
> >>>>=20
> >>>> The tmp_mask_lock is the key. This is a global lock. I don't quite
> >>>> understand
> >>>> why tmp_mask_lock is needed here. I think there are two possible
> >>>> solutions here:
> >>>> 1. Remove this tmp_mask_lock
> >>>=20
> >>> Maybe you could have a look at 33de0aa4bae98 (and 11ea68f553e24)? It
> >>> would allow you to understand the nature of the problem.
> >>>=20
> >>> This can probably be replaced with a per-CPU cpumask, which would
> >>> avoid the locking, but potentially result in a larger memory usage.
> >>=20
> >> Thanks, I will try it.
> >=20
> > A simple alternative would be this:
> >=20
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index dd53298ef1a5..0d11b74af38c 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -224,15 +224,12 @@ int irq_do_set_affinity(struct irq_data *data, co=
nst struct cpumask *mask,
> >   	struct irq_desc *desc =3D irq_data_to_desc(data);
> >   	struct irq_chip *chip =3D irq_data_get_irq_chip(data);
> >   	const struct cpumask  *prog_mask;
> > +	struct cpumask tmp_mask =3D {};
> >   	int ret;
> >   -	static DEFINE_RAW_SPINLOCK(tmp_mask_lock);
> > -	static struct cpumask tmp_mask;
> > -
> >   	if (!chip || !chip->irq_set_affinity)
> >   		return -EINVAL;
> >   -	raw_spin_lock(&tmp_mask_lock);
> >   	/*
> >   	 * If this is a managed interrupt and housekeeping is enabled on
> >   	 * it check whether the requested affinity mask intersects with
> > @@ -280,8 +277,6 @@ int irq_do_set_affinity(struct irq_data *data, cons=
t struct cpumask *mask,
> >   	else
> >   		ret =3D -EINVAL;
> >   -	raw_spin_unlock(&tmp_mask_lock);
> > -
> >   	switch (ret) {
> >   	case IRQ_SET_MASK_OK:
> >   	case IRQ_SET_MASK_OK_DONE:
> >=20
> > but that will eat a significant portion of your stack if your kernel is
> > configured for a large number of CPUs.
> >=20
>=20
> Currently CONFIG_NR_CPUS=3D4096,each `struct cpumask` occupies 512 bytes.

This seems crazy. Why would you build a kernel with something *that*
big, specially considering that you have a lot less than 1k CPUs?

[...]

> > The removal of this global lock is the only option in my opinion.
> > Either the cpumask becomes a stack variable, or it becomes a static
> > per-CPU variable. Both have drawbacks, but they are not a bottleneck
> > anymore.
>=20
> I also prefer to remove the global lock. Which variable do you think is
> better?

Given the number of CPUs your system is configured for, there is no
good answer. An on-stack variable is dangerously large, and a per-CPU
cpumask results in 2MB being allocated, which I find insane.

You'll have to pick your own poison and convince Thomas of the
validity of your approach.

	M.

--=20
Without deviation from the norm, progress is not possible.