From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A91236B05E
	for <linux-kernel@vger.kernel.org>; Fri, 26 Jun 2026 19:07:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1782500826; cv=none; b=bAz1TTeaT7eHAgQxH/JxATRw/Y7JX0lJ/n7Nun5SysntMOKvVw59oLHiE/J3MH2UQp6GkDjzCBnuEE5irbJCXVLVxsMcDRWD+ip6JADhMuTK2e42oFcckq1RIyKFHSx00m325zivioy92EQ2NkNue/TP8T4NCHcFMMw5xy0QEFs=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1782500826; c=relaxed/simple;
	bh=UWGfYlcpI5ucIPcZ1bzg54k3yfawGc55LQsHaL5xO/Y=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID:
	 MIME-Version:Content-Type; b=OYQRLuR/EZUa6C/AEymhObEK04RyAKxBiaAvB/7psaD8TBm+0m4Yjh1nLkKlMWpBc2gyao7iYuHOYyrYx/88EFzfeOia50XkA07wGAuZFxooSSHIM6aKhgKd6SGM0PEWzsCEFgOCAsGvh5vq0jY8oexnuKIRZEP2a/pJqH5duZI=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kODUvmx7; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kODUvmx7"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80A021F000E9;
	Fri, 26 Jun 2026 19:07:04 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1782500825;
	bh=UwEbyMfKMTKC4RgkRQVTeboi4KU65yCHj94M5SyEdy8=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date;
	b=kODUvmx7ZqlVqiWZ00rOyItuCB5QmkMAjPO6awgiJ2HV3YLcj35qagt4k57EEx/W6
	 V1duYu9hswqT3EdneFSTXItUT0Veq+XgRbt+GUtySocu8/hIwhSlJSqq1lO978Tner
	 O8aKfIvCSLOHpMZqqu5TxSy/5/ieE+PMxlowfMRG9fOR9L7Zb75Qz3jFdyPFVCI7x6
	 wMGNn4pjqJn4zNWCaj1WevrWz70n7sbkPEj9B23Dnn5SM1aW18zK5VZIQdMr2/fSqD
	 glKwOh+kr+aMEF6ZDRZGnJJIQkE3+TFFVYz8L5zjUSOiexyS1E1/JTutN3DVPetrcr
	 RBQfP2IAZrIEA==
From: Thomas Gleixner <tglx@kernel.org>
To: Chuyi Zhou <zhouchuyi@bytedance.com>, mingo@redhat.com, luto@kernel.org,
 peterz@infradead.org, paulmck@kernel.org, muchun.song@linux.dev,
 bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com,
 bigeasy@linutronix.de, clrkwllms@kernel.org, rostedt@goodmis.org,
 nadav.amit@gmail.com, vkuznets@redhat.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH v8 04/14] smp: Use task-local IPI cpumask in
 smp_call_function_many_cond()
In-Reply-To: <8d3587e6-e3a1-40f5-ba0d-65583a2f1ecb@bytedance.com>
References: <20260616111127.966468-1-zhouchuyi@bytedance.com>
 <20260616111127.966468-5-zhouchuyi@bytedance.com> <871pdtjryo.ffs@fw13>
 <8d3587e6-e3a1-40f5-ba0d-65583a2f1ecb@bytedance.com>
Date: Fri, 26 Jun 2026 21:07:02 +0200
Message-ID: <87a4shi0ix.ffs@fw13>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain

On Fri, Jun 26 2026 at 23:47, Chuyi Zhou wrote:
> On 2026-06-26 10:29 p.m., Thomas Gleixner wrote:
>>> -	err = scs_prepare(tsk, node);
>>> +	err = smp_task_ipi_mask_alloc(tsk);
>> 
>> Hrm. So we unconditionally allocate another per task CPU mask. How many
>> task actually utilize it?
>> 
>> We keep making task_struct and the related things larger every other
>> release without actually looking at the resulting overall memory
>> consumption.
>> 
>
> Thanks, this is a fair concern.
>
> The task-local cpumask approach came from the earlier discussion with
> Sebastian and Nadav. The problem we tried to solve there was the 
> lifetime of the wait mask once the later patch re-enables preemption 
> before csd_lock_wait(). At that point the wait mask can no longer be the 
> per-CPU cfd->cpumask: the task may be preempted or migrate while it is 
> still iterating the mask, and another task running on the original CPU 
> could enter smp_call_function_many_cond() and reuse that per-CPU mask.
>
> I agree that the memory cost needs to be called out explicitly. The
> current implementation trades one task-local cpumask for a stable mask
> lifetime and avoids adding allocation/failure handling to the generic 
> IPI path.
>
> I considered avoiding the fork-time allocation, but the alternatives do
> not look straightforward:
>
> - stack storage is not suitable for large NR_CPUS/CPUMASK_OFFSTACK 
> configurations;
>
> - per-CPU storage is exactly what becomes unsafe once the wait is made
> preemptible;
>
> - allocating the mask in smp_call_function_many_cond() would put an
> allocation in the generic IPI path. It also cannot rely on a sleeping
> allocation because this function is entered from contexts which have
> historically only required preemption to be disabled. Using GFP_ATOMIC
> would need a failure/fallback path, in which case the latency
> improvement becomes opportunistic rather than guaranteed.
>
> For the motivating x86 TLB flush paths, the users are also not a small
> static set of tasks. Ordinary tasks can hit this through exit, unmap,
> reclaim, etc., so I do not see a clean way to allocate this only for a
> pre-identifiable subset of tasks.

I understand that, but this all wants to be spelled out in the change
log and explained.