From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A91236B05E for ; Fri, 26 Jun 2026 19:07:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782500826; cv=none; b=bAz1TTeaT7eHAgQxH/JxATRw/Y7JX0lJ/n7Nun5SysntMOKvVw59oLHiE/J3MH2UQp6GkDjzCBnuEE5irbJCXVLVxsMcDRWD+ip6JADhMuTK2e42oFcckq1RIyKFHSx00m325zivioy92EQ2NkNue/TP8T4NCHcFMMw5xy0QEFs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782500826; c=relaxed/simple; bh=UWGfYlcpI5ucIPcZ1bzg54k3yfawGc55LQsHaL5xO/Y=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=OYQRLuR/EZUa6C/AEymhObEK04RyAKxBiaAvB/7psaD8TBm+0m4Yjh1nLkKlMWpBc2gyao7iYuHOYyrYx/88EFzfeOia50XkA07wGAuZFxooSSHIM6aKhgKd6SGM0PEWzsCEFgOCAsGvh5vq0jY8oexnuKIRZEP2a/pJqH5duZI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kODUvmx7; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kODUvmx7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80A021F000E9; Fri, 26 Jun 2026 19:07:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782500825; bh=UwEbyMfKMTKC4RgkRQVTeboi4KU65yCHj94M5SyEdy8=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=kODUvmx7ZqlVqiWZ00rOyItuCB5QmkMAjPO6awgiJ2HV3YLcj35qagt4k57EEx/W6 V1duYu9hswqT3EdneFSTXItUT0Veq+XgRbt+GUtySocu8/hIwhSlJSqq1lO978Tner O8aKfIvCSLOHpMZqqu5TxSy/5/ieE+PMxlowfMRG9fOR9L7Zb75Qz3jFdyPFVCI7x6 wMGNn4pjqJn4zNWCaj1WevrWz70n7sbkPEj9B23Dnn5SM1aW18zK5VZIQdMr2/fSqD glKwOh+kr+aMEF6ZDRZGnJJIQkE3+TFFVYz8L5zjUSOiexyS1E1/JTutN3DVPetrcr RBQfP2IAZrIEA== From: Thomas Gleixner To: Chuyi Zhou , mingo@redhat.com, luto@kernel.org, peterz@infradead.org, paulmck@kernel.org, muchun.song@linux.dev, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com, bigeasy@linutronix.de, clrkwllms@kernel.org, rostedt@goodmis.org, nadav.amit@gmail.com, vkuznets@redhat.com Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH v8 04/14] smp: Use task-local IPI cpumask in smp_call_function_many_cond() In-Reply-To: <8d3587e6-e3a1-40f5-ba0d-65583a2f1ecb@bytedance.com> References: <20260616111127.966468-1-zhouchuyi@bytedance.com> <20260616111127.966468-5-zhouchuyi@bytedance.com> <871pdtjryo.ffs@fw13> <8d3587e6-e3a1-40f5-ba0d-65583a2f1ecb@bytedance.com> Date: Fri, 26 Jun 2026 21:07:02 +0200 Message-ID: <87a4shi0ix.ffs@fw13> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Fri, Jun 26 2026 at 23:47, Chuyi Zhou wrote: > On 2026-06-26 10:29 p.m., Thomas Gleixner wrote: >>> - err = scs_prepare(tsk, node); >>> + err = smp_task_ipi_mask_alloc(tsk); >> >> Hrm. So we unconditionally allocate another per task CPU mask. How many >> task actually utilize it? >> >> We keep making task_struct and the related things larger every other >> release without actually looking at the resulting overall memory >> consumption. >> > > Thanks, this is a fair concern. > > The task-local cpumask approach came from the earlier discussion with > Sebastian and Nadav. The problem we tried to solve there was the > lifetime of the wait mask once the later patch re-enables preemption > before csd_lock_wait(). At that point the wait mask can no longer be the > per-CPU cfd->cpumask: the task may be preempted or migrate while it is > still iterating the mask, and another task running on the original CPU > could enter smp_call_function_many_cond() and reuse that per-CPU mask. > > I agree that the memory cost needs to be called out explicitly. The > current implementation trades one task-local cpumask for a stable mask > lifetime and avoids adding allocation/failure handling to the generic > IPI path. > > I considered avoiding the fork-time allocation, but the alternatives do > not look straightforward: > > - stack storage is not suitable for large NR_CPUS/CPUMASK_OFFSTACK > configurations; > > - per-CPU storage is exactly what becomes unsafe once the wait is made > preemptible; > > - allocating the mask in smp_call_function_many_cond() would put an > allocation in the generic IPI path. It also cannot rely on a sleeping > allocation because this function is entered from contexts which have > historically only required preemption to be disabled. Using GFP_ATOMIC > would need a failure/fallback path, in which case the latency > improvement becomes opportunistic rather than guaranteed. > > For the motivating x86 TLB flush paths, the users are also not a small > static set of tasks. Ordinary tasks can hit this through exit, unmap, > reclaim, etc., so I do not see a clean way to allocate this only for a > pre-identifiable subset of tasks. I understand that, but this all wants to be spelled out in the change log and explained.