From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68EBF35B62C for ; Sat, 27 Jun 2026 23:11:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782601885; cv=none; b=kV71uBbBdfip9dW/Lur3big/Kcygg61uKa20RY/Vo7wKM25nAVA2G12R3MxWf1aunVZ4mNIw6SHJFIj2M2PWZ+b7BJE5wPfpyF9nDdq2jdTewQQAjm4XmqTOvdjjVPrXHMLWtDHiMW4fEqhSRqtQArcY+5mOEja30D1SElakyaA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782601885; c=relaxed/simple; bh=k8uR7NKym1vHM5iQfkiIogcY8G3yR3cgSeXGrEOZfPU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=uJsqHm9WWZwFKLp06sbv3a6iOVvWcIUFedv+1MfkFMUv+Z40e2D00QHphNzbzwgXxjGQj6YA896sNXFZl+duxdWIfHN/W0xIzbE6+M50wzAvS31qMgGGNWkIOJGgxYcn6OEdqVQe8yqznvl37uRvTW5DfEPzqVsuESpkPCyQfJM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VCltz26+; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VCltz26+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7AEC61F000E9; Sat, 27 Jun 2026 23:11:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782601884; bh=SUVfPb32/5Yy02PpVgYPuV1CNj4nGegvpvPLP2fZQKA=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=VCltz26+9OwDWF4Xhi9cKwvUKfA4xvms0qgjby4HjX0MUh9P4kxXEEGW8dyCh7H0n LzQ2Sr5NpaQqAGyxM2mfnhOmZEuU4q+ECiyC77vmNjLb+2bh8BQrzcV8hRtk2XyGYc NUpKJmEq55RMIKhZ1t9hHqNnEh9moFHE6VgRGhbGz4nEOBjMl+CB1hmK32Crzp+4Sn 3QaSz55tMvLNfrcXPdmMhvbejEPYqRRAa4QzE7MppFiFcaH5H5pA6R/HgtVwdEH+FN 07kUr3MfHc98mtbSbSyFiajSnSm0/kcPYTs6wzj2m9qkgofCZL5XPZHCayMqezgwFT tiz+tk9YWRlfA== From: Thomas Gleixner To: David Stevens , Pasha Tatashin , Linus Walleij , Will Deacon , Quentin Perret , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Andy Lutomirski , Xin Li , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Uladzislau Rezki , Kees Cook Cc: David Stevens , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 10/13] fork: Store task pointer in unpopulated stack ptes In-Reply-To: <87wlvkhqd0.ffs@fw13> References: <20260424191456.2679717-1-stevensd@google.com> <20260424191456.2679717-11-stevensd@google.com> <87wlvkhqd0.ffs@fw13> Date: Sun, 28 Jun 2026 01:11:20 +0200 Message-ID: <87qzlrh947.ffs@fw13> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Sat, Jun 27 2026 at 00:46, Thomas Gleixner wrote: > On Fri, Apr 24 2026 at 12:14, David Stevens wrote: >> Store the task pointer in the ptes of the unpopulated pages of dynamic >> stacks, to allow the vm_struct pointer to be retrieved without relying >> on any locks or current. > > You fail to explain why you can't use current. Changelogs have to > describe the WHY and not the WHAT. I obviously know why you can't use it. But the absence of a proper explanation and my disgust for the implementation bothered me enough to look deeper into it. Let's look at the only problematic case: schedule() .... switch_to(prev, next) switch_to_asm(prev, next) 1) switch(RSP) __switch_to(prev, next) 2) this_cpu_write(current_task, next); There is obviously a hole between #1 and #2 where 'current_task' is not giving the right answer. You work around that with this PTE storage magic which is admittedly smart, but completely overengineered and not necessary at all. Why? If you look at the above condensed context switch logic related to this problem thoroughly, you'll notice that there are three sources of information: - prev: the task being scheduled out - next: the task being scheduled in - RSP: the stack pointer Between #1 and #2 it cannot be determined whether RSP belongs to 'prev' or 'next' because 'next' is not exposed to the fault handler. But if it would be exposed it would allow to answer the question where RSP belongs to, no? So the obvious _and_ simple solution _is_ to expose 'next': schedule() .... switch_to(prev, next) 1) raw_cpu_write(next_task, next); switch_to_asm(prev, next) 2) switch(RSP) __switch_to(prev, next) 3) raw_cpu_write(current_task, next); With that the stack fault handler logic becomes: curr = raw_cpu_read(current_task); addr = fred_event_data(regs); if (within_task_stack(addr, curr)) return handle_stack_fault(regs, curr); next = raw_cpu_read(next_task); if (curr != next && within_task_stack(addr, next) return handle_stack_fault(regs, next); return 0; Which is correct at any point in time and that pattern works on _all_ supported architectures because it's all CPU local. All you need is one extra store. When done right that's ending up in the same cache line which is anyway dirtied by the context switch (i.e. current_task), so you won't even be able to measure the overhead. Thanks, tglx --- Everything should be made as simple as possible, but not simpler. - Einstein