From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f73.google.com (mail-dl1-f73.google.com [74.125.82.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB1183DC4BE for ; Fri, 24 Apr 2026 19:17:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777058232; cv=none; b=MHjf8vmeYZlNSYieDFJBJdKwrsbYGelM7PNQ7QEN16K6SFq/UXw54Pv8edv4MrWyYL7+Rqi8QUkj4IVC+L3KY3nwB6UdwRnLUZURdRs9e2p9XIp3tQVm7XTnNXeAODIGffsml+1vBSmY8GdfvyWqHrR9rLaCB57YaW4q+s43xVU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777058232; c=relaxed/simple; bh=8CwW+GpF5BAaieLCinbMNp90S6l0U17fNDo9y9xNgcA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZzBUuqKC+8nAe9m+I69ssb3iUJXEl9BL7MNgrKvdJJ/6vky51kj5FJvQON2o7SEGBnm+bU/Wg1vCcoTw0diXOe4bd8h5sWZhCBy5GyrlYy9DeCSiaMLkuGmmEjbG+NuHd0+lwIkQ6BfQU5t7O9ckQjpaXZpjK+qD+dUL6i8rLEc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--stevensd.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jr4mBAMg; arc=none smtp.client-ip=74.125.82.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--stevensd.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jr4mBAMg" Received: by mail-dl1-f73.google.com with SMTP id a92af1059eb24-12dba1e866dso2820951c88.1 for ; Fri, 24 Apr 2026 12:17:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1777058228; x=1777663028; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LVunvLNRu5wU/C5dXUrSh/20awEeZOSQNdDKwzJM0Dk=; b=jr4mBAMg2ncalSL0odGZ6BGOa7uuq3zkItyKWt+dHXb6eOHW4n6n7XAw8WvZQhX37U VZHdEyaFWB/9Q3XT0RaYI0wnfuqp3ka+ZNuV/kunMazjQiEfGUgSqPEOSqNY4Mc4BFrr 3XvHesXQohVlJ1Xpb3O/nAs1fUbpxAWkP1AsU2QbMoPcDK9gWPmyhwT6UrpJD/7f/dGd fDd7ezm1x4sWub1Y2Zuy5GdxvMwRGICXKiQ8DeObuGTmeUWNm9qqj66cHWgx/HqbqfcZ IdFt/engqfZGmQ1BkbHR4jMwznHf4l5lh+rNZMTWjgCH9IMHQ/1R3LC/ZjHFRqjT/9xd +s7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777058228; x=1777663028; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LVunvLNRu5wU/C5dXUrSh/20awEeZOSQNdDKwzJM0Dk=; b=FRUWuRAklf8TPirqrWSLaJwR3sibGwDF5FhKw9MV+yHWZPftVX+0OXrZHa6RnfLemP KBedJIlAteS/IZinq39yH6tchH3cqpOYyvqcM+jaIcFsh5A7oz5a/YEg1dePyeQqFg2Z w707l7W46GOAYoFW87ku57kDaud6i377iGBiFt2TyPm8LPXJpUi39rfjaA0b2bEyCAHI DsJ2k96da+SejpWBfHNNVe1mODFpZGr3vEcMvBMmgYL4V7+I5UZqxwkRPJ6R4h7FqaMW R4oFYFkEtjeWXMQ+UMGRKHSJmbcjU5Z1sab2vaUecK9TNBGRbvyVOWjMFZKJpyOORP6E K60g== X-Forwarded-Encrypted: i=1; AFNElJ8OwPvg/7u3ZfD5rC8hZawvfGAu6N2SMxkYQyaO9DYFqH7UIDDxc2kIUIUmZzSrO4Z0SWsxI4RIKdWc6Ig=@vger.kernel.org X-Gm-Message-State: AOJu0Yx7RWrfotKEKc5Kpx4L2W9c4sUTh669iA57BgeHX/+j6dpTSecV 967LNqRpALluiET7/hMrC7zMDHM141wOb7P8xzeX09r+abUXyMppgKUmj9xQEcNGib07T5FVsw7 7XjhrOKfomW8Z2A== X-Received: from dybnj5.prod.google.com ([2002:a05:7300:d085:b0:2dd:4573:2897]) (user=stevensd job=prod-delivery.src-stubby-dispatcher) by 2002:a05:7023:b0d:b0:12c:8eb:80b9 with SMTP id a92af1059eb24-12c73afa081mr12946449c88.6.1777058227667; Fri, 24 Apr 2026 12:17:07 -0700 (PDT) Date: Fri, 24 Apr 2026 12:14:53 -0700 In-Reply-To: <20260424191456.2679717-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260424191456.2679717-1-stevensd@google.com> X-Mailer: git-send-email 2.54.0.rc2.544.gc7ae2d5bb8-goog Message-ID: <20260424191456.2679717-11-stevensd@google.com> Subject: [PATCH v2 10/13] fork: Store task pointer in unpopulated stack ptes From: David Stevens To: Pasha Tatashin , Linus Walleij , Will Deacon , Quentin Perret , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Andy Lutomirski , Xin Li , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Uladzislau Rezki , Kees Cook Cc: David Stevens , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Store the task pointer in the ptes of the unpopulated pages of dynamic stacks, to allow the vm_struct pointer to be retrieved without relying on any locks or current. This relies on being able to pack the struct task_struct pointer into a pte. Since the struct is 64 byte aligned, that gives 5 bits of leeway, which should be viable on most architectures. Any architecture which enables dynamic thread stacks must provide make_data_kpte() and unpack_data_kpte(), which pack/unpack a right shifted pointer value into/from a pte. Signed-off-by: David Stevens --- include/linux/sched/task_stack.h | 1 + kernel/fork.c | 74 +++++++++++++++++++++++++++++--- mm/vmalloc.c | 2 +- 3 files changed, 69 insertions(+), 8 deletions(-) diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h index 7dcff2836d7e..7cf00ce97f7c 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -105,6 +105,7 @@ void exit_task_stack_account(struct task_struct *tsk); void dynamic_stack_refill_pages(void); unsigned long dynamic_stack_accounting(struct task_struct *tsk, bool finalize); bool dynamic_stack_fault(struct task_struct *tsk, unsigned long address, bool *on_stack); +struct task_struct *task_from_stack_address(unsigned long address); /* * Refill and charge for the used pages. diff --git a/kernel/fork.c b/kernel/fork.c index 9ac9d23f5f4b..733fc1f58b8b 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -296,16 +296,40 @@ static bool try_release_thread_stack_to_cache(struct vm_struct *vm_area) static DEFINE_PER_CPU(struct page *, dynamic_stack_pages[DYNSTK_PAGE_POOL_NR]); +#define TASK_PTR_SHIFT (ilog2(__alignof__(struct task_struct))) + static void link_vmap_stack_to_task(struct task_struct *tsk, struct vm_struct *vm_area) { + int i; + unsigned long addr; + pte_t *ptep, pte; + + pte = make_data_kpte(((unsigned long)tsk) >> TASK_PTR_SHIFT); + tsk->stack_vm_area = vm_area; tsk->packed_stack = (unsigned long)kasan_reset_tag(vm_area->addr); + + addr = (unsigned long)vm_area->addr; + ptep = virt_to_kpte(addr); + for (i = vm_area->nr_pages; i < THREAD_SIZE >> PAGE_SHIFT; + i++, addr += PAGE_SIZE, ptep++) + set_pte_at(&init_mm, addr, ptep, pte); } -static void free_vmap_stack(struct vm_struct *vm_area) +static void free_vmap_stack(struct vm_struct *vm_area, bool was_mapped) { int i; + /* Clear data kptes since vunmap expects present or none. */ + if (was_mapped) { + unsigned long addr = (unsigned long)vm_area->addr; + pte_t *ptep = virt_to_kpte(addr); + unsigned int nr_to_clear = (THREAD_SIZE >> PAGE_SHIFT) - vm_area->nr_pages; + + if (nr_to_clear) + clear_ptes(&init_mm, addr, ptep, nr_to_clear); + } + remove_vm_area(vm_area->addr); for (i = 0; i < vm_area->nr_pages; i++) @@ -354,7 +378,7 @@ static struct vm_struct *alloc_vmap_stack(int node) return vm_area; cleanup_err: - free_vmap_stack(vm_area); + free_vmap_stack(vm_area, false); return NULL; } @@ -477,6 +501,42 @@ unsigned long dynamic_stack_accounting(struct task_struct *tsk, bool finalize) return i; } +noinstr struct task_struct *task_from_stack_address(unsigned long address) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + BUILD_BUG_ON((BITS_PER_LONG - TASK_PTR_SHIFT) > KPTE_AVAILABLE_DATA_BITS); + + if (!is_vmalloc_addr((void *)address)) + return NULL; + + pgd = pgd_offset_k(address); + if (pgd_none(*pgd) || pgd_leaf(*pgd)) + return NULL; + + p4d = p4d_offset(pgd, address); + if (p4d_none(*p4d) || p4d_leaf(*p4d)) + return NULL; + + pud = pud_offset(p4d, address); + if (pud_none(*pud) || pud_leaf(*pud)) + return NULL; + + pmd = pmd_offset(pud, address); + if (pmd_none(*pmd) || pmd_leaf(*pmd)) + return NULL; + + pte = pte_offset_kernel(pmd, address); + if (pte_present(*pte) || pte_none(*pte)) + return NULL; + + return (struct task_struct *)(unpack_data_kpte(*pte) << TASK_PTR_SHIFT); +} + bool noinstr dynamic_stack_fault(struct task_struct *tsk, unsigned long address, bool *on_stack) { unsigned long stack, hole_end, addr; @@ -570,7 +630,7 @@ static inline struct vm_struct *alloc_vmap_stack(int node) return stack ? find_vm_area(stack) : NULL; } -static inline void free_vmap_stack(struct vm_struct *vm_area) +static inline void free_vmap_stack(struct vm_struct *vm_area, bool was_mapped) { vfree(vm_area->addr); } @@ -590,7 +650,7 @@ static void thread_stack_free_work(struct work_struct *work) if (try_release_thread_stack_to_cache(vm_stack->stack_vm_area)) return; - free_vmap_stack(vm_area); + free_vmap_stack(vm_area, true); } static void thread_stack_delayed_free(struct task_struct *tsk) @@ -618,7 +678,7 @@ static int free_vm_stack_cache(unsigned int cpu) if (!vm_area) continue; - free_vmap_stack(vm_area); + free_vmap_stack(vm_area, true); cached_vm_stack_areas[i] = NULL; } @@ -653,7 +713,7 @@ static int alloc_thread_stack_node(struct task_struct *tsk, int node) unsigned long memset_offset = 0; if (memcg_charge_kernel_stack(vm_area)) { - free_vmap_stack(vm_area); + free_vmap_stack(vm_area, true); return -ENOMEM; } @@ -674,7 +734,7 @@ static int alloc_thread_stack_node(struct task_struct *tsk, int node) return -ENOMEM; if (memcg_charge_kernel_stack(vm_area)) { - free_vmap_stack(vm_area); + free_vmap_stack(vm_area, true); return -ENOMEM; } link_vmap_stack_to_task(tsk, vm_area); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 39b7e118cbce..76955c101180 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -76,7 +76,7 @@ early_param("nohugevmalloc", set_nohugevmalloc); static const bool vmap_allow_huge = false; #endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */ -bool is_vmalloc_addr(const void *x) +noinstr bool is_vmalloc_addr(const void *x) { unsigned long addr = (unsigned long)kasan_reset_tag(x); -- 2.54.0.rc2.544.gc7ae2d5bb8-goog