From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B70CD531 for ; Fri, 24 Apr 2026 23:06:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777071982; cv=none; b=SsNQHw8d8Q0oP9bzpkPznr2IU84rJXreLriC2FQ5X720uGD1Y8vW2wiAO5bAtR8HjtMdasB2Dhi5o1NW5wHdYFUrANi8wkiCalQJwUy5JiqzfkWBhpC25te5JCSLbfNpgz2KaZ/THKWR04TtbynfydoiD1JalrDhKgpuF8IgdMQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777071982; c=relaxed/simple; bh=A3OMnsew5dke6O9oWcwJHPyaTtbq24Snnkg7MYeRP4Q=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=J4R4DzeY8irHoEnSCDd0OFt+U0M5Y7Mtfh37XOboOjRK6MyMg19xHqaNSnJNfwGo5FPWCvk+/s6FcSOzZBhYx09jrrVhOosHd2NqEVnu6/ZMjBu6rLBw5T/luD0sZ1JE9cevQWmpVea0/lQyAe3FsgRQzCIf00aDvg71/UHahLc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=soleen.com; spf=pass smtp.mailfrom=soleen.com; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b=YTAYm/6y; arc=none smtp.client-ip=209.85.222.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=soleen.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=soleen.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b="YTAYm/6y" Received: by mail-qk1-f180.google.com with SMTP id af79cd13be357-8cb40149037so860873185a.2 for ; Fri, 24 Apr 2026 16:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1777071980; x=1777676780; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=RtoQqCQTkTrD0PDh6RrLNWtjJq5PmFm6VcLxe9y1JUE=; b=YTAYm/6yXpaMzwT3K/qdhiWCNxOQ0Kk2w8DFyiqnMqJeV1HCJIb8Rq4xX38Szc4EIf hMrYsSFERYj89U0Bb8Jw9Lelh/EnMI+w97pL/Y7vGIVoUCR+TPjzwCGt+K7feVNkSuhG LaTsMq/gtwthWj5hufbv/LnbbeV6vSM5HD+Dhk6keFxqTEIRqOibn4ourLI0HPw5n6Jc ZNyyinL5xIU3FxQkYvYoIAoMI7hhN1DQPd/BQWjMNmEDQz6wnfV0yuJMw2KQ64XFR93H 31xl3Wckvd2IX7Vo3/HsP4N5n/ejieava+QCc46TMZtDha8AGj+CaDLCIm5Vu5SlC/lo 1cUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777071980; x=1777676780; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RtoQqCQTkTrD0PDh6RrLNWtjJq5PmFm6VcLxe9y1JUE=; b=cpyE41lIVa/3L4QkE8x6Yff2LDhOjOKN4Zhg1+5Nqu47eruV6tizzOdYlcMwtqXf/h q3jq48TusN5K2RCwicszRX1gdzsjbNPyaD6h6DHwHKekDEY4xD42Y+altGQfOf3xUEst 9/4vCPXjo1XSCNN8d8BY1hPTa5QpeZZdAO3H809zTFEOssuxLxHG9WHCCOhLcgUxIs5d qCn9nHFZKdp9sJoEvxOAe27OzwZFvfV9Q96D5kKW+oFEf7DZj6yPwzp/h6RJVVQMVp6+ QqTje1TqoX8e+TZb7rM0maR/nMeFWnLS7B/24ZhaeKxyQ4e5K7v7sLYvfOKdfyR46rO6 Ga8A== X-Forwarded-Encrypted: i=1; AFNElJ8W48Q8U386KoTN4hnZ/NiuTbL1V73UcgGF0d0nrXhrFXfSJHIuCmH4nqCnc15h7AqGrV/KYLlz/SqLQQg=@vger.kernel.org X-Gm-Message-State: AOJu0YwrU1NzjGh031lKNZx/nVTJEv8mIe1PZ9JcjhOfWqQ0aJ/AFU7S LF1F1RzGQ0+leT6+MMVg5N24R0T2nkgWpf1MPZ20+6HdjZirh9jjG9YzC0sSniOKS0Q= X-Gm-Gg: AeBDieuPegzA98mO3Nn0jxC6mCVQzY/Mn5YPfI4tl7PmIsqum92573O1GQUssMnzcRJ i4Cj2emYOWxvvoS6Q1W1Sj7PO85Wm1KTZo8KP1BIxkNtnvDKiCubqpnfkfR3+fPM6v6Pu8VTAzb ZB3zPSyXc92joC+wqqvpuS+CzfbEIm1G80f7nINLGu7krPXojBnzlMq3bfJB7N/J2tfXqixD7PH 25tbZN94pTvVeu3wA6gTE1gVqVVu5+Wa5W+9t6pwiwWs/PrukQIXigjw7UAdSTJtrjL42coNyLe Mrqhs5Y3bHEvMEn5kK/qDm49tD/BCqmuqU5NKmkrToz8q2gxEheQitCaAKdV1KMLA6QilKkwscX wJY9wMhQrpEVbxnfcjGX4wGyjaZtoS62/z4FXWJB7sP9fdTteWkjZimSrobbDS2llfvjmGaRqrA FjRal0tETZQOpyrNoSsmkLL7n2UJDT/Vo3kAygTgI0+ithTjiGX2ez4XZIWQWGSQ== X-Received: by 2002:a05:620a:460f:b0:8cf:c106:faca with SMTP id af79cd13be357-8e791f755f5mr4775333385a.36.1777071980394; Fri, 24 Apr 2026 16:06:20 -0700 (PDT) Received: from plex ([71.181.43.54]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8eb3aa60b99sm1843237685a.42.2026.04.24.16.06.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Apr 2026 16:06:20 -0700 (PDT) Date: Fri, 24 Apr 2026 23:06:18 +0000 From: Pasha Tatashin To: David Laight Cc: Pasha Tatashin , Dave Hansen , David Stevens , Linus Walleij , Will Deacon , Quentin Perret , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Andy Lutomirski , Xin Li , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Uladzislau Rezki , Kees Cook , linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org Subject: Re: [PATCH v2 00/13] Dynamic Kernel Stacks Message-ID: References: <20260424191456.2679717-1-stevensd@google.com> <20260424232637.054f15dd@pumpkin> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260424232637.054f15dd@pumpkin> On 04-24 23:26, David Laight wrote: > On Fri, 24 Apr 2026 21:35:20 +0000 > Pasha Tatashin wrote: > > > On 04-24 12:41, Dave Hansen wrote: > > > On 4/24/26 12:14, David Stevens wrote: > > > > The question is then: is this approach something that is fundamentally > > > > untenable in the kernel > > > > > > Yes. Fundamentally untenable. > > > > > > Not allowing stack faults has been a wonderful simplification. It's one > > > of those things that just plain makes the kernel easier to maintain. > > > Saving low single digits of system memory is not exactly making me eager > > > to go back to the harder-to-maintain days. > > > > > > I seriously doubt that this 1% is the lowest hanging fruit for memory > > > bloat on these systems. ;) > > > > This true until, in a fleet of millions of machines, you encounter a > > one-in-a-billion chance of a stack overflow. You are then forced to > > double the statically allocated kernel stacks on every machine, paying a > > memory tax even though 99.999..% of threads never exceed 4K. This > > overhead accumulates to petabytes of wasted capacity. > > And then you hit a stack fault in some path where you can't sleep and > there isn't any available kernel memory. Well, at least if we hit this rare case, we can simply double a buffer of pre-reserved stack memory per CPU. This still saves significant memory compared to wasting it on every single thread. > An alternative idea is to arrange for some system calls to sleep in > userspace, so when the thread is woken it re-executes the system call. > It then makes sense to assign the kernel stack to the process when > it enters the kernel. > That might mean that you don't need a kernel stack for all the threads > sleeping in futex() - it might even be possible to do the retry in > userspace saving the second kernel entry most of the time. > It is all 'hard and difficult' though. I was thinking about a similar approach as well—sort of multiplexing the kernel stacks. But honestly, when trying to cover all the edge cases, I didn't find it to be any better or easier than just using dynamic kernel stacks. An alternative approach, which was proposed at LSFMM by Willy, is to add an explicit deep stack calls. When we enter a path that we know is exceptionally deep, only then do we extend the stack, keeping the default (say, 8K) everywhere else. > The easier solution is to rewrite the system code so it doesn't have > 1000s of threads :-) That ship sailed in the early 90s of the previous millennium. Nowadays, we have high end workstations with almost 200 hardware threads. Rewriting system code to reduce thread counts simply isn't an option for our storage machines, which have millions of threads per unit. +CC Matthew Wilcox