From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C894FC54E60 for ; Thu, 14 Mar 2024 19:57:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A5B5800E1; Thu, 14 Mar 2024 15:57:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 556EF800B4; Thu, 14 Mar 2024 15:57:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F75C800E1; Thu, 14 Mar 2024 15:57:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2BC45800B4 for ; Thu, 14 Mar 2024 15:57:32 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EEBE71C095A for ; Thu, 14 Mar 2024 19:57:31 +0000 (UTC) X-FDA: 81896704302.10.0B7B876 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf26.hostedemail.com (Postfix) with ESMTP id AA03E140018 for ; Thu, 14 Mar 2024 19:57:29 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=Gbtwdjqn; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710446250; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dlngYo+VSnclef4VOZhdqljtbFfFaVONfLZGwaKrv+E=; b=x9sPzfj0wU3+cXbsRn/pK1Ef3qslTX/3KIxJngKmPWqI2nGE3ZXv3btOsA/FF+2tiDfaF9 vFj4veNLH+QT6EQzE/GCTvs77il9NHf4swZmetM+KmPpIK055ykAFDim3y8b+2yhYU5NpS fheg1msEyjcKcZTfQ5mkJvgoJvt7sSA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=Gbtwdjqn; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710446250; a=rsa-sha256; cv=none; b=dv+A+NwnbtPFogBL2O1IjSJoNE7UI8LHSD5ka6c4b/h4ZQ/wIWMOmcQelaQ9OqJZIMlD4f G5JIWxoApm9NUJf+Ka6p66nxAukvIec5WVqPAYt2OfGUHP0fhmHhSrYxllYQgqNNpoHJEU drwiNCWxm9HWq2LEIGtM/kIB4xgBxbM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=dlngYo+VSnclef4VOZhdqljtbFfFaVONfLZGwaKrv+E=; b=GbtwdjqnLBzaCLa97mN0tIYapt jN504bjb2WWVxM0pxPKK7bCP/++OdQZzcqGnNtJW5m+auFeMeCltw812aoQHTG87y7qXgEm/TPyjY cs0hWlXYoaYVYp3oY5MMflk4TeehJidlMS46KNez+5Dttc6+muQeGgtsNv4P/ubSVS6oJaFrMW7PV xua560O/7pLBegf1hC+gofn6oK2zXCmlc9tmjEx2V+w2H0/E96E3nPezuSVcK4tRu350cs5+t1f00 C3U72wqh/sKrhmsj8+IlusbyyxahSnHj4PKTOs8VR0GvME4blEC7DT/D6mYJf/4IZkgg8MtkDGmJR BlnBAaXA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rkrCc-00000008ZW4-3t4d; Thu, 14 Mar 2024 19:57:22 +0000 Date: Thu, 14 Mar 2024 19:57:22 +0000 From: Matthew Wilcox To: Kent Overstreet Cc: "H. Peter Anvin" , Pasha Tatashin , linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Subject: Re: [RFC 00/14] Dynamic Kernel Stacks Message-ID: References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <2cb8f02d-f21e-45d2-afe2-d1c6225240f3@zytor.com> <2qp4uegb4kqkryihqyo6v3fzoc2nysuhltc535kxnh6ozpo5ni@isilzw7nth42> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2qp4uegb4kqkryihqyo6v3fzoc2nysuhltc535kxnh6ozpo5ni@isilzw7nth42> X-Rspamd-Queue-Id: AA03E140018 X-Rspam-User: X-Stat-Signature: 3aknbrta7bty3dg68hyay5fbkkd3zjsh X-Rspamd-Server: rspam01 X-HE-Tag: 1710446249-573995 X-HE-Meta: U2FsdGVkX181q8kwhyvTtsf/zmnPe94dNJxPpksjEm4I8ySs6dB4H6MXn+bUs6abJPTrVGqS7KazDpChmAJ+ZgRFy1KkmtD+un+fuLN2LuXMLzqEULFh1CrPTRhmOysX2cdMOLzQvUuBW7JkzSkLspesnqvfikzpZLRmKwcvFPQFbQAgJYzMJ6STbf2LAbp5WPZXOiHYZmfSri+Lpcud+qMyYzviR4DMU6o4Mni7n+alCRo606VHjRkGEgF2gLK9SgAlU8fE4ziPnsL52yLj+VujEq+Xc55EQMuI+522YGS+WFwhiED4xcmuQerV9gl4BEz0CcJbf5IzKOtzJlrRC3XpOWKbVb2URk9OuLRr5nOYWGxm1TWavDZiTWHfUNd2nCiBnLIWeyOq5sV9GjUtE2ITJyJ+zJcGe3CQCeof1Sy980sLM8a1GE7SJ7T3+G367WPxLCyJlcNC1EoF3cYB7Ma1IgBIbx7aao8oE4X+B2INl4PG4+US5dIQbjv7GQVvalJZzdY6tmz/TAaq+1Mv4ufVCU9nQZFiHgE99Q+m7GHRzcexgY2fs2CbVVy07H1L705ZiYYMsXEYmYbYnbGCpcWE10NtysxmbrYoG3BlshF+EPOsU0J3CU2XodCbETkCky+vpKpzSB4LB2qMcAI8aayjnMAugLNOr1xDGLYXMCgUmsC/VHZWhOlDJCNwcbXkL8+G0SAVk54qH5enJ+CI0x8vGNhi2YH6EHO5UlRCH9Iov+nA7Y1KNHZoL2z9dc1U6vQr5WbopBD787UFz/bqRfjAAlCB4vIkDRuWprd9EfIQlLvQxWkWHpV0YBJiqR45dpfGLZ5nyrHgfUnG7bXbBWnrq15+nP8Bm5XFCno3G/RRCoL5iUDrfv8DxyUQbSPj3o6VqHL1ObhGFGGI2KmFHbqwX7dE4T1rrZKUDg2UUmuYt4vsSrkWPgnioZ81DzrBfHiLnbsEdc96NuSSGsJ 3S22mHdf /p0XJ0OPBw+yIS4tV3S2YV51ohwcDlXcjeEUNvSk/7AauKfyngkZjmnU325slT3ZJU8hNJl9iIb1c4+4RYrnRSW0nfcODCj60Nk4zfI3NglbSwIdRzzbUtF3nB3cLgYCsRlk4HtLegoQTMqE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 14, 2024 at 03:53:39PM -0400, Kent Overstreet wrote: > On Thu, Mar 14, 2024 at 07:43:06PM +0000, Matthew Wilcox wrote: > > On Tue, Mar 12, 2024 at 10:18:10AM -0700, H. Peter Anvin wrote: > > > Second, non-dynamic kernel memory is one of the core design decisions in > > > Linux from early on. This means there are lot of deeply embedded assumptions > > > which would have to be untangled. > > > > I think there are other ways of getting the benefit that Pasha is seeking > > without moving to dynamically allocated kernel memory. One icky thing > > that XFS does is punt work over to a kernel thread in order to use more > > stack! That breaks a number of things including lockdep (because the > > kernel thread doesn't own the lock, the thread waiting for the kernel > > thread owns the lock). > > > > If we had segmented stacks, XFS could say "I need at least 6kB of stack", > > and if less than that was available, we could allocate a temporary > > stack and switch to it. I suspect Google would also be able to use this > > API for their rare cases when they need more than 8kB of kernel stack. > > Who knows, we might all be able to use such a thing. > > > > I'd been thinking about this from the point of view of allocating more > > stack elsewhere in kernel space, but combining what Pasha has done here > > with this idea might lead to a hybrid approach that works better; allocate > > 32kB of vmap space per kernel thread, put 12kB of memory at the top of it, > > rely on people using this "I need more stack" API correctly, and free the > > excess pages on return to userspace. No complicated "switch stacks" API > > needed, just an "ensure we have at least N bytes of stack remaining" API. > > Why would we need an "I need more stack" API? Pasha's approach seems > like everything we need for what you're talking about. Because double faults are hard, possibly impossible, and the FRED approach Peter described has extra overhead? This was all described up-thread.