From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7232FC7EE2E for ; Mon, 12 Jun 2023 21:34:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C23138E0003; Mon, 12 Jun 2023 17:34:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD3128E0002; Mon, 12 Jun 2023 17:34:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9A7B8E0003; Mon, 12 Jun 2023 17:34:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 980EF8E0002 for ; Mon, 12 Jun 2023 17:34:54 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 50A428033F for ; Mon, 12 Jun 2023 21:34:54 +0000 (UTC) X-FDA: 80895400908.20.296E015 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id 8265B18001F for ; Mon, 12 Jun 2023 21:34:52 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=WWGX6Esm; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686605692; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8hru4Ln0n0blCjwJRFxyEWHpdaxi5KDF8ZKga7dyQmg=; b=Kiwj9cOmhymyChRFq5Y6KFrPK+cxbuSYyWcksb9k+Rvxzpzh/3DyWlbXIRn7wf1kpmkAqs MAQCW7HI1RPAjXWaACBoPJwKCBXqhc9mx+QnB4DOFrFdSFEd+nkZosU8gxUmtPwj1MmXQ/ wtSJ/NsUahL/0Jh4HdnYO02+T9PN/v8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=WWGX6Esm; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686605692; a=rsa-sha256; cv=none; b=ieMxRXS4SOeBevNMlcSgjIZTWfN7CL3jGmC643u2lifmhXUnmCnz0h5MBzOvLNMf/TcZRQ cdZSTuHtNGkvb73MTSbnmGYVsQboEKJo6oSxNouS/PwckrwBSVvuKa/0w1yZpHI+CK5Z1C GtAUnxaT5iP8i7Y2ujXONnwvfr6JlmU= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5205A62AEB; Mon, 12 Jun 2023 21:34:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D36BC433EF; Mon, 12 Jun 2023 21:34:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686605690; bh=h0CRiC60VVAZqmO+EH1iTk5RAONQQFwe4r5bnxWeBYM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WWGX6EsmQ2I3j/HZk09nSijqLsLqKh2zR9pZKyLx/B0jIXGPm7WOm6aHPsipPOPiY jbuqr2L5NdxGQ25vQ0efNR0RvVTA/m99hLadgsoxeMr2Yr4F8lHDdx8RspQzi7W43/ mLC5DXypQ1i3u7doi7r5ZuwkTyjLzEfegN8R501cyMOXLvlLRW9VI2Ncve7nqZpdCc NEaiFAjaKVxiKSOlPYg1cGzREgYQilsaa62hJ0h6RiWVYxtKmWXyFXZ0lBGx5e1JWW 26oQT3m1G5nZ691Qq/SnJu3UQtk8Wk6rKOBVtLl9V+SVu1rHNR6YSCJNrSY+wvrhZ3 1QMTS9aeu3yow== Date: Tue, 13 Jun 2023 00:34:11 +0300 From: Mike Rapoport To: Song Liu Cc: Mark Rutland , Kent Overstreet , linux-kernel@vger.kernel.org, Andrew Morton , Catalin Marinas , Christophe Leroy , "David S. Miller" , Dinh Nguyen , Heiko Carstens , Helge Deller , Huacai Chen , Luis Chamberlain , Michael Ellerman , "Naveen N. Rao" , Palmer Dabbelt , Russell King , Steven Rostedt , Thomas Bogendoerfer , Thomas Gleixner , Will Deacon , bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, netdev@vger.kernel.org, sparclinux@vger.kernel.org, x86@kernel.org Subject: Re: [PATCH 00/13] mm: jit/text allocator Message-ID: <20230612213411.GP52412@kernel.org> References: <20230601101257.530867-1-rppt@kernel.org> <20230605092040.GB3460@kernel.org> <20230608184116.GJ52412@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 8265B18001F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: r33dagqnrcgieika53obx6b4iywtax31 X-HE-Tag: 1686605692-190764 X-HE-Meta: U2FsdGVkX1/QmTpUcYwferU4W6buFYkNsveKzN/Gf0cDGXVBUESBDqc72HX7S38Qr95mq823ksGh4Df6pTUtKcX5Y/YYUbYGFPJbnFLBdBJtUClZ1AtiaDBk8Ypozijrcug3XHnSRnZ8g9FpYI1gf0ElpZ4VKxKQnhKfy9VydodGADLFJf1km4sI1O1drxv7x2oTaWUg45s9W61X1c5IpB/TSxIpMqAOPkdneTsY98Ih956NXvKvfqd8DTGCU11AISIWZ/opO5GWB4zQEzUKI1ijdvH78y2yhKyXGmfViX+22/5IsE1BSt4jlHuMogBLBwHJJ+dCJxFgQcJrFxSTIifvkrw1duWm/7qQCus1cwDKABXW2E4BWrrVpA2NI5kT1ZoLurziCWPVJwGaLvi0h1ZJtcWpwtrNiiGlvFRhREDvLX3E90TNWVors8S6B06yKtvX7OCj40ixNymoudGAoPk2vFa2PbaPpe0gUN5Jj05vhWbsO9/b1A9ENmii4kmaM3gVGnhzP3bPBHkNamIquCVw0o0JDPZACni8curhn2dY6P8DmuUyib8KZajhlTizFuvUM0L//hQgMIEFuTwKFXIIn/OpbjRwhbzaN58EaqEz2NkQYU/F+QWbyMNxCoBzk9MS2pBqExEUUEoEyHj8hHGvvnc3MsmRoqBzSc7J+DME9mDxYDSvdo6d6MADA4U2KVN4FPrRVTVGfxLuuH2nz1Rz3P5OarfeNhYMWucZsvEYYpqvg5gCGTL3yvAG29R6YMwuoQ00BEH3rab+Fi8Ndswuu+wxT7jjqtXI3/PIPuyOJr2/XLJIx0hRN2lJw1osLAg9KvZqOR3HDs4ro109wacy2k8FCYYANv7G3QGWuurodc8cyS7+XuDKX4An4X3Ce8HFliwepAWZ96lwOytNbw+bfnDh63X5XtjX2x2slhVng5U5Rew6mUiFm1StBW+4JHx2peOiE61/nykUmNs VYz/LzN8 E0wFfp7oNbeTpG4fwp4KPEsY1mL5FToEXMep36k00PseXVEDVLvmdHICe+P+EftVJfMR9oKXv1zBtCn3cE5trjqPKb785pYCL5/4v5tqiom8vNSNBVnDTAacQnH43LEBPZb0vgsPny2N27DBfvgxWqRb6dhgxqbP9Tz4rvhcehh6C8+Bdyfqh3UM+xbuTsBh4nrfLieT1c8GUw1mVbbBdiHKs9UVms1pH9Jdj33dTvv3OeBv6upARVrHYH5v+d0iNLKUiFjH/cvVK52n31xbtC/ZnnfTMh1MOQjO5d3YF0jn5N5y/KBu/CAXuYi7sLAOG0peE9meqvmpIXimTi6VgUKoyOkxypwEs3hYCJA7cMOrlZDZbf+BrNTLncqBQNo05b/JesAb2lnzXdhAT3Q955ntjWseUdxcx9L1x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote: > On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport wrote: > > > > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote: > > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland wrote: > > > > > > [...] > > > > > > > > > > Can you give more detail on what parameters you need? If the only extra > > > > > > > parameter is just "does this allocation need to live close to kernel > > > > > > > text", that's not that big of a deal. > > > > > > > > > > > > My thinking was that we at least need the start + end for each caller. That > > > > > > might be it, tbh. > > > > > > > > > > Do you mean that modules will have something like > > > > > > > > > > jit_text_alloc(size, MODULES_START, MODULES_END); > > > > > > > > > > and kprobes will have > > > > > > > > > > jit_text_alloc(size, KPROBES_START, KPROBES_END); > > > > > ? > > > > > > > > Yes. > > > > > > How about we start with two APIs: > > > jit_text_alloc(size); > > > jit_text_alloc_range(size, start, end); > > > > > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am > > > not quite convinced it is needed. > > > > Right now arm64 and riscv override bpf and kprobes allocations to use the > > entire vmalloc address space, but having the ability to allocate generated > > code outside of modules area may be useful for other architectures. > > > > Still the start + end for the callers feels backwards to me because the > > callers do not define the ranges, but rather the architectures, so we still > > need a way for architectures to define how they want allocate memory for > > the generated code. > > Yeah, this makes sense. > > > > > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by > > > > > adding enum jit_type parameter to jit_text_alloc(). > > > > > > > > That feels backwards to me; it centralizes a bunch of information about > > > > distinct users to be able to shove that into a static array, when the callsites > > > > can pass that information. > > > > > > I think we only two type of users: module and everything else (ftrace, kprobe, > > > bpf stuff). The key differences are: > > > > > > 1. module uses text and data; while everything else only uses text. > > > 2. module code is generated by the compiler, and thus has stronger > > > requirements in address ranges; everything else are generated via some > > > JIT or manual written assembly, so they are more flexible with address > > > ranges (in JIT, we can avoid using instructions that requires a specific > > > address range). > > > > > > The next question is, can we have the two types of users share the same > > > address ranges? If not, we can reserve the preferred range for modules, > > > and let everything else use the other range. I don't see reasons to further > > > separate users in the "everything else" group. > > > > I agree that we can define only two types: modules and everything else and > > let the architectures define if they need different ranges for these two > > types, or want the same range for everything. > > > > With only two types we can have two API calls for alloc, and a single > > structure that defines the ranges etc from the architecture side rather > > than spread all over. > > > > Like something along these lines: > > > > struct execmem_range { > > unsigned long start; > > unsigned long end; > > unsigned long fallback_start; > > unsigned long fallback_end; > > pgprot_t pgprot; > > unsigned int alignment; > > }; > > > > struct execmem_modules_range { > > enum execmem_module_flags flags; > > struct execmem_range text; > > struct execmem_range data; > > }; > > > > struct execmem_jit_range { > > struct execmem_range text; > > }; > > > > struct execmem_params { > > struct execmem_modules_range modules; > > struct execmem_jit_range jit; > > }; > > > > struct execmem_params *execmem_arch_params(void); > > > > void *execmem_text_alloc(size_t size); > > void *execmem_data_alloc(size_t size); > > void execmem_free(void *ptr); > > With the jit variation, maybe we can just call these > module_[text|data]_alloc()? I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel image, like modules, ftrace on x86 and s390 and maybe something else in the future. And jit_text_alloc() for allocations that can reside anywhere. I tried to find a different name for 'struct execmem_modules_range' but couldn't think of anything better than 'struct execmem_close_to_kernel', so I've left modules in the name. > btw: Depending on the implementation of the allocator, we may also > need separate free()s for text and data. > > > > > void *jit_text_alloc(size_t size); > > void jit_free(void *ptr); > > Let's just add jit_free() for completeness even if it will be the same as execmem_free() for now. > [...] > > How should we move ahead from here? > > AFAICT, all these changes can be easily extended and refactored > in the future, so we don't have to make it perfect the first time. > OTOH, having the interface committed (either this set or my > module_alloc_type version) can unblock works in the binpack > allocator and the users side. Therefore, I think we can move > relatively fast here? Once the interface and architecture abstraction is ready we can work on the allocator and the users. We also need to update text_poking/alternatives on architectures that would allocate executable memory as ROX. I did some quick tests and with these patches 'modprobe xfs' takes tens time more than before. > Thanks, > Song -- Sincerely yours, Mike.