From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CE66C4345F for ; Sat, 20 Apr 2024 09:11:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4F826B007B; Sat, 20 Apr 2024 05:11:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FD756B0083; Sat, 20 Apr 2024 05:11:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EBF06B0085; Sat, 20 Apr 2024 05:11:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6ED746B007B for ; Sat, 20 Apr 2024 05:11:38 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 148AEC163B for ; Sat, 20 Apr 2024 09:11:38 +0000 (UTC) X-FDA: 82029342276.15.95806B3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf04.hostedemail.com (Postfix) with ESMTP id AAB314001D for ; Sat, 20 Apr 2024 09:11:35 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=WiOFFcIs; spf=pass (imf04.hostedemail.com: domain of mhiramat@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=mhiramat@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713604295; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=atEFCVFKhenDXfAPlNTZ5IRj0utGmTa8H04LcjpFOU4=; b=TDtnsD+FnTKTrlzv/cyyX6zl8ZmRynDW50AvbWMKkreo9uz5nqI9O9pHays5yPFZIME3Rc KgNUxBjkjO8GE9OnUuni7lB1v+M4tzJ5JnMRsaSnFtlGT7vBo2VR9sQ1CRv9354gmG6AVy 2gqbOee0q8wL6mkyBb9trOpUER5nQnE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713604295; a=rsa-sha256; cv=none; b=BgwIRfhQpChovsZq8AiRnJeX7B5gJ5GDb5cegcqf2yTtmgMiwnpai8COs4SpPe46Kj+SUc 5Kt9L1CJ9CG47nsrE68UUqo/B8I3MksMxyTFSM8eKBrk671hRlSSTyU4Qb647g7JuuXoyl A5FlvzLYUwAqmW9P/OYyLX3ikDfWQZw= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=WiOFFcIs; spf=pass (imf04.hostedemail.com: domain of mhiramat@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=mhiramat@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 9266A60677; Sat, 20 Apr 2024 09:11:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 72072C072AA; Sat, 20 Apr 2024 09:11:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713604294; bh=gx+4p/uDBfF9iByUyBCzGltg748yzYFaTBhBB8UsaaI=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=WiOFFcIsUZjqS2Hu+iFB3XXcMZa5pzL1HBjixIci52d8Dz9R1WD9G9VB56jkKYeZ0 yDy0eI1sbeixq6g8aSxoZ6nlN7GVrsoYNR+OtdV5qezCXWWNalxmX0AQMOmxxVvjB8 mDIxwHW3sBjsjL/fldCkVAVACyfuDXihwfUSyE32m5VVdZN1QGS01/8WCa5+GHvn0e eqnq4e8Pva3SV5BNn9cHwpijQL1+svPQU6wcQztqSBwE5iNQ3v/A2Gh1fOxnMqAWe6 p9b3Z1WSt5cu2C+P9w6zBxw+xdBSQlJVW4JFCyv2uFOrSDGSq+nT5wb2kq12tKe8xw OK7o8nQPMJ1wg== Date: Sat, 20 Apr 2024 18:11:21 +0900 From: Masami Hiramatsu (Google) To: Mike Rapoport Cc: Song Liu , Mark Rutland , Peter Zijlstra , linux-kernel@vger.kernel.org, Alexandre Ghiti , Andrew Morton , Bjorn Topel , Catalin Marinas , Christophe Leroy , "David S. Miller" , Dinh Nguyen , Donald Dutile , Eric Chanudet , Heiko Carstens , Helge Deller , Huacai Chen , Kent Overstreet , Luis Chamberlain , Michael Ellerman , Nadav Amit , Palmer Dabbelt , Puranjay Mohan , Rick Edgecombe , Russell King , Steven Rostedt , Thomas Bogendoerfer , Thomas Gleixner , Will Deacon , bpf@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, netdev@vger.kernel.org, sparclinux@vger.kernel.org, x86@kernel.org Subject: Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free() Message-Id: <20240420181121.d6c7be11a6f98dc2462f8b41@kernel.org> In-Reply-To: References: X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: AAB314001D X-Rspam-User: X-Stat-Signature: 6bjzxpjnt96a7yr6omk7k5xec4wkd1yj X-HE-Tag: 1713604295-458626 X-HE-Meta: U2FsdGVkX1/daM+vH2CSoM8GWBqDh/cps70Q0JadXdCXnfVGTNc3D2YOhxxACCyhWN2YTHzso2+ueNUzY0P0ASc1kvdHXpy3TEy48M439u92kvM0kiCQn0c+nGnNAPOQif1MHvm3z9YAnhMiYGsNP9p4tkxeHXvGUHsNWXEz9gStqp+jlJ9cJK+dfkLMQsR5DRbViztu5yFyYUaIaxs6Sd7GCaE5uMvp18DscKPTIOdc8RUSgd2cKwYZgh2OEg9++wCTpUUJu56BjC3s1SIEwmstWiNWg8ewereoKVxiJGZ4fkFpn6GptRDZn3flmjuaIZpYE+5LsxOYejabd6hmwva9UEs1TPBlPf1x0FX+fiTautS97t/GMP4Ke1rCdqiM0dc45s5pe1p1qUXCcU9PJ8NH6XqiJcfdi2WBtB1f55EKpoYHVEpQ2N92YmGlMcBD8+zXsfCccFbKzgxDBG/B0id+G4KLA1qVDWajXllGANxt2xIa53JYHODQNYawLIMk6jGLAM03+OlywniuUtFTCq/AKpsbJjKx+B/Pi8ZaWgHQtKiF+SIdiaQjPsFu+aqDfnruGg1OPxqrHaCSHH/KqLGbn37Uzdjeq+ZtpUkqiFXU+SlOzsELr8fd5+/i6rDjAEbwf3HV26CaZ3dMYr8E/JPrnPmQ+C9VUVLnEk75exu/jdMB99Uz0HoB2dfKUPKhYb1MOb+J8LYXM9kPL7v8/w1mvD+DTdIrJFefS4xWTXo2mzCpmuozsvXLTEcxnyziyyr23iSSj8QYhIU5HPC3zfkFCdKYHuR3OrmzP9fyvDKAHOJE0BoQRAEKPHd/9vrOU3wToF11LqRJb4DbHEnZxs0w2cEj7F5MdVc7tgxn7zzZHKYHVKKh9Y2EB1M1IhDL5hSvgqJtOpZVl4MNYwRS2BkX5sZKIgj0S+0KbpkS/iM10j+yRyQOvE9FmHOuJxuJc5JavM/uC6PryBHPC7C aTzNjJko ad1v3+EWStKpKsoGCwYk6q3/e/8AFKZe3aI/sFKse1TKY34oPLNw4TanS8u/sua/cFQhuB0C7Rg1FrIWqrnGFgtBgMp+INRZp9eoxnLGU9LnumBuO71tSr0TEuotGpCd0FzDvpXMjvlPmmX44lIZfzP9UoAc1qQtz7mwySZFlDUD/8bQU/5AeQL4VBAowHn3gBiFVYV3Ot1JHEFC1pUkC70dd49WSGOK6zFK8vGJmj3kEPpr2A844lzaBfNdtVaev9JNNY48mWk+lJV/OtuxkZ3KMO+BymvwJwpVeuL82EEjNChiu3dVm1EMBtC6q8ePa11ATp5MYtl4xqk/Xi77IjM/cK8/O+ikmasbYlcjpU1YP7s8s1w8denGXQReCNFN2yyKebfNTMHVrG8F2Vuob9WP+7vsV38zqspH+NuUNMO7+KJg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, 20 Apr 2024 07:22:50 +0300 Mike Rapoport wrote: > On Fri, Apr 19, 2024 at 02:42:16PM -0700, Song Liu wrote: > > On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport wrote: > > > > > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > > > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > > > > [...] > > > > > > > > > > > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-rppt@kernel.org > > > > > > > > > > > > For the ROX to work, we need different users (module text, kprobe, etc.) to have > > > > > > the same execmem_range. From [1]: > > > > > > > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size) > > > > > > { > > > > > > ... > > > > > > p = __execmem_cache_alloc(size); > > > > > > if (p) > > > > > > return p; > > > > > > err = execmem_cache_populate(range, size); > > > > > > ... > > > > > > } > > > > > > > > > > > > We are calling __execmem_cache_alloc() without range. For this to work, > > > > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > > > > > > > Actually, on x86 this will "just work" because everything shares the same > > > > > address space :) > > > > > > > > > > The 2M pages in the cache will be in the modules space, so > > > > > __execmem_cache_alloc() will always return memory from that address space. > > > > > > > > > > For other architectures this indeed needs to be fixed with passing the > > > > > range to __execmem_cache_alloc() and limiting search in the cache for that > > > > > range. > > > > > > > > I think we at least need the "map to" concept (initially proposed by Thomas) > > > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > > > > maps to EXECMEM_MODULE_TEXT, so that all these actually share > > > > the same range. > > > > > > Why? > > > > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as > > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe > > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing > > the "range" object, we will have to compare different range parameters to check > > we can share cached pages between module text and kprobe, which is not > > efficient. Did I miss something? Song, thanks for trying to eplain. I think I need to explain why I used module_alloc() originally. This depends on how kprobe features are implemented on the architecture, and how much features are supported on kprobes. Because kprobe jump optimization and kprobe jump-back optimization need to use a jump instruction to jump into the trampoline and jump back from the trampoline directly, if the architecuture jmp instruction supports +-2GB range like x86, it needs to allocate the trampoline buffer inside such address space. This requirement is similar to the modules (because module function needs to call other functions in the kernel etc.), at least kprobes on x86 used module_alloc(). However, if an architecture only supports breakpoint/trap based kprobe, it does not need to consider whether the execmem is allocated. > > We can always share large ROX pages as long as they are within the correct > address space. The permissions for them are ROX and the alignment > differences are due to KASAN and this is handled during allocation of the > large page to refill the cache. __execmem_cache_alloc() only needs to limit > the search for the address space of the range. So I don't think EXECMEM_KPROBE always same as EXECMEM_MODULE_TEXT, it should be configured for each arch. Especially, if it is only used for searching parameter, it looks OK to me. Thank you, > > And regardless, they way we deal with sharing of the cache can be sorted > out later. > > > Thanks, > > Song > > -- > Sincerely yours, > Mike. > -- Masami Hiramatsu (Google)