From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94DF4C43334 for ; Thu, 14 Jul 2022 10:16:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237857AbiGNKQk (ORCPT ); Thu, 14 Jul 2022 06:16:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237801AbiGNKQj (ORCPT ); Thu, 14 Jul 2022 06:16:39 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 586DB120A5; Thu, 14 Jul 2022 03:16:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=0Gkk0NC34K+apqrYO+RTe0+nfUNqT3lfjLteCbnILho=; b=UFnSe+fhAVY6yhXjV2wPJ61iZ8 2Zw0OYv0osZnIIxvRy4gbl9Ye5kgg+qtQKMQkvhFyAK32I4SIz8biMp+cQChL4nwYrIOG6C9ex0Ef f8U+9j9tY1bXIkJVNz+BPOChOnGQrY3uTbXPb+H8mB29/YqRxwsKepdNGgjXhQGGRQAiaiCXQXaCR fpHfR8jE+VDhlrnJsDgRZjQMdGT1qeIdRxvz6Re/H0sOAKr8FKrB+YaDbMcUjAbbDYwWHaeq77Jdg /X5bj3osOCoUykMSaOAx17CzfJkkDvSFWE1X4eNNlbC4Qk3E0NHW17ULmCJInK+jjxGJdV7o3rS7d pVljFdig==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=worktop.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oBvsy-009HkF-C2; Thu, 14 Jul 2022 10:15:56 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id E144A980120; Thu, 14 Jul 2022 12:10:36 +0200 (CEST) Date: Thu, 14 Jul 2022 12:10:36 +0200 From: Peter Zijlstra To: Song Liu Cc: Song Liu , bpf , lkml , Linux-MM , "linux-modules@vger.kernel.org" , Luis Chamberlain , Steven Rostedt , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Masami Hiramatsu , "naveen.n.rao@linux.ibm.com" , "davem@davemloft.net" , "anil.s.keshavamurthy@intel.com" , "keescook@chromium.org" , "hch@infradead.org" , "dave@stgolabs.net" , "daniel@iogearbox.net" , Kernel Team , "x86@kernel.org" , "dave.hansen@linux.intel.com" , "rick.p.edgecombe@intel.com" , "akpm@linux-foundation.org" Subject: Re: [PATCH bpf-next 1/3] mm/vmalloc: introduce vmalloc_exec which allocates RO+X memory Message-ID: References: <20220713071846.3286727-1-song@kernel.org> <20220713071846.3286727-2-song@kernel.org> <7C927986-3665-4BD6-A339-D3FE4A71E3D4@fb.com> <78A18945-0841-4CCE-8A33-6C09ECBFF7E1@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <78A18945-0841-4CCE-8A33-6C09ECBFF7E1@fb.com> Precedence: bulk List-ID: On Wed, Jul 13, 2022 at 09:20:55PM +0000, Song Liu wrote: > > > > On Jul 13, 2022, at 1:26 PM, Peter Zijlstra wrote: > > > > On Wed, Jul 13, 2022 at 03:48:35PM +0000, Song Liu wrote: > > > >>> So how about instead we separate them? Then much of the problem goes > >>> away, you don't need to track these 2M chunks at all. > >> > >> If we manage the memory in < 2MiB granularity, either 4kB or smaller, > >> we still need some way to track which parts are being used, no? I mean > >> the bitmap. > > > > I was thinking the vmalloc vmap_area tree could help out there. > > Interesting. vmap_area tree indeed keeps a lot of useful information. > > Currently, powerpc supports CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC, Only PPC32; and it's due to a constraint in their MMU vs page protections. > which leaves module_alloc just for module text. If this works, we get > separation between RO+X and RW memory. What would it take to enable > CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC for x86_64? The VM_TOPDOWN_VMAP flag and ensuring the data and code regions never overlap. Once you have that you can enable it. Specifically the problem is that data needs to be in the s32 immediate range just like code, so we're constrained to the module range. Given that constraint, the easiest solution is to use the different ends of that range.