From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AC994C77B7E for ; Tue, 2 May 2023 11:32:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=8tiaFmzasRWncAGnAj+gqaGtIDM2g8B0SNA84xBP1Os=; b=3DiFFKRop78alf 1qRR7cbMQbxdQwizZlHAk+pkJxlHSOhDlq4gKQNSpvYFELW+hrJkyH+8Y6kvOQSAPV6d2n+aPZ822 5S4ox+Utw2zfzkLCxPEvw0iC7cUBvgb0HSL3aJ5ycoEsLTYVKh1fGNl6hyqNFqJR9JpdC8QO2sJxw GoTNZ0UBnkcv5yxwzPIFu0GOV4l8N+cfHzk4fyj2FYpKs45HlRAZ+TMPCSmj6qz++e7n30NhsVQs+ LP7zd6P4bac0g8ils6+R/ri3hmsvdxHf/nWIaSSo8tCN7h44086TLsyQ5npCcFvmXOc8rY7j1Lybq 03WGadyr7l4Zeuj8unKA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1ptoEb-001FFJ-1N; Tue, 02 May 2023 11:31:53 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1ptoEY-001FDv-03 for linux-arm-kernel@lists.infradead.org; Tue, 02 May 2023 11:31:52 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6A21EC14; Tue, 2 May 2023 04:32:28 -0700 (PDT) Received: from FVFF77S0Q05N (unknown [10.57.23.31]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 168FE3F64C; Tue, 2 May 2023 04:31:42 -0700 (PDT) Date: Tue, 2 May 2023 12:31:37 +0100 From: Mark Rutland To: Ard Biesheuvel Cc: linux-arm-kernel@lists.infradead.org, will@kernel.org, catalin.marinas@arm.com, broonie@kernel.org, Shanker Donthineni Subject: Re: [PATCH] arm64: module: Widen module region to 2 GiB Message-ID: References: <20230404135437.2744866-1-ardb@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230502_043150_171216_DAD7A2B4 X-CRM114-Status: GOOD ( 45.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Apr 05, 2023 at 03:16:25PM +0200, Ard Biesheuvel wrote: > On Wed, 5 Apr 2023 at 13:38, Mark Rutland wrote: > > > > On Tue, Apr 04, 2023 at 06:07:04PM +0200, Ard Biesheuvel wrote: > > > On Tue, 4 Apr 2023 at 17:49, Mark Rutland wrote: > > > > > > > > On Tue, Apr 04, 2023 at 03:54:37PM +0200, Ard Biesheuvel wrote: > ... > > > > > #ifdef CONFIG_RANDOMIZE_BASE > > > > > -extern u64 module_alloc_base; > > > > > +extern u64 module_alloc_limit; > > > > > #else > > > > > -#define module_alloc_base ((u64)_etext - MODULES_VSIZE) > > > > > +#define module_alloc_limit MODULE_REF_END > > > > > +#endif > > > > > + > > > > > +#ifdef CONFIG_ARM64_MODULE_PLTS > > > > > +#define MODULE_REF_END ((u64)_end) > > > > > +#else > > > > > +#define MODULE_REF_END ((u64)_etext) > > > > > #endif > > > > > > > > I was initially a bit confused by this. I think it's a bit misleading for the > > > > !CONFIG_ARM64_MODULE_PLTS case, since modules can still have data references > > > > between _etext and _end, it's just that we're (hopefully) unlikely to have the > > > > text be <128M with >2G of subsequent data. > > > > > > > > > > So the reason for this is that otherwise, with PLTs disabled, we lose > > > (_end - _etext) worth of module space for no good reason, and this > > > could potentially be a substantial chunk of space. However, when PLTs > > > are enabled, we cannot safely use _etext as the upper bound, as _etext > > > - SZ_2G may produce an address that is out of range for a PREL32 > > > relocation. > > > > I understood those two constraints, which are: > > > > (1) For PREL32 references, the module must be within 2GiB of _end regardless of > > whether we use PLTs. > > > > (2) For direct branches without PLTs, the module must be within 128MiB of > > _etext. > > > > To be pedantic, let's define it as > > (1) for PREL32 references, the module region must be at most 2 GiB in > size and include the kernel range [_stext, _end), so that PREL32 > references from any module to any other module or the kernel are > always within -/+ 2 GiB > > (2) for CALL26 references, the module region must be at most 128 MiB > in size and include the kernel range [_stext, _etext) so that CALL26 > references from any module to any other module or the kernel are > always within -/+ 128 MiB Yes, agreed. As discussed in-person at Linaro connect, I'd prefer if we could have the module logic in one place, and explicitly handle the (rare/unlikely) cases where the kernel is too big to allow us to select a 128M or 2G region. I've pushed a branch (atop v6.3 for now) to: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/module-space-rework Which does that (and I think handles the randomization aspects correctly). I intend to rebase and post that come v6.4-rc1. Thanks, Mark. > > What I find confusing is having a single conditional MODULE_REF_END definition, > > as it implicitly relies on other properties being maintained elsewhere, when we > > try to allocate modules relative to this, and I don't think those always align > > as we'd prefer. > > > > Consider a config: > > > > CONFIG_MODULE_PLTS=y > > RANDOMIZE_BASE=n > > CONFIG_RANDOMIZE_MODULE_REGION_FULL=n > > > > In that config, with your patch we'd have: > > > > #define module_alloc_limit MODULE_REF_END > > #define MODULE_REF_END ((u64)_end) > > > > In module alloc(), our first attempt at allocating the module would be: > > > > p = __vmalloc_node_range(size, MODULE_ALIGN, > > module_alloc_limit - SZ_128M, > > module_alloc_limit, gfp_mask, PAGE_KERNEL, > > VM_DEFER_KMEMLEAK, NUMA_NO_NODE, > > __builtin_return_address(0)); > > > > In this case, module_alloc_limit is '(u64)_end', so if there's a signficiant > > quantity of data between _etext and _end we will fail to allocate in the > > preferred 128M region that avoids PLTs more often than necessary, before > > falling back to the 2G region that may require PLTs. > > > > That's not a functional problem since we'll fall back to using PLTs, but I > > don't think that's as intended. > > > > The allocations occur bottom up, so we will fall back earlier than > strictly necessary, but only after exhausting a significant chunk of > the module region. I don't see that as a problem. > > > > Consider another config with: > > > > CONFIG_MODULE_PLTS=n > > RANDOMIZE_BASE=n > > CONFIG_RANDOMIZE_MODULE_REGION_FULL=n > > > > In that config we'd have: > > > > #define module_alloc_limit MODULE_REF_END > > #define MODULE_REF_END ((u64)_etext) > > > > In module alloc(), our only attempt at allocating the module would be: > > > > p = __vmalloc_node_range(size, MODULE_ALIGN, > > module_alloc_limit - SZ_128M, > > module_alloc_limit, gfp_mask, PAGE_KERNEL, > > VM_DEFER_KMEMLEAK, NUMA_NO_NODE, > > __builtin_return_address(0)); > > > > Say I've built an incredibly unlikely kernel with 64M of text and 2G-96M of > > data between _etext and _end. In this case, 'module_alloc_limit - SZ_128M' > > would be 32M below '_end - SZ_2G', so PREL32 relocations could be out-of-range. > > > > I think that ~2 GiB kernel images have their own special set of > challenges, and this is probably not the nastiest one. > > > > However, in that case, we can tolerate the waste, so we can just use _end > > > instead. > > > > I get the idea, but as above, I don't think that's always correct. > > > > > > I'd find this clearer if we could express the two constaints separately. e.g. > > > > have something like: > > > > > > > > #define MODULE_DATA_REF_END ((u64)_end) > > > > #define MODULE_TEXT_REF_END ((u64)_etext) > > > > > > > > That'd allow us to do something like the below, which I think would be clearer. > > > > > > > > u64 module_alloc_end; > > > > u64 module_alloc_base_noplt; > > > > u64 module_alloc_base_plt; > > > > > > > > /* > > > > * Call this somewhere after choosing hte KASLR limits > > > > */ > > > > void module_limits_init(void) > > > > { > > > > module_alloc_end = (u64)_stext; > > > > > > > > /* > > > > * 32-bit relative data references must always fall within 2G of the > > > > * end of the kernel image. > > > > */ > > > > module_alloc_base_plt = max(MODULE_VADDR, MODULE_DATA_REF_END - SZ_2G); > > > > > > > > /* > > > > * Direct branches must be within 128M of the end of the kernel text. > > > > */ > > > > module_alloc_base_noplt = max(module_alloc_base_plt, > > > > MODULE_TEXT_REF_END - SZ_128M); > > > > } > > > > > > > > > > Currently, the randomization of the module region is independent from > > > the randomization of the kernel itself, and once we incorporate that > > > here, I don't think it will be any clearer tbh. > > > > Ok; I've clearly missed that aspect, and I'll have to go page that in. > > > > ok _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel