From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C516BCFA457 for ; Thu, 24 Oct 2024 10:34:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55D646B007B; Thu, 24 Oct 2024 06:34:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E6EE6B0082; Thu, 24 Oct 2024 06:34:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 387846B0083; Thu, 24 Oct 2024 06:34:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 103D96B007B for ; Thu, 24 Oct 2024 06:34:26 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5F4D4ACC3C for ; Thu, 24 Oct 2024 10:33:49 +0000 (UTC) X-FDA: 82708135692.08.8FBFDF9 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 77507C0009 for ; Thu, 24 Oct 2024 10:34:01 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729765910; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2OAX3jc19+bpKbCon7+OaXnYH7HKKrBrPJCIcgHFU2I=; b=DZFaC5ilRCVjrb735H3oLPBBcj30HOsy/VcO1BrNwPBFTMJzP/rO2qYlrihp3LyC9oEfV5 Lu0sKRi1yAEc1Le4cKnH2Bfzf4+9zMqs3FFIiTtu47Uw0PcprLdRfxfGdzuyUTbnuX0Wk8 lA7WPiGlqKTFYyGY49f+H/qkk6fnXNQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729765910; a=rsa-sha256; cv=none; b=VP7ESxKooYN3jk2/STqFBkNobA+XXhbgaN74hVLZbVn6fMFy/NBhgBENw0ouvycy1dIM7u fcOM3Iv3db3uI2qn5aJ8HkslYEp7BJGSDlC7BLAAYlzyqyJy4EDlXhlE14i5FTJ+JfuVQ+ VJWEdwmjRA59UsBdDu8yXVAhxQRPad4= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B3905497; Thu, 24 Oct 2024 03:34:52 -0700 (PDT) Received: from [10.57.88.37] (unknown [10.57.88.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4C7373F7C5; Thu, 24 Oct 2024 03:34:20 -0700 (PDT) Message-ID: Date: Thu, 24 Oct 2024 11:34:18 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64 Content-Language: en-GB To: Neal Gompa Cc: Nick Chan , Eric Curtin , Andrew Morton , Anshuman Khandual , Ard Biesheuvel , Catalin Marinas , David Hildenbrand , Greg Marsden , Ivan Ivanov , Kalesh Singh , Marc Zyngier , Mark Rutland , Matthias Brugger , Miroslav Benes , Will Deacon , Hector Martin , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, asahi@lists.linux.dev References: <20241014105514.3206191-1-ryan.roberts@arm.com> <4623805.lGaqSPkdTl@skuld-framework> <09e480d7-3ef6-4352-a484-91733ad7d231@arm.com> <649d7aa6-4163-4969-ba14-777f0e9cddb1@arm.com> <872f1c9c-9fb2-4372-810d-abe5419c4bd8@arm.com> <2174ff43-3ab6-409b-a8a8-bd319a134d86@gmail.com> <997f1826-ec45-4d47-ad94-33c0d194b5a4@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: hz5zst7u3okotxu6nr7aenoyur6nbjxn X-Rspamd-Queue-Id: 77507C0009 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729766041-955478 X-HE-Meta: U2FsdGVkX187rJBUBjKoBb4YJX+AbrpvCF+ZYSgLoeeoRQTn1sNf/XNSumXKPj1RrnrD5U48PnycJ4/PeiNbMRecJ3+uHOJi6B989oTXOrZWG3fkI7YD92T5e59Wu6vFluFDOMrtErHDWbn5WzmPsscsVAldnQOAxNiFSmCCaVpDJex7TKJnKYJCu3XP9bzuZcK8LRx4rUiu6rdrZKNmdFespQOCT8D1uZPtuuNglzf42hkoea6nkCrJp1aVILZb6sslaew+1g9wArr/2oTDvivFE4hrnRz7epyDYz/hvh+Bm172s6xEjMGAmt+HrNoCgrDbLl8fTq9z/yxHpWf11rtJKK50HFny59vPITxgDR3IM25qu8WNMl3q3NP4cZg3R2ajvDyNOElbQHKDiHJEwOmAYKeWoILnYuDZbUpyzkzfk9Isi6k5+jf9acgVSpkZoBnWMQzGVRRAah3t/nyDbfpdI0+IY5DohxL3OuLt3qdoxNO+1AqX5wXR02brvoYfZOQKf3oXdO7z0rHsTpOsRb4vW+a6u/AV0iPSUSnnhtMlL9vkAh+K1LQJ6iHOwGeNoM35YXBFgw0rdFYPzydak5sQt0qmltP/D6LrhbZoO1hfhs4j5/ncuopvqewUqlDH9YonQa1dZU4ePqGD+ezI0UIdppcQdkz6k7Rr5qzkY46m/PZaPvTSHmahJtq2cAiAbaHIC5eFzEabzqDqzFaWQ6GmIcNyzy+eIFw4HoE6bqh3QuQfePdTyGFyUxAsn7QQiU4jilHowgeCrcotSDh8VNlnE2LH6B4PBzzKOxQx7wcy/nPWHBS+yRFAyXoYB6x0NRwZqoNU5wNnIqxd2i32Cjhez8BOlaFrSRDZOVLt5KixIcPjuZ7Q8Ytzl6dRRDn7aWBnkxI3qIxA4sNqj7gYpKCqCJ1EIGjgBogp/3pQFXo1rCbYZiPRaobmvxk7gUWCMu61Tflk9+cTPHWU2zq 77GIRFmR m2VKziRF8GQ23phklAtFiZg65igx6m6pO1EAUEKBWT7Sy6VCBy/SgwLXSNGd+Xu6maJK6fJwLUrK6RhQB5WS+Q9Ton5rNbbHS9t0Kcf47Z5nclBQQKF39drdWs0BOrckHh4eoBkfbNC6iqy1kDd6JZFW03g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 22/10/2024 18:30, Neal Gompa wrote: [...] >>>>>>>>>> >>>>>>>>>> This is a generally very exciting patch set! I'm looking forward to seeing it >>>>>>>>>> land so I can take advantage of it for Fedora ARM and Fedora Asahi Remix. >>>>>>>>>> >>>>>>>>>> That said, I have a couple of questions: >>>>>>>>>> >>>>>>>>>> * Going forward, how would we handle drivers/modules that require a particular >>>>>>>>>> page size? For example, the Apple Silicon IOMMU driver code requires the >>>>>>>>>> kernel to operate in 16k page size mode, and it would need to be disabled in >>>>>>>>>> other page sizes. >>>>>>>>> >>>>>>>>> I think these drivers would want to check PAGE_SIZE at probe time and fail if an >>>>>>>>> unsupported page size is in use. Do you see any issue with that? >>>>>>>>> >>>>>>>>>> >>>>>>>>>> * How would we handle an invalid selection at boot? >>>>>>>>> >>>>>>>>> What do you mean by invalid here? The current policy validates that the >>>>>>>>> requested page size is supported by the HW by checking mmfr0. If no page size is >>>>>>>>> passed on the command line, or the passed value is not supported by the HW, then >>>>>>>>> the we default to the largest page size supported by the HW (so for Apple >>>>>>>>> Silicon that would be 16k since the HW doesn't support 64k). Although I think it >>>>>>>>> may be better to change that policy to use the smallest page size in this case; >>>>>>>>> 4k is the safer bet for compat and will waste much less memory than 64k. >>>>>>>>> >>>>>>>>>> Can we program in a >>>>>>>>>> fallback when the "wrong" mode is selected for a chip or something similar? >>>>>>>>> >>>>>>>>> Do you mean effectively add a machanism to force 16k if the detected HW is Apple >>>>>>>>> Silicon? The trouble is that we need to select the page size, very early in >>>>>>>>> boot, before start_kernel() is called, so we really only have generic arch code >>>>>>>>> and the command line with which to make the decision. >>>>>>>> >>>>>>>> Yes... I think a build-time CONFIG for default page size, which can be >>>>>>>> overridden by a karg makes sense... Even on platforms like Apple >>>>>>>> Silicon you may want to test very specific things in 4k by overriding >>>>>>>> with a karg. >>>>>>> >>>>>>> Ahh, yes, that would certainly work. I'll work it into the next version. >>>>>>> >>>>>> >>>>>> Could we maybe extend to have some kind of way to include a table of >>>>>> SoC IDs that certain modes are disabled (e.g. 64k on Apple Silicon) >>>>> >>>>> 64k is already disabled on Apple Silicon because mmfr0 reports that 64k is not >>>>> supported. >>>>> >>>>>> and preferred modes when no arg is set (16k for Apple Silicon)? That >>>>> >>>>> And it's not obvious that we should hard-code a page size preference to a SoC >>>>> ID. If the CPU can support multiple page sizes, it should be up to the SW stack >>>>> to decide, not the SoC. >>>>> >>>>> I'm guessing your desire is to have a single kernel build that will boot 16k by >>>>> default on Apple Silicon and 4k by default on other systems, all without needing >>>>> to modify the command line? Personally I think it's cleaner to just require >>>>> setting the page size on the command line in these cases. >>>>> >>>>>> way it'd work something like this: >>>>>> >>>>>> 1. Table identification of 4/16/64 depending on identified SoC >>>>> So I'd prefer not to have this >>>>> >>>>>> 2. Unidentified ones follow build-time default >>>>>> 3. karg forces a mode regardless >>>>> But keep these 2. >>>>> >>>> >>> Since we are talking about Apple Silicon and page size, I would like to >>> add that on the Apple Silicon SoCs I am working on, the situation is like >>> this: >>> >>> Apple A7 (s5l8960x), A8 (T7000), A8X (T7001): CPU MMU support 4K and 64K >>> page sizes. >>> >>> Apple A9 (s8000/s8003), A9X (s8001), A10 (t8010), A10X (t8011), A11 (t8015): >>> CPU MMU Support 16K and 64K page sizes. >>> >>> However, all of them have 4K page DART IOMMUs. >>> >>>> I think it makes sense to have it, because it's not just Apple Silicon >>>> where such a preference/requirement may be necessary. Apple Silicon >>>> technically works at 4k, but is completely broken at 4k because Linux >>>> cannot do 16k IOMMU with 4k everything else, so being able to at least >>>> prefer 16k out of the box is important. And SoCs like the NVIDIA Grace >>>> Hopper platform prefer 64k over other options (though I am unaware of >>>> a gross incompatibility that effectively requires it like Apple >>>> Silicon has). >>>> >>>> When we're trying to get to "single generic image that works >>>> everywhere", stuff like this matters and I would really like you to >>>> consider it from the lens of "we want things to work as automagic as >>>> they do on x86". >>> For me, in order to get to this level of automagic, there do need to be >>> a table of which SoC should use which page size table. >> >> OK, but it's not clear to me that this table needs to be in the kernel. Could it >> not be something in user space (e.g. during installation) that configures the >> kernel command line? >> > > This is not compatible with using things like ISOs with UEFI+ACPI > enabled desktop/server systems. We need to be able to safely, > automatically, and correctly boot up and support hardware. The only > place to do that early enough is in the kernel. But this can wait > until the core stuff is in. OK got it. > >> Regardless, the hard work here is getting the boot-time page size selection >> mechanism in place. Once that's there, follow up patches can add the desired >> policy. I'd rather leave it out for now to avoid anything slowing down the core >> work. >> > > Sure, this can be done afterward. Thanks! I understand the problem a bit better now. I'm sure we can find a solution once we have landed the core mechanism. Thanks, Ryan