From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ECBBC433C1 for ; Tue, 30 Mar 2021 13:59:55 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 34C0F619BD for ; Tue, 30 Mar 2021 13:59:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34C0F619BD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Subject:Cc:To: From:Message-ID:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=axn+P5BLXNjBJrH+WzNw3Q1qDN8U5AxM9B4Xc8MxIF4=; b=caHYq/RmhfbdYJC6Ek5UwxqC2 lou00qveZBNE5m8WKqEwDK404FOp9T7mN8ZHjDumF0LfFLsdx+0Fe7oGYkbk5xmXNNO9byM/1W7Zz 7qvziVMDQkV3o2YvyIAc56biPUy4nVd1x4ftpsWGYJojP6PbWHggnJiKDtqvf7hAh3QlEeApcA55B MPQ4GCV+H6SJQEiFYikwCNV9isUEoHQgGO0S6KVEFgRqxtYUPUuqdvF0mlNB8Hz/aDxd/IW0wgzT7 UsDx5X3NFC2yKsXU/CMnDZz518QqrioeL0QRty4VYB0IFjJ/MZvzPd6LiIqnjlBnEMPgTIy0LRfFa /m6eSQkCQ==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lREsx-003tju-7I; Tue, 30 Mar 2021 13:58:23 +0000 Received: from mail.kernel.org ([198.145.29.99]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lREqo-003t8x-Tj for linux-arm-kernel@lists.infradead.org; Tue, 30 Mar 2021 13:58:21 +0000 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 32834619AB; Tue, 30 Mar 2021 13:56:09 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lREql-004h0R-9T; Tue, 30 Mar 2021 14:56:07 +0100 Date: Tue, 30 Mar 2021 14:56:06 +0100 Message-ID: <87h7ksrbih.wl-maz@kernel.org> From: Marc Zyngier To: Ard Biesheuvel Cc: Linux ARM , Will Deacon , Android Kernel Team , Anshuman Khandual , Steve Capper , Catalin Marinas , kvmarm , Quentin Perret Subject: Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE In-Reply-To: References: <20210330112126.463336-1-ardb@kernel.org> <87lfa4rety.wl-maz@kernel.org> <87k0pordvw.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: ardb@kernel.org, linux-arm-kernel@lists.infradead.org, will@kernel.org, kernel-team@android.com, anshuman.khandual@arm.com, steve.capper@arm.com, catalin.marinas@arm.com, kvmarm@lists.cs.columbia.edu, qperret@google.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210330_145812_379555_6AA4B8D0 X-CRM114-Status: GOOD ( 42.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 30 Mar 2021 14:15:19 +0100, Ard Biesheuvel wrote: > > On Tue, 30 Mar 2021 at 15:04, Marc Zyngier wrote: > > > > On Tue, 30 Mar 2021 13:49:18 +0100, > > Ard Biesheuvel wrote: > > > > > > On Tue, 30 Mar 2021 at 14:44, Marc Zyngier wrote: > > > > > > > > On Tue, 30 Mar 2021 12:21:26 +0100, > > > > Ard Biesheuvel wrote: > > > > > > > > > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA > > > > > configurations") introduced a new layout for the 52-bit VA space, in > > > > > order to maximize the space available to the linear region. After this > > > > > change, the kernel VA space is no longer split 1:1 down the middle, and > > > > > as it turns out, this violates an assumption in the KVM init code when > > > > > it chooses the layout for the nVHE EL2 mapping. > > > > > > > > > > Given that EFI does not support 52-bit VA addressing (as it only > > > > > supports 4k pages), and that in general, loaders cannot assume that the > > > > > kernel being loaded supports 52-bit VA/PA addressing in the first place, > > > > > we can safely assume that the kernel, and therefore the .idmap section, > > > > > will be 48-bit addressable on 52-bit VA capable systems. > > > > > > > > > > So in this case, organize the nVHE EL2 address space as a 2^48 byte > > > > > window starting at address 0x0, containing the ID map and the > > > > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte > > > > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in > > > > > size, so it is slightly larger, but this only matters on systems where > > > > > the DRAM footprint in the physical memory map exceeds 3968 TB) > > > > > > > > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not > > > > necessarily because I have that much memory, but because my system has > > > > multiple memory banks, one of which lands on that spot, I cannot map > > > > such memory at EL2. We'll explode at run time. > > > > > > > > Can we keep the private mapping to 47 bits and restore the missing > > > > chunk to the linear mapping? Of course, it means that the linear map > > > > is now potential no linear anymore, so we'd have to garantee that the > > > > kernel lines in the first 2^47 bits instead. Crap. > > > > > > > > > > Yeah. The linear region needs to be contiguous. Alternatively, we > > > could restrict the upper address limit for loading the kernel to 47 > > > bits. > > > > Is that something we can do retroactively? We could mandate it for > > LVA systems only, but that's a bit odd. > > > > Yeah, especially given the fact that LVA systems will be VHE capable > and may therefore not care in the first place. > > On systems that have memory that high, EFI is likely to load the > kernel there, as it usually allocates from the top down, and it tries > to avoid having to move it around unless asked to (via KASLR), in > which case it will currently randomize over the entire available > memory space. > > So it is going to add a special case for a corner^2 case, i.e., nVHE > on 52-bit/64k pages with more than 3968 TB distance between the start > and end of DRAM. Ugh. Yeah. I'd rather we ignore that memory altogether, but I don't think we can. > It seems to me that the only way to solve this is to permit the idmap > and the hyp linear region to overlap, and use the 2^47 byte window at > the top of the address space for the hyp private mappings instead of > the one at the bottom. But that's the hard problem I want to avoid thinking of. We need to ensure that there is no EL1 VA that is congruent with the idmap over the kern_hyp_va() transformation. It means imposing restrictions over the EL1 linear map, and prevent any allocation that would result in this overlap (and that is including text). How do we do that? Frankly, I think we need to start looking into enabling VHE for the nVHE /behaviour/. Having a single TTBR on these systems is just insane. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel