From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A440C2BEC34; Thu, 25 Jun 2026 20:43:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782420238; cv=none; b=Ta9RoyUIMWdNgp3yczTu6zrK7sDrEYod5Zb0rtPUvTHHdMvvaevx8z924ZwfkU+hcI83iA2ISBtfvL62pIbEiDKB4xtHSJkhwmjkhImT+MIbzna1RfybO1zRRRzKVIv4oFRkgPkqrJxtgar1lLQp0oHFjhWxBbI74PDQloXF+VM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782420238; c=relaxed/simple; bh=6mjm7ztUhM+UsARMBsbn37A9uVvdzJbdEzYTl37em28=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=LGH2wXpJvbcs/4017YVPa1nvNxZjD6PuorXeV/5j24x53tXHU+hdURMrMH0lG7i/N7malz/efb6HgKkNUKxBFau1d87wG68q9n20f8wGPaUdrEJDn7X0E/PxvNqKcezn+pt8qpSdw/ioYFJ7/cuM3+FEgzJsfjjcKOSkpsLwSv4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lGksxVq+; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lGksxVq+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 518A11F000E9; Thu, 25 Jun 2026 20:43:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782420237; bh=zSwvIEP71JLAUC8GD2o8kJZj3FkWSNlSkDXruBlYrD4=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=lGksxVq+0cPxwkswllexFn2c75zliDdYmGN0QEpwCyPI/J8P0oELNpEPAVnzV8uwZ nOjFnSi/m++bpDkIr2EgdPdL4039b+mEsvkow+O9CBTCMFgm199fLeadALGLyWVrSd n5gxFCD4Mi+ArrDO7VYquVAB9A4eq1WFdrU+PKNHaxaY0uBezB/OE3ImcgMqQL9dgM JYIjiVYc4n4tDtQzhWJGZ9KHa4xB5cBKpAoUSvY/w2gJlXN/iqY7Hnub1kMyjcoU0H 69b7qNuJZ+b419VZ0kmnVgapGuN3tB1fb2PsSGh+YLSA+5sxUd0mxxp4LajowRHfqK gcdDFsZBxmwcQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wcqvT-0000000G6ge-1ONI; Thu, 25 Jun 2026 20:43:55 +0000 Date: Thu, 25 Jun 2026 21:43:54 +0100 Message-ID: <86cxxeqrjp.wl-maz@kernel.org> From: Marc Zyngier To: Oliver Upton Cc: sashiko-reviews@lists.linux.dev, kvmarm@lists.linux.dev Subject: Re: [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set In-Reply-To: References: <20260624202446.1698535-1-oupton@kernel.org> <20260624202446.1698535-2-oupton@kernel.org> <20260624204025.519861F000E9@smtp.kernel.org> <86echur5g9.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: oupton@kernel.org, sashiko-reviews@lists.linux.dev, kvmarm@lists.linux.dev X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Thu, 25 Jun 2026 20:34:46 +0100, Oliver Upton wrote: > > On Thu, Jun 25, 2026 at 04:43:34PM +0100, Marc Zyngier wrote: > > On Wed, 24 Jun 2026 22:00:43 +0100, > > Oliver Upton wrote: > > > > > > On Wed, Jun 24, 2026 at 08:40:24PM +0000, sashiko-bot@kernel.org wrote: > > > > > + /* > > > > > + * The architecture sucks; assume that the S1PTW fetched for write if > > > > > + * HA is enabled at stage-1. Note that hardware updates to dirty state > > > > > + * and table AF are predicated on HA=1 (DDI0487 M.a D24.2.194; R_SNVTX). > > > > > + */ > > > > > + if (kvm_vcpu_abt_iss1tw(vcpu)) > > > > > + return effective_tcr_ha(vcpu); > > > > > > > > [Severity: High] > > > > Does unconditionally treating S1PTW faults as writes when HA is enabled break > > > > guests that use pre-populated page tables in read-only memslots? > > > > > > > > If a guest populates its page tables with the Access Flag already set, places > > > > them in an RO memslot, and enables HA in its TCR, the hardware only needs read > > > > access during a walk. > > > > Correct, this is a consequence of R_HDTRB. > > While not _directly_ related, this does seem at odds with the > implications of R_JCXVS. > > I.e. when HD is set, the PTW can speculatively fetch the S2 translation for > write before knowing if the S1 descriptor is actually subject to an > update per R_HDTRB. I agree that this is a bit odd, and has a taste of speculative writes (always a good idea... not!). > > It obviously all still fits together (permission checks are later down > the line), just weird is all. Anyway... > > > I've been chewing on that one for a bit, and came up with the > > following argument: > > > > - We're missing one of Read or Write because of L1's doing, and L1 > > needs to do *something* about it. We don't need to find out about HA > > in the guest, we just need to forward the fault (and it is probably > > enough to check that we're in a nested context). > > We still need to account for host-induced permission faults, e.g. dirty > tracking or an RO memslot getting mapped into the L2. So I think we > still need to evaluate R+W before forwarding to the L1. Hmm. I had forgotten about this indeed. Ultimately, we need to keep track of why a S2 entry is RO in the L1 IPA space. We can either use more SW bits in the PTE (not that many left), or wait until Wei-Lin is done with his reverse + direct map tracking structure. > Looking ahead to HAFDBS, for this to work we will need to use a liberal > interpretation of R_JCXVS at the time of the initial translation fault > and always walk with intent for write. Yeah, I'm not too precious about that, and we might as well take advantage of the architecture. > > Basically, there seems to be a subtle difference arising between writes > as observed from the nested MMU and writes as observed at the virtual > endpoint (memslot). Funny how something as straightforward as the access > flag can be so headache inducing :) Well, you knew NV was just a sorry hack, didn't you? ;-) It's just another case of "SW will sort it out eventually...". > > > - We have checked that HA==1, derived from that that we need R+W, and > > we're missing the Write permission because the page is marked RO > > (assuming that KVM still maps with Read permission by default): > > > > - either it is "opportunistically" RO (dirty tracking, page never > > written to), and we flip it to RW, rince, repeat. > > > > - or it is hard-wired to RO at the memslot level, and the only > > possibility is that the PTW is trying to perform an atomic access > > (as per R_HDTRB), which we should be able to reject with an > > "Unsupported atomic hardware update fault". > > > > Thoughts? > > All aboard :) I've just been staring at dirty state crap for too long. I > really like the unsupported atomic fault over our current behavior of > injecting an SEA. > > I'll fold all of this into v3. Sounds good. Thank you! M. -- Without deviation from the norm, progress is not possible.