From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C8C2371041 for ; Mon, 11 May 2026 11:24:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778498662; cv=none; b=NDilX2OrKwZ34VJdhYRIDO7Xoyld9T32A01YOvo0tEYjQmfI0/YOM6RenL7DxjSItnPb7euFQTUyVr5DjyDCPs5mpiX2W9zkVcxSoqdZV0wGGzaAmU0/SIQU1UC6RnyAoAL2Iykc/61d4w6DO0kkAHAmsr+JKxRgb5a6b3Xvk9c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778498662; c=relaxed/simple; bh=jbkPxkd9BN13vEMhJ++fpHhlXgRluMJRb8/wkrIt/Tc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kS1rvZKA2Q3HuWEi8+UNhiPr+eB6e1+uxWs3mozOfzSQbsZQqWfhhs+dYSBxholK5v8IDtMoAPj9PDz7BgB22Mnx44jCvpLAVlOERudcbUgGVl8K8dT35aqiJl4kwAuT4kaF8Lyvn4LmaH5fSMpOFU4Hf8LVDYky+qzgMoCtbhs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lrMSz6h/; arc=none smtp.client-ip=209.85.128.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lrMSz6h/" Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-488940ccfa6so625e9.1 for ; Mon, 11 May 2026 04:24:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778498659; x=1779103459; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=TPN0GnVnAV+QTqwTIgV1cCo5LbQpemV7wVfcXMSy54o=; b=lrMSz6h/tX9T64gNzkhE/Q42fc/LbyNe2GQ1s3ltzOSHRSPk5cMgtcVDgT68kZR73E M4Y4S/QDvqG7g47jPjVXMe5bAs+BPFIr9IbDGr/TiqRicpg6GpywQx4zKT6vFLcAdws0 UsY5bHXZihoaWbJPyXTYTsDA/A7Pg8oD2mK2dwz7TNySrxsqT7o4ZN0sdf99kpoaSJ1t Vunz+unMRmvuyK6xhupKUY/Z1Ebi/ts3ySBpfZm+/WMY8ujJgYdr4eWmPigrhw3UlyaJ 5bmuUBgEf0TPMHgVWuzwLIfxXWoME8QZVHiBpw7QsDyNgZRkPkp4CxFrV9rLEgHMGVTv uixw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778498659; x=1779103459; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TPN0GnVnAV+QTqwTIgV1cCo5LbQpemV7wVfcXMSy54o=; b=FQycdTCgLb3LbU7yPDM/HvUVq5oeFGeoMwuhQDppEUSDw2HMT5vvj8GUXmsaqYHqGE s3zqD7nWYwYIxb9cFLUG4o74zrMXBZLsUJRCnxzeSI+m0k6rp8xRyo4qi0Y+um7N7PmD 7MPI9wWBtEdfvKuNoDbw5EjbGgVskpmAzM/kQm9dWKUrvdZIcmkCS5Mj+Z7fdV5fR/c3 tiuZZoYmIPM9+j3kYAbYP5YxuHVQmfqKrhiDklBs/TZvKFyfADvwc1hw9tuOhQyXDFZz K0vp/QW0dllk2ILdLDfmf5QPmWbdbDFt2+jeD04WDBLyequADMsVM5bH9MeuuJoovUvP xXqQ== X-Forwarded-Encrypted: i=1; AFNElJ8q1BUPsuF4gbs/BuyuFrhEjylLYjwI/W0jYSKa5xOxKzZXhcZkUdw5o7fsL4kGSFFoiIX8359OPf/CSaw=@vger.kernel.org X-Gm-Message-State: AOJu0YwgL1O+M81CeY25z1hibbvn2Wf97lvDQ80iZDK1z8LjXDaebmDx mflBcu7dTHhBLMaED8pgeIOMi13k6jKPeEcP90ELCSjaKUt88s+2v7AE47kAZlzSLg== X-Gm-Gg: Acq92OFn5I8aur4c6XHqYvchcisQLsI+EqjQzZBez2triKGsUbIgO9HTGu0+RH6gRi7 5ZTeZZEC2WBmt7uio4hUgR3Ea45EUcu8OFPeHr3o7ZswJ/UySb8mfr26SSpfJ1dwJIjzUr89yHB 45LgGjoKcaqMsXOogF3TfXiI+x3UwxOl+7sXWktOvWiIvoD+PWE6TQKFulfPteu0MDnmZHZroFU MBrOAE6SOCL8LKaNjb9ApiVnom4O9jO2XeC+nEBAVTLJwq3As1b83238LZlciCOQ4PxMBVOTOGs hwwwotgQBzW+TL8Mg1JSoUrzvJVUx4aCHttXoh+BN3zbcW82y1Zzpkix38hf+oicG5sFZZlvof2 /9gOeJH6YOh8h2ArPvu0vb/tiJf+gLQp/FaPXnYjQ7sRoTRGoR11JHyXgpkSM+yNtIRcwnNnz5d WZHIg6UFGSRVhzhfc/TVROlJum/y9IksI68vqkI/YfcfKUVM1YE9NEb3JkrGK/JWZ5KO0= X-Received: by 2002:a05:600c:638f:b0:45f:2940:d194 with SMTP id 5b1f17b1804b1-48e6e954525mr2966735e9.2.1778498658476; Mon, 11 May 2026 04:24:18 -0700 (PDT) Received: from google.com (8.181.38.34.bc.googleusercontent.com. [34.38.181.8]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45491da03a7sm25337059f8f.33.2026.05.11.04.24.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 May 2026 04:24:17 -0700 (PDT) Date: Mon, 11 May 2026 11:24:14 +0000 From: Mostafa Saleh To: Jason Gunthorpe Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, iommu@lists.linux.dev, catalin.marinas@arm.com, will@kernel.org, maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, joro@8bytes.org, jean-philippe@linaro.org, mark.rutland@arm.com, qperret@google.com, tabba@google.com, vdonnefort@google.com, sebastianene@google.com, keirf@google.com Subject: Re: [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table Message-ID: References: <20260501111928.259252-1-smostafa@google.com> <20260501111928.259252-9-smostafa@google.com> <20260501130006.GF6912@ziepe.ca> <20260509232714.GI9285@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260509232714.GI9285@ziepe.ca> On Sat, May 09, 2026 at 08:27:14PM -0300, Jason Gunthorpe wrote: > On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote: > > So far this is the list of requirements/changes needed share the > > stage-2 page table (besides the obvious: same page table format, > > granularity, endianness...) > > > > 1) HW BBM is not supported in the hypervisor page table, that’s > > because it can generate TLB conflict aborts, which the hypervisor > > can not handle because of the limited syndrome information. > > We can rely on FEAT_BBML3 which was newly introduced to work > > around that, it’s quite niche and not supported in KVM yet or > > have an allow list similar to the kernel > > (as in cpu_supports_bbml2_noabort()) which also limits the number > > of CPUs that can run this. > > Do you think pkvm will need BBM? Hitless replace of a PTE is already a > pretty advanced feature and the SMMU has its own support matrix there > too. Is it for shared/private conversion? Yes, we can break block on memory donation which is transfer of ownership to the hypervisor or a guest. > > > 2) Handling page faults, devices must be able to stall and let the > > hypervisor handle the page fault (which has to proxy through the > > kernel as the hypervisor doesn’t handle interrupts), this includes > > also IO page faults which are hard to get right from the HW which > > and may lead to system stability issues or lockups. > > No.. once you turn on IO like this you don't have page faults > anymore. Everything must be permantently mapped into the SMMU view, it > can never be made non-present and you must run without page > faults. That's what you have in the io-pgtable constructed table, > right? Exactly, but the CPU page table doesn’t guarantee that, so we either have to handle page faults in the IOMMU, or completely change how KVM deals with stage-2 if we want to share the page table with the CPU. > > > Alternatively, we can pin the stage-2 pages, that would require some > > hypercalls, hacks to the driver/IOMMU API and possibly new semantics > > in the DMA-API for IDENTITY devices as they will still need to pin > > the pages as they are actually in stage-2 translation and not bypass. > > ?? Then how does this series work? This series works fine as it shadows the page table and doesn't share it with the CPU, so it fully populates the address space. > > > 3) SMMUv3 must be coherent. > > Yes for sure. > > > 4) Support BTM/DVM for TLB invalidation, otherwise some hooks are > > still required (although not io-pgtable-arm) > > SW needs to forward invalidations, BTM is rare.. > > > IMO, 1, 2 are the most tricky parts. It's more work and runs on very > > limited systems, However, it can be implemented as an optimization) > > which is my plan. > > I think unless you can do it without these HW features (excluding 3) > don't bother. I am looking into this now, but as I mentioned that will be a separate RFC following this one as an optimization for advanced HW. Thanks, Mostafa > > > I am not sure how CCA deals with that, I’d expect they have a lot of > > constraints on CPUs/SMMUs and DMA capable devices on those systems. > > 3 is not supported. The entire S2 is permanently mapped and doesn't > really change alot at runtime. No page faults, not sure if the RMM > private/shard conversion would require BMM.. > > Jason