From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED8193F9F48 for ; Mon, 11 May 2026 14:22:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778509356; cv=none; b=BeZEMz5prGGuxy8Wf5v3ST6ayW1ZGrXRFmK5FNKV1Klbu5Ufk+V4LulkBErLFvRAqPt0xkoEf5WSjwjZTgYz9gZF3QqPXBu26TgKv40+iUkhjdd9x4XxgqjXzU4lO0uE/qDbFW9deCxUszMnS/fVyborG86lsgYi5sSa2R729LA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778509356; c=relaxed/simple; bh=qJaZDUv8tWzR82l/Nglli0CCITCzVGn6lJPSfUdXdOo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=pKUFCFQ/7Aj3uRkYIMEtF2UrhGCFFhflot5MfYIjPgoFq30YYHzzl4PYOjLUbrOVZ4AbKEAPKbiHVkCE5v6W5AA9FPgGmM/fB5cinIxslXwj24xbGovCRwL806rloIK+RAiEHwsu8syI4cQUVOK7ZD+2o5m/NYryyiJWfUyTgvc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca; spf=pass smtp.mailfrom=ziepe.ca; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b=eOQWSBHb; arc=none smtp.client-ip=209.85.219.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="eOQWSBHb" Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-8b3d6b215cfso72128766d6.3 for ; Mon, 11 May 2026 07:22:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1778509354; x=1779114154; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=lRu8PBFexJkXAK4pr5czUbNe2ffmqFkTspqpEqHZ1YI=; b=eOQWSBHb0oP+9nZRpxyseLVp6LVi+crYhaFHVetcX7LfL3hIBRICIjJxahQ9oa82lf bpBGZJ2qWDfBfdRXG5V3JsZsu24iHKbXpC3C6zcf6LIZgQz4XL3yT/Mf7d81pJ9TqKS0 u7kIxw0f/MFKwQxQSlctTXv+YzU7RMi6a0wZzj1qHZ+a26qhXWXSq544nr0Zvt7A69LT DYRfusr0PXsuh5Xgr6yYdSwIgVzLwl5PDEbNT7Wixbyy956Lsb00mfqL2n9YlqAxSbVi UZMKT/f/aE1LAmt3a8Nr3HTVyVu44FqX6EWkkdqNa0qaxH2Mfqv1KWowbOE4sFCuUPH3 bDBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778509354; x=1779114154; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lRu8PBFexJkXAK4pr5czUbNe2ffmqFkTspqpEqHZ1YI=; b=a8bNjW00754yOE4CsaYy6+VoR/XYJx5pk/WZuMkGxuE8ggxNSx3BIjydRIbLcnGlhI VfOfPBw3Fd3mx8JvF9qQE+j9rNv7a2McnNR2LUJmDTrIAYlwoDjDXLCHdlWnJYzKZhSl 7HT8CeG9hV4Ko+XU4XJQ5d+j+ykc7w1PJKyPngR0//e3xQJdjAVVR5wib8mG5Vptzt3i xFSmBqCFj9l9SmgMMSk6eVK/P2bDb4+fppe+yIbb7wHhoIN6XC5pfZ2pTyACuwvKRWcW c6Ehf+WHd9Af5sWsxl4tZcNogWafpTOvbmXcAa+bWHs6sEc93CgvkTkdcZiWCLYQqwos Kxpg== X-Forwarded-Encrypted: i=1; AFNElJ/8qLksX+BIZDRRzoEcqRYQ/ZLiYp+/yQa0qcBGH4io2UFQsqZBjBdJFwggnyDdLfV568PGGVa0oDbBcwY=@vger.kernel.org X-Gm-Message-State: AOJu0Yy1abFWWUrxjTlPIgfkIv/tz/9BU5Q9nBs0jJOlfCalXdXeXpx9 U8aqiHV9e/PqP5ZOKYj4sNbSVThjmvVq8aYZFOlo/HZi9enP4Z8VdB4MT7iem34RAHc= X-Gm-Gg: Acq92OHQn5TD+gTuwRPAwPBah6tYaSZdI5WwMOqTyg5kJhbCnheUf7tbbSw7H8oOHw+ AwsTaCCX/RpZw3xS4A5DY5mq+A1qSbcGJ78tNyJNK4wJyqtgmklHYH0j/FMgOyDP5fjiOIxMjsE muamn7RjpRG93CvFxi8HsSmr0wrsROX7lUmJNeozDLe0RNyfG/yC7UstYTVvaDVVW7MmDm0+z7t UZzjjE5BWgZNSHXfPChEBSvwf0im2nwVIG/ayxYSRtYE8AJ3WUAPyDmZsLuWB1KPQRpQgbl0HIW xjXYEh452eKqYKLkH6myAtaU3bMpnYsfGP7WbW4LkYROkHGyQWFeVjQ5pXQkXhBQw04rQ05B4qv OYN9Lz73vR4xNItuFyiIfPjK8YtGdqFx26mPENwfMO0QZhO8XxCYyIhM4MBCeiUZdQ0Z0iFc4Y8 aa6qb8JNhAHKMrkRK8EKCoQ5+7Jb8ibqPcjUBK9z5oy+19qSkE3qhg+V5g/RDHdkBpm1+U028Q4 jDCBQ4109DJvW7n X-Received: by 2002:a05:6214:130b:b0:89c:8a0f:55a0 with SMTP id 6a1803df08f44-8bc42f55059mr354930086d6.16.1778509353844; Mon, 11 May 2026 07:22:33 -0700 (PDT) Received: from ziepe.ca (crbknf0213w-47-54-130-67.pppoe-dynamic.high-speed.nl.bellaliant.net. [47.54.130.67]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8b53d83114fsm308645866d6.48.2026.05.11.07.22.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 May 2026 07:22:33 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1wMRWi-00000004oEW-37xE; Mon, 11 May 2026 11:22:32 -0300 Date: Mon, 11 May 2026 11:22:32 -0300 From: Jason Gunthorpe To: Mostafa Saleh Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, iommu@lists.linux.dev, catalin.marinas@arm.com, will@kernel.org, maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, joro@8bytes.org, jean-philippe@linaro.org, mark.rutland@arm.com, qperret@google.com, tabba@google.com, vdonnefort@google.com, sebastianene@google.com, keirf@google.com Subject: Re: [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table Message-ID: <20260511142232.GP9285@ziepe.ca> References: <20260501111928.259252-1-smostafa@google.com> <20260501111928.259252-9-smostafa@google.com> <20260501130006.GF6912@ziepe.ca> <20260509232714.GI9285@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, May 11, 2026 at 11:24:14AM +0000, Mostafa Saleh wrote: > On Sat, May 09, 2026 at 08:27:14PM -0300, Jason Gunthorpe wrote: > > On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote: > > > So far this is the list of requirements/changes needed share the > > > stage-2 page table (besides the obvious: same page table format, > > > granularity, endianness...) > > > > > > 1) HW BBM is not supported in the hypervisor page table, that’s > > > because it can generate TLB conflict aborts, which the hypervisor > > > can not handle because of the limited syndrome information. > > > We can rely on FEAT_BBML3 which was newly introduced to work > > > around that, it’s quite niche and not supported in KVM yet or > > > have an allow list similar to the kernel > > > (as in cpu_supports_bbml2_noabort()) which also limits the number > > > of CPUs that can run this. > > > > Do you think pkvm will need BBM? Hitless replace of a PTE is already a > > pretty advanced feature and the SMMU has its own support matrix there > > too. Is it for shared/private conversion? > > Yes, we can break block on memory donation which is transfer of > ownership to the hypervisor or a guest. So you need BBM support on the SMMU too? That is probably a big problem because the SMMU is often mismatched to the CPU :\ Also io-pgtable arm cannot trigger BBM behaviors, so how do you implement it? > > No.. once you turn on IO like this you don't have page faults > > anymore. Everything must be permantently mapped into the SMMU view, it > > can never be made non-present and you must run without page > > faults. That's what you have in the io-pgtable constructed table, > > right? > > Exactly, but the CPU page table doesn’t guarantee that, so we either > have to handle page faults in the IOMMU, or completely change how KVM > deals with stage-2 if we want to share the page table with the CPU. So that's the real explanation, KVM cannot manage the S2 in the right way so you can't share it. RMM/etc are managing the S2 without pointless page faults so they can share it. > > > Alternatively, we can pin the stage-2 pages, that would require some > > > hypercalls, hacks to the driver/IOMMU API and possibly new semantics > > > in the DMA-API for IDENTITY devices as they will still need to pin > > > the pages as they are actually in stage-2 translation and not bypass. > > > > ?? Then how does this series work? > > This series works fine as it shadows the page table and doesn't share it > with the CPU, so it fully populates the address space. Which is why it is so weird that KVM is using a partially populated S2 when there is, and must, be a fully populated one for the SMMU. But I understand there are reasons fo rthis. Jason