From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3923F3D6CB7 for ; Fri, 24 Apr 2026 16:39:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777048764; cv=none; b=Wc80/YvszQd7blG5w2Rr/CLQX+rgbKjJU6iRgf1P6axriHXK5N1sX3pYQT6iYK6doTgFeLkZiHdaBhsO2SFRvtQ0VST4zpchI6AzCoafX/DZZ5ki5IfMvwwFfaxjrjx7Q4HXtm2L07IEP2E4yLCa5avbx9vODOBoCVbETxgM0hI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777048764; c=relaxed/simple; bh=c7Ret2AWgfDsaIm21hkMMD9rs9MLLV0BuDZ7Xo+ndwQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=JQbHw8mVFc/0b755AwHJX1qmFIu5EK5QGW9twldB2BPV16e3C4QngL8Iq8+Z8K1zbRGa0PX/RaRFYGHMsM+gZC1g927G98duowyMe6uTD65OgAzw6cLZGoZWT00yTcK/SyPDnx8QasqhyqmRErEFYtE4x4tkle01A2FXbTPCAEA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca; spf=pass smtp.mailfrom=ziepe.ca; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b=ZuMYSXkv; arc=none smtp.client-ip=209.85.160.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="ZuMYSXkv" Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-50e5bea4045so50892031cf.3 for ; Fri, 24 Apr 2026 09:39:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1777048762; x=1777653562; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=baVqOWXW0W679wCWJSOseh6T7DFu4rt5dV7U28qjPJU=; b=ZuMYSXkvpdb8qMOzER+T2bUi5bV2DOMF+w9I1pH6HZpxf0tKbdpu44NmoJJonPcTIz YY4SkoSdAAsIAjkuN4vFDUThOk5tUgwhyheCzexMXGPiIvokOa52Dp1nfi9Bm7rufYW6 1JGkgfmIH1TmqkU1gFV8nWFYgyp8nBcM4p3C5rzaCK8yodFwgibwzY+Z62KcvtegQm0c b03yNFuK+Z4Y21ZW5Nj+9sw1Khl7cS4BTyCr4gxCS4lUV9UhaJhpmGypIelQFsVo216I yVWbzFTQ2mDOMLubRrxDXzC11dTt5ZrjTug8neBUNCRB+/cT3dj8IEtE2pQydXVEFU/d HylA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777048762; x=1777653562; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=baVqOWXW0W679wCWJSOseh6T7DFu4rt5dV7U28qjPJU=; b=RCGcgAS9UIlVuFgT7DXvpgkYoCCVMVeLy6teb2sce1kmwOL5Ca9+mGYCx3HpJiSxCi Ivi6HnSEYHp/nqG5B2UovJcRSnzvM2yz9hshrGXgXgvpd84rTgoe5QyUt1U31AfkVkJX SyTIlOm+vHj7yf0RiQ41akvfYVLIhUMRp31jYpg85agfDBKhTCsE+nqaF/xtaWB7R8WE Eyiry19gOGwJiiHFQXOJ/7sGtwU82dGJi34sJBdJ7c0EGKJWTnf7xxKjgQ4xuQmJW9TS /FHSBvBvX5Trfnyyi8w679vhkemlUmGYPQl6I66eplAxCH4VqeKM13roFcnuhEOI26kj +VsA== X-Forwarded-Encrypted: i=1; AFNElJ8qvyx7lIdTPWqb/yVSN33+eZ4h1pSQMe92ubHR+yfey61vlfxoxkLQi0cjnRbkFYsvb8bB2PWcDRt7Oyo=@vger.kernel.org X-Gm-Message-State: AOJu0YzY3cxm1khZTskMjVZn5vwcJ0qTx52UzDUFM1BZx+5B21NJDl1P rDPmKTkonvea67mmaRqDTuoe07mXHY8gvV3+1bk7t4z/SQUBFmSY7jr0UnPbDRm0FTA= X-Gm-Gg: AeBDies9jyUZT4snY8J78J3FY0JzmBpr5ayOvn0OCbO49MFTpIWU5uIKseVeXufQ4Ff /5eJXC1+v0uvdGRFABpd7b52KkE1rl+f1J5DXyzCLktGAxAga/FAqRIN12tNGD921af9lxLv0/b HBO85HtMOcg17342IamLQ2Yp616rfjyX2W01CByjmKmR8zC2YZNz3JUKgruL9VeBryV6sq5XWiD 8hl6Icqw9Zf4hAxe3ye2f7f7F1hSr6SZRbEMYLQsbA7ggv7ffT+FoMarbEaQxN8qzn7hPDcucPV BkmUhHsI3hh0CXV51/b1q8pX7mpHbkr/7fyMb5f48gWXxLgvSs3d6Z1mFSMRKtomQ5mDvfXlHlL zUu0/YtMximj5mspWu2cm3WKQ8eCkIxD5zbbHGOINBMVe1dkTlKeFPezDAS3FidJXfaxBOF0ejG dEyoBYbd3Jzze3F1GBhL5WL2DQQgoc6/uojE4VGpruHOzMA95fil9iSmfJw6KSSN950KhLXvrFh Q4rkCMkR3KkUAZXb0mtKjL8Y+Y= X-Received: by 2002:a05:622a:4187:b0:50b:37a1:a012 with SMTP id d75a77b69052e-50e36c419bcmr479897421cf.41.1777048762005; Fri, 24 Apr 2026 09:39:22 -0700 (PDT) Received: from ziepe.ca (crbknf0213w-47-54-130-67.pppoe-dynamic.high-speed.nl.bellaliant.net. [47.54.130.67]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8b02ae7fef0sm182965026d6.38.2026.04.24.09.39.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Apr 2026 09:39:21 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1wGJYn-00000003KVv-09ty; Fri, 24 Apr 2026 13:39:21 -0300 Date: Fri, 24 Apr 2026 13:39:21 -0300 From: Jason Gunthorpe To: Will Deacon Cc: Evangelos Petrongonas , Robin Murphy , Joerg Roedel , Nicolin Chen , Pranjal Shrivastava , Lu Baolu , linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, nh-open-source@amazon.com, Zeev Zilberman Subject: Re: [PATCH] iommu/arm-smmu-v3: Allow disabling Stage 1 translation Message-ID: <20260424163921.GG3611611@ziepe.ca> References: <20260422064431.GA49867@dev-dsk-epetron-1c-1d4d9719.eu-west-1.amazon.com> <20260422162351.GK3611611@ziepe.ca> <20260423142326.GP3611611@ziepe.ca> <20260423223716.GS3611611@ziepe.ca> <20260424154256.GF3611611@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Fri, Apr 24, 2026 at 05:01:27PM +0100, Will Deacon wrote: > On Fri, Apr 24, 2026 at 12:42:56PM -0300, Jason Gunthorpe wrote: > > On Fri, Apr 24, 2026 at 04:16:17PM +0100, Will Deacon wrote: > > > > > > STE/CD is pretty simple now, there is only one place to put the CMO > > > > > > and the ordering is all handled with that shared code. We no longer > > > > > > care about ordering beyond all the writes must be visible to HW before > > > > > > issuing the CMDQ invalidation command - which is the same environment > > > > > > as the pagetable. > > > > > > > > > > You presumably rely on 64-bit single-copy atomicity for hitless updates, > > > > > no? > > > > > > > > Yes, just like the page table does.. > > > > > > > > I hope that's not a problem or we have a issue with the PTW :) > > > > > > You trimmed the part from my reply where I think we _do_ have an issue > > > with the PTW. Here it is again: > > > > > > The non-coherent case looks more fragile, because I don't _think_ the > > > architecture provides any ordering or atomicity guarantees about cache > > > cleaning to the PoC. Presumably, the correct sequence would be to write > > > the PTE with the valid bit clear, do the CMO (with completion barrier), > > > *then* write the bottom byte with the valid bit set and do another CMO. > > > > I wasn't sure if you are being serious. > > > > CMO + barriers must provide an ordering guarentee about cache cleaning > > to POC otherwise the entire Linux DMA API is broken. dma_sync must > > order with following device DMA. IMHO that's not negotiable for Linux. > > The problem is with concurrent DMA (from the page-table walker) and I > don't see anything that guarantees that in the CPU architecture. I don't > think the streaming DMA API pretends to handle that case, does it? It > relies on a pretty rigid ownership concept from what I understand. I think you pointed out two things, ordering and tearing. Ordering is OK. If I write a PTE, dma_sync, then command a device to use that IOVA the PTW must observe the new PTE value. Otherwise dma_sync isn't doing what Linux requires. Tearing is a different issue, if the device uses the IOVA and races with the PTE write changing it then you say maybe it can mis-read it with tearing. However, this race only happens if the PTE is currently non-valid or being changed to non-valid. Meaning randomly you will be getting an invalid IOVA event. In non-coherent mode we don't allow SVA and we don't allow VFIO. Only the DMA API and drivers open coding things. For VFIO and SVA, yes, we need the HW to work and properly, userspcae can trigger invalid IOVA, we can't tolerate a corrupted PTE. In embedded I suppose you could make an argument you don't care about it since invalid IOVA would have to be caused by a buggy kernel driver, it should never happen, and thus this is really a debug feature. If the race will never be hit in a working system maybe it is fine to leave it as is. Would be good to document this detail :) > Of course I'd rather that the architecture said that our current code > is fine, but if it doesn't then I don't have much choice, really. At the > very least, we should minimise the number of places where we rely on > non-architected behaviour and so keeping the CDs and STEs non-cacheable > remains my preference. So, I am convinced, PTW has that escape above that doesn't apply to STE/CD. Those can be accessed truely at any time and we can't ever leave a 64 bit value in a strange state. Jason