From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3DCD2C52D6F for ; Wed, 21 Aug 2024 10:11:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To: Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=shPtI4yPMxhwNt4uha1HbLqCMGDAgRxC+dkJsclhuHA=; b=WAOstvhV/YKNofn7C9wsQtIlVv sNqThwKnc7dLl928N2HcJXkMjwZRYpaVC6Nwt2tdrjwkU7Y4eU6cQkn1gt9FRArNkEUdzSV4Lwfth Pu6k2G0kzxJbWOexM235NTFJcAxa0JwHGh5RWIlaQyGH5Wx8d/eQajvKTbb9mCkdl8CCgNra4d/V9 xSancBEZ+8qidgRD+fUceP5F4nuaP9qV1zF28sGROqU/W6ffLjnsViRa9fO77CEGhZ+5oq/9TSyJZ NOK4uwVg7mRmHd4GCNncJDlKJ3g58PBijb0MUgv2Gvg8tCbZNULOz8/MBZDvaDXsBbtBcPVLYU6Wd C1aDRi5w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sgiJT-00000008N1N-3I7X; Wed, 21 Aug 2024 10:11:35 +0000 Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sgi29-00000008Ja1-1riu for linux-arm-kernel@lists.infradead.org; Wed, 21 Aug 2024 09:53:43 +0000 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-428e12f6e56so51955e9.0 for ; Wed, 21 Aug 2024 02:53:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1724234019; x=1724838819; darn=lists.infradead.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=shPtI4yPMxhwNt4uha1HbLqCMGDAgRxC+dkJsclhuHA=; b=jAICwRcSOk8MmwaKord59gpbyL/QD2WLSSZHEzEeAUltGmez16iWuWlSDl1hy+D5X0 rdDFVYag1Kj1r3H3JKDGV9pHVRUoamkotLWxz32QCwt+L59gq26hEhewkSdO1vz3MxAw RmIFKqksHonfVQ6qfNymJOrmWzAWkjnGkvV0gOwW0AV9t3EezU5ULv0fpLNTJ4zuNpxB SkqMqv2yzTCiX7oCuvsHwiPG9ajlJu26BhmSjccU223e3CiPn1lOrstFUDo3ExSjcSSW /L/chyfEuUTOYzETTAJeS15vwEHZ3oA/1jSyPO38rHe3Y2mK9NzCQhmj15hPuxZ73vL7 VwFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724234019; x=1724838819; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=shPtI4yPMxhwNt4uha1HbLqCMGDAgRxC+dkJsclhuHA=; b=Vs8wUKse5U9OYnruASGZQynf1s18FXmK9S9rZedp6SgqLiWOPhxSjzd9UB8c2Slv1B fahR3ptyk/r4SEU2DOckIqYZuZRbdhrWf92+ORmlAzMzDHqz8NIOJ9HMaUdS0L2DOpZR qjAQx1u04UQkZndqpgpwLv88FVNbIGaQSaF1Be7lIKtwKEVmE3b/CJsyIJqgvhJDMslI U7a2f4ZgqwT9LWkjzo3Jrku1hwcsN/S7QYKkSmnS/rEXovevwhAvogiEkGyKUdZBYhKz 0eiR1J+JXkcnECcmpjgsMbzdTF33+JfnYmFQlqLXuS7MHgWV7EuxrRkPJG51PGoMiQnx +MrQ== X-Forwarded-Encrypted: i=1; AJvYcCV9zQcNA7cHdL8VxRmn8JDJMRU58puJXn3p+D2McCMLrtcQcd0UFjWQV+oj3xcQ/huRfgAWdbt1qhc5k2BxHg1p@lists.infradead.org X-Gm-Message-State: AOJu0Yz0/WZeKkLQQlEEUjLHhEIct5Jt6bGrM3gzQL9wq6wvyhOyLHzc NEYJLv6cp/iiFubqnUvaHuIORl70xnOaGcgASFiAtORuu7FbWlGHBeGx7arf5A== X-Google-Smtp-Source: AGHT+IHUAEyavgkTbRh5dHjOmtVDPm5Mvfn00nNuswalwfrfossBr6xCiGgGPEQlL0lIyjnw6dvHOA== X-Received: by 2002:a05:600c:1d9a:b0:426:6413:b681 with SMTP id 5b1f17b1804b1-42ab615a8acmr2142795e9.6.1724234018435; Wed, 21 Aug 2024 02:53:38 -0700 (PDT) Received: from google.com (203.75.199.104.bc.googleusercontent.com. [104.199.75.203]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-371a937a5f4sm10928107f8f.51.2024.08.21.02.53.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Aug 2024 02:53:37 -0700 (PDT) Date: Wed, 21 Aug 2024 09:53:33 +0000 From: Mostafa Saleh To: Jason Gunthorpe Cc: acpica-devel@lists.linux.dev, Alex Williamson , Hanjun Guo , iommu@lists.linux.dev, Joerg Roedel , Kevin Tian , kvm@vger.kernel.org, Len Brown , linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Lorenzo Pieralisi , "Rafael J. Wysocki" , Robert Moore , Robin Murphy , Sudeep Holla , Will Deacon , Eric Auger , Jean-Philippe Brucker , Moritz Fischer , Michael Shavit , Nicolin Chen , patches@lists.linux.dev, Shameerali Kolothum Thodi , Marc Zyngier Subject: Re: [PATCH 2/8] iommu/arm-smmu-v3: Use S2FWB when available Message-ID: References: <0-v1-54e734311a7f+14f72-smmuv3_nesting_jgg@nvidia.com> <2-v1-54e734311a7f+14f72-smmuv3_nesting_jgg@nvidia.com> <20240820120102.GB3773488@nvidia.com> <20240820202138.GH3773488@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240820202138.GH3773488@nvidia.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240821_025341_536927_543FEC41 X-CRM114-Status: GOOD ( 55.85 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Aug 20, 2024 at 05:21:38PM -0300, Jason Gunthorpe wrote: > On Tue, Aug 20, 2024 at 07:52:53PM +0000, Mostafa Saleh wrote: > > On Tue, Aug 20, 2024 at 09:01:02AM -0300, Jason Gunthorpe wrote: > > > On Tue, Aug 20, 2024 at 08:30:05AM +0000, Mostafa Saleh wrote: > > > > Hi Jason, > > > > > > > > On Tue, Aug 06, 2024 at 08:41:15PM -0300, Jason Gunthorpe wrote: > > > > > Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field > > > > > works. When S2FWB is supported and enabled the IOPTE will force cachable > > > > > access to IOMMU_CACHE memory and deny cachable access otherwise. > > > > > > > > > > This is not especially meaningful for simple S2 domains, it apparently > > > > > doesn't even force PCI no-snoop access to be coherent. > > > > > > > > > > However, when used with a nested S1, FWB has the effect of preventing the > > > > > guest from choosing a MemAttr that would cause ordinary DMA to bypass the > > > > > cache. Consistent with KVM we wish to deny the guest the ability to become > > > > > incoherent with cached memory the hypervisor believes is cachable so we > > > > > don't have to flush it. > > > > > > > > > > Turn on S2FWB whenever the SMMU supports it and use it for all S2 > > > > > mappings. > > > > > > > > I have been looking into this recently from the KVM side as it will > > > > use FWB for the CPU stage-2 unconditionally for guests(if supported), > > > > however that breaks for non-coherent devices when assigned, and > > > > limiting assigned devices to be coherent seems too restrictive. > > > > > > kvm's CPU S2 doesn't care about non-DMA-coherent devices though? That > > > concept is only relevant to the SMMU. > > > > Why not? That would be a problem if a device is not dma coherent, > > and the VM knows that and maps it’s DMA memory as non cacheable. > > But it would be overridden by FWB in stage-2 to be cacheable, > > it would lead to coherency issues. > > Oh, from that perspective yes, but the entire point of S2FWB is that > VM's can not create non-coherent access so it is a bit nonsense to ask > for both S2FWB and try to assign a non-DMA coherent device. Yes, but KVM sets FWB unconditionally and would use cacheable mapping for stage-2, and I expect the same for the nested SMMU. > > > Yes, that also breaks (although I think this is an easier problem to > > solve) > > Well, it is easy to solve, just don't use S2FWB and manually flush the > caches before the hypervisor touches any memory. :) Yes, although that means virtualized devices would have worse performance :/ but I guess there is nothing more to do here. I have some ideas about that, I can send patches to the kvm list as an RFC. > > > What I mean is the master itself not the SMMU (the SID basically), > > so in that case the STE shouldn’t have FWB enabled. > > That doesn't matter, those cases will not pass in IOMMU_CACHE and they > will work fine with S2FWB turned on. > But that won’t be the case in nested? Otherwise why we use FWB in the first place. > > > Also bear in mind VFIO won't run unless ARM_SMMU_FEAT_COHERENCY is set > > > so we won't even get a chance to ask for a S2 domain. > > > > Oh, I think that is only for the SMMU, not for the master, the > > SMMU can be coherent (for pte, ste …) but the master can still be > > non coherent. Looking at how VFIO uses it, that seems to be a bug? > > If there are mixes of SMMU feature and dev_is_dma_coherent() then it > would be a bug yes.. > I think there is a bug, I was able to assign a “non-coherent” device with VFIO with no issues, and it allows it as long as the SMMU is coherent. > I recall we started out trying to use dev_is_dma_coherent() but > Christoph explained it doesn't work that generally: > > https://lore.kernel.org/kvm/20220406135150.GA21532@lst.de/ > > Seems we sort of gave up on it, too complicated. Robin had a nice > observation of the complexity: > > Disregarding the complete disaster of PCIe No Snoop on Arm-Based > systems, there's the more interesting effectively-opposite scenario > where an SMMU bridges non-coherent devices to a coherent interconnect. > It's not something we take advantage of yet in Linux, and it can only be > properly described in ACPI, but there do exist situations where > IOMMU_CACHE is capable of making the device's traffic snoop, but > dev_is_dma_coherent() - and device_get_dma_attr() for external users - > would still say non-coherent because they can't assume that the SMMU is > enabled and programmed in just the right way. > > Anyhow, for the purposes of KVM and VFIO, devices that don't work with > IOMMU_CACHE are not allowed. From an API perspective > IOMMU_CAP_CACHE_COHERENCY is supposed to return if the struct device > can use IOMMU_CACHE. > > The corner case where we have a ARM_SMMU_FEAT_COHERENCY SMMU but > somehow specific devices don't support IOMMU_CACHE is not properly > reflected in IOMMU_CAP_CACHE_COHERENCY. I don't know how to fix that, > and we've been ignoring it for a long time now :) Thanks a lot for the extra context! Maybe the SMMUv3 .capable, should be changed to check if the device is coherent (instead of using dev_is_dma_coherent, it can use lower level functions from the supported buses) Also, I think supporting IOMMU_CACHE is not enough, as the SMMU can support it but the device is still not coherent. Thanks, Mostafa > > Jason