From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19AF86A332 for ; Fri, 22 Nov 2024 15:33:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732289625; cv=none; b=VVKxjg7JuEiHrT+Q4rjMnJDLZkK97KMu7V7i0pIDExo2lxtQ3gZ9THlQYTFr4uI+Mjiu9AwZ27Kxga4NUHX1+dWfP+jXJ9vWZvgQzF9CHqbuHIx42b741GnZqkEPWpa5J1DjjXBxIXO7cH1IHaX1nKNGxiyhVoPbuy5ShwnK31s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732289625; c=relaxed/simple; bh=5/4mOCruP9LCg0HwglvqohMWx01ZFcFeJHiltZsn6Zs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=WbWk8p2UWJ0VSSgRI7MeTKy9GaXoUTRQF8gCRmCnEI7fors0AnePwHIU+TpN4QhsUegnmrqwhJUJDvx/sSUJqSQc5+WxDm7Kxx/CB3DNlcRFLmERtEpomdajx0OqkYWrzCEUZ8J0PNpBO4V3ig0RgGOsYMiSI++0lTFrtkFaEhg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca; spf=pass smtp.mailfrom=ziepe.ca; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b=DPDBwdtB; arc=none smtp.client-ip=209.85.160.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="DPDBwdtB" Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-460c2418e37so14536671cf.0 for ; Fri, 22 Nov 2024 07:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1732289622; x=1732894422; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=EOCWTCKBT/RmxXc7uytrHzl3RpOs2C/hy6YIYREc/Lg=; b=DPDBwdtBRUA3evxTQe3BT7jm8vK1mahsr4wxYESKT0YL338iFFzvFUN6pBKr3xQytM fvEeWLlVmERb67jMfHvxyoGz9vwL+mwoWLkmGIuvj5oqcbImX1jZVuOEYPwjGZl1LfJf 18neTU66RfhIxuzMD3Z4Lk9l3cZjRb0RE+odyanE+g4AGEkkJw0mKxjxdqJcD/EckRMp z0MSXLhYtiJ+xH3nbnvV7PzYvXnnYMbOO7jYDL7uBEZV1pdFjsbtRFumEQ+WrnI8MWrS mNtw/82RhY+w/piaO8hxJsf+2tk71tF7yXVXiBOxbu8PWV9eowJddZ+sswNNLTbnooGy c4EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732289622; x=1732894422; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=EOCWTCKBT/RmxXc7uytrHzl3RpOs2C/hy6YIYREc/Lg=; b=RX0Rm0116tj+rI9drIo5jAwDzj16hpesnFlci5HmhaykrLCCFUyQ6qOeC8BMnuQ+3y GV0pmlJ05TGzl8auf30kiZAm76UXwZGZVTiLIz+vKaPVsgpQYhqtaNspRBca265gAJG0 uvkR1/rXeXP4cReJHWQxELKXp4TU37syWkVy66MBioK99q3Zg6nGsRI3HIcEaA8Cd1p1 agInZI8Cz5JVjJd2FmBCVI2XIvGpd/foTGqvGys3Jakz9SVOgF/WmZRxPJFC+wM78ho7 D3wMCWX8QMs7h/LbBdksHZ1p3dIOhcHVMhHDmS0iCHl0D55Y+oc5LHpSMzkfEWJyHFJV IN/A== X-Forwarded-Encrypted: i=1; AJvYcCW+ybHUXwz8w2Ze69xeZGOoQ0W44AFJfYTc7g7zuRgeXKWwMvCGyAV06d2VedtseFFCLJH70YDcGj+txaw=@vger.kernel.org X-Gm-Message-State: AOJu0Yy+OK0l6LDRCqLpYOoyx9wzGgcSpecV3olpnoNrM+s2stwd6dA3 fMNSNUAguz76LpsVOICQEcxwH/766o4YKt9vsA2z23LNiNVBIBob16UABq674YpomrtBTD8/KGx S X-Gm-Gg: ASbGncvMg0WR4EWtjgh004pTAFqCbh7egsxyiD2HrKZRbrZXTW4jTwgCBn2CnMr0Rmn 2qQu3Y9wI0+jhegQHowxS7c+H1rHZzMd+XXyMM26BoPUnjq5SYLG43Mzsp1WJQioqGZKkFYTKyR NwQr2MHGVjgGxVvTiLc7vJDiK+bgBvI39BkenXEYi+QLh6CiRH6QoCrrdnlBpKR1qA7oXyK1rKV SThWC7C9gi31RQ4YzWCk/d9S+8WJDxcJdcKfItifihPJ7q/eHYWxvnJbaoQcBvkRl1hSdLN12Zt +ZlR5IqfflnYMgtQwJicYmw= X-Google-Smtp-Source: AGHT+IE/84E5Hy9Mt/fOujPHeev+IBUL62FBMMJZ/uhWKf39i8Fv9iytqMnfgjTl0nmjaLXsEb0oaw== X-Received: by 2002:a05:622a:388:b0:461:186b:6b9d with SMTP id d75a77b69052e-4653d568a95mr41065441cf.17.1732289621751; Fri, 22 Nov 2024 07:33:41 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-128-5.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.128.5]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b514152bbasm95886085a.100.2024.11.22.07.33.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2024 07:33:40 -0800 (PST) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1tEVfA-00000004Tw6-1J4a; Fri, 22 Nov 2024 11:33:40 -0400 Date: Fri, 22 Nov 2024 11:33:40 -0400 From: Jason Gunthorpe To: Andrew Jones Cc: iommu@lists.linux.dev, kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, tjeznach@rivosinc.com, zong.li@sifive.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, anup@brainfault.org, atishp@atishpatra.org, tglx@linutronix.de, alex.williamson@redhat.com, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu Subject: Re: [RFC PATCH 08/15] iommu/riscv: Add IRQ domain for interrupt remapping Message-ID: <20241122153340.GC773835@ziepe.ca> References: <20241114161845.502027-17-ajones@ventanamicro.com> <20241114161845.502027-25-ajones@ventanamicro.com> <20241118184336.GB559636@ziepe.ca> <20241119-62ff49fc1eedba051838dba2@orel> <20241119140047.GC559636@ziepe.ca> <20241119153622.GD559636@ziepe.ca> <20241121-4e637c492d554280dec3b077@orel> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241121-4e637c492d554280dec3b077@orel> On Fri, Nov 22, 2024 at 04:11:36PM +0100, Andrew Jones wrote: > The reason is that the RISC-V IOMMU only checks the MSI table, i.e. > enables its support for MSI remapping, when the g-stage (second-stage) > page table is in use. However, the expected virtual memory scheme for an > OS to use for DMA would be to have s-stage (first-stage) in use and the > g-stage set to 'Bare' (not in use). That isn't really a technical reason. > OIOW, it doesn't appear the spec authors expected MSI remapping to > be enabled for the host DMA use case. That does make some sense, > since it's actually not necessary. For the host DMA use case, > providing mappings for each s-mode interrupt file which the device > is allowed to write to in the s-stage page table sufficiently > enables MSIs to be delivered. Well, that seems to be the main problem here. You are grappling with a spec design that doesn't match the SW expecations. Since it has deviated from what everyone else has done you now have extra challenges to resolve in some way. Just always using interrupt remapping if the HW is capable of interrupt remapping and ignoring the spec "expectation" is a nice a simple way to make things work with existing Linux. > If "default VFIO" means VFIO without irqbypass, then it would work the > same as the DMA API, assuming all mappings for all necessary s-mode > interrupt files are created (something the DMA API needs as well). > However, VFIO would also need 'vfio_iommu_type1.allow_unsafe_interrupts=1' > to be set for this no-irqbypass configuration. Which isn't what anyone wants, you need to make the DMA API domain be fully functional so that VFIO works. > > That isn't ideal, the translation under the IRQs shouldn't really be > > changing as the translation under the IOMMU changes. > > Unless the device is assigned to a guest, then the IRQ domain wouldn't > do anything at all (it'd just sit between the device and the device's > old MSI parent domain), but it also wouldn't come and go, risking issues > with anything sensitive to changes in the IRQ domain hierarchy. VFIO isn't restricted to such a simple use model. You have to support all the generality, which includes fully supporting changing the iommu translation on the fly. > > Further, VFIO assumes iommu_group_has_isolated_msi(), ie > > IRQ_DOMAIN_FLAG_ISOLATED_MSI, is fixed while it is is bound. Will that > > be true if the iommu is flapping all about? What will you do when VFIO > > has it attached to a blocked domain? > > > > It just doesn't make sense to change something so fundamental as the > > interrupt path on an iommu domain attachement. :\ > > Yes, it does appear I should be doing this at iommu device probe time > instead. It won't provide any additional functionality to use cases which > aren't assigning devices to guests, but it also won't hurt, and it should > avoid the risks you point out. Even if you statically create the domain you can't change the value of IRQ_DOMAIN_FLAG_ISOLATED_MSI depending on what is currently attached to the IOMMU. What you are trying to do is not supported by the software stack right now. You need to make much bigger, more intrusive changes, if you really want to make interrupt remapping dynamic. Jason