From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C190C433E6 for ; Wed, 20 Jan 2021 02:10:05 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD2ED2245C for ; Wed, 20 Jan 2021 02:10:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD2ED2245C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Fo/FVOjrLP4T6onTl2LEhvuINAo4E8N0Spcp9DF7f9M=; b=hEGqcDeEJM7DWAot2smwYey0A JAjdf4AtSlfJyAIzYWvb5aQzGrjySHc1A2VZ2iEtjvcc2qO0Xoje0Z9kmV9tHp8ErWnUJBH2EMBOv Sc1hX7TZx71CBM17gtQRTeaI97lz1h13Rsev37Ed89OiwWHRI+MNXxt/hEUJhdID96MToblMjIZ0M m//lSTS6YsxHeYE936DvSRHvNeUu42BLcn31F4TRbaAOkJicLz056fLpctozxBwW+613VcHVmLJ07 BRMjkJ1a2iAyGbsEVJ9WQegA191vPFRBzxhc2fOZc/jDxLpUe9UzU71uFTwoGtH7cST2jQZj4g8G3 3Y9gnY7BQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l22un-0001DG-KZ; Wed, 20 Jan 2021 02:08:09 +0000 Received: from mga06.intel.com ([134.134.136.31]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l22uk-00018m-09 for linux-arm-kernel@lists.infradead.org; Wed, 20 Jan 2021 02:08:07 +0000 IronPort-SDR: GBmC3oeheKA7p2hF2TC2hI2hDPClIYkC4ZVPD+ZpUSiWEtFPDhPVZmtv60r2Ia98pzIo439S0Q U8s+xqcNbVDg== X-IronPort-AV: E=McAfee;i="6000,8403,9869"; a="240569531" X-IronPort-AV: E=Sophos;i="5.79,359,1602572400"; d="scan'208";a="240569531" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2021 18:06:00 -0800 IronPort-SDR: W6eM+jIfTZwk0PBH1IIFJWnO3U5jYa/3pifRN/NWz3Ibe/PpfkfPsHjwZ4SUqihavlEC755cX7 eI+tl5wEn0GQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,359,1602572400"; d="scan'208";a="466917375" Received: from allen-box.sh.intel.com (HELO [10.239.159.28]) ([10.239.159.28]) by fmsmga001.fm.intel.com with ESMTP; 19 Jan 2021 18:05:54 -0800 Subject: Re: [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA To: Jean-Philippe Brucker , "Tian, Kevin" References: <20210108145217.2254447-1-jean-philippe@linaro.org> <20210108145217.2254447-4-jean-philippe@linaro.org> <4de8ef03-a2ed-316e-d3e3-6b8474e20113@linux.intel.com> <636814a9-7dea-06f6-03ec-6a98dd30b7e3@linux.intel.com> From: Lu Baolu Message-ID: <5e563a0b-e853-bb72-2cec-13c94ab4d554@linux.intel.com> Date: Wed, 20 Jan 2021 09:57:42 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210119_210806_206813_9E49040B X-CRM114-Status: GOOD ( 35.91 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Greg Kroah-Hartman , "vivek.gautam@arm.com" , "guohanjun@huawei.com" , "will@kernel.org" , "lorenzo.pieralisi@arm.com" , "joro@8bytes.org" , Zhou Wang , "linux-acpi@vger.kernel.org" , "zhangfei.gao@linaro.org" , "lenb@kernel.org" , "devicetree@vger.kernel.org" , Arnd Bergmann , "eric.auger@redhat.com" , "robh+dt@kernel.org" , "Jonathan.Cameron@huawei.com" , "linux-arm-kernel@lists.infradead.org" , David Woodhouse , "rjw@rjwysocki.net" , "shameerali.kolothum.thodi@huawei.com" , "iommu@lists.linux-foundation.org" , "sudeep.holla@arm.com" , "robin.murphy@arm.com" , "linux-accelerators@lists.ozlabs.org" , baolu.lu@linux.intel.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Jean, On 1/19/21 6:16 PM, Jean-Philippe Brucker wrote: > On Mon, Jan 18, 2021 at 06:54:28AM +0000, Tian, Kevin wrote: >>> From: Lu Baolu >>> Sent: Saturday, January 16, 2021 11:54 AM >>> >>> Hi Jean, >>> >>> On 2021/1/15 0:41, Jean-Philippe Brucker wrote: >>>> I guess detailing what's needed for nested IOPF can help the discussion, >>>> although I haven't seen any concrete plan about implementing it, and it >>>> still seems a couple of years away. There are two important steps with >>>> nested IOPF: >>>> >>>> (1) Figuring out whether a fault comes from L1 or L2. A SMMU stall event >>>> comes with this information, but a PRI page request doesn't. The >>> IOMMU >>>> driver has to first translate the IOVA to a GPA, injecting the fault >>>> into the guest if this translation fails by using the usual >>>> iommu_report_device_fault(). >> >> The IOMMU driver can walk the page tables to find out the level information. >> If the walk terminates at the 1st level, inject to the guest. Otherwise fix the >> mm fault at 2nd level. It's not efficient compared to hardware-provided info, >> but it's doable and actual overhead needs to be measured (optimization exists >> e.g. having fault client to hint no 2nd level fault expected when registering fault >> handler in pinned case). >> >>>> >>>> (2) Translating the faulting GPA to a HVA that can be fed to >>>> handle_mm_fault(). That requires help from KVM, so another interface - >>>> either KVM registering GPA->HVA translation tables or IOMMU driver >>>> querying each translation. Either way it should be reusable by device >>>> drivers that implement IOPF themselves. >> >> Or just leave to the fault client (say VFIO here) to figure it out. VFIO has the >> information about GPA->HPA and can then call handle_mm_fault to fix the >> received fault. The IOMMU driver just exports an interface for the device drivers >> which implement IOPF themselves to report a fault which is then handled by >> the IOMMU core by reusing the same faulting path. >> >>>> >>>> (1) could be enabled with iommu_dev_enable_feature(). (2) requires a >>> more >>>> complex interface. (2) alone might also be desirable - demand-paging for >>>> level 2 only, no SVA for level 1. >> >> Yes, this is what we want to point out. A general FEAT_IOPF implies more than >> what this patch intended to address. >> >>>> >>>> Anyway, back to this patch. What I'm trying to convey is "can the IOMMU >>>> receive incoming I/O page faults for this device and, when SVA is enabled, >>>> feed them to the mm subsystem? Enable that or return an error." I'm stuck >>>> on the name. IOPF alone is too vague. Not IOPF_L1 as Kevin noted, since L1 >>>> is also used in virtualization. IOPF_BIND and IOPF_SVA could also mean (2) >>>> above. IOMMU_DEV_FEAT_IOPF_FLAT? >>>> >>>> That leaves space for the nested extensions. (1) above could be >>>> IOMMU_FEAT_IOPF_NESTED, and (2) requires some new interfacing with >>> KVM (or >>>> just an external fault handler) and could be used with either IOPF_FLAT or >>>> IOPF_NESTED. We can figure out the details later. What do you think? >>> >>> I agree that we can define IOPF_ for current usage and leave space for >>> future extensions. >>> >>> IOPF_FLAT represents IOPF on first-level translation, currently first >>> level translation could be used in below cases. >>> >>> 1) FL w/ internal Page Table: Kernel IOVA; >>> 2) FL w/ external Page Table: VFIO passthrough; >>> 3) FL w/ shared CPU page table: SVA >>> >>> We don't need to support IOPF for case 1). Let's put it aside. >>> >>> IOPF handling of 2) and 3) are different. Do we need to define different >>> names to distinguish these two cases? >>> >> >> Defining feature names according to various use cases does not sound a >> clean way. In an ideal way we should have just a general FEAT_IOPF since >> the hardware (at least VT-d) does support fault in either 1st-level, 2nd- >> level or nested configurations. We are entering this trouble just because >> there is difficulty for the software evolving to enable full hardware cap >> in one batch. My last proposal was sort of keeping FEAT_IOPF as a general >> capability for whether delivering fault through the IOMMU or the ad-hoc >> device, and then having a separate interface for whether IOPF reporting >> is available under a specific configuration. The former is about the path >> between the IOMMU and the device, while the latter is about the interface >> between the IOMMU driver and its faulting client. >> >> The reporting capability can be checked when the fault client is registering >> its fault handler, and at this time the IOMMU driver knows how the related >> mapping is configured (1st, 2nd, or nested) and whether fault reporting is >> supported in such configuration. We may introduce IOPF_REPORT_FLAT and >> IOPF_REPORT_NESTED respectively. while IOPF_REPORT_FLAT detection is >> straightforward (2 and 3 can be differentiated internally based on configured >> level), IOPF_REPORT_NESTED needs additional info to indicate which level is >> concerned since the vendor driver may not support fault reporting in both >> levels or the fault client may be interested in only one level (e.g. with 2nd >> level pinned). > > I agree with this plan (provided I understood it correctly this time): > have IOMMU_DEV_FEAT_IOPF describing the IOPF interface between device and > IOMMU. Enabling it on its own doesn't do anything visible to the driver, > it just probes for capabilities and enables PRI if necessary. For host > SVA, since there is no additional communication between IOMMU and device > driver, enabling IOMMU_DEV_FEAT_SVA in addition to IOPF is sufficient. > Then when implementing nested we'll extend iommu_register_fault_handler() > with flags and parameters. That will also enable advanced dispatching (1). > > Will it be necessary to enable FEAT_IOPF when doing VFIO passthrough > (injecting to the guest or handling it with external page tables)? > I think that would be better. Currently a device driver registering a > fault handler doesn't know if it will get recoverable page faults or only > unrecoverable ones. > > So I don't think this patch needs any change. Baolu, are you ok with > keeping this and patch 4? It sounds good to me. Keep FEAT_IOPF as the IOMMU capability of generating I/O page fault and differentiate different I/O page faults by extending the fault handler register interface. > > Thanks, > Jean > Best regards, baolu _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel