From: Yi Liu <yi.l.liu@intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>, Nicolin Chen <nicolinc@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
"chao.p.peng@linux.intel.com" <chao.p.peng@linux.intel.com>,
"yi.y.sun@linux.intel.com" <yi.y.sun@linux.intel.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"shameerali.kolothum.thodi@huawei.com"
<shameerali.kolothum.thodi@huawei.com>,
"lulu@redhat.com" <lulu@redhat.com>,
"suravee.suthikulpanit@amd.com" <suravee.suthikulpanit@amd.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
"Zeng, Xin" <xin.zeng@intel.com>,
"Zhao, Yan Y" <yan.y.zhao@intel.com>
Subject: Re: [PATCH v6 2/6] iommufd: Add IOMMU_HWPT_INVALIDATE
Date: Fri, 1 Dec 2023 17:08:50 +0800 [thread overview]
Message-ID: <f29ac3f9-0ab8-48e5-addd-82592c55838c@intel.com> (raw)
In-Reply-To: <BN9PR11MB5276CF2C6BD4163634F42D478C81A@BN9PR11MB5276.namprd11.prod.outlook.com>
On 2023/12/1 15:10, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Friday, December 1, 2023 3:05 PM
>>
>> On 2023/12/1 13:19, Tian, Kevin wrote:
>>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>> Sent: Friday, December 1, 2023 12:50 PM
>>>>
>>>> On Fri, Dec 01, 2023 at 11:51:26AM +0800, Yi Liu wrote:
>>>>> On 2023/11/29 08:57, Jason Gunthorpe wrote:
>>>>>> On Tue, Nov 28, 2023 at 04:51:21PM -0800, Nicolin Chen wrote:
>>>>>>>>> I also thought about making this out_driver_error_code per HW.
>>>>>>>>> Yet, an error can be either per array or per entry/quest. The
>>>>>>>>> array-related error should be reported in the array structure
>>>>>>>>> that is a core uAPI, v.s. the per-HW entry structure. Though
>>>>>>>>> we could still report an array error in the entry structure
>>>>>>>>> at the first entry (or indexed by "array->entry_num")?
>>>>>>>>>
>>>>>>>>
>>>>>>>> why would there be an array error? array is just a software
>>>>>>>> entity containing actual HW invalidation cmds. If there is
>>>>>>>> any error with the array itself it should be reported via
>>>>>>>> ioctl errno.
>>>>>>>
>>>>>>> User array reading is a software operation, but kernel array
>>>>>>> reading is a hardware operation that can raise an error when
>>>>>>> the memory location to the array is incorrect or so.
>>>>>>
>>>>>> Well, we shouldn't get into a situation like that.. By the time the HW
>>>>>> got the address it should be valid.
>>>>>>
>>>>>>> With that being said, I think errno (-EIO) could do the job,
>>>>>>> as you suggested too.
>>>>>>
>>>>>> Do we have any idea what HW failures can be generated by the
>>>> commands
>>>>>> this will execture? IIRC I don't remember seeing any smmu specific
>>>>>> codes related to invalid invalidation? Everything is a valid input?
>>>>>>
>>>>>> Can vt-d fail single commands? What about AMD?
>>>>>
>>>>> Intel VT-d side, after each invalidation request, there is a wait
>>>>> descriptor which either provide an interrupt or an address for the
>>>>> hw to notify software the request before the wait descriptor has been
>>>>> completed. While, if there is error happened on the invalidation request,
>>>>> a flag (IQE, ICE, ITE) would be set in the Fault Status Register, and some
>>>>> detailed information would be recorded in the Invalidation Queue Error
>>>>> Record Register. So an invalidation request may be failed with some
>> error
>>>>> reported. If no error, will return completion via the wait descriptor. Is
>>>>> this what you mean by "fail a single command"?
>>>>
>>>> I see the current VT-d series marking those as "REVISIT". How
>>>> will it report an error to the user space from those register?
>>>>
>>>> Are they global status registers so that it might be difficult
>>>> to direct the error to the nested domain for an event fd?
>>>>
>>>
>>> They are global registers but invalidation queue is also the global
>>> resource. intel-iommu driver polls the status register after queueing
>>> new invalidation descriptors. The submission is serialized.
>>>
>>> If the error is related to a descriptor itself (e.g. format issue) then
>>> the head register points to the problematic descriptor so software
>>> can direct it to the related domain.
>>>
>>> If the error is related to device tlb invalidation (e.g. timeout) there
>>> is no way to associate the error with a specific descriptor by current
>>> spec. But intel-iommu driver batches descriptors per domain so
>>> we can still direct the error to the nested domain.
>>>
>>> But I don't see the need of doing it via eventfd.
>>>
>>> The poll semantics in intel-iommu driver is essentially a sync model.
>>> vt-d spec does allow software to optionally enable notification upon
>>> those errors but it's not used so far.
>>>
>>> With that I still prefer to having driver-specific error code defined
>>> in the entry. If ARM is an event-driven model then we can define
>>> that field at least in vtd specific data structure.
>>>
>>> btw given vtd doesn't use native format in uAPI it doesn't make
>>> sense to forward descriptor formatting errors back to userspace.
>>> Those, if happen, are driver's own problem. intel-iommu driver
>>> should verify the uAPI structure and return -EINVAL or proper
>>> errno to userspace purely in software.
>>>
>>> With that Yi please just define error codes for device tlb related
>>> errors for vtd.
>>
>> hmmm, this sounds like customized error code. is it? So far, VT-d
>
> yes. there is no need to replicate hardware registers/bits if most
> of them are irrelevant to userspace.
>
>> spec has two errors (ICE and ITE). ITE is valuable to let userspace
>> know. For ICE, looks like no much value. Intel iommu driver should
>> be responsible to submit a valid device-tlb invalidation to device.
>
> it's an invalid completion message from the device which could be
> caused by various reasons (not exactly due to the invalidation
> request by iommu driver). so it still makes sense to forward.
ok. so we may need to define a field to forward the detailed info to
user as well. This data is error-code specific. @Nic, are we aligned
that the error_code field and error data reporting should be moved
to the driver-specific part since it is different between vendors?
--
Regards,
Yi Liu
next prev parent reply other threads:[~2023-12-01 9:06 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-17 13:07 [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2) Yi Liu
2023-11-17 13:07 ` [PATCH v6 1/6] iommu: Add cache_invalidate_user op Yi Liu
2023-11-20 7:53 ` Tian, Kevin
2023-12-06 18:32 ` Jason Gunthorpe
2023-12-06 18:43 ` Nicolin Chen
2023-12-06 18:50 ` Jason Gunthorpe
2023-12-07 6:53 ` Yi Liu
2024-01-08 7:32 ` Binbin Wu
2023-11-17 13:07 ` [PATCH v6 2/6] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
2023-11-20 8:09 ` Tian, Kevin
2023-11-20 8:29 ` Yi Liu
2023-11-20 8:34 ` Tian, Kevin
2023-11-20 17:36 ` Nicolin Chen
2023-11-21 2:50 ` Tian, Kevin
2023-11-21 5:24 ` Nicolin Chen
2023-11-24 2:36 ` Tian, Kevin
2023-11-27 19:53 ` Nicolin Chen
2023-11-28 6:01 ` Yi Liu
2023-11-29 0:54 ` Nicolin Chen
2023-11-28 8:03 ` Tian, Kevin
2023-11-29 0:51 ` Nicolin Chen
2023-11-29 0:57 ` Jason Gunthorpe
2023-11-29 1:09 ` Nicolin Chen
2023-11-29 19:58 ` Jason Gunthorpe
2023-11-29 22:07 ` Nicolin Chen
2023-11-30 0:08 ` Jason Gunthorpe
2023-11-30 20:41 ` Nicolin Chen
2023-12-01 0:45 ` Jason Gunthorpe
2023-12-01 4:29 ` Nicolin Chen
2023-12-01 12:55 ` Jason Gunthorpe
2023-12-01 19:58 ` Nicolin Chen
2023-12-01 20:43 ` Jason Gunthorpe
2023-12-01 22:12 ` Nicolin Chen
2023-12-04 14:48 ` Jason Gunthorpe
2023-12-05 17:33 ` Nicolin Chen
2023-12-06 12:48 ` Jason Gunthorpe
2023-12-01 3:51 ` Yi Liu
2023-12-01 4:50 ` Nicolin Chen
2023-12-01 5:19 ` Tian, Kevin
2023-12-01 7:05 ` Yi Liu
2023-12-01 7:10 ` Tian, Kevin
2023-12-01 9:08 ` Yi Liu [this message]
2023-11-21 5:02 ` Baolu Lu
2023-11-21 5:19 ` Nicolin Chen
2023-11-28 5:54 ` Yi Liu
2023-12-06 18:33 ` Jason Gunthorpe
2023-12-07 6:59 ` Yi Liu
2023-12-07 9:04 ` Tian, Kevin
2023-12-07 14:42 ` Jason Gunthorpe
2023-12-11 7:53 ` Yi Liu
2023-12-11 13:21 ` Jason Gunthorpe
2023-12-12 13:45 ` Liu, Yi L
2023-12-12 14:40 ` Jason Gunthorpe
2023-12-13 13:47 ` Liu, Yi L
2023-12-13 14:11 ` Jason Gunthorpe
2023-12-11 7:49 ` Yi Liu
2023-11-17 13:07 ` [PATCH v6 3/6] iommu: Add iommu_copy_struct_from_user_array helper Yi Liu
2023-11-20 8:17 ` Tian, Kevin
2023-11-20 17:25 ` Nicolin Chen
2023-11-21 2:48 ` Tian, Kevin
2024-01-08 8:37 ` Binbin Wu
2023-11-17 13:07 ` [PATCH v6 4/6] iommufd/selftest: Add mock_domain_cache_invalidate_user support Yi Liu
2023-12-06 18:16 ` Jason Gunthorpe
2023-12-11 11:21 ` Yi Liu
2023-11-17 13:07 ` [PATCH v6 5/6] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
2023-11-17 13:07 ` [PATCH v6 6/6] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
2023-12-06 18:19 ` Jason Gunthorpe
2023-12-11 11:28 ` Yi Liu
2023-12-11 13:06 ` Jason Gunthorpe
2023-12-09 1:47 ` [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2) Jason Gunthorpe
2023-12-11 2:29 ` Tian, Kevin
2023-12-11 12:36 ` Yi Liu
2023-12-11 13:05 ` Jason Gunthorpe
2023-12-11 15:34 ` Suthikulpanit, Suravee
2023-12-11 16:06 ` Jason Gunthorpe
2023-12-11 12:35 ` Yi Liu
2023-12-11 13:20 ` Jason Gunthorpe
2023-12-11 20:11 ` Nicolin Chen
2023-12-11 21:48 ` Jason Gunthorpe
2023-12-11 17:35 ` Suthikulpanit, Suravee
2023-12-11 17:45 ` Jason Gunthorpe
2023-12-11 21:27 ` Nicolin Chen
2023-12-11 21:57 ` Jason Gunthorpe
2023-12-12 7:30 ` Nicolin Chen
2023-12-12 14:44 ` Jason Gunthorpe
2023-12-12 19:13 ` Nicolin Chen
2023-12-12 19:21 ` Jason Gunthorpe
2023-12-12 20:05 ` Nicolin Chen
2023-12-13 12:40 ` Jason Gunthorpe
2023-12-13 19:54 ` Nicolin Chen
[not found] ` <CGME20231217215720eucas1p2a590aca62ce8eb5ba81df6bc8b1a785d@eucas1p2.samsung.com>
2023-12-17 11:21 ` Joel Granados
2023-12-19 9:26 ` Yi Liu
2023-12-20 11:23 ` Joel Granados
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f29ac3f9-0ab8-48e5-addd-82592c55838c@intel.com \
--to=yi.l.liu@intel.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=chao.p.peng@linux.intel.com \
--cc=cohuck@redhat.com \
--cc=eric.auger@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=lulu@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=nicolinc@nvidia.com \
--cc=peterx@redhat.com \
--cc=robin.murphy@arm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=xin.zeng@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.y.sun@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).