kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
	"chao.p.peng@linux.intel.com" <chao.p.peng@linux.intel.com>,
	"yi.y.sun@linux.intel.com" <yi.y.sun@linux.intel.com>,
	"peterx@redhat.com" <peterx@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"shameerali.kolothum.thodi@huawei.com"
	<shameerali.kolothum.thodi@huawei.com>,
	"lulu@redhat.com" <lulu@redhat.com>,
	"suravee.suthikulpanit@amd.com" <suravee.suthikulpanit@amd.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
	"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
	"Zeng, Xin" <xin.zeng@intel.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>
Subject: Re: [PATCH v6 2/6] iommufd: Add IOMMU_HWPT_INVALIDATE
Date: Mon, 4 Dec 2023 10:48:50 -0400	[thread overview]
Message-ID: <20231204144850.GC1493156@nvidia.com> (raw)
In-Reply-To: <ZWpaTD9dVge+suyv@Asurada-Nvidia>

On Fri, Dec 01, 2023 at 02:12:28PM -0800, Nicolin Chen wrote:
> > Why is timeout linked to these two? Or rather, it doesn't have to be
> > linked like that. Any gerror is effectively synchronous because it
> > halts the queue and allows SW time to inspect which command failed and
> > record the gerror flags. So each and every command can get an error
> > indication.
> > 
> > Restarting the queue is done by putting sync in there to effectively
> > nop the failed command and we hope for the best and let it rip.
> 
> I see that SMMU driver only restarts the queue when dealing with
> CERROR_ILL. So only CERROR_ABT or CERROR_ATC_INV would result in
> -ETIMEOUT.

I'm not sure that is the best thing to do. ABT is basically the
machine caught fire, so sure there is no recovery for that.

But ATC_INV could be recovered and should ideally be canceled then
forwarded to the VM.
 
> > > As you remarked that we can't block the global CMDQ, so we have
> > > to let a real CERROR_ILL go. Yet, we can make sure commands to
> > > be fully sanitized before being issued, as we should immediately
> > > reject faulty commands anyway, for errors such as unsupported op
> > > codes, unzero-ed reserved fields, and unlinked vSIDs. This can
> > > at least largely reduce the probability of a real CERROR_ILL.
> > 
> > I'm more a little more concerend with ATC_INV as a malfunctioning
> > device can trigger this..
> 
> How about making sure that the invalidate handler always issues
> one CMD_ATC_INV at a time, so each arm_smmu_cmdq_issue_cmdlist()
> call has a chance to timeout? Then, we can simply know which one
> in the user array fails.

That sounds slow

> > > So, combining these two, we can still have a basic synchronous
> > > way by returning an errno to the invalidate ioctl? I see Kevin
> > > replied something similar too.
> > 
> > It isn't enough information, you don't know which gerror bits to set
> > and you don't know what cons index to stick to indicate the error
> > triggering command with just a simple errno.
> >
> > It does need to return a bunch of data to get it all right.
> 
> The array structure returns req_num to indicate the index. This
> works, even if the command consumption stops in the middle:
>  * @req_num: Input the number of cache invalidation requests in the array.
>  *           Output the number of requests successfully handled by kernel.
> 
> So we only need an error code of CERROR_ABT/ILL/ATC_INV.

Yes

> Or am I missing some point here?

It sounds Ok, we just have to understand what userspace should be
doing and how much of this the kernel should implement.

It seems to me that the error code should return the gerror and the
req_num should indicate the halted cons. The vmm should relay both
into the virtual registers.

Jason

  reply	other threads:[~2023-12-04 14:48 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-17 13:07 [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2) Yi Liu
2023-11-17 13:07 ` [PATCH v6 1/6] iommu: Add cache_invalidate_user op Yi Liu
2023-11-20  7:53   ` Tian, Kevin
2023-12-06 18:32   ` Jason Gunthorpe
2023-12-06 18:43     ` Nicolin Chen
2023-12-06 18:50       ` Jason Gunthorpe
2023-12-07  6:53         ` Yi Liu
2024-01-08  7:32   ` Binbin Wu
2023-11-17 13:07 ` [PATCH v6 2/6] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
2023-11-20  8:09   ` Tian, Kevin
2023-11-20  8:29     ` Yi Liu
2023-11-20  8:34       ` Tian, Kevin
2023-11-20 17:36         ` Nicolin Chen
2023-11-21  2:50           ` Tian, Kevin
2023-11-21  5:24             ` Nicolin Chen
2023-11-24  2:36               ` Tian, Kevin
2023-11-27 19:53                 ` Nicolin Chen
2023-11-28  6:01                   ` Yi Liu
2023-11-29  0:54                     ` Nicolin Chen
2023-11-28  8:03                   ` Tian, Kevin
2023-11-29  0:51                     ` Nicolin Chen
2023-11-29  0:57                       ` Jason Gunthorpe
2023-11-29  1:09                         ` Nicolin Chen
2023-11-29 19:58                           ` Jason Gunthorpe
2023-11-29 22:07                             ` Nicolin Chen
2023-11-30  0:08                               ` Jason Gunthorpe
2023-11-30 20:41                                 ` Nicolin Chen
2023-12-01  0:45                                   ` Jason Gunthorpe
2023-12-01  4:29                                     ` Nicolin Chen
2023-12-01 12:55                                       ` Jason Gunthorpe
2023-12-01 19:58                                         ` Nicolin Chen
2023-12-01 20:43                                           ` Jason Gunthorpe
2023-12-01 22:12                                             ` Nicolin Chen
2023-12-04 14:48                                               ` Jason Gunthorpe [this message]
2023-12-05 17:33                                                 ` Nicolin Chen
2023-12-06 12:48                                                   ` Jason Gunthorpe
2023-12-01  3:51                         ` Yi Liu
2023-12-01  4:50                           ` Nicolin Chen
2023-12-01  5:19                             ` Tian, Kevin
2023-12-01  7:05                               ` Yi Liu
2023-12-01  7:10                                 ` Tian, Kevin
2023-12-01  9:08                                   ` Yi Liu
2023-11-21  5:02   ` Baolu Lu
2023-11-21  5:19     ` Nicolin Chen
2023-11-28  5:54       ` Yi Liu
2023-12-06 18:33   ` Jason Gunthorpe
2023-12-07  6:59   ` Yi Liu
2023-12-07  9:04     ` Tian, Kevin
2023-12-07 14:42       ` Jason Gunthorpe
2023-12-11  7:53         ` Yi Liu
2023-12-11 13:21           ` Jason Gunthorpe
2023-12-12 13:45             ` Liu, Yi L
2023-12-12 14:40               ` Jason Gunthorpe
2023-12-13 13:47                 ` Liu, Yi L
2023-12-13 14:11                   ` Jason Gunthorpe
2023-12-11  7:49       ` Yi Liu
2023-11-17 13:07 ` [PATCH v6 3/6] iommu: Add iommu_copy_struct_from_user_array helper Yi Liu
2023-11-20  8:17   ` Tian, Kevin
2023-11-20 17:25     ` Nicolin Chen
2023-11-21  2:48       ` Tian, Kevin
2024-01-08  8:37   ` Binbin Wu
2023-11-17 13:07 ` [PATCH v6 4/6] iommufd/selftest: Add mock_domain_cache_invalidate_user support Yi Liu
2023-12-06 18:16   ` Jason Gunthorpe
2023-12-11 11:21     ` Yi Liu
2023-11-17 13:07 ` [PATCH v6 5/6] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
2023-11-17 13:07 ` [PATCH v6 6/6] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
2023-12-06 18:19   ` Jason Gunthorpe
2023-12-11 11:28     ` Yi Liu
2023-12-11 13:06       ` Jason Gunthorpe
2023-12-09  1:47 ` [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2) Jason Gunthorpe
2023-12-11  2:29   ` Tian, Kevin
2023-12-11 12:36     ` Yi Liu
2023-12-11 13:05       ` Jason Gunthorpe
2023-12-11 15:34         ` Suthikulpanit, Suravee
2023-12-11 16:06           ` Jason Gunthorpe
2023-12-11 12:35   ` Yi Liu
2023-12-11 13:20     ` Jason Gunthorpe
2023-12-11 20:11       ` Nicolin Chen
2023-12-11 21:48         ` Jason Gunthorpe
2023-12-11 17:35   ` Suthikulpanit, Suravee
2023-12-11 17:45     ` Jason Gunthorpe
2023-12-11 21:27   ` Nicolin Chen
2023-12-11 21:57     ` Jason Gunthorpe
2023-12-12  7:30       ` Nicolin Chen
2023-12-12 14:44         ` Jason Gunthorpe
2023-12-12 19:13           ` Nicolin Chen
2023-12-12 19:21             ` Jason Gunthorpe
2023-12-12 20:05               ` Nicolin Chen
2023-12-13 12:40                 ` Jason Gunthorpe
2023-12-13 19:54                   ` Nicolin Chen
     [not found] ` <CGME20231217215720eucas1p2a590aca62ce8eb5ba81df6bc8b1a785d@eucas1p2.samsung.com>
2023-12-17 11:21   ` Joel Granados
2023-12-19  9:26     ` Yi Liu
2023-12-20 11:23       ` Joel Granados

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231204144850.GC1493156@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=lulu@redhat.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=nicolinc@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=xin.zeng@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@linux.intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).