From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jacob Pan Subject: Re: [PATCH v4 14/22] iommu: handle page response timeout Date: Mon, 30 Apr 2018 10:54:24 -0700 Message-ID: <20180430105424.3f12f792@jacob-builder> References: <1523915351-54415-1-git-send-email-jacob.jun.pan@linux.intel.com> <1523915351-54415-15-git-send-email-jacob.jun.pan@linux.intel.com> <20180423153622.GC38106@ostrya.localdomain> <20180425083711.222202e7@jacob-builder> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Jean-Philippe Brucker Cc: Raj Ashok , Greg Kroah-Hartman , Rafael Wysocki , "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org" , LKML , David Woodhouse List-Id: iommu@lists.linux-foundation.org On Mon, 30 Apr 2018 11:58:10 +0100 Jean-Philippe Brucker wrote: > On 25/04/18 16:37, Jacob Pan wrote: > >> In the other cases (unsupported PRI or rogue guest) then disabling > >> PRI using a FAILURE status might be the right thing to do. However, > >> assuming the device follows the PCI spec it will stop sending page > >> requests once there are as many PPRs in flight as the allocated > >> credit. > >> > > Agreed, here I am not taking any actions. There may be need to drain > > in-fly requests. > > Right, as long as we first ensure that no new fault is generated (by > using a Response Failure). Though in my opinion not taking action > might be the safest option :) > > Another thought: currently the comment in iommu.h says > "@IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent > faults from this device if possible. This is "Response Failure" in > PCI PRI." > > I wonder if we should simply say "Drop all subsequent faults from the > device". Even if the PCI device doesn't properly implement PRI, the > IOMMU driver should set a "PRI disabled" bit in the device data that > prevents it from from reporting new faults and flooding the queue. > Anyway, it's a small detail that could go in a future patch series. > right, we should disable PRI and let future PRQ response pending on re-enabling of PRI on the device. I will leave that to future enhancement. > >> If there isn't any possibility of memory leak or abusing > >> resources, I don't think it's our problem that the guest is > >> excessively slow at handling page requests. Setting an upper bound > >> to page request latency might do more harm than good. Ensuring > >> that devices respect the number of allocated in-flight PPRs is > >> more important in my opinion. > > How about we have a really long timeout, e.g. 1 min similar to > > device invalidate response timeout in ATS spec., just for basic > > safety and diagnosis. Optionally, we could have quota in parallel. > > I agree that for development a timeout is useful. It might be worth > adding it as an option to the IOMMU module instead of a define. > Perhaps a number of seconds, 10 being the default and 0 disabling the > timeout? Otherwise we would probably end up with a succession of > patches incrementing the timeout by arbitrary values, if people find > it inconvenient. > make sense. will do.