From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org,
LKML <linux-kernel@vger.kernel.org>,
Joerg Roedel <joro@8bytes.org>,
David Woodhouse <dwmw2@infradead.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alex Williamson <alex.williamson@redhat.com>,
Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
Rafael Wysocki <rafael.j.wysocki@intel.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
Raj Ashok <ashok.raj@intel.com>,
Jean Delvare <khali@linux-fr.org>,
Christoph Hellwig <hch@infradead.org>,
jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH v5 15/23] iommu: handle page response timeout
Date: Tue, 29 May 2018 09:20:58 -0700 [thread overview]
Message-ID: <20180529092058.1942b223@jacob-builder> (raw)
In-Reply-To: <5AF93E3A.2040902@linux.intel.com>
On Mon, 14 May 2018 15:43:54 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:
> Hi,
>
> On 05/12/2018 04:54 AM, Jacob Pan wrote:
> > When IO page faults are reported outside IOMMU subsystem, the page
> > request handler may fail for various reasons. E.g. a guest received
> > page requests but did not have a chance to run for a long time. The
> > irresponsive behavior could hold off limited resources on the
> > pending device.
> > There can be hardware or credit based software solutions as
> > suggested in the PCI ATS Ch-4. To provide a basic safty net this
> > patch introduces a per device deferrable timer which monitors the
> > longest pending page fault that requires a response. Proper action
> > such as sending failure response code could be taken when timer
> > expires but not included in this patch. We need to consider the
> > life cycle of page groupd ID to prevent confusion with reused group
> > ID by a device. For now, a warning message provides clue of such
> > failure.
> >
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > ---
> > drivers/iommu/iommu.c | 53
> > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/iommu.h | 4 ++++ 2 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 02fed3e..1f2f49e 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -827,6 +827,37 @@ int iommu_group_unregister_notifier(struct
> > iommu_group *group, }
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> >
> > +static void iommu_dev_fault_timer_fn(struct timer_list *t)
> > +{
> > + struct iommu_fault_param *fparam = from_timer(fparam, t,
> > timer);
> > + struct iommu_fault_event *evt;
> > +
> > + u64 now;
> > +
> > + now = get_jiffies_64();
> > +
> > + /* The goal is to ensure driver or guest page fault
> > handler(via vfio)
> > + * send page response on time. Otherwise, limited queue
> > resources
> > + * may be occupied by some irresponsive guests or drivers.
> > + * When per device pending fault list is not empty, we
> > periodically checks
> > + * if any anticipated page response time has expired.
> > + *
> > + * TODO:
> > + * We could do the following if response time expires:
> > + * 1. send page response code FAILURE to all pending PRQ
> > + * 2. inform device driver or vfio
> > + * 3. drain in-flight page requests and responses for this
> > device
> > + * 4. clear pending fault list such that driver can
> > unregister fault
> > + * handler(otherwise blocked when pending faults are
> > present).
> > + */
> > + list_for_each_entry(evt, &fparam->faults, list) {
> > + if (time_after64(now, evt->expire))
> > + pr_err("Page response time expired!, pasid
> > %d gid %d exp %llu now %llu\n",
> > + evt->pasid,
> > evt->page_req_group_id, evt->expire, now);
> > + }
> > + mod_timer(t, now + prq_timeout);
> > +}
> > +
>
> This timer scheme is very rough.
>
yes, the timer is a rough safety net for misbehaved PRQ handlers such
as a guest.
> The timer expires every 10 seconds (by default).
>
> 0 10 20
> 30 40
> +---------------+---------------+---------------+---------------+ ^
> ^ ^ ^ ^ | | |
> | | F0 F1 F2 F3
> (F1,F2,F3 will not be handled until here!)
>
> F0, F1, F2, F3 are four page faults happens during [0, 10s) time
> window. F1, F2, F3 timeout won't be handled until the timer expires
> again at 20s. That means a fault might be pending there until about
> (2 * prq_timeout) seconds later.
>
correct. it could be 2x for the worst case. I should explain in
comments.
> Out of curiosity, Why not adding a timer in iommu_fault_event,
> starting it in iommu_report_device_fault() and removing it in
> iommu_page_response()?
>
I thought about that also but since we are just trying to have a broad
and rough safety net (in addition to potential HW mechanism or credit
based solution), my thought was that having a per device timer is more
economical than per event.
Thanks for the in-depth check!
> Best regards,
> Lu Baolu
>
>
> [...]
>
[Jacob Pan]
next prev parent reply other threads:[~2018-05-29 16:20 UTC|newest]
Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-11 20:53 [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual Address (SVA) Jacob Pan
2018-05-11 20:53 ` Jacob Pan
[not found] ` <1526072055-86990-1-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-11 20:53 ` [PATCH v5 01/23] iommu: introduce bind_pasid_table API function Jacob Pan
2018-05-11 20:53 ` Jacob Pan
2018-08-23 16:34 ` Auger Eric
[not found] ` <e9ddb745-9cfb-1d40-05e0-7bd75292a41f-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-08-24 12:47 ` Liu, Yi L
2018-08-24 12:47 ` Liu, Yi L
2018-08-24 13:20 ` Auger Eric
2018-08-28 17:04 ` Jacob Pan
2018-08-24 15:00 ` Auger Eric
2018-08-28 5:14 ` Jacob Pan
2018-08-28 8:34 ` Auger Eric
2018-08-28 8:34 ` Auger Eric
2018-08-28 16:36 ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 02/23] iommu/vt-d: move device_domain_info to header Jacob Pan
2018-05-11 20:53 ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 03/23] iommu/vt-d: add a flag for pasid table bound status Jacob Pan
2018-05-11 20:53 ` Jacob Pan
[not found] ` <1526072055-86990-4-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-13 7:33 ` Lu Baolu
2018-05-13 7:33 ` Lu Baolu
[not found] ` <5AF7EA43.5060805-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 18:51 ` Jacob Pan
2018-05-14 18:51 ` Jacob Pan
2018-05-13 8:01 ` Lu Baolu
2018-05-13 8:01 ` Lu Baolu
[not found] ` <5AF7F0EE.902-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 18:52 ` Jacob Pan
2018-05-14 18:52 ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 04/23] iommu/vt-d: add bind_pasid_table function Jacob Pan
2018-05-11 20:53 ` Jacob Pan
[not found] ` <1526072055-86990-5-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-13 9:29 ` Lu Baolu
2018-05-13 9:29 ` Lu Baolu
[not found] ` <5AF8058B.4090703-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 20:22 ` Jacob Pan
2018-05-14 20:22 ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 06/23] iommu/vt-d: add definitions for PFSID Jacob Pan
2018-05-11 20:53 ` Jacob Pan
[not found] ` <1526072055-86990-7-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 1:36 ` Lu Baolu
2018-05-14 1:36 ` Lu Baolu
[not found] ` <5AF8E808.5030402-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 20:30 ` Jacob Pan
2018-05-14 20:30 ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 08/23] iommu/vt-d: support flushing more translation cache types Jacob Pan
2018-05-11 20:54 ` Jacob Pan
[not found] ` <1526072055-86990-9-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 2:18 ` Lu Baolu
2018-05-14 2:18 ` Lu Baolu
[not found] ` <5AF8F204.2010800-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 20:46 ` Jacob Pan
2018-05-14 20:46 ` Jacob Pan
2018-05-17 8:44 ` kbuild test robot
2018-05-17 8:44 ` kbuild test robot
2018-05-11 20:54 ` [PATCH v5 09/23] iommu/vt-d: add svm/sva invalidate function Jacob Pan
2018-05-11 20:54 ` Jacob Pan
[not found] ` <1526072055-86990-10-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 3:35 ` Lu Baolu
2018-05-14 3:35 ` Lu Baolu
2018-05-14 20:49 ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 10/23] iommu: introduce device fault data Jacob Pan
2018-05-11 20:54 ` Jacob Pan
2018-09-21 10:07 ` Auger Eric
2018-09-21 17:05 ` Jacob Pan
2018-09-26 10:20 ` Auger Eric
2018-05-11 20:54 ` [PATCH v5 11/23] driver core: add per device iommu param Jacob Pan
2018-05-11 20:54 ` Jacob Pan
[not found] ` <1526072055-86990-12-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 5:27 ` Lu Baolu
2018-05-14 5:27 ` Lu Baolu
[not found] ` <5AF91E31.9060705-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 20:52 ` Jacob Pan
2018-05-14 20:52 ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 12/23] iommu: add a timeout parameter for prq response Jacob Pan
2018-05-11 20:54 ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 16/23] iommu/config: add build dependency for dmar Jacob Pan
2018-05-11 20:54 ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 17/23] iommu/vt-d: report non-recoverable faults to device Jacob Pan
2018-05-11 20:54 ` Jacob Pan
[not found] ` <1526072055-86990-18-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 8:17 ` Lu Baolu
2018-05-14 8:17 ` Lu Baolu
[not found] ` <5AF94618.2080403-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-29 17:33 ` Jacob Pan
2018-05-29 17:33 ` Jacob Pan
2018-05-29 15:54 ` [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual Address (SVA) Jacob Pan
2018-05-29 15:54 ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 05/23] iommu: introduce iommu invalidate API function Jacob Pan
2018-05-11 20:53 ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 07/23] iommu/vt-d: fix dev iotlb pfsid use Jacob Pan
[not found] ` <1526072055-86990-8-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 1:52 ` Lu Baolu
2018-05-14 1:52 ` Lu Baolu
[not found] ` <5AF8EBC4.4040104-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 20:38 ` Jacob Pan
2018-05-14 20:38 ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 13/23] iommu: introduce device fault report API Jacob Pan
2018-09-06 9:25 ` Auger Eric
2018-09-06 12:42 ` Jean-Philippe Brucker
2018-09-06 13:14 ` Auger Eric
[not found] ` <9013df5a-02f9-55b8-eb5e-fad4be0a2c92-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-09-06 17:06 ` Jean-Philippe Brucker
2018-09-06 17:06 ` Jean-Philippe Brucker
2018-09-07 7:11 ` Auger Eric
[not found] ` <953746f3-352b-cd17-9938-eb78af3b58a9-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-09-07 11:23 ` Jean-Philippe Brucker
2018-09-07 11:23 ` Jean-Philippe Brucker
2018-09-14 13:24 ` Auger Eric
2018-09-17 16:57 ` Jacob Pan
[not found] ` <1526072055-86990-14-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 6:01 ` Lu Baolu
2018-05-14 6:01 ` Lu Baolu
[not found] ` <5AF92622.2090902-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 20:55 ` Jacob Pan
2018-05-14 20:55 ` Jacob Pan
2018-05-15 6:52 ` Lu Baolu
2018-05-15 6:52 ` Lu Baolu
2018-05-17 11:41 ` Liu, Yi L
2018-05-17 11:41 ` Liu, Yi L
[not found] ` <A2975661238FB949B60364EF0F2C257439BF2537-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2018-05-17 15:59 ` Jacob Pan
2018-05-17 15:59 ` Jacob Pan
2018-05-17 23:22 ` Liu, Yi L
2018-05-21 23:03 ` Jacob Pan
2018-09-25 14:58 ` Jean-Philippe Brucker
2018-09-25 14:58 ` Jean-Philippe Brucker
2018-09-25 22:17 ` Jacob Pan
2018-09-26 10:14 ` Jean-Philippe Brucker
2018-05-11 20:54 ` [PATCH v5 14/23] iommu: introduce page response function Jacob Pan
[not found] ` <1526072055-86990-15-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 6:39 ` Lu Baolu
2018-05-14 6:39 ` Lu Baolu
[not found] ` <5AF92F37.3050404-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-29 16:13 ` Jacob Pan
2018-05-29 16:13 ` Jacob Pan
2018-09-10 14:52 ` Auger Eric
2018-09-10 17:50 ` Jacob Pan
2018-09-10 19:06 ` Auger Eric
2018-09-10 19:06 ` Auger Eric
2018-05-11 20:54 ` [PATCH v5 15/23] iommu: handle page response timeout Jacob Pan
[not found] ` <1526072055-86990-16-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-14 7:43 ` Lu Baolu
2018-05-14 7:43 ` Lu Baolu
2018-05-29 16:20 ` Jacob Pan [this message]
2018-05-30 7:46 ` Lu Baolu
2018-05-11 20:54 ` [PATCH v5 18/23] iommu/intel-svm: report device page request Jacob Pan
2018-05-11 20:54 ` [PATCH v5 19/23] iommu/intel-svm: replace dev ops with fault report API Jacob Pan
2018-05-11 20:54 ` [PATCH v5 20/23] iommu/intel-svm: do not flush iotlb for viommu Jacob Pan
2018-05-11 20:54 ` [PATCH v5 21/23] iommu/vt-d: add intel iommu page response function Jacob Pan
2018-05-11 20:54 ` [PATCH v5 22/23] trace/iommu: add sva trace events Jacob Pan
2018-05-11 20:54 ` [PATCH v5 23/23] iommu: use sva invalidate and device fault trace event Jacob Pan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180529092058.1942b223@jacob-builder \
--to=jacob.jun.pan@linux.intel.com \
--cc=alex.williamson@redhat.com \
--cc=ashok.raj@intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=gregkh@linuxfoundation.org \
--cc=hch@infradead.org \
--cc=iommu@lists.linux-foundation.org \
--cc=jean-philippe.brucker@arm.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=khali@linux-fr.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rafael.j.wysocki@intel.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.