iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Alex Williamson
	<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Lan Tianyu <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Tian,
	Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Jean Delvare <khali-PUYAD+kWke1g9hUCZPvPmw@public.gmane.org>,
	David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Subject: Re: [RFC 5/9] iommu: Introduce fault notifier API
Date: Mon, 26 Jun 2017 08:27:52 -0700	[thread overview]
Message-ID: <20170626082752.464c278d@jacob-builder> (raw)
In-Reply-To: <20170623131551.6aeb9af7-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>

On Fri, 23 Jun 2017 13:15:51 -0600
Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Fri, 23 Jun 2017 11:59:28 -0700
> Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> 
> > On Thu, 22 Jun 2017 16:53:17 -0600
> > Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >   
> > > On Wed, 14 Jun 2017 15:22:59 -0700
> > > Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> > >     
> > > > Traditionally, device specific faults are detected and handled
> > > > within their own device drivers. When IOMMU is enabled, faults
> > > > such as DMA related transactions are detected by IOMMU. There
> > > > is no generic reporting mechanism to report faults back to the
> > > > in-kernel device driver or the guest OS in case of assigned
> > > > devices.
> > > > 
> > > > Faults detected by IOMMU is based on the transaction's source ID
> > > > which can be reported at per device basis, regardless of the
> > > > device type is a PCI device or not.
> > > > 
> > > > The fault types includes recoverable (e.g. page request) and
> > > > unrecoverable faults(e.g. invalid context). In most cases,
> > > > faults can be handled by IOMMU drivers. However, there are use
> > > > cases that require fault processing outside IOMMU driver, e.g.
> > > > 
> > > > 1. page request fault originated from an SVM capable device
> > > > that is assigned to guest via vIOMMU. In this case, the first
> > > > level page tables are owned by the guest. Page request must be
> > > > propagated to the guest to let guest OS fault in the pages then
> > > > send page response. In this mechanism, the direct receiver of
> > > > IOMMU fault notification is VFIO, which can relay notification
> > > > events to QEMU or other user space software.
> > > > 
> > > > 2. faults need more subtle handling by device drivers. Other
> > > > than simply invoke reset function, there are needs to let
> > > > device driver handle the fault with a smaller impact.
> > > > 
> > > > This patchset is intended to create a generic fault notification
> > > > API such that it can scale as follows:
> > > > - all IOMMU types
> > > > - PCI and non-PCI devices
> > > > - recoverable and unrecoverable faults
> > > > - VFIO and other other in kernel users
> > > > - DMA & IRQ remapping (TBD)
> > > > 
> > > > The event data contains both generic and raw architectural data
> > > > such that performance is not compromised as the data
> > > > propagation may involve many layers.
> > > > 
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > > > Signed-off-by: Ashok Raj <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > > ---
> > > >  drivers/iommu/iommu.c | 63
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > include/linux/iommu.h | 54
> > > > +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 117
> > > > insertions(+)
> > > > 
> > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > > index c786370..04c73f3 100644
> > > > --- a/drivers/iommu/iommu.c
> > > > +++ b/drivers/iommu/iommu.c
> > > > @@ -48,6 +48,7 @@ struct iommu_group {
> > > >  	struct list_head devices;
> > > >  	struct mutex mutex;
> > > >  	struct blocking_notifier_head notifier;
> > > > +	struct blocking_notifier_head fault_notifier;
> > > >  	void *iommu_data;
> > > >  	void (*iommu_data_release)(void *iommu_data);
> > > >  	char *name;
> > > > @@ -345,6 +346,7 @@ struct iommu_group *iommu_group_alloc(void)
> > > >  	mutex_init(&group->mutex);
> > > >  	INIT_LIST_HEAD(&group->devices);
> > > >  	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> > > > +	BLOCKING_INIT_NOTIFIER_HEAD(&group->fault_notifier);
> > > >  
> > > >  	ret = ida_simple_get(&iommu_group_ida, 0, 0,
> > > > GFP_KERNEL); if (ret < 0) {
> > > > @@ -790,6 +792,67 @@ int iommu_group_unregister_notifier(struct
> > > > iommu_group *group,
> > > > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
> > > >  /**
> > > > + * iommu_register_fault_notifier - Register a notifier for
> > > > fault reporting
> > > > + * @dev: device to notify fault events
> > > > + * @nb: notifier block to signal
> > > > + *
> > > > + */
> > > > +int iommu_register_fault_notifier(struct device *dev,
> > > > +				struct notifier_block *nb)
> > > > +{
> > > > +	int ret;
> > > > +	struct iommu_group *group = iommu_group_get(dev);
> > > > +
> > > > +	if (!group)
> > > > +		return -EINVAL;
> > > > +
> > > > +	ret =
> > > > blocking_notifier_chain_register(&group->fault_notifier, nb);
> > > > +	iommu_group_put(group);
> > > > +
> > > > +	return ret;
> > > > +
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(iommu_register_fault_notifier);
> > > > +
> > > > +/**
> > > > + * iommu_unregister_fault_notifier - Unregister a notifier for
> > > > fault reporting
> > > > + * @domain: the domain to watch
> > > > + * @nb: notifier block to signal
> > > > + *
> > > > + */
> > > > +int iommu_unregister_fault_notifier(struct device *dev,
> > > > +				  struct notifier_block *nb)
> > > > +{
> > > > +	int ret;
> > > > +	struct iommu_group *group = iommu_group_get(dev);
> > > > +
> > > > +	if (!group)
> > > > +		return -EINVAL;
> > > > +
> > > > +	ret =
> > > > blocking_notifier_chain_unregister(&group->fault_notifier, nb);
> > > > +	iommu_group_put(group);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(iommu_unregister_fault_notifier);      
> > > 
> > > 
> > > If the call chains are on the group, why do we register with a
> > > device? If I registered with a group, I'd know that I only need
> > > to register once per group, that's not clear here and may lead to
> > > callers of this getting multiple notifications, one for each
> > > device in a group. 
> > My thought was that since iommu_group is already part of struct
> > device, it is more efficient to share group level notifier than
> > introducing per device fault callbacks or notifiers.  
> 
> That's fine, but the interface should reflect the same.
>  
> > So you do need to register per device not just once for the group.
> > In most cases device with SVM has ACS, it will be 1:1 between
> > device and group.  
> 
> ACS at the endpoint is only one factor in determining grouping, we
> need only look at Intel Core processors to find root ports lacking
> ACS which will result in multiple devices per group as a very common
> case. 
> > When the fault event occurs, IOMMU can identify the offending device
> > and struct device is embedded in the event data sent to the device
> > driver or VFIO, so the receiver can filter out unwanted events.  
> 
> I think this is a poor design choice.  If the notification list is on
> the group then make the registration be on the group.  Otherwise the
> number of notifies seen by the caller is automatically multiplied by
> the number of devices they're using in the group.  The caller not only
> needs to disregard unintended devices, but they need to make sure they
> only act on a notification for the device on the correct notifier.
> This is designing in unnecessary complexity and overhead.
> 
If the design is for high frequency events, I would totally agree with
you that having notifier on the group does not scale. But since this is
for fault notification, which implies infrequent use, is it still
worthwhile to tax strict device to have a per device notifier?
Even with recoverable fault, i.e. page request, a properly configured
use case would pre-fault pages to minimize PRQ.

I was referring to this previous discussion by David for the need of
per device fault handler(https://lwn.net/Articles/608914/). Perhaps I
misinterpreted your suggestion for an alternative solution.

Thanks,

Jacob

> > > > +
> > > > +int iommu_fault_notifier_call_chain(struct iommu_fault_event
> > > > *event) +{
> > > > +	int ret;
> > > > +	struct iommu_group *group =
> > > > iommu_group_get(event->dev); +
> > > > +	if (!group)
> > > > +		return -EINVAL;
> > > > +	/* caller provide generic data related to the event,
> > > > TBD */
> > > > +	ret =
> > > > (blocking_notifier_call_chain(&group->fault_notifier, 0, (void
> > > > *)event)
> > > > +		== NOTIFY_BAD) ? -EINVAL : 0;
> > > > +	iommu_group_put(group);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL(iommu_fault_notifier_call_chain);
> > > > +
> > > > +/**
> > > >   * iommu_group_id - Return ID for a group
> > > >   * @group: the group to ID
> > > >   *
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > index 2cdbaa3..fe89e88 100644
> > > > --- a/include/linux/iommu.h
> > > > +++ b/include/linux/iommu.h
> > > > @@ -42,6 +42,7 @@
> > > >   * if the IOMMU page table format is equivalent.
> > > >   */
> > > >  #define IOMMU_PRIV	(1 << 5)
> > > > +#define IOMMU_EXEC	(1 << 6)      
> > > 
> > > Irrelevant change?
> > >     
> > yes, it should be separate. used later in svm patch.  
> > > >  
> > > >  struct iommu_ops;
> > > >  struct iommu_group;
> > > > @@ -97,6 +98,36 @@ struct iommu_domain {
> > > >  	void *iova_cookie;
> > > >  };
> > > >  
> > > > +/*
> > > > + * Generic fault event notification data, used by all IOMMU
> > > > architectures.
> > > > + *
> > > > + * - PCI and non-PCI devices
> > > > + * - Recoverable faults (e.g. page request) & un-recoverable
> > > > faults
> > > > + * - DMA remapping and IRQ remapping faults
> > > > + *
> > > > + * @dev The device which faults are reported by IOMMU
> > > > + * @addr tells the offending address
> > > > + * @pasid contains process address space ID, used in shared
> > > > virtual memory (SVM)
> > > > + * @prot page access protection flag, e.g. IOMMU_READ,
> > > > IOMMU_WRITE
> > > > + * @flags contains fault type, etc.
> > > > + * @length tells the size of the buf
> > > > + * @buf contains any raw or arch specific data
> > > > + *
> > > > + */
> > > > +struct iommu_fault_event {
> > > > +	struct device *dev;
> > > > +	__u64 addr;
> > > > +	__u32 pasid;
> > > > +	__u32 prot;
> > > > +	__u32 flags;
> > > > +#define IOMMU_FAULT_PAGE_REQ	BIT(0)
> > > > +#define IOMMU_FAULT_UNRECOV	BIT(1)
> > > > +#define IOMMU_FAULT_IRQ_REMAP	BIT(2)
> > > > +#define IOMMU_FAULT_INVAL	BIT(3)      
> > > 
> > > Details of each of these are defined where?
> > >     
> > good point, will add description.  
> > > > +	__u32 length;
> > > > +	__u8  buf[];
> > > > +};      
> > > 
> > > This is not UAPI, so I'll be curious how vfio is supposed to
> > > expose this to userspace.
> > >     
> > But this also has other in-kernel users.  
> 
> 
> Any point where you expect vfio to pass through one of these model
> specific data structures needs to be fully specified in a uapi header.
> No magic opaque data please.  Thanks,
> 
> Alex

[Jacob Pan]

  parent reply	other threads:[~2017-06-26 15:27 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-14 22:22 [RFC 0/9] IOMMU driver changes for shared virtual memory virtualization Jacob Pan
2017-06-14 22:22 ` [RFC 4/9] iommu/vt-d: Add iommu do invalidate function Jacob Pan
     [not found]   ` <1497478983-77580-5-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-22 22:52     ` Alex Williamson
     [not found] ` <1497478983-77580-1-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-14 22:22   ` [RFC 1/9] iommu: Introduce bind_pasid_table API function Jacob Pan
     [not found]     ` <1497478983-77580-2-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-22 22:52       ` Alex Williamson
     [not found]         ` <20170622165201.3d8fe75d-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-06-23 18:20           ` Jacob Pan
2017-06-14 22:22   ` [RFC 2/9] iommu/vt-d: add bind_pasid_table function Jacob Pan
     [not found]     ` <1497478983-77580-3-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-22 22:52       ` Alex Williamson
     [not found]         ` <20170622165215.5989e02c-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-06-23 18:19           ` Jacob Pan
2017-06-23 18:59             ` Alex Williamson
2017-06-23 20:21               ` Jacob Pan
2017-06-14 22:22   ` [RFC 3/9] iommu: Introduce iommu do invalidate API function Jacob Pan
     [not found]     ` <1497478983-77580-4-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-22 22:52       ` Alex Williamson
2017-06-14 22:22   ` [RFC 5/9] iommu: Introduce fault notifier API Jacob Pan
     [not found]     ` <1497478983-77580-6-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-22 22:53       ` Alex Williamson
     [not found]         ` <20170622165317.20f3ebde-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-06-23 18:59           ` Jacob Pan
2017-06-23 19:15             ` Alex Williamson
     [not found]               ` <20170623131551.6aeb9af7-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-06-26 15:27                 ` Jacob Pan [this message]
2017-06-26 15:32                   ` Alex Williamson
2017-06-14 22:23   ` [RFC 6/9] iommu/vt-d: track device with pasid table bond to a guest Jacob Pan
     [not found]     ` <1497478983-77580-7-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-22 22:54       ` Alex Williamson
2017-06-14 22:23   ` [RFC 7/9] iommu/dmar: notify unrecoverable faults Jacob Pan
     [not found]     ` <1497478983-77580-8-git-send-email-jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-22 22:54       ` Alex Williamson
     [not found]         ` <20170622165416.6ea718f1-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-06-23 20:19           ` Jacob Pan
2017-06-14 22:23   ` [RFC 8/9] iommu/intel-svm: notify page request to guest Jacob Pan
2017-06-22 22:53     ` Alex Williamson
2017-06-23 20:16       ` Jacob Pan
2017-06-23 20:34         ` Alex Williamson
     [not found]           ` <20170623143434.2473215b-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-06-23 21:33             ` Jacob Pan
2017-06-14 22:23 ` [RFC 9/9] iommu/intel-svm: replace dev ops with generic fault notifier Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170626082752.464c278d@jacob-builder \
    --to=jacob.jun.pan-vuqaysv1563yd54fqh9/ca@public.gmane.org \
    --cc=alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=khali-PUYAD+kWke1g9hUCZPvPmw@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).