Linux IOMMU Development
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2
       [not found]   ` <1415830228-7844-2-git-send-email-jbarnes-Y1mF5jBUw70BENJcbMCuUQ@public.gmane.org>
@ 2015-01-25 13:16     ` Oded Gabbay
       [not found]       ` <54C4ECBC.5070301-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Oded Gabbay @ 2015-01-25 13:16 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: jroedel-l3A5Bk7waGM@public.gmane.org, Bridgman, John,
	Elifaz, Dana,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org



On 11/13/2014 12:10 AM, Jesse Barnes wrote:
> This could be useful for debug in the future if we want to track
> major/minor faults more closely, and also avoids the put_page trick we
> used with gup.
>
> In order to do this, we also track the task struct in the PASID state
> structure.  This lets us update the appropriate task stats after the
> fault has been handled, and may aid with debug in the future as well.
>
> v2: drop task accounting; GPU activity may have been submitted by a
>      different thread than the one binding the PASID (Joerg)
>
> Tested-by: Oded Gabbay<oded.gabbay-5C7GfCeVMHo@public.gmane.org>
> Signed-off-by: Jesse Barnes<jbarnes-Y1mF5jBUw70BENJcbMCuUQ@public.gmane.org>

Hi Jesse,

I know I tested your patch a few months ago, but we have a new feature (still 
internally) in the driver, which has some conflicts with this patch.

Our feature is basically doing "exception handling" by registering a callback 
function with the iommu driver in inv_ppr_cb.

Now, with the old code (we used 3.17.2 until a few days ago), this callback 
function was called in, at least, three use-cases (which we are testing):

(1) Writing to a "bad" system memory address, which is *not* in the process's 
memory address space.

(2) Writing to a read-only page, which is inside the process's memory address space

(3) Reading from a page without permissions, which is inside the process's 
memory address space

With the new code (3.19-rc5), this callback is only called in the first 
use-case, while (2) and (3) are handled in handle_mm_fault(), which is now 
called from do_fault. The return value of handle_mm_fault() is 0, so 
handle_fault_error() is not called and amdkfd doesn't get notification, hence 
our test fails.

This is a problem for us as we want to propagate these exceptions to the user 
space HSA runtime, so it could handle them.

I have 2 questions:

1. Why don't we call inv_ppr_cb() in any case ?
2. How come handle_mm_fault() returns 0 in cases (2) and (3) ? Or in other 
words, what is considered to be a success in handle_mm_fault() and is it visible 
to the user-space process ?

Thanks,

	Oded

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2
       [not found]       ` <54C4ECBC.5070301-5C7GfCeVMHo@public.gmane.org>
@ 2015-01-26 23:01         ` Jesse Barnes
  0 siblings, 0 replies; 2+ messages in thread
From: Jesse Barnes @ 2015-01-26 23:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: jroedel-l3A5Bk7waGM@public.gmane.org, Bridgman, John,
	Elifaz, Dana,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org

On Sun, 25 Jan 2015 15:16:44 +0200
Oded Gabbay <oded.gabbay-5C7GfCeVMHo@public.gmane.org> wrote:

> 
> 
> On 11/13/2014 12:10 AM, Jesse Barnes wrote:
> > This could be useful for debug in the future if we want to track
> > major/minor faults more closely, and also avoids the put_page trick we
> > used with gup.
> >
> > In order to do this, we also track the task struct in the PASID state
> > structure.  This lets us update the appropriate task stats after the
> > fault has been handled, and may aid with debug in the future as well.
> >
> > v2: drop task accounting; GPU activity may have been submitted by a
> >      different thread than the one binding the PASID (Joerg)
> >
> > Tested-by: Oded Gabbay<oded.gabbay-5C7GfCeVMHo@public.gmane.org>
> > Signed-off-by: Jesse Barnes<jbarnes-Y1mF5jBUw70BENJcbMCuUQ@public.gmane.org>
> 
> Hi Jesse,
> 
> I know I tested your patch a few months ago, but we have a new feature (still 
> internally) in the driver, which has some conflicts with this patch.
> 
> Our feature is basically doing "exception handling" by registering a callback 
> function with the iommu driver in inv_ppr_cb.
> 
> Now, with the old code (we used 3.17.2 until a few days ago), this callback 
> function was called in, at least, three use-cases (which we are testing):
> 
> (1) Writing to a "bad" system memory address, which is *not* in the process's 
> memory address space.
> 
> (2) Writing to a read-only page, which is inside the process's memory address space
> 
> (3) Reading from a page without permissions, which is inside the process's 
> memory address space
> 
> With the new code (3.19-rc5), this callback is only called in the first 
> use-case, while (2) and (3) are handled in handle_mm_fault(), which is now 
> called from do_fault. The return value of handle_mm_fault() is 0, so 
> handle_fault_error() is not called and amdkfd doesn't get notification, hence 
> our test fails.
> 
> This is a problem for us as we want to propagate these exceptions to the user 
> space HSA runtime, so it could handle them.
> 
> I have 2 questions:
> 
> 1. Why don't we call inv_ppr_cb() in any case ?

We do if we fail to allocate the vma or it's in the wrong location, but
we could extend the do_fault() handling to do it in more cases.

> 2. How come handle_mm_fault() returns 0 in cases (2) and (3) ? Or in other 
> words, what is considered to be a success in handle_mm_fault() and is it visible 
> to the user-space process ?

handle_mm_fault() is somewhat of a low level function.  We can catch
more cases in our own do_fault() code if we need to.   The x86
__do_page_fault is probably a good reference.  I mainly tried to match
existing behavior when I added the handle_mm_fault(), but may have
missed stuff.  As I said, we can extend our do_fault() to handle all
the cases we want prior to calling handle_mm_fault().

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-01-26 23:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1415830228-7844-1-git-send-email-jbarnes@virtuousgeek.org>
     [not found] ` <1415830228-7844-2-git-send-email-jbarnes@virtuousgeek.org>
     [not found]   ` <1415830228-7844-2-git-send-email-jbarnes-Y1mF5jBUw70BENJcbMCuUQ@public.gmane.org>
2015-01-25 13:16     ` [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2 Oded Gabbay
     [not found]       ` <54C4ECBC.5070301-5C7GfCeVMHo@public.gmane.org>
2015-01-26 23:01         ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox