[PATCH 02/15] mm/hmm: fix header file if/else/endif maze v2

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 02/15] mm/hmm: fix header file if/else/endif maze v2
       [not found] <20180320020038.3360-1-jglisse@redhat.com>
@ 2018-03-20  2:00 ` jglisse
  2018-03-20  2:00 ` [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
  2018-03-20  2:00 ` [PATCH 05/15] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
  2 siblings, 0 replies; 11+ messages in thread
From: jglisse @ 2018-03-20  2:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable,
	Ralph Campbell, John Hubbard, Evgeny Baskakov

From: Jérôme Glisse <jglisse@redhat.com>

The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. Because
of this after multiple include there was multiple definition of both
hmm_mm_init() and hmm_mm_destroy() leading to build failure if HMM
was enabled (CONFIG_HMM set).

Changed since v1:
  - Fix the maze when CONFIG_HMM is disabled not just when it is
    disabled. This fix bot build failure.
  - Improved commit message.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
---
 include/linux/hmm.h | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 325017ad9311..36dd21fe5caf 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -498,23 +498,16 @@ struct hmm_device {
 struct hmm_device *hmm_device_new(void *drvdata);
 void hmm_device_put(struct hmm_device *hmm_device);
 #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-#endif /* IS_ENABLED(CONFIG_HMM) */
 
 /* Below are for HMM internal use only! Not to be used by device driver! */
-#if IS_ENABLED(CONFIG_HMM_MIRROR)
 void hmm_mm_destroy(struct mm_struct *mm);
 
 static inline void hmm_mm_init(struct mm_struct *mm)
 {
 	mm->hmm = NULL;
 }
-#else /* IS_ENABLED(CONFIG_HMM_MIRROR) */
-static inline void hmm_mm_destroy(struct mm_struct *mm) {}
-static inline void hmm_mm_init(struct mm_struct *mm) {}
-#endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
-
-
 #else /* IS_ENABLED(CONFIG_HMM) */
 static inline void hmm_mm_destroy(struct mm_struct *mm) {}
 static inline void hmm_mm_init(struct mm_struct *mm) {}
+#endif /* IS_ENABLED(CONFIG_HMM) */
 #endif /* LINUX_HMM_H */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
       [not found] <20180320020038.3360-1-jglisse@redhat.com>
  2018-03-20  2:00 ` [PATCH 02/15] mm/hmm: fix header file if/else/endif maze v2 jglisse
@ 2018-03-20  2:00 ` jglisse
  2018-03-21  4:14   ` John Hubbard
  2018-03-20  2:00 ` [PATCH 05/15] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
  2 siblings, 1 reply; 11+ messages in thread
From: jglisse @ 2018-03-20  2:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Ralph Campbell,
	Jérôme Glisse, stable, Evgeny Baskakov, Mark Hairgrove,
	John Hubbard

From: Ralph Campbell <rcampbell@nvidia.com>

The hmm_mirror_register() function registers a callback for when
the CPU pagetable is modified. Normally, the device driver will
call hmm_mirror_unregister() when the process using the device is
finished. However, if the process exits uncleanly, the struct_mm
can be destroyed with no warning to the device driver.

Changed since v1:
  - dropped VM_BUG_ON()
  - cc stable

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h | 10 ++++++++++
 mm/hmm.c            | 18 +++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 36dd21fe5caf..fa7b51f65905 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -218,6 +218,16 @@ enum hmm_update_type {
  * @update: callback to update range on a device
  */
 struct hmm_mirror_ops {
+	/* release() - release hmm_mirror
+	 *
+	 * @mirror: pointer to struct hmm_mirror
+	 *
+	 * This is called when the mm_struct is being released.
+	 * The callback should make sure no references to the mirror occur
+	 * after the callback returns.
+	 */
+	void (*release)(struct hmm_mirror *mirror);
+
 	/* sync_cpu_device_pagetables() - synchronize page tables
 	 *
 	 * @mirror: pointer to struct hmm_mirror
diff --git a/mm/hmm.c b/mm/hmm.c
index 320545b98ff5..6088fa6ed137 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -160,6 +160,21 @@ static void hmm_invalidate_range(struct hmm *hmm,
 	up_read(&hmm->mirrors_sem);
 }
 
+static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct hmm *hmm = mm->hmm;
+	struct hmm_mirror *mirror;
+	struct hmm_mirror *mirror_next;
+
+	down_write(&hmm->mirrors_sem);
+	list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
+		list_del_init(&mirror->list);
+		if (mirror->ops->release)
+			mirror->ops->release(mirror);
+	}
+	up_write(&hmm->mirrors_sem);
+}
+
 static void hmm_invalidate_range_start(struct mmu_notifier *mn,
 				       struct mm_struct *mm,
 				       unsigned long start,
@@ -185,6 +200,7 @@ static void hmm_invalidate_range_end(struct mmu_notifier *mn,
 }
 
 static const struct mmu_notifier_ops hmm_mmu_notifier_ops = {
+	.release		= hmm_release,
 	.invalidate_range_start	= hmm_invalidate_range_start,
 	.invalidate_range_end	= hmm_invalidate_range_end,
 };
@@ -230,7 +246,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror)
 	struct hmm *hmm = mirror->hmm;
 
 	down_write(&hmm->mirrors_sem);
-	list_del(&mirror->list);
+	list_del_init(&mirror->list);
 	up_write(&hmm->mirrors_sem);
 }
 EXPORT_SYMBOL(hmm_mirror_unregister);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 05/15] mm/hmm: hmm_pfns_bad() was accessing wrong struct
       [not found] <20180320020038.3360-1-jglisse@redhat.com>
  2018-03-20  2:00 ` [PATCH 02/15] mm/hmm: fix header file if/else/endif maze v2 jglisse
  2018-03-20  2:00 ` [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
@ 2018-03-20  2:00 ` jglisse
  2 siblings, 0 replies; 11+ messages in thread
From: jglisse @ 2018-03-20  2:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: Jérôme Glisse <jglisse@redhat.com>

The private field of mm_walk struct point to an hmm_vma_walk struct and
not to the hmm_range struct desired. Fix to get proper struct pointer.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 mm/hmm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 667944630dc9..f5631e1a7319 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -312,7 +312,8 @@ static int hmm_pfns_bad(unsigned long addr,
 			unsigned long end,
 			struct mm_walk *walk)
 {
-	struct hmm_range *range = walk->private;
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	hmm_pfn_t *pfns = range->pfns;
 	unsigned long i;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-20  2:00 ` [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
@ 2018-03-21  4:14   ` John Hubbard
  2018-03-21 18:03     ` Jerome Glisse
  0 siblings, 1 reply; 11+ messages in thread
From: John Hubbard @ 2018-03-21  4:14 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> The hmm_mirror_register() function registers a callback for when
> the CPU pagetable is modified. Normally, the device driver will
> call hmm_mirror_unregister() when the process using the device is
> finished. However, if the process exits uncleanly, the struct_mm
> can be destroyed with no warning to the device driver.
> 
> Changed since v1:
>   - dropped VM_BUG_ON()
>   - cc stable
> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: stable@vger.kernel.org
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/hmm.h | 10 ++++++++++
>  mm/hmm.c            | 18 +++++++++++++++++-
>  2 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 36dd21fe5caf..fa7b51f65905 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -218,6 +218,16 @@ enum hmm_update_type {
>   * @update: callback to update range on a device
>   */
>  struct hmm_mirror_ops {
> +	/* release() - release hmm_mirror
> +	 *
> +	 * @mirror: pointer to struct hmm_mirror
> +	 *
> +	 * This is called when the mm_struct is being released.
> +	 * The callback should make sure no references to the mirror occur
> +	 * after the callback returns.
> +	 */
> +	void (*release)(struct hmm_mirror *mirror);
> +
>  	/* sync_cpu_device_pagetables() - synchronize page tables
>  	 *
>  	 * @mirror: pointer to struct hmm_mirror
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 320545b98ff5..6088fa6ed137 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -160,6 +160,21 @@ static void hmm_invalidate_range(struct hmm *hmm,
>  	up_read(&hmm->mirrors_sem);
>  }
>  
> +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +	struct hmm *hmm = mm->hmm;
> +	struct hmm_mirror *mirror;
> +	struct hmm_mirror *mirror_next;
> +
> +	down_write(&hmm->mirrors_sem);
> +	list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
> +		list_del_init(&mirror->list);
> +		if (mirror->ops->release)
> +			mirror->ops->release(mirror);

Hi Jerome,

This presents a deadlock problem (details below). As for solution ideas, 
Mark Hairgrove points out that the MMU notifiers had to solve the
same sort of problem, and part of the solution involves "avoid
holding locks when issuing these callbacks". That's not an entire 
solution description, of course, but it seems like a good start.

Anyway, for the deadlock problem:

Each of these ->release callbacks potentially has to wait for the 
hmm_invalidate_range() callbacks to finish. That is not shown in any
code directly, but it's because: when a device driver is processing 
the above ->release callback, it has to allow any in-progress operations 
to finish up (as specified clearly in your comment documentation above). 

Some of those operations will invariably need to do things that result 
in page invalidations, thus triggering the hmm_invalidate_range() callback.
Then, the hmm_invalidate_range() callback tries to acquire the same 
hmm->mirrors_sem lock, thus leading to deadlock:

hmm_invalidate_range():
// ...
	down_read(&hmm->mirrors_sem);
	list_for_each_entry(mirror, &hmm->mirrors, list)
		mirror->ops->sync_cpu_device_pagetables(mirror, action,
							start, end);
	up_read(&hmm->mirrors_sem);

thanks,
--
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-21  4:14   ` John Hubbard
@ 2018-03-21 18:03     ` Jerome Glisse
  2018-03-21 22:16       ` John Hubbard
  0 siblings, 1 reply; 11+ messages in thread
From: Jerome Glisse @ 2018-03-21 18:03 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> > From: Ralph Campbell <rcampbell@nvidia.com>
> > 
> > The hmm_mirror_register() function registers a callback for when
> > the CPU pagetable is modified. Normally, the device driver will
> > call hmm_mirror_unregister() when the process using the device is
> > finished. However, if the process exits uncleanly, the struct_mm
> > can be destroyed with no warning to the device driver.
> > 
> > Changed since v1:
> >   - dropped VM_BUG_ON()
> >   - cc stable
> > 
> > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> > Signed-off-by: Jï¿½rï¿½me Glisse <jglisse@redhat.com>
> > Cc: stable@vger.kernel.org
> > Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> > Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > ---
> >  include/linux/hmm.h | 10 ++++++++++
> >  mm/hmm.c            | 18 +++++++++++++++++-
> >  2 files changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > index 36dd21fe5caf..fa7b51f65905 100644
> > --- a/include/linux/hmm.h
> > +++ b/include/linux/hmm.h
> > @@ -218,6 +218,16 @@ enum hmm_update_type {
> >   * @update: callback to update range on a device
> >   */
> >  struct hmm_mirror_ops {
> > +	/* release() - release hmm_mirror
> > +	 *
> > +	 * @mirror: pointer to struct hmm_mirror
> > +	 *
> > +	 * This is called when the mm_struct is being released.
> > +	 * The callback should make sure no references to the mirror occur
> > +	 * after the callback returns.
> > +	 */
> > +	void (*release)(struct hmm_mirror *mirror);
> > +
> >  	/* sync_cpu_device_pagetables() - synchronize page tables
> >  	 *
> >  	 * @mirror: pointer to struct hmm_mirror
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 320545b98ff5..6088fa6ed137 100644
> > --- a/mm/hmm.c
> > +++ b/mm/hmm.c
> > @@ -160,6 +160,21 @@ static void hmm_invalidate_range(struct hmm *hmm,
> >  	up_read(&hmm->mirrors_sem);
> >  }
> >  
> > +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> > +{
> > +	struct hmm *hmm = mm->hmm;
> > +	struct hmm_mirror *mirror;
> > +	struct hmm_mirror *mirror_next;
> > +
> > +	down_write(&hmm->mirrors_sem);
> > +	list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
> > +		list_del_init(&mirror->list);
> > +		if (mirror->ops->release)
> > +			mirror->ops->release(mirror);
> 
> Hi Jerome,
> 
> This presents a deadlock problem (details below). As for solution ideas, 
> Mark Hairgrove points out that the MMU notifiers had to solve the
> same sort of problem, and part of the solution involves "avoid
> holding locks when issuing these callbacks". That's not an entire 
> solution description, of course, but it seems like a good start.
> 
> Anyway, for the deadlock problem:
> 
> Each of these ->release callbacks potentially has to wait for the 
> hmm_invalidate_range() callbacks to finish. That is not shown in any
> code directly, but it's because: when a device driver is processing 
> the above ->release callback, it has to allow any in-progress operations 
> to finish up (as specified clearly in your comment documentation above). 
> 
> Some of those operations will invariably need to do things that result 
> in page invalidations, thus triggering the hmm_invalidate_range() callback.
> Then, the hmm_invalidate_range() callback tries to acquire the same 
> hmm->mirrors_sem lock, thus leading to deadlock:
> 
> hmm_invalidate_range():
> // ...
> 	down_read(&hmm->mirrors_sem);
> 	list_for_each_entry(mirror, &hmm->mirrors, list)
> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
> 							start, end);
> 	up_read(&hmm->mirrors_sem);

That is just illegal, the release callback is not allowed to trigger
invalidation all it does is kill all device's threads and stop device
page fault from happening. So there is no deadlock issues. I can re-
inforce the comment some more (see [1] for example on what it should
be).

Also it is illegal for the sync callback to trigger any mmu_notifier
callback. I thought this was obvious. The sync callback should only
update device page table and do _nothing else_. No way to make this
re-entrant.

For anonymous private memory migrated to device memory it is freed
shortly after the release callback (see exit_mmap()). For share memory
you might want to migrate back to regular memory but that will be fine
as you will not get mmu_notifier callback any more.

So i don't see any deadlock here.

Cheers,
Jï¿½rï¿½me

[1] https://cgit.freedesktop.org/~glisse/linux/commit/?h=nouveau-hmm&id=93adb3e6b4f39d5d146b6a8afb4175d37bdd4890

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-21 18:03     ` Jerome Glisse
@ 2018-03-21 22:16       ` John Hubbard
  2018-03-21 22:46         ` Jerome Glisse
  0 siblings, 1 reply; 11+ messages in thread
From: John Hubbard @ 2018-03-21 22:16 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-mm, Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
>> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
>>> From: Ralph Campbell <rcampbell@nvidia.com>

<snip>

>> Hi Jerome,
>>
>> This presents a deadlock problem (details below). As for solution ideas, 
>> Mark Hairgrove points out that the MMU notifiers had to solve the
>> same sort of problem, and part of the solution involves "avoid
>> holding locks when issuing these callbacks". That's not an entire 
>> solution description, of course, but it seems like a good start.
>>
>> Anyway, for the deadlock problem:
>>
>> Each of these ->release callbacks potentially has to wait for the 
>> hmm_invalidate_range() callbacks to finish. That is not shown in any
>> code directly, but it's because: when a device driver is processing 
>> the above ->release callback, it has to allow any in-progress operations 
>> to finish up (as specified clearly in your comment documentation above). 
>>
>> Some of those operations will invariably need to do things that result 
>> in page invalidations, thus triggering the hmm_invalidate_range() callback.
>> Then, the hmm_invalidate_range() callback tries to acquire the same 
>> hmm->mirrors_sem lock, thus leading to deadlock:
>>
>> hmm_invalidate_range():
>> // ...
>> 	down_read(&hmm->mirrors_sem);
>> 	list_for_each_entry(mirror, &hmm->mirrors, list)
>> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
>> 							start, end);
>> 	up_read(&hmm->mirrors_sem);
> 
> That is just illegal, the release callback is not allowed to trigger
> invalidation all it does is kill all device's threads and stop device
> page fault from happening. So there is no deadlock issues. I can re-
> inforce the comment some more (see [1] for example on what it should
> be).

That rule is fine, and it is true that the .release callback will not 
directly trigger any invalidations. However, the problem is in letting 
any *existing* outstanding operations finish up. We have to let 
existing operations "drain", in order to meet the requirement that 
everything is done when .release returns.

For example, if a device driver thread is in the middle of working through
its fault buffer, it will call migrate_vma(), which will in turn unmap
pages. That will cause an hmm_invalidate_range() callback, which tries
to take hmm->mirrors_sems, and we deadlock.

There's no way to "kill" such a thread while it's in the middle of
migrate_vma(), you have to let it finish up.

> 
> Also it is illegal for the sync callback to trigger any mmu_notifier
> callback. I thought this was obvious. The sync callback should only
> update device page table and do _nothing else_. No way to make this
> re-entrant.

That is obvious, yes. I am not trying to say there is any problem with
that rule. It's the "drain outstanding operations during .release", 
above, that is the real problem.

thanks,
-- 
John Hubbard
NVIDIA

> 
> For anonymous private memory migrated to device memory it is freed
> shortly after the release callback (see exit_mmap()). For share memory
> you might want to migrate back to regular memory but that will be fine
> as you will not get mmu_notifier callback any more.
> 
> So i don't see any deadlock here.
> 
> Cheers,
> Jérôme
> 
> [1] https://cgit.freedesktop.org/~glisse/linux/commit/?h=nouveau-hmm&id=93adb3e6b4f39d5d146b6a8afb4175d37bdd4890
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-21 22:16       ` John Hubbard
@ 2018-03-21 22:46         ` Jerome Glisse
  2018-03-21 23:10           ` John Hubbard
  0 siblings, 1 reply; 11+ messages in thread
From: Jerome Glisse @ 2018-03-21 22:46 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> > On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
> >> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> >>> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> <snip>
> 
> >> Hi Jerome,
> >>
> >> This presents a deadlock problem (details below). As for solution ideas, 
> >> Mark Hairgrove points out that the MMU notifiers had to solve the
> >> same sort of problem, and part of the solution involves "avoid
> >> holding locks when issuing these callbacks". That's not an entire 
> >> solution description, of course, but it seems like a good start.
> >>
> >> Anyway, for the deadlock problem:
> >>
> >> Each of these ->release callbacks potentially has to wait for the 
> >> hmm_invalidate_range() callbacks to finish. That is not shown in any
> >> code directly, but it's because: when a device driver is processing 
> >> the above ->release callback, it has to allow any in-progress operations 
> >> to finish up (as specified clearly in your comment documentation above). 
> >>
> >> Some of those operations will invariably need to do things that result 
> >> in page invalidations, thus triggering the hmm_invalidate_range() callback.
> >> Then, the hmm_invalidate_range() callback tries to acquire the same 
> >> hmm->mirrors_sem lock, thus leading to deadlock:
> >>
> >> hmm_invalidate_range():
> >> // ...
> >> 	down_read(&hmm->mirrors_sem);
> >> 	list_for_each_entry(mirror, &hmm->mirrors, list)
> >> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
> >> 							start, end);
> >> 	up_read(&hmm->mirrors_sem);
> > 
> > That is just illegal, the release callback is not allowed to trigger
> > invalidation all it does is kill all device's threads and stop device
> > page fault from happening. So there is no deadlock issues. I can re-
> > inforce the comment some more (see [1] for example on what it should
> > be).
> 
> That rule is fine, and it is true that the .release callback will not 
> directly trigger any invalidations. However, the problem is in letting 
> any *existing* outstanding operations finish up. We have to let 
> existing operations "drain", in order to meet the requirement that 
> everything is done when .release returns.
> 
> For example, if a device driver thread is in the middle of working through
> its fault buffer, it will call migrate_vma(), which will in turn unmap
> pages. That will cause an hmm_invalidate_range() callback, which tries
> to take hmm->mirrors_sems, and we deadlock.
> 
> There's no way to "kill" such a thread while it's in the middle of
> migrate_vma(), you have to let it finish up.
>
> > Also it is illegal for the sync callback to trigger any mmu_notifier
> > callback. I thought this was obvious. The sync callback should only
> > update device page table and do _nothing else_. No way to make this
> > re-entrant.
> 
> That is obvious, yes. I am not trying to say there is any problem with
> that rule. It's the "drain outstanding operations during .release", 
> above, that is the real problem.

Maybe just relax the release callback wording, it should stop any
more processing of fault buffer but not wait for it to finish. In
nouveau code i kill thing but i do not wait hence i don't deadlock.

What matter is to stop any further processing. Yes some fault might
be in flight but they will serialize on various lock. So just do not
wait in the release callback, kill thing. I might have a bug where i
still fill in GPU page table in nouveau, i will check nouveau code
for that.

Kill thing should also kill the channel (i don't do that in nouveau
because i am waiting on some channel patchset) but i am not sure if
hardware like it if we kill channel before stoping fault notification.

Cheers,
Jï¿½rï¿½me

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-21 22:46         ` Jerome Glisse
@ 2018-03-21 23:10           ` John Hubbard
  2018-03-21 23:37             ` Jerome Glisse
  0 siblings, 1 reply; 11+ messages in thread
From: John Hubbard @ 2018-03-21 23:10 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-mm, Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On 03/21/2018 03:46 PM, Jerome Glisse wrote:
> On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
>> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
>>> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
>>>> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
>>>>> From: Ralph Campbell <rcampbell@nvidia.com>
>>
>> <snip>
>>
>>>> Hi Jerome,
>>>>
>>>> This presents a deadlock problem (details below). As for solution ideas, 
>>>> Mark Hairgrove points out that the MMU notifiers had to solve the
>>>> same sort of problem, and part of the solution involves "avoid
>>>> holding locks when issuing these callbacks". That's not an entire 
>>>> solution description, of course, but it seems like a good start.
>>>>
>>>> Anyway, for the deadlock problem:
>>>>
>>>> Each of these ->release callbacks potentially has to wait for the 
>>>> hmm_invalidate_range() callbacks to finish. That is not shown in any
>>>> code directly, but it's because: when a device driver is processing 
>>>> the above ->release callback, it has to allow any in-progress operations 
>>>> to finish up (as specified clearly in your comment documentation above). 
>>>>
>>>> Some of those operations will invariably need to do things that result 
>>>> in page invalidations, thus triggering the hmm_invalidate_range() callback.
>>>> Then, the hmm_invalidate_range() callback tries to acquire the same 
>>>> hmm->mirrors_sem lock, thus leading to deadlock:
>>>>
>>>> hmm_invalidate_range():
>>>> // ...
>>>> 	down_read(&hmm->mirrors_sem);
>>>> 	list_for_each_entry(mirror, &hmm->mirrors, list)
>>>> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
>>>> 							start, end);
>>>> 	up_read(&hmm->mirrors_sem);
>>>
>>> That is just illegal, the release callback is not allowed to trigger
>>> invalidation all it does is kill all device's threads and stop device
>>> page fault from happening. So there is no deadlock issues. I can re-
>>> inforce the comment some more (see [1] for example on what it should
>>> be).
>>
>> That rule is fine, and it is true that the .release callback will not 
>> directly trigger any invalidations. However, the problem is in letting 
>> any *existing* outstanding operations finish up. We have to let 
>> existing operations "drain", in order to meet the requirement that 
>> everything is done when .release returns.
>>
>> For example, if a device driver thread is in the middle of working through
>> its fault buffer, it will call migrate_vma(), which will in turn unmap
>> pages. That will cause an hmm_invalidate_range() callback, which tries
>> to take hmm->mirrors_sems, and we deadlock.
>>
>> There's no way to "kill" such a thread while it's in the middle of
>> migrate_vma(), you have to let it finish up.
>>
>>> Also it is illegal for the sync callback to trigger any mmu_notifier
>>> callback. I thought this was obvious. The sync callback should only
>>> update device page table and do _nothing else_. No way to make this
>>> re-entrant.
>>
>> That is obvious, yes. I am not trying to say there is any problem with
>> that rule. It's the "drain outstanding operations during .release", 
>> above, that is the real problem.
> 
> Maybe just relax the release callback wording, it should stop any
> more processing of fault buffer but not wait for it to finish. In
> nouveau code i kill thing but i do not wait hence i don't deadlock.

But you may crash, because that approach allows .release to finish
up, thus removing the mm entirely, out from under (for example)
a migrate_vma call--or any other call that refers to the mm.

It doesn't seem too hard to avoid the problem, though: maybe we
can just drop the lock while doing the mirror->ops->release callback.
There are a few ways to do this, but one example is: 

    -- take the lock,
        -- copy the list to a local list, deleting entries as you go,
    -- drop the lock, 
    -- iterate through the local list copy and 
        -- issue the mirror->ops->release callbacks.

At this point, more items could have been added to the list, so repeat
the above until the original list is empty. 

This is subject to a limited starvation case if mirror keep getting 
registered, but I think we can ignore that, because it only lasts as long as 
mirrors keep getting added, and then it finishes up.

> 
> What matter is to stop any further processing. Yes some fault might
> be in flight but they will serialize on various lock. 

Those faults in flight could already be at a point where they have taken
whatever locks they need, so we don't dare let the mm get destroyed while
such fault handling is in progress.


So just do not
> wait in the release callback, kill thing. I might have a bug where i
> still fill in GPU page table in nouveau, i will check nouveau code
> for that.

Again, we can't "kill" a thread of execution (this would often be an
interrupt bottom half context, btw) while it is, for example,
in the middle of migrate_vma.

I really don't believe there is a safe way to do this without draining
the existing operations before .release returns, and for that, we'll need to 
issue the .release callbacks while not holding locks.

thanks,
-- 
John Hubbard
NVIDIA

> 
> Kill thing should also kill the channel (i don't do that in nouveau
> because i am waiting on some channel patchset) but i am not sure if
> hardware like it if we kill channel before stoping fault notification.
> 
> Cheers,
> Jérôme
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-21 23:10           ` John Hubbard
@ 2018-03-21 23:37             ` Jerome Glisse
  2018-03-22  0:11               ` John Hubbard
  0 siblings, 1 reply; 11+ messages in thread
From: Jerome Glisse @ 2018-03-21 23:37 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On Wed, Mar 21, 2018 at 04:10:32PM -0700, John Hubbard wrote:
> On 03/21/2018 03:46 PM, Jerome Glisse wrote:
> > On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
> >> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> >>> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
> >>>> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> >>>>> From: Ralph Campbell <rcampbell@nvidia.com>

[...]

> >>> That is just illegal, the release callback is not allowed to trigger
> >>> invalidation all it does is kill all device's threads and stop device
> >>> page fault from happening. So there is no deadlock issues. I can re-
> >>> inforce the comment some more (see [1] for example on what it should
> >>> be).
> >>
> >> That rule is fine, and it is true that the .release callback will not 
> >> directly trigger any invalidations. However, the problem is in letting 
> >> any *existing* outstanding operations finish up. We have to let 
> >> existing operations "drain", in order to meet the requirement that 
> >> everything is done when .release returns.
> >>
> >> For example, if a device driver thread is in the middle of working through
> >> its fault buffer, it will call migrate_vma(), which will in turn unmap
> >> pages. That will cause an hmm_invalidate_range() callback, which tries
> >> to take hmm->mirrors_sems, and we deadlock.
> >>
> >> There's no way to "kill" such a thread while it's in the middle of
> >> migrate_vma(), you have to let it finish up.
> >>
> >>> Also it is illegal for the sync callback to trigger any mmu_notifier
> >>> callback. I thought this was obvious. The sync callback should only
> >>> update device page table and do _nothing else_. No way to make this
> >>> re-entrant.
> >>
> >> That is obvious, yes. I am not trying to say there is any problem with
> >> that rule. It's the "drain outstanding operations during .release", 
> >> above, that is the real problem.
> > 
> > Maybe just relax the release callback wording, it should stop any
> > more processing of fault buffer but not wait for it to finish. In
> > nouveau code i kill thing but i do not wait hence i don't deadlock.
> 
> But you may crash, because that approach allows .release to finish
> up, thus removing the mm entirely, out from under (for example)
> a migrate_vma call--or any other call that refers to the mm.

No you can not crash on mm as it will not vanish before you are done
with it as mm will not be freed before you call hmm_unregister() and
you should not call that from release, nor should you call it before
everything is flush. However vma struct might vanish ... i might have
assume wrongly about the down_write() always happening in exit_mmap()
This might be a solution to force serialization.


> 
> It doesn't seem too hard to avoid the problem, though: maybe we
> can just drop the lock while doing the mirror->ops->release callback.
> There are a few ways to do this, but one example is: 
> 
>     -- take the lock,
>         -- copy the list to a local list, deleting entries as you go,
>     -- drop the lock, 
>     -- iterate through the local list copy and 
>         -- issue the mirror->ops->release callbacks.
> 
> At this point, more items could have been added to the list, so repeat
> the above until the original list is empty. 
> 
> This is subject to a limited starvation case if mirror keep getting 
> registered, but I think we can ignore that, because it only lasts as long as 
> mirrors keep getting added, and then it finishes up.

The down_write is better solution and easier just 2 line of code.

> 
> > 
> > What matter is to stop any further processing. Yes some fault might
> > be in flight but they will serialize on various lock. 
> 
> Those faults in flight could already be at a point where they have taken
> whatever locks they need, so we don't dare let the mm get destroyed while
> such fault handling is in progress.

mm can not vanish until hmm_unregister() is call, vma will vanish before.

> So just do not
> > wait in the release callback, kill thing. I might have a bug where i
> > still fill in GPU page table in nouveau, i will check nouveau code
> > for that.
> 
> Again, we can't "kill" a thread of execution (this would often be an
> interrupt bottom half context, btw) while it is, for example,
> in the middle of migrate_vma.

You should not call migrate from bottom half ! Only call this from work
queue like nouveau.

> 
> I really don't believe there is a safe way to do this without draining
> the existing operations before .release returns, and for that, we'll need to 
> issue the .release callbacks while not holding locks.

down_write on mmap_sem would force serialization. I am not sure we want
to do this change now. It can wait as it is definitly not an issue for
nouveau yet. Taking mmap_sem in write (see oom in exit_mmap()) in release
make me nervous.

Cheers,
Jï¿½rï¿½me

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-21 23:37             ` Jerome Glisse
@ 2018-03-22  0:11               ` John Hubbard
  2018-03-22  1:32                 ` Jerome Glisse
  0 siblings, 1 reply; 11+ messages in thread
From: John Hubbard @ 2018-03-22  0:11 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-mm, Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On 03/21/2018 04:37 PM, Jerome Glisse wrote:
> On Wed, Mar 21, 2018 at 04:10:32PM -0700, John Hubbard wrote:
>> On 03/21/2018 03:46 PM, Jerome Glisse wrote:
>>> On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
>>>> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
>>>>> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
>>>>>> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
>>>>>>> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> [...]
> 
>>>>> That is just illegal, the release callback is not allowed to trigger
>>>>> invalidation all it does is kill all device's threads and stop device
>>>>> page fault from happening. So there is no deadlock issues. I can re-
>>>>> inforce the comment some more (see [1] for example on what it should
>>>>> be).
>>>>
>>>> That rule is fine, and it is true that the .release callback will not 
>>>> directly trigger any invalidations. However, the problem is in letting 
>>>> any *existing* outstanding operations finish up. We have to let 
>>>> existing operations "drain", in order to meet the requirement that 
>>>> everything is done when .release returns.
>>>>
>>>> For example, if a device driver thread is in the middle of working through
>>>> its fault buffer, it will call migrate_vma(), which will in turn unmap
>>>> pages. That will cause an hmm_invalidate_range() callback, which tries
>>>> to take hmm->mirrors_sems, and we deadlock.
>>>>
>>>> There's no way to "kill" such a thread while it's in the middle of
>>>> migrate_vma(), you have to let it finish up.
>>>>
>>>>> Also it is illegal for the sync callback to trigger any mmu_notifier
>>>>> callback. I thought this was obvious. The sync callback should only
>>>>> update device page table and do _nothing else_. No way to make this
>>>>> re-entrant.
>>>>
>>>> That is obvious, yes. I am not trying to say there is any problem with
>>>> that rule. It's the "drain outstanding operations during .release", 
>>>> above, that is the real problem.
>>>
>>> Maybe just relax the release callback wording, it should stop any
>>> more processing of fault buffer but not wait for it to finish. In
>>> nouveau code i kill thing but i do not wait hence i don't deadlock.
>>
>> But you may crash, because that approach allows .release to finish
>> up, thus removing the mm entirely, out from under (for example)
>> a migrate_vma call--or any other call that refers to the mm.
> 
> No you can not crash on mm as it will not vanish before you are done
> with it as mm will not be freed before you call hmm_unregister() and
> you should not call that from release, nor should you call it before
> everything is flush. However vma struct might vanish ... i might have
> assume wrongly about the down_write() always happening in exit_mmap()
> This might be a solution to force serialization.
> 
 
OK. My details on mm destruction were inaccurate, but we do agree now
that that the whole virtual address space is being torn down at the same 
time as we're trying to use it, so I think we're on the same page now.

>>
>> It doesn't seem too hard to avoid the problem, though: maybe we
>> can just drop the lock while doing the mirror->ops->release callback.
>> There are a few ways to do this, but one example is: 
>>
>>     -- take the lock,
>>         -- copy the list to a local list, deleting entries as you go,
>>     -- drop the lock, 
>>     -- iterate through the local list copy and 
>>         -- issue the mirror->ops->release callbacks.
>>
>> At this point, more items could have been added to the list, so repeat
>> the above until the original list is empty. 
>>
>> This is subject to a limited starvation case if mirror keep getting 
>> registered, but I think we can ignore that, because it only lasts as long as 
>> mirrors keep getting added, and then it finishes up.
> 
> The down_write is better solution and easier just 2 line of code.

OK. I'll have a better idea when I see it.

> 
>>
>>>
>>> What matter is to stop any further processing. Yes some fault might
>>> be in flight but they will serialize on various lock. 
>>
>> Those faults in flight could already be at a point where they have taken
>> whatever locks they need, so we don't dare let the mm get destroyed while
>> such fault handling is in progress.
> 
> mm can not vanish until hmm_unregister() is call, vma will vanish before.

OK, yes. And we agree that vma vanishing is a problem. 

> 
>> So just do not
>>> wait in the release callback, kill thing. I might have a bug where i
>>> still fill in GPU page table in nouveau, i will check nouveau code
>>> for that.
>>
>> Again, we can't "kill" a thread of execution (this would often be an
>> interrupt bottom half context, btw) while it is, for example,
>> in the middle of migrate_vma.
> 
> You should not call migrate from bottom half ! Only call this from work
> queue like nouveau.

By "bottom half", I mean the kthread that we have running to handle work
that was handed off from the top half ISR. So we are in process context.
And we will need to do migrate_vma() from there.

> 
>>
>> I really don't believe there is a safe way to do this without draining
>> the existing operations before .release returns, and for that, we'll need to 
>> issue the .release callbacks while not holding locks.
> 
> down_write on mmap_sem would force serialization. I am not sure we want
> to do this change now. It can wait as it is definitly not an issue for
> nouveau yet. Taking mmap_sem in write (see oom in exit_mmap()) in release
> make me nervous.
> 

I'm not going to lose any sleep about when various fixes are made, as long as
we agree on problems and solution approaches, and fix them at some point.
I will note that our downstreamdriver will not be...well, completely usable, 
until we fix this, though.

thanks,
-- 
John Hubbard
NVIDIA
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-22  0:11               ` John Hubbard
@ 2018-03-22  1:32                 ` Jerome Glisse
  0 siblings, 0 replies; 11+ messages in thread
From: Jerome Glisse @ 2018-03-22  1:32 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On Wed, Mar 21, 2018 at 05:11:10PM -0700, John Hubbard wrote:
> On 03/21/2018 04:37 PM, Jerome Glisse wrote:
> > On Wed, Mar 21, 2018 at 04:10:32PM -0700, John Hubbard wrote:
> >> On 03/21/2018 03:46 PM, Jerome Glisse wrote:
> >>> On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
> >>>> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> >>>>> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
> >>>>>> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> >>>>>>> From: Ralph Campbell <rcampbell@nvidia.com>
> > 
> > [...]
> > 
> >>>>> That is just illegal, the release callback is not allowed to trigger
> >>>>> invalidation all it does is kill all device's threads and stop device
> >>>>> page fault from happening. So there is no deadlock issues. I can re-
> >>>>> inforce the comment some more (see [1] for example on what it should
> >>>>> be).
> >>>>
> >>>> That rule is fine, and it is true that the .release callback will not 
> >>>> directly trigger any invalidations. However, the problem is in letting 
> >>>> any *existing* outstanding operations finish up. We have to let 
> >>>> existing operations "drain", in order to meet the requirement that 
> >>>> everything is done when .release returns.
> >>>>
> >>>> For example, if a device driver thread is in the middle of working through
> >>>> its fault buffer, it will call migrate_vma(), which will in turn unmap
> >>>> pages. That will cause an hmm_invalidate_range() callback, which tries
> >>>> to take hmm->mirrors_sems, and we deadlock.
> >>>>
> >>>> There's no way to "kill" such a thread while it's in the middle of
> >>>> migrate_vma(), you have to let it finish up.
> >>>>
> >>>>> Also it is illegal for the sync callback to trigger any mmu_notifier
> >>>>> callback. I thought this was obvious. The sync callback should only
> >>>>> update device page table and do _nothing else_. No way to make this
> >>>>> re-entrant.
> >>>>
> >>>> That is obvious, yes. I am not trying to say there is any problem with
> >>>> that rule. It's the "drain outstanding operations during .release", 
> >>>> above, that is the real problem.
> >>>
> >>> Maybe just relax the release callback wording, it should stop any
> >>> more processing of fault buffer but not wait for it to finish. In
> >>> nouveau code i kill thing but i do not wait hence i don't deadlock.
> >>
> >> But you may crash, because that approach allows .release to finish
> >> up, thus removing the mm entirely, out from under (for example)
> >> a migrate_vma call--or any other call that refers to the mm.
> > 
> > No you can not crash on mm as it will not vanish before you are done
> > with it as mm will not be freed before you call hmm_unregister() and
> > you should not call that from release, nor should you call it before
> > everything is flush. However vma struct might vanish ... i might have
> > assume wrongly about the down_write() always happening in exit_mmap()
> > This might be a solution to force serialization.
> > 
>  
> OK. My details on mm destruction were inaccurate, but we do agree now
> that that the whole virtual address space is being torn down at the same 
> time as we're trying to use it, so I think we're on the same page now.
> 
> >>
> >> It doesn't seem too hard to avoid the problem, though: maybe we
> >> can just drop the lock while doing the mirror->ops->release callback.
> >> There are a few ways to do this, but one example is: 
> >>
> >>     -- take the lock,
> >>         -- copy the list to a local list, deleting entries as you go,
> >>     -- drop the lock, 
> >>     -- iterate through the local list copy and 
> >>         -- issue the mirror->ops->release callbacks.
> >>
> >> At this point, more items could have been added to the list, so repeat
> >> the above until the original list is empty. 
> >>
> >> This is subject to a limited starvation case if mirror keep getting 
> >> registered, but I think we can ignore that, because it only lasts as long as 
> >> mirrors keep getting added, and then it finishes up.
> > 
> > The down_write is better solution and easier just 2 line of code.
> 
> OK. I'll have a better idea when I see it.
> 
> > 
> >>
> >>>
> >>> What matter is to stop any further processing. Yes some fault might
> >>> be in flight but they will serialize on various lock. 
> >>
> >> Those faults in flight could already be at a point where they have taken
> >> whatever locks they need, so we don't dare let the mm get destroyed while
> >> such fault handling is in progress.
> > 
> > mm can not vanish until hmm_unregister() is call, vma will vanish before.
> 
> OK, yes. And we agree that vma vanishing is a problem. 
> 
> > 
> >> So just do not
> >>> wait in the release callback, kill thing. I might have a bug where i
> >>> still fill in GPU page table in nouveau, i will check nouveau code
> >>> for that.
> >>
> >> Again, we can't "kill" a thread of execution (this would often be an
> >> interrupt bottom half context, btw) while it is, for example,
> >> in the middle of migrate_vma.
> > 
> > You should not call migrate from bottom half ! Only call this from work
> > queue like nouveau.
> 
> By "bottom half", I mean the kthread that we have running to handle work
> that was handed off from the top half ISR. So we are in process context.
> And we will need to do migrate_vma() from there.
> 
> > 
> >>
> >> I really don't believe there is a safe way to do this without draining
> >> the existing operations before .release returns, and for that, we'll need to 
> >> issue the .release callbacks while not holding locks.
> > 
> > down_write on mmap_sem would force serialization. I am not sure we want
> > to do this change now. It can wait as it is definitly not an issue for
> > nouveau yet. Taking mmap_sem in write (see oom in exit_mmap()) in release
> > make me nervous.
> > 
> 
> I'm not going to lose any sleep about when various fixes are made, as long as
> we agree on problems and solution approaches, and fix them at some point.
> I will note that our downstreamdriver will not be...well, completely usable, 
> until we fix this, though.
> 

So i posted updated patch for 3 and 4 that should address your concern.
Testing done with them and nouveau seems to work ok. I am hopping this
address all your concerns.

Cheers,
Jï¿½rï¿½me

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-22  1:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180320020038.3360-1-jglisse@redhat.com>
2018-03-20  2:00 ` [PATCH 02/15] mm/hmm: fix header file if/else/endif maze v2 jglisse
2018-03-20  2:00 ` [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
2018-03-21  4:14   ` John Hubbard
2018-03-21 18:03     ` Jerome Glisse
2018-03-21 22:16       ` John Hubbard
2018-03-21 22:46         ` Jerome Glisse
2018-03-21 23:10           ` John Hubbard
2018-03-21 23:37             ` Jerome Glisse
2018-03-22  0:11               ` John Hubbard
2018-03-22  1:32                 ` Jerome Glisse
2018-03-20  2:00 ` [PATCH 05/15] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).