[PATCH 02/14] mm/hmm: fix header file if/else/endif maze

Linux kernel -stable discussions
 help / color / mirror / Atom feed

* [PATCH 02/14] mm/hmm: fix header file if/else/endif maze
       [not found] <20180316191414.3223-1-jglisse@redhat.com>
@ 2018-03-16 19:14 ` jglisse
  2018-03-16 21:09   ` Andrew Morton
  2018-03-17  1:20   ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2 jglisse
  2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
  2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
  2 siblings, 2 replies; 15+ messages in thread
From: jglisse @ 2018-03-16 19:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable,
	Ralph Campbell, John Hubbard, Evgeny Baskakov

From: Jérôme Glisse <jglisse@redhat.com>

The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
---
 include/linux/hmm.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 325017ad9311..ef6044d08cc5 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -498,6 +498,9 @@ struct hmm_device {
 struct hmm_device *hmm_device_new(void *drvdata);
 void hmm_device_put(struct hmm_device *hmm_device);
 #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
+#else /* IS_ENABLED(CONFIG_HMM) */
+static inline void hmm_mm_destroy(struct mm_struct *mm) {}
+static inline void hmm_mm_init(struct mm_struct *mm) {}
 #endif /* IS_ENABLED(CONFIG_HMM) */
 
 /* Below are for HMM internal use only! Not to be used by device driver! */
@@ -513,8 +516,4 @@ static inline void hmm_mm_destroy(struct mm_struct *mm) {}
 static inline void hmm_mm_init(struct mm_struct *mm) {}
 #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
 
-
-#else /* IS_ENABLED(CONFIG_HMM) */
-static inline void hmm_mm_destroy(struct mm_struct *mm) {}
-static inline void hmm_mm_init(struct mm_struct *mm) {}
 #endif /* LINUX_HMM_H */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze
  2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse
@ 2018-03-16 21:09   ` Andrew Morton
  2018-03-16 21:18     ` Jerome Glisse
  2018-03-17  1:20   ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2 jglisse
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-03-16 21:09 UTC (permalink / raw)
  To: jglisse
  Cc: linux-mm, linux-kernel, stable, Ralph Campbell, John Hubbard,
	Evgeny Baskakov

On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote:

> From: Jérôme Glisse <jglisse@redhat.com>
> 
> The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong.

"were wrong" is not a sufficient explanation of the problem, especially
if we're requesting a -stable backport.  Please fully describe the
effects of a bug when fixing it?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze
  2018-03-16 21:09   ` Andrew Morton
@ 2018-03-16 21:18     ` Jerome Glisse
  2018-03-16 21:35       ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Jerome Glisse @ 2018-03-16 21:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, stable, Ralph Campbell, John Hubbard,
	Evgeny Baskakov

On Fri, Mar 16, 2018 at 02:09:59PM -0700, Andrew Morton wrote:
> On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote:
> 
> > From: Jï¿½rï¿½me Glisse <jglisse@redhat.com>
> > 
> > The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong.
> 
> "were wrong" is not a sufficient explanation of the problem, especially
> if we're requesting a -stable backport.  Please fully describe the
> effects of a bug when fixing it?

Build issue (compilation failure) if you have multiple includes of
hmm.h through different headers is the most obvious issue. So it
will be very obvious with any big driver that include the file in
different headers.

I can respin with that. Sorry again for not being more explanatory
it is always hard for me to figure what is not obvious to others.

Cheers,
Jï¿½rï¿½me

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze
  2018-03-16 21:18     ` Jerome Glisse
@ 2018-03-16 21:35       ` Andrew Morton
  2018-03-16 21:40         ` John Hubbard
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-03-16 21:35 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-mm, linux-kernel, stable, Ralph Campbell, John Hubbard,
	Evgeny Baskakov

On Fri, 16 Mar 2018 17:18:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:

> On Fri, Mar 16, 2018 at 02:09:59PM -0700, Andrew Morton wrote:
> > On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote:
> > 
> > > From: Jérôme Glisse <jglisse@redhat.com>
> > > 
> > > The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong.
> > 
> > "were wrong" is not a sufficient explanation of the problem, especially
> > if we're requesting a -stable backport.  Please fully describe the
> > effects of a bug when fixing it?
> 
> Build issue (compilation failure) if you have multiple includes of
> hmm.h through different headers is the most obvious issue. So it
> will be very obvious with any big driver that include the file in
> different headers.

That doesn't seem to warrant a -stable backport?  The developer of such
a driver will simply fix the headers?

> I can respin with that. Sorry again for not being more explanatory
> it is always hard for me to figure what is not obvious to others.

I updated the changelog, no respin needed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze
  2018-03-16 21:35       ` Andrew Morton
@ 2018-03-16 21:40         ` John Hubbard
  0 siblings, 0 replies; 15+ messages in thread
From: John Hubbard @ 2018-03-16 21:40 UTC (permalink / raw)
  To: Andrew Morton, Jerome Glisse
  Cc: linux-mm, linux-kernel, stable, Ralph Campbell, Evgeny Baskakov

On 03/16/2018 02:35 PM, Andrew Morton wrote:
> On Fri, 16 Mar 2018 17:18:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote:
> 
>> On Fri, Mar 16, 2018 at 02:09:59PM -0700, Andrew Morton wrote:
>>> On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote:
>>>
>>>> From: Jérôme Glisse <jglisse@redhat.com>
>>>>
>>>> The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong.
>>>
>>> "were wrong" is not a sufficient explanation of the problem, especially
>>> if we're requesting a -stable backport.  Please fully describe the
>>> effects of a bug when fixing it?
>>
>> Build issue (compilation failure) if you have multiple includes of
>> hmm.h through different headers is the most obvious issue. So it
>> will be very obvious with any big driver that include the file in
>> different headers.
> 
> That doesn't seem to warrant a -stable backport?  The developer of such
> a driver will simply fix the headers?

Right. For this patch, I would strongly request a -stable backport.  It's 
really going to cause problems if anyone tries to use -stable with HMM,
without this fix.

thanks,
-- 
John Hubbard
NVIDIA

> 
>> I can respin with that. Sorry again for not being more explanatory
>> it is always hard for me to figure what is not obvious to others.
> 
> I updated the changelog, no respin needed.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2
  2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse
  2018-03-16 21:09   ` Andrew Morton
@ 2018-03-17  1:20   ` jglisse
  1 sibling, 0 replies; 15+ messages in thread
From: jglisse @ 2018-03-17  1:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable,
	Ralph Campbell, John Hubbard, Evgeny Baskakov

From: Jérôme Glisse <jglisse@redhat.com>

The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. Because
of this after multiple include there was multiple definition of both
hmm_mm_init() and hmm_mm_destroy() leading to build failure if HMM
was enabled (CONFIG_HMM set).

Changed since v1:
  - Fix the maze when CONFIG_HMM is disabled not just when it is
    enabled. This fix bot build failure.
  - Improved commit message.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
---
 include/linux/hmm.h | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 325017ad9311..36dd21fe5caf 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -498,23 +498,16 @@ struct hmm_device {
 struct hmm_device *hmm_device_new(void *drvdata);
 void hmm_device_put(struct hmm_device *hmm_device);
 #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-#endif /* IS_ENABLED(CONFIG_HMM) */
 
 /* Below are for HMM internal use only! Not to be used by device driver! */
-#if IS_ENABLED(CONFIG_HMM_MIRROR)
 void hmm_mm_destroy(struct mm_struct *mm);
 
 static inline void hmm_mm_init(struct mm_struct *mm)
 {
 	mm->hmm = NULL;
 }
-#else /* IS_ENABLED(CONFIG_HMM_MIRROR) */
-static inline void hmm_mm_destroy(struct mm_struct *mm) {}
-static inline void hmm_mm_init(struct mm_struct *mm) {}
-#endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
-
-
 #else /* IS_ENABLED(CONFIG_HMM) */
 static inline void hmm_mm_destroy(struct mm_struct *mm) {}
 static inline void hmm_mm_init(struct mm_struct *mm) {}
+#endif /* IS_ENABLED(CONFIG_HMM) */
 #endif /* LINUX_HMM_H */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2
       [not found] <20180316191414.3223-1-jglisse@redhat.com>
  2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse
@ 2018-03-16 19:14 ` jglisse
  2018-03-16 21:12   ` Andrew Morton
  2018-03-17  2:36   ` John Hubbard
  2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
  2 siblings, 2 replies; 15+ messages in thread
From: jglisse @ 2018-03-16 19:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Ralph Campbell,
	Jérôme Glisse, stable, Evgeny Baskakov, Mark Hairgrove,
	John Hubbard

From: Ralph Campbell <rcampbell@nvidia.com>

The hmm_mirror_register() function registers a callback for when
the CPU pagetable is modified. Normally, the device driver will
call hmm_mirror_unregister() when the process using the device is
finished. However, if the process exits uncleanly, the struct_mm
can be destroyed with no warning to the device driver.

Changed since v1:
  - dropped VM_BUG_ON()
  - cc stable

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h | 10 ++++++++++
 mm/hmm.c            | 18 +++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index ef6044d08cc5..61b0e1c05ee1 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -218,6 +218,16 @@ enum hmm_update_type {
  * @update: callback to update range on a device
  */
 struct hmm_mirror_ops {
+	/* release() - release hmm_mirror
+	 *
+	 * @mirror: pointer to struct hmm_mirror
+	 *
+	 * This is called when the mm_struct is being released.
+	 * The callback should make sure no references to the mirror occur
+	 * after the callback returns.
+	 */
+	void (*release)(struct hmm_mirror *mirror);
+
 	/* sync_cpu_device_pagetables() - synchronize page tables
 	 *
 	 * @mirror: pointer to struct hmm_mirror
diff --git a/mm/hmm.c b/mm/hmm.c
index 320545b98ff5..6088fa6ed137 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -160,6 +160,21 @@ static void hmm_invalidate_range(struct hmm *hmm,
 	up_read(&hmm->mirrors_sem);
 }
 
+static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct hmm *hmm = mm->hmm;
+	struct hmm_mirror *mirror;
+	struct hmm_mirror *mirror_next;
+
+	down_write(&hmm->mirrors_sem);
+	list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
+		list_del_init(&mirror->list);
+		if (mirror->ops->release)
+			mirror->ops->release(mirror);
+	}
+	up_write(&hmm->mirrors_sem);
+}
+
 static void hmm_invalidate_range_start(struct mmu_notifier *mn,
 				       struct mm_struct *mm,
 				       unsigned long start,
@@ -185,6 +200,7 @@ static void hmm_invalidate_range_end(struct mmu_notifier *mn,
 }
 
 static const struct mmu_notifier_ops hmm_mmu_notifier_ops = {
+	.release		= hmm_release,
 	.invalidate_range_start	= hmm_invalidate_range_start,
 	.invalidate_range_end	= hmm_invalidate_range_end,
 };
@@ -230,7 +246,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror)
 	struct hmm *hmm = mirror->hmm;
 
 	down_write(&hmm->mirrors_sem);
-	list_del(&mirror->list);
+	list_del_init(&mirror->list);
 	up_write(&hmm->mirrors_sem);
 }
 EXPORT_SYMBOL(hmm_mirror_unregister);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
@ 2018-03-16 21:12   ` Andrew Morton
  2018-03-16 21:26     ` Jerome Glisse
  2018-03-17  2:36   ` John Hubbard
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-03-16 21:12 UTC (permalink / raw)
  To: jglisse
  Cc: linux-mm, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov,
	Mark Hairgrove, John Hubbard

On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote:

> The hmm_mirror_register() function registers a callback for when
> the CPU pagetable is modified. Normally, the device driver will
> call hmm_mirror_unregister() when the process using the device is
> finished. However, if the process exits uncleanly, the struct_mm
> can be destroyed with no warning to the device driver.

Again, what are the user-visible effects of the bug?  Such info is
needed when others review our request for a -stable backport.  And the
many people who review -stable patches for integration into their own
kernel trees will want to understand the benefit of the patch to their
users.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-16 21:12   ` Andrew Morton
@ 2018-03-16 21:26     ` Jerome Glisse
  2018-03-16 21:37       ` John Hubbard
  0 siblings, 1 reply; 15+ messages in thread
From: Jerome Glisse @ 2018-03-16 21:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov,
	Mark Hairgrove, John Hubbard

On Fri, Mar 16, 2018 at 02:12:21PM -0700, Andrew Morton wrote:
> On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote:
> 
> > The hmm_mirror_register() function registers a callback for when
> > the CPU pagetable is modified. Normally, the device driver will
> > call hmm_mirror_unregister() when the process using the device is
> > finished. However, if the process exits uncleanly, the struct_mm
> > can be destroyed with no warning to the device driver.
> 
> Again, what are the user-visible effects of the bug?  Such info is
> needed when others review our request for a -stable backport.  And the
> many people who review -stable patches for integration into their own
> kernel trees will want to understand the benefit of the patch to their
> users.

I have not had any issues in any of my own testing but nouveau driver
is not as advance as the NVidia closed driver in respect to HMM inte-
gration yet.

If any issues they will happen between exit_mm() and exit_files() in
do_exit() (kernel/exit.c) exit_mm() tear down the mm struct but without
this callback the device driver might still be handling page fault and
thus might potentialy tries to handle them against a dead mm_struct.

So i am not sure what are the symptoms. To be fair there is no public
driver using that part of HMM beside nouveau rfc patches. So at this
point the impact on anybody is non existent. If anyone want to back-
port nouveau HMM support once it make it upstream it will probably
have to backport more things along the way. This is why i am not that
aggressive on ccing stable so far.

Cheers,
Jï¿½rï¿½me

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-16 21:26     ` Jerome Glisse
@ 2018-03-16 21:37       ` John Hubbard
  0 siblings, 0 replies; 15+ messages in thread
From: John Hubbard @ 2018-03-16 21:37 UTC (permalink / raw)
  To: Jerome Glisse, Andrew Morton
  Cc: linux-mm, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov,
	Mark Hairgrove

On 03/16/2018 02:26 PM, Jerome Glisse wrote:
> On Fri, Mar 16, 2018 at 02:12:21PM -0700, Andrew Morton wrote:
>> On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote:
>>
>>> The hmm_mirror_register() function registers a callback for when
>>> the CPU pagetable is modified. Normally, the device driver will
>>> call hmm_mirror_unregister() when the process using the device is
>>> finished. However, if the process exits uncleanly, the struct_mm
>>> can be destroyed with no warning to the device driver.
>>
>> Again, what are the user-visible effects of the bug?  Such info is
>> needed when others review our request for a -stable backport.  And the
>> many people who review -stable patches for integration into their own
>> kernel trees will want to understand the benefit of the patch to their
>> users.
> 
> I have not had any issues in any of my own testing but nouveau driver
> is not as advance as the NVidia closed driver in respect to HMM inte-
> gration yet.
> 
> If any issues they will happen between exit_mm() and exit_files() in
> do_exit() (kernel/exit.c) exit_mm() tear down the mm struct but without
> this callback the device driver might still be handling page fault and
> thus might potentialy tries to handle them against a dead mm_struct.
> 
> So i am not sure what are the symptoms. To be fair there is no public
> driver using that part of HMM beside nouveau rfc patches. So at this
> point the impact on anybody is non existent. If anyone want to back-
> port nouveau HMM support once it make it upstream it will probably
> have to backport more things along the way. This is why i am not that
> aggressive on ccing stable so far.

The problem I'd like to avoid is: having a version of HMM in stable that
is missing this new callback. And without it, once the driver starts doing
actual concurrent operations, we can expect that the race condition will
happen.

It just seems unfortunate to have stable versions out there that would
be exposed to this, when it only require a small patch to avoid it.

On the other hand, it's also reasonable to claim that this is part of the
evolving HMM feature, and as such, this new feature does not belong in
stable.  I'm not sure which argument carries more weight here.

thanks,
-- 
John Hubbard
NVIDIA

> 
> Cheers,
> Jérôme
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
  2018-03-16 21:12   ` Andrew Morton
@ 2018-03-17  2:36   ` John Hubbard
  2018-03-17  3:47     ` John Hubbard
  1 sibling, 1 reply; 15+ messages in thread
From: John Hubbard @ 2018-03-17  2:36 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On 03/16/2018 12:14 PM, jglisse@redhat.com wrote:
> From: Ralph Campbell <rcampbell@nvidia.com>
> 

<snip>

> +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +	struct hmm *hmm = mm->hmm;
> +	struct hmm_mirror *mirror;
> +	struct hmm_mirror *mirror_next;
> +
> +	down_write(&hmm->mirrors_sem);
> +	list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
> +		list_del_init(&mirror->list);
> +		if (mirror->ops->release)
> +			mirror->ops->release(mirror);
> +	}
> +	up_write(&hmm->mirrors_sem);
> +}
> +

OK, as for actual code review:

This part of the locking looks good. However, I think it can race against
hmm_mirror_register(), because hmm_mirror_register() will just add a new 
mirror regardless.

So:

thread 1                                      thread 2
--------------                                -----------------
hmm_release                                   hmm_mirror_register 
    down_write(&hmm->mirrors_sem);                <blocked: waiting for sem>
        // deletes all list items
    up_write
                                                  unblocked: adds new mirror
                                              

...so I think we need a way to back out of any pending hmm_mirror_register()
calls, as part of the .release steps, right? It seems hard for the device driver,
which could be inside of hmm_mirror_register(), to handle that. Especially considering
that right now, hmm_mirror_register() will return success in this case--so
there is no indication that anything is wrong.

Maybe hmm_mirror_register() could return an error (and not add to the mirror list),
in such a situation, how's that sound?

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-17  2:36   ` John Hubbard
@ 2018-03-17  3:47     ` John Hubbard
  2018-03-17  4:39       ` John Hubbard
  0 siblings, 1 reply; 15+ messages in thread
From: John Hubbard @ 2018-03-17  3:47 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On 03/16/2018 07:36 PM, John Hubbard wrote:
> On 03/16/2018 12:14 PM, jglisse@redhat.com wrote:
>> From: Ralph Campbell <rcampbell@nvidia.com>
>>
> 
> <snip>
> 
>> +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>> +{
>> +	struct hmm *hmm = mm->hmm;
>> +	struct hmm_mirror *mirror;
>> +	struct hmm_mirror *mirror_next;
>> +
>> +	down_write(&hmm->mirrors_sem);
>> +	list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
>> +		list_del_init(&mirror->list);
>> +		if (mirror->ops->release)
>> +			mirror->ops->release(mirror);
>> +	}
>> +	up_write(&hmm->mirrors_sem);
>> +}
>> +
> 
> OK, as for actual code review:
> 
> This part of the locking looks good. However, I think it can race against
> hmm_mirror_register(), because hmm_mirror_register() will just add a new 
> mirror regardless.
> 
> So:
> 
> thread 1                                      thread 2
> --------------                                -----------------
> hmm_release                                   hmm_mirror_register 
>     down_write(&hmm->mirrors_sem);                <blocked: waiting for sem>
>         // deletes all list items
>     up_write
>                                                   unblocked: adds new mirror
>                                               
> 
> ...so I think we need a way to back out of any pending hmm_mirror_register()
> calls, as part of the .release steps, right? It seems hard for the device driver,
> which could be inside of hmm_mirror_register(), to handle that. Especially considering
> that right now, hmm_mirror_register() will return success in this case--so
> there is no indication that anything is wrong.
> 
> Maybe hmm_mirror_register() could return an error (and not add to the mirror list),
> in such a situation, how's that sound?
> 

In other words, I think this would help (not tested yet beyond a quick compile,
but it's pretty simple):

diff --git a/mm/hmm.c b/mm/hmm.c
index 7ccca5478ea1..da39f8522dca 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -66,6 +66,7 @@ struct hmm {
        struct list_head        mirrors;
        struct mmu_notifier     mmu_notifier;
        struct rw_semaphore     mirrors_sem;
+       bool                    shutting_down;
 };
 
 /*
@@ -99,6 +100,7 @@ static struct hmm *hmm_register(struct mm_struct *mm)
        INIT_LIST_HEAD(&hmm->ranges);
        spin_lock_init(&hmm->lock);
        hmm->mm = mm;
+       hmm->shutting_down = false;
 
        /*
         * We should only get here if hold the mmap_sem in write mode ie on
@@ -167,6 +169,7 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
        struct hmm_mirror *mirror_next;
 
        down_write(&hmm->mirrors_sem);
+       hmm->shutting_down = true;
        list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
                list_del_init(&mirror->list);
                if (mirror->ops->release)
@@ -227,6 +230,10 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm)
                return -ENOMEM;
 
        down_write(&mirror->hmm->mirrors_sem);
+       if (mirror->hmm->shutting_down) {
+               up_write(&mirror->hmm->mirrors_sem);
+               return -ESRCH;
+       }
        list_add(&mirror->list, &mirror->hmm->mirrors);
        up_write(&mirror->hmm->mirrors_sem);


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2
  2018-03-17  3:47     ` John Hubbard
@ 2018-03-17  4:39       ` John Hubbard
  0 siblings, 0 replies; 15+ messages in thread
From: John Hubbard @ 2018-03-17  4:39 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Ralph Campbell, stable,
	Evgeny Baskakov, Mark Hairgrove

On 03/16/2018 08:47 PM, John Hubbard wrote:
> On 03/16/2018 07:36 PM, John Hubbard wrote:
>> On 03/16/2018 12:14 PM, jglisse@redhat.com wrote:
>>> From: Ralph Campbell <rcampbell@nvidia.com>
>>>
>>
>> <snip>
>>
>>> +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>>> +{
>>> +	struct hmm *hmm = mm->hmm;
>>> +	struct hmm_mirror *mirror;
>>> +	struct hmm_mirror *mirror_next;
>>> +
>>> +	down_write(&hmm->mirrors_sem);
>>> +	list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
>>> +		list_del_init(&mirror->list);
>>> +		if (mirror->ops->release)
>>> +			mirror->ops->release(mirror);
>>> +	}
>>> +	up_write(&hmm->mirrors_sem);
>>> +}
>>> +
>>
>> OK, as for actual code review:
>>
>> This part of the locking looks good. However, I think it can race against
>> hmm_mirror_register(), because hmm_mirror_register() will just add a new 
>> mirror regardless.
>>
>> So:
>>
>> thread 1                                      thread 2
>> --------------                                -----------------
>> hmm_release                                   hmm_mirror_register 
>>     down_write(&hmm->mirrors_sem);                <blocked: waiting for sem>
>>         // deletes all list items
>>     up_write
>>                                                   unblocked: adds new mirror
>>                                               
>>

Mark Hairgrove just pointed out some more fun facts:

1. Because hmm_mirror_register() needs to be called with an mm that has a non-zero
refcount, you generally cannot get an hmm_release callback, so the above race should
not happen.

2. We looked around, and the code is missing a call to mmu_notifier_unregister().
That means that it is going to leak memory and not let the mm get released either.

Maybe having each mirror have its own mmu notifier callback is a possible way
to solve this.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct
       [not found] <20180316191414.3223-1-jglisse@redhat.com>
  2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse
  2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
@ 2018-03-16 19:14 ` jglisse
  2018-03-17  2:04   ` John Hubbard
  2 siblings, 1 reply; 15+ messages in thread
From: jglisse @ 2018-03-16 19:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: Jérôme Glisse <jglisse@redhat.com>

The private field of mm_walk struct point to an hmm_vma_walk struct and
not to the hmm_range struct desired. Fix to get proper struct pointer.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 mm/hmm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 6088fa6ed137..64d9e7dae712 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -293,7 +293,8 @@ static int hmm_pfns_bad(unsigned long addr,
 			unsigned long end,
 			struct mm_walk *walk)
 {
-	struct hmm_range *range = walk->private;
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	hmm_pfn_t *pfns = range->pfns;
 	unsigned long i;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct
  2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
@ 2018-03-17  2:04   ` John Hubbard
  0 siblings, 0 replies; 15+ messages in thread
From: John Hubbard @ 2018-03-17  2:04 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, stable, Evgeny Baskakov,
	Ralph Campbell, Mark Hairgrove

On 03/16/2018 12:14 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> The private field of mm_walk struct point to an hmm_vma_walk struct and
> not to the hmm_range struct desired. Fix to get proper struct pointer.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: stable@vger.kernel.org
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/hmm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 6088fa6ed137..64d9e7dae712 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -293,7 +293,8 @@ static int hmm_pfns_bad(unsigned long addr,
>  			unsigned long end,
>  			struct mm_walk *walk)
>  {
> -	struct hmm_range *range = walk->private;
> +	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
>  	hmm_pfn_t *pfns = range->pfns;
>  	unsigned long i;
>  

This fix looks good. I also checked the other uses of walk->private, of course, 
but it was only this one that was wrong.

I think this patch also belongs in -stable, because it is a simple bug fix.

For the description, well...actually, because ->range is the first element in
struct hmm_vma_walk, you probably end up with the same pointer value, both
before and after this fix. So maybe there are no symptoms to see. Maybe that's
an argument for *not* putting it in -stable, too. I'll leave that question
to more experienced people.

Either way, you can add: 

Reviewed by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-17  4:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180316191414.3223-1-jglisse@redhat.com>
2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse
2018-03-16 21:09   ` Andrew Morton
2018-03-16 21:18     ` Jerome Glisse
2018-03-16 21:35       ` Andrew Morton
2018-03-16 21:40         ` John Hubbard
2018-03-17  1:20   ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2 jglisse
2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
2018-03-16 21:12   ` Andrew Morton
2018-03-16 21:26     ` Jerome Glisse
2018-03-16 21:37       ` John Hubbard
2018-03-17  2:36   ` John Hubbard
2018-03-17  3:47     ` John Hubbard
2018-03-17  4:39       ` John Hubbard
2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
2018-03-17  2:04   ` John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox