* [PATCH 02/14] mm/hmm: fix header file if/else/endif maze [not found] <20180316191414.3223-1-jglisse@redhat.com> @ 2018-03-16 19:14 ` jglisse 2018-03-16 21:09 ` Andrew Morton 2018-03-17 1:20 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2 jglisse 2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse 2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse 2 siblings, 2 replies; 15+ messages in thread From: jglisse @ 2018-03-16 19:14 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable, Ralph Campbell, John Hubbard, Evgeny Baskakov From: Jérôme Glisse <jglisse@redhat.com> The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> --- include/linux/hmm.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 325017ad9311..ef6044d08cc5 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -498,6 +498,9 @@ struct hmm_device { struct hmm_device *hmm_device_new(void *drvdata); void hmm_device_put(struct hmm_device *hmm_device); #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */ +#else /* IS_ENABLED(CONFIG_HMM) */ +static inline void hmm_mm_destroy(struct mm_struct *mm) {} +static inline void hmm_mm_init(struct mm_struct *mm) {} #endif /* IS_ENABLED(CONFIG_HMM) */ /* Below are for HMM internal use only! Not to be used by device driver! */ @@ -513,8 +516,4 @@ static inline void hmm_mm_destroy(struct mm_struct *mm) {} static inline void hmm_mm_init(struct mm_struct *mm) {} #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */ - -#else /* IS_ENABLED(CONFIG_HMM) */ -static inline void hmm_mm_destroy(struct mm_struct *mm) {} -static inline void hmm_mm_init(struct mm_struct *mm) {} #endif /* LINUX_HMM_H */ -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze 2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse @ 2018-03-16 21:09 ` Andrew Morton 2018-03-16 21:18 ` Jerome Glisse 2018-03-17 1:20 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2 jglisse 1 sibling, 1 reply; 15+ messages in thread From: Andrew Morton @ 2018-03-16 21:09 UTC (permalink / raw) To: jglisse Cc: linux-mm, linux-kernel, stable, Ralph Campbell, John Hubbard, Evgeny Baskakov On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. "were wrong" is not a sufficient explanation of the problem, especially if we're requesting a -stable backport. Please fully describe the effects of a bug when fixing it? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze 2018-03-16 21:09 ` Andrew Morton @ 2018-03-16 21:18 ` Jerome Glisse 2018-03-16 21:35 ` Andrew Morton 0 siblings, 1 reply; 15+ messages in thread From: Jerome Glisse @ 2018-03-16 21:18 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, stable, Ralph Campbell, John Hubbard, Evgeny Baskakov On Fri, Mar 16, 2018 at 02:09:59PM -0700, Andrew Morton wrote: > On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote: > > > From: J�r�me Glisse <jglisse@redhat.com> > > > > The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. > > "were wrong" is not a sufficient explanation of the problem, especially > if we're requesting a -stable backport. Please fully describe the > effects of a bug when fixing it? Build issue (compilation failure) if you have multiple includes of hmm.h through different headers is the most obvious issue. So it will be very obvious with any big driver that include the file in different headers. I can respin with that. Sorry again for not being more explanatory it is always hard for me to figure what is not obvious to others. Cheers, J�r�me ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze 2018-03-16 21:18 ` Jerome Glisse @ 2018-03-16 21:35 ` Andrew Morton 2018-03-16 21:40 ` John Hubbard 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2018-03-16 21:35 UTC (permalink / raw) To: Jerome Glisse Cc: linux-mm, linux-kernel, stable, Ralph Campbell, John Hubbard, Evgeny Baskakov On Fri, 16 Mar 2018 17:18:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote: > On Fri, Mar 16, 2018 at 02:09:59PM -0700, Andrew Morton wrote: > > On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote: > > > > > From: Jérôme Glisse <jglisse@redhat.com> > > > > > > The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. > > > > "were wrong" is not a sufficient explanation of the problem, especially > > if we're requesting a -stable backport. Please fully describe the > > effects of a bug when fixing it? > > Build issue (compilation failure) if you have multiple includes of > hmm.h through different headers is the most obvious issue. So it > will be very obvious with any big driver that include the file in > different headers. That doesn't seem to warrant a -stable backport? The developer of such a driver will simply fix the headers? > I can respin with that. Sorry again for not being more explanatory > it is always hard for me to figure what is not obvious to others. I updated the changelog, no respin needed. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 02/14] mm/hmm: fix header file if/else/endif maze 2018-03-16 21:35 ` Andrew Morton @ 2018-03-16 21:40 ` John Hubbard 0 siblings, 0 replies; 15+ messages in thread From: John Hubbard @ 2018-03-16 21:40 UTC (permalink / raw) To: Andrew Morton, Jerome Glisse Cc: linux-mm, linux-kernel, stable, Ralph Campbell, Evgeny Baskakov On 03/16/2018 02:35 PM, Andrew Morton wrote: > On Fri, 16 Mar 2018 17:18:02 -0400 Jerome Glisse <jglisse@redhat.com> wrote: > >> On Fri, Mar 16, 2018 at 02:09:59PM -0700, Andrew Morton wrote: >>> On Fri, 16 Mar 2018 15:14:07 -0400 jglisse@redhat.com wrote: >>> >>>> From: Jérôme Glisse <jglisse@redhat.com> >>>> >>>> The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. >>> >>> "were wrong" is not a sufficient explanation of the problem, especially >>> if we're requesting a -stable backport. Please fully describe the >>> effects of a bug when fixing it? >> >> Build issue (compilation failure) if you have multiple includes of >> hmm.h through different headers is the most obvious issue. So it >> will be very obvious with any big driver that include the file in >> different headers. > > That doesn't seem to warrant a -stable backport? The developer of such > a driver will simply fix the headers? Right. For this patch, I would strongly request a -stable backport. It's really going to cause problems if anyone tries to use -stable with HMM, without this fix. thanks, -- John Hubbard NVIDIA > >> I can respin with that. Sorry again for not being more explanatory >> it is always hard for me to figure what is not obvious to others. > > I updated the changelog, no respin needed. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2 2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse 2018-03-16 21:09 ` Andrew Morton @ 2018-03-17 1:20 ` jglisse 1 sibling, 0 replies; 15+ messages in thread From: jglisse @ 2018-03-17 1:20 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable, Ralph Campbell, John Hubbard, Evgeny Baskakov From: Jérôme Glisse <jglisse@redhat.com> The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. Because of this after multiple include there was multiple definition of both hmm_mm_init() and hmm_mm_destroy() leading to build failure if HMM was enabled (CONFIG_HMM set). Changed since v1: - Fix the maze when CONFIG_HMM is disabled not just when it is enabled. This fix bot build failure. - Improved commit message. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> --- include/linux/hmm.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 325017ad9311..36dd21fe5caf 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -498,23 +498,16 @@ struct hmm_device { struct hmm_device *hmm_device_new(void *drvdata); void hmm_device_put(struct hmm_device *hmm_device); #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */ -#endif /* IS_ENABLED(CONFIG_HMM) */ /* Below are for HMM internal use only! Not to be used by device driver! */ -#if IS_ENABLED(CONFIG_HMM_MIRROR) void hmm_mm_destroy(struct mm_struct *mm); static inline void hmm_mm_init(struct mm_struct *mm) { mm->hmm = NULL; } -#else /* IS_ENABLED(CONFIG_HMM_MIRROR) */ -static inline void hmm_mm_destroy(struct mm_struct *mm) {} -static inline void hmm_mm_init(struct mm_struct *mm) {} -#endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */ - - #else /* IS_ENABLED(CONFIG_HMM) */ static inline void hmm_mm_destroy(struct mm_struct *mm) {} static inline void hmm_mm_init(struct mm_struct *mm) {} +#endif /* IS_ENABLED(CONFIG_HMM) */ #endif /* LINUX_HMM_H */ -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 [not found] <20180316191414.3223-1-jglisse@redhat.com> 2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse @ 2018-03-16 19:14 ` jglisse 2018-03-16 21:12 ` Andrew Morton 2018-03-17 2:36 ` John Hubbard 2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse 2 siblings, 2 replies; 15+ messages in thread From: jglisse @ 2018-03-16 19:14 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, linux-kernel, Ralph Campbell, Jérôme Glisse, stable, Evgeny Baskakov, Mark Hairgrove, John Hubbard From: Ralph Campbell <rcampbell@nvidia.com> The hmm_mirror_register() function registers a callback for when the CPU pagetable is modified. Normally, the device driver will call hmm_mirror_unregister() when the process using the device is finished. However, if the process exits uncleanly, the struct_mm can be destroyed with no warning to the device driver. Changed since v1: - dropped VM_BUG_ON() - cc stable Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: stable@vger.kernel.org Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> --- include/linux/hmm.h | 10 ++++++++++ mm/hmm.c | 18 +++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index ef6044d08cc5..61b0e1c05ee1 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -218,6 +218,16 @@ enum hmm_update_type { * @update: callback to update range on a device */ struct hmm_mirror_ops { + /* release() - release hmm_mirror + * + * @mirror: pointer to struct hmm_mirror + * + * This is called when the mm_struct is being released. + * The callback should make sure no references to the mirror occur + * after the callback returns. + */ + void (*release)(struct hmm_mirror *mirror); + /* sync_cpu_device_pagetables() - synchronize page tables * * @mirror: pointer to struct hmm_mirror diff --git a/mm/hmm.c b/mm/hmm.c index 320545b98ff5..6088fa6ed137 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -160,6 +160,21 @@ static void hmm_invalidate_range(struct hmm *hmm, up_read(&hmm->mirrors_sem); } +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct hmm *hmm = mm->hmm; + struct hmm_mirror *mirror; + struct hmm_mirror *mirror_next; + + down_write(&hmm->mirrors_sem); + list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) { + list_del_init(&mirror->list); + if (mirror->ops->release) + mirror->ops->release(mirror); + } + up_write(&hmm->mirrors_sem); +} + static void hmm_invalidate_range_start(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, @@ -185,6 +200,7 @@ static void hmm_invalidate_range_end(struct mmu_notifier *mn, } static const struct mmu_notifier_ops hmm_mmu_notifier_ops = { + .release = hmm_release, .invalidate_range_start = hmm_invalidate_range_start, .invalidate_range_end = hmm_invalidate_range_end, }; @@ -230,7 +246,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror) struct hmm *hmm = mirror->hmm; down_write(&hmm->mirrors_sem); - list_del(&mirror->list); + list_del_init(&mirror->list); up_write(&hmm->mirrors_sem); } EXPORT_SYMBOL(hmm_mirror_unregister); -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse @ 2018-03-16 21:12 ` Andrew Morton 2018-03-16 21:26 ` Jerome Glisse 2018-03-17 2:36 ` John Hubbard 1 sibling, 1 reply; 15+ messages in thread From: Andrew Morton @ 2018-03-16 21:12 UTC (permalink / raw) To: jglisse Cc: linux-mm, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov, Mark Hairgrove, John Hubbard On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote: > The hmm_mirror_register() function registers a callback for when > the CPU pagetable is modified. Normally, the device driver will > call hmm_mirror_unregister() when the process using the device is > finished. However, if the process exits uncleanly, the struct_mm > can be destroyed with no warning to the device driver. Again, what are the user-visible effects of the bug? Such info is needed when others review our request for a -stable backport. And the many people who review -stable patches for integration into their own kernel trees will want to understand the benefit of the patch to their users. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 2018-03-16 21:12 ` Andrew Morton @ 2018-03-16 21:26 ` Jerome Glisse 2018-03-16 21:37 ` John Hubbard 0 siblings, 1 reply; 15+ messages in thread From: Jerome Glisse @ 2018-03-16 21:26 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov, Mark Hairgrove, John Hubbard On Fri, Mar 16, 2018 at 02:12:21PM -0700, Andrew Morton wrote: > On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote: > > > The hmm_mirror_register() function registers a callback for when > > the CPU pagetable is modified. Normally, the device driver will > > call hmm_mirror_unregister() when the process using the device is > > finished. However, if the process exits uncleanly, the struct_mm > > can be destroyed with no warning to the device driver. > > Again, what are the user-visible effects of the bug? Such info is > needed when others review our request for a -stable backport. And the > many people who review -stable patches for integration into their own > kernel trees will want to understand the benefit of the patch to their > users. I have not had any issues in any of my own testing but nouveau driver is not as advance as the NVidia closed driver in respect to HMM inte- gration yet. If any issues they will happen between exit_mm() and exit_files() in do_exit() (kernel/exit.c) exit_mm() tear down the mm struct but without this callback the device driver might still be handling page fault and thus might potentialy tries to handle them against a dead mm_struct. So i am not sure what are the symptoms. To be fair there is no public driver using that part of HMM beside nouveau rfc patches. So at this point the impact on anybody is non existent. If anyone want to back- port nouveau HMM support once it make it upstream it will probably have to backport more things along the way. This is why i am not that aggressive on ccing stable so far. Cheers, J�r�me ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 2018-03-16 21:26 ` Jerome Glisse @ 2018-03-16 21:37 ` John Hubbard 0 siblings, 0 replies; 15+ messages in thread From: John Hubbard @ 2018-03-16 21:37 UTC (permalink / raw) To: Jerome Glisse, Andrew Morton Cc: linux-mm, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov, Mark Hairgrove On 03/16/2018 02:26 PM, Jerome Glisse wrote: > On Fri, Mar 16, 2018 at 02:12:21PM -0700, Andrew Morton wrote: >> On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote: >> >>> The hmm_mirror_register() function registers a callback for when >>> the CPU pagetable is modified. Normally, the device driver will >>> call hmm_mirror_unregister() when the process using the device is >>> finished. However, if the process exits uncleanly, the struct_mm >>> can be destroyed with no warning to the device driver. >> >> Again, what are the user-visible effects of the bug? Such info is >> needed when others review our request for a -stable backport. And the >> many people who review -stable patches for integration into their own >> kernel trees will want to understand the benefit of the patch to their >> users. > > I have not had any issues in any of my own testing but nouveau driver > is not as advance as the NVidia closed driver in respect to HMM inte- > gration yet. > > If any issues they will happen between exit_mm() and exit_files() in > do_exit() (kernel/exit.c) exit_mm() tear down the mm struct but without > this callback the device driver might still be handling page fault and > thus might potentialy tries to handle them against a dead mm_struct. > > So i am not sure what are the symptoms. To be fair there is no public > driver using that part of HMM beside nouveau rfc patches. So at this > point the impact on anybody is non existent. If anyone want to back- > port nouveau HMM support once it make it upstream it will probably > have to backport more things along the way. This is why i am not that > aggressive on ccing stable so far. The problem I'd like to avoid is: having a version of HMM in stable that is missing this new callback. And without it, once the driver starts doing actual concurrent operations, we can expect that the race condition will happen. It just seems unfortunate to have stable versions out there that would be exposed to this, when it only require a small patch to avoid it. On the other hand, it's also reasonable to claim that this is part of the evolving HMM feature, and as such, this new feature does not belong in stable. I'm not sure which argument carries more weight here. thanks, -- John Hubbard NVIDIA > > Cheers, > Jérôme > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse 2018-03-16 21:12 ` Andrew Morton @ 2018-03-17 2:36 ` John Hubbard 2018-03-17 3:47 ` John Hubbard 1 sibling, 1 reply; 15+ messages in thread From: John Hubbard @ 2018-03-17 2:36 UTC (permalink / raw) To: jglisse, linux-mm Cc: Andrew Morton, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov, Mark Hairgrove On 03/16/2018 12:14 PM, jglisse@redhat.com wrote: > From: Ralph Campbell <rcampbell@nvidia.com> > <snip> > +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) > +{ > + struct hmm *hmm = mm->hmm; > + struct hmm_mirror *mirror; > + struct hmm_mirror *mirror_next; > + > + down_write(&hmm->mirrors_sem); > + list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) { > + list_del_init(&mirror->list); > + if (mirror->ops->release) > + mirror->ops->release(mirror); > + } > + up_write(&hmm->mirrors_sem); > +} > + OK, as for actual code review: This part of the locking looks good. However, I think it can race against hmm_mirror_register(), because hmm_mirror_register() will just add a new mirror regardless. So: thread 1 thread 2 -------------- ----------------- hmm_release hmm_mirror_register down_write(&hmm->mirrors_sem); <blocked: waiting for sem> // deletes all list items up_write unblocked: adds new mirror ...so I think we need a way to back out of any pending hmm_mirror_register() calls, as part of the .release steps, right? It seems hard for the device driver, which could be inside of hmm_mirror_register(), to handle that. Especially considering that right now, hmm_mirror_register() will return success in this case--so there is no indication that anything is wrong. Maybe hmm_mirror_register() could return an error (and not add to the mirror list), in such a situation, how's that sound? thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 2018-03-17 2:36 ` John Hubbard @ 2018-03-17 3:47 ` John Hubbard 2018-03-17 4:39 ` John Hubbard 0 siblings, 1 reply; 15+ messages in thread From: John Hubbard @ 2018-03-17 3:47 UTC (permalink / raw) To: jglisse, linux-mm Cc: Andrew Morton, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov, Mark Hairgrove On 03/16/2018 07:36 PM, John Hubbard wrote: > On 03/16/2018 12:14 PM, jglisse@redhat.com wrote: >> From: Ralph Campbell <rcampbell@nvidia.com> >> > > <snip> > >> +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) >> +{ >> + struct hmm *hmm = mm->hmm; >> + struct hmm_mirror *mirror; >> + struct hmm_mirror *mirror_next; >> + >> + down_write(&hmm->mirrors_sem); >> + list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) { >> + list_del_init(&mirror->list); >> + if (mirror->ops->release) >> + mirror->ops->release(mirror); >> + } >> + up_write(&hmm->mirrors_sem); >> +} >> + > > OK, as for actual code review: > > This part of the locking looks good. However, I think it can race against > hmm_mirror_register(), because hmm_mirror_register() will just add a new > mirror regardless. > > So: > > thread 1 thread 2 > -------------- ----------------- > hmm_release hmm_mirror_register > down_write(&hmm->mirrors_sem); <blocked: waiting for sem> > // deletes all list items > up_write > unblocked: adds new mirror > > > ...so I think we need a way to back out of any pending hmm_mirror_register() > calls, as part of the .release steps, right? It seems hard for the device driver, > which could be inside of hmm_mirror_register(), to handle that. Especially considering > that right now, hmm_mirror_register() will return success in this case--so > there is no indication that anything is wrong. > > Maybe hmm_mirror_register() could return an error (and not add to the mirror list), > in such a situation, how's that sound? > In other words, I think this would help (not tested yet beyond a quick compile, but it's pretty simple): diff --git a/mm/hmm.c b/mm/hmm.c index 7ccca5478ea1..da39f8522dca 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -66,6 +66,7 @@ struct hmm { struct list_head mirrors; struct mmu_notifier mmu_notifier; struct rw_semaphore mirrors_sem; + bool shutting_down; }; /* @@ -99,6 +100,7 @@ static struct hmm *hmm_register(struct mm_struct *mm) INIT_LIST_HEAD(&hmm->ranges); spin_lock_init(&hmm->lock); hmm->mm = mm; + hmm->shutting_down = false; /* * We should only get here if hold the mmap_sem in write mode ie on @@ -167,6 +169,7 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) struct hmm_mirror *mirror_next; down_write(&hmm->mirrors_sem); + hmm->shutting_down = true; list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) { list_del_init(&mirror->list); if (mirror->ops->release) @@ -227,6 +230,10 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm) return -ENOMEM; down_write(&mirror->hmm->mirrors_sem); + if (mirror->hmm->shutting_down) { + up_write(&mirror->hmm->mirrors_sem); + return -ESRCH; + } list_add(&mirror->list, &mirror->hmm->mirrors); up_write(&mirror->hmm->mirrors_sem); thanks, -- John Hubbard NVIDIA ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 2018-03-17 3:47 ` John Hubbard @ 2018-03-17 4:39 ` John Hubbard 0 siblings, 0 replies; 15+ messages in thread From: John Hubbard @ 2018-03-17 4:39 UTC (permalink / raw) To: jglisse, linux-mm Cc: Andrew Morton, linux-kernel, Ralph Campbell, stable, Evgeny Baskakov, Mark Hairgrove On 03/16/2018 08:47 PM, John Hubbard wrote: > On 03/16/2018 07:36 PM, John Hubbard wrote: >> On 03/16/2018 12:14 PM, jglisse@redhat.com wrote: >>> From: Ralph Campbell <rcampbell@nvidia.com> >>> >> >> <snip> >> >>> +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) >>> +{ >>> + struct hmm *hmm = mm->hmm; >>> + struct hmm_mirror *mirror; >>> + struct hmm_mirror *mirror_next; >>> + >>> + down_write(&hmm->mirrors_sem); >>> + list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) { >>> + list_del_init(&mirror->list); >>> + if (mirror->ops->release) >>> + mirror->ops->release(mirror); >>> + } >>> + up_write(&hmm->mirrors_sem); >>> +} >>> + >> >> OK, as for actual code review: >> >> This part of the locking looks good. However, I think it can race against >> hmm_mirror_register(), because hmm_mirror_register() will just add a new >> mirror regardless. >> >> So: >> >> thread 1 thread 2 >> -------------- ----------------- >> hmm_release hmm_mirror_register >> down_write(&hmm->mirrors_sem); <blocked: waiting for sem> >> // deletes all list items >> up_write >> unblocked: adds new mirror >> >> Mark Hairgrove just pointed out some more fun facts: 1. Because hmm_mirror_register() needs to be called with an mm that has a non-zero refcount, you generally cannot get an hmm_release callback, so the above race should not happen. 2. We looked around, and the code is missing a call to mmu_notifier_unregister(). That means that it is going to leak memory and not let the mm get released either. Maybe having each mirror have its own mmu notifier callback is a possible way to solve this. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct [not found] <20180316191414.3223-1-jglisse@redhat.com> 2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse 2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse @ 2018-03-16 19:14 ` jglisse 2018-03-17 2:04 ` John Hubbard 2 siblings, 1 reply; 15+ messages in thread From: jglisse @ 2018-03-16 19:14 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, linux-kernel, Jérôme Glisse, stable, Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard From: Jérôme Glisse <jglisse@redhat.com> The private field of mm_walk struct point to an hmm_vma_walk struct and not to the hmm_range struct desired. Fix to get proper struct pointer. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: stable@vger.kernel.org Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> --- mm/hmm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/hmm.c b/mm/hmm.c index 6088fa6ed137..64d9e7dae712 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -293,7 +293,8 @@ static int hmm_pfns_bad(unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct hmm_range *range = walk->private; + struct hmm_vma_walk *hmm_vma_walk = walk->private; + struct hmm_range *range = hmm_vma_walk->range; hmm_pfn_t *pfns = range->pfns; unsigned long i; -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct 2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse @ 2018-03-17 2:04 ` John Hubbard 0 siblings, 0 replies; 15+ messages in thread From: John Hubbard @ 2018-03-17 2:04 UTC (permalink / raw) To: jglisse, linux-mm Cc: Andrew Morton, linux-kernel, stable, Evgeny Baskakov, Ralph Campbell, Mark Hairgrove On 03/16/2018 12:14 PM, jglisse@redhat.com wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > The private field of mm_walk struct point to an hmm_vma_walk struct and > not to the hmm_range struct desired. Fix to get proper struct pointer. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > Cc: stable@vger.kernel.org > Cc: Evgeny Baskakov <ebaskakov@nvidia.com> > Cc: Ralph Campbell <rcampbell@nvidia.com> > Cc: Mark Hairgrove <mhairgrove@nvidia.com> > Cc: John Hubbard <jhubbard@nvidia.com> > --- > mm/hmm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/hmm.c b/mm/hmm.c > index 6088fa6ed137..64d9e7dae712 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -293,7 +293,8 @@ static int hmm_pfns_bad(unsigned long addr, > unsigned long end, > struct mm_walk *walk) > { > - struct hmm_range *range = walk->private; > + struct hmm_vma_walk *hmm_vma_walk = walk->private; > + struct hmm_range *range = hmm_vma_walk->range; > hmm_pfn_t *pfns = range->pfns; > unsigned long i; > This fix looks good. I also checked the other uses of walk->private, of course, but it was only this one that was wrong. I think this patch also belongs in -stable, because it is a simple bug fix. For the description, well...actually, because ->range is the first element in struct hmm_vma_walk, you probably end up with the same pointer value, both before and after this fix. So maybe there are no symptoms to see. Maybe that's an argument for *not* putting it in -stable, too. I'll leave that question to more experienced people. Either way, you can add: Reviewed by: John Hubbard <jhubbard@nvidia.com> thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-03-17 4:39 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20180316191414.3223-1-jglisse@redhat.com>
2018-03-16 19:14 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze jglisse
2018-03-16 21:09 ` Andrew Morton
2018-03-16 21:18 ` Jerome Glisse
2018-03-16 21:35 ` Andrew Morton
2018-03-16 21:40 ` John Hubbard
2018-03-17 1:20 ` [PATCH 02/14] mm/hmm: fix header file if/else/endif maze v2 jglisse
2018-03-16 19:14 ` [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
2018-03-16 21:12 ` Andrew Morton
2018-03-16 21:26 ` Jerome Glisse
2018-03-16 21:37 ` John Hubbard
2018-03-17 2:36 ` John Hubbard
2018-03-17 3:47 ` John Hubbard
2018-03-17 4:39 ` John Hubbard
2018-03-16 19:14 ` [PATCH 04/14] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
2018-03-17 2:04 ` John Hubbard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox