* [PATCH] x86/mm: pgds getting out of sync after memory hot remove @ 2017-05-19 18:01 Jérôme Glisse 2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse 2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov 0 siblings, 2 replies; 6+ messages in thread From: Jérôme Glisse @ 2017-05-19 18:01 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: Jérôme Glisse, Kirill A. Shutemov, Andrew Morton, Ingo Molnar, Michal Hocko, Mel Gorman After memory hot remove it seems we do not synchronize pgds for kernel virtual memory range (on vmemmap_free()). This seems bogus to me as it means we are left with stall entry for process with mm != mm_init Yet i am puzzle by the fact that i am only now hitting this issue. It never was an issue with 4.12 or before ie HMM never triggered following BUG_ON inside sync_global_pgds(): if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref)); It seems that Kirill 5 level page table changes play a role in this behavior change. I could not bisect because HMM is painfull to rebase for each bisection step so that is just my best guess. Am i missing something here ? Am i wrong in assuming that should sync pgd on vmemmap_free() ? If so anyone have a good guess on why i am now seeing the above BUG_ON ? Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> JA(C)rA'me Glisse (1): x86/mm: synchronize pgd in vmemmap_free() arch/x86/mm/init_64.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) -- 2.4.11 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] x86/mm: synchronize pgd in vmemmap_free() 2017-05-19 18:01 [PATCH] x86/mm: pgds getting out of sync after memory hot remove Jérôme Glisse @ 2017-05-19 18:01 ` Jérôme Glisse 2017-05-20 0:34 ` John Hubbard 2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov 1 sibling, 1 reply; 6+ messages in thread From: Jérôme Glisse @ 2017-05-19 18:01 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: Jérôme Glisse, Kirill A. Shutemov, Andrew Morton, Ingo Molnar, Michal Hocko, Mel Gorman When we free kernel virtual map we should synchronize p4d/pud for all the pgds to avoid any stall entry in non canonical pgd. Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> --- arch/x86/mm/init_64.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index ff95fe8..df753f8 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -108,8 +108,6 @@ void sync_global_pgds(unsigned long start, unsigned long end) BUILD_BUG_ON(pgd_none(*pgd_ref)); p4d_ref = p4d_offset(pgd_ref, address); - if (p4d_none(*p4d_ref)) - continue; spin_lock(&pgd_lock); list_for_each_entry(page, &pgd_list, lru) { @@ -123,12 +121,16 @@ void sync_global_pgds(unsigned long start, unsigned long end) pgt_lock = &pgd_page_get_mm(page)->page_table_lock; spin_lock(pgt_lock); - if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) - BUG_ON(p4d_page_vaddr(*p4d) - != p4d_page_vaddr(*p4d_ref)); - - if (p4d_none(*p4d)) + if (p4d_none(*p4d_ref)) { set_p4d(p4d, *p4d_ref); + } else { + if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) + BUG_ON(p4d_page_vaddr(*p4d) + != p4d_page_vaddr(*p4d_ref)); + + if (p4d_none(*p4d)) + set_p4d(p4d, *p4d_ref); + } spin_unlock(pgt_lock); } @@ -1024,6 +1026,7 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct) void __ref vmemmap_free(unsigned long start, unsigned long end) { remove_pagetable(start, end, false); + sync_global_pgds(start, end - 1); } #ifdef CONFIG_MEMORY_HOTREMOVE -- 2.4.11 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm: synchronize pgd in vmemmap_free() 2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse @ 2017-05-20 0:34 ` John Hubbard 0 siblings, 0 replies; 6+ messages in thread From: John Hubbard @ 2017-05-20 0:34 UTC (permalink / raw) To: Jérôme Glisse, linux-kernel, linux-mm Cc: Kirill A. Shutemov, Andrew Morton, Ingo Molnar, Michal Hocko, Mel Gorman Hi Jerome, On 05/19/2017 11:01 AM, Jérôme Glisse wrote: > When we free kernel virtual map we should synchronize p4d/pud for > all the pgds to avoid any stall entry in non canonical pgd. "any stale entry in the non-canonical pgd", is what I think you meant to type there. Also, it would be nice to clarify that commit description a bit: I'm not sure what is meant here by a "non-canonical pgd". Also, it seems like the reshuffling of the internals of sync_global_pgds() deserves at least some mention here. More below. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Mel Gorman <mgorman@suse.de> > --- > arch/x86/mm/init_64.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index ff95fe8..df753f8 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -108,8 +108,6 @@ void sync_global_pgds(unsigned long start, unsigned long end) > BUILD_BUG_ON(pgd_none(*pgd_ref)); > p4d_ref = p4d_offset(pgd_ref, address); > > - if (p4d_none(*p4d_ref)) > - continue; > > spin_lock(&pgd_lock); > list_for_each_entry(page, &pgd_list, lru) { > @@ -123,12 +121,16 @@ void sync_global_pgds(unsigned long start, unsigned long end) > pgt_lock = &pgd_page_get_mm(page)->page_table_lock; > spin_lock(pgt_lock); > > - if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) > - BUG_ON(p4d_page_vaddr(*p4d) > - != p4d_page_vaddr(*p4d_ref)); > - > - if (p4d_none(*p4d)) > + if (p4d_none(*p4d_ref)) { > set_p4d(p4d, *p4d_ref); Is the intention really to set p4d to a zeroed *p4d_ref, or is that a mistake? > + } else { > + if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) I think the code needs to be somewhat restructured, but as it stands, the above !p4d_none(*p4d_ref) will always be true, because first part of the if/else checked for the opposite case: p4d_none(*p4d_ref). This is a side effect of moving that block of code. > + BUG_ON(p4d_page_vaddr(*p4d) > + != p4d_page_vaddr(*p4d_ref)); > + > + if (p4d_none(*p4d)) > + set_p4d(p4d, *p4d_ref); > + } > > spin_unlock(pgt_lock); > } > @@ -1024,6 +1026,7 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct) > void __ref vmemmap_free(unsigned long start, unsigned long end) > { > remove_pagetable(start, end, false); > + sync_global_pgds(start, end - 1); This does fix the HMM crash that I was seeing in hmm-next. thanks, John Hubbard NVIDIA > } > > #ifdef CONFIG_MEMORY_HOTREMOVE > -- > 2.4.11 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm: pgds getting out of sync after memory hot remove 2017-05-19 18:01 [PATCH] x86/mm: pgds getting out of sync after memory hot remove Jérôme Glisse 2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse @ 2017-05-22 13:12 ` Kirill A. Shutemov 2017-05-22 14:11 ` Jerome Glisse 1 sibling, 1 reply; 6+ messages in thread From: Kirill A. Shutemov @ 2017-05-22 13:12 UTC (permalink / raw) To: Jérôme Glisse Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Andrew Morton, Ingo Molnar, Michal Hocko, Mel Gorman On Fri, May 19, 2017 at 02:01:26PM -0400, Jerome Glisse wrote: > After memory hot remove it seems we do not synchronize pgds for kernel > virtual memory range (on vmemmap_free()). This seems bogus to me as it > means we are left with stall entry for process with mm != mm_init > > Yet i am puzzle by the fact that i am only now hitting this issue. It > never was an issue with 4.12 or before ie HMM never triggered following > BUG_ON inside sync_global_pgds(): > > if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) > BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref)); > > > It seems that Kirill 5 level page table changes play a role in this > behavior change. I could not bisect because HMM is painfull to rebase > for each bisection step so that is just my best guess. > > > Am i missing something here ? Am i wrong in assuming that should sync > pgd on vmemmap_free() ? If so anyone have a good guess on why i am now > seeing the above BUG_ON ? What would we gain by syncing pgd on free? Stale pgds are fine as long as they are not referenced (use-after-free case). Syncing is addtional work. See af2cf278ef4f ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()") and 5372e155a28f ("x86/mm: Drop unused argument 'removed' from sync_global_pgds()"). -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm: pgds getting out of sync after memory hot remove 2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov @ 2017-05-22 14:11 ` Jerome Glisse 2017-05-22 14:29 ` Kirill A. Shutemov 0 siblings, 1 reply; 6+ messages in thread From: Jerome Glisse @ 2017-05-22 14:11 UTC (permalink / raw) To: Kirill A. Shutemov Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Andrew Morton, Ingo Molnar, Michal Hocko, Mel Gorman On Mon, May 22, 2017 at 04:12:15PM +0300, Kirill A. Shutemov wrote: > On Fri, May 19, 2017 at 02:01:26PM -0400, Jerome Glisse wrote: > > After memory hot remove it seems we do not synchronize pgds for kernel > > virtual memory range (on vmemmap_free()). This seems bogus to me as it > > means we are left with stall entry for process with mm != mm_init > > > > Yet i am puzzle by the fact that i am only now hitting this issue. It > > never was an issue with 4.12 or before ie HMM never triggered following > > BUG_ON inside sync_global_pgds(): > > > > if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) > > BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref)); > > > > > > It seems that Kirill 5 level page table changes play a role in this > > behavior change. I could not bisect because HMM is painfull to rebase > > for each bisection step so that is just my best guess. > > > > > > Am i missing something here ? Am i wrong in assuming that should sync > > pgd on vmemmap_free() ? If so anyone have a good guess on why i am now > > seeing the above BUG_ON ? > > What would we gain by syncing pgd on free? Stale pgds are fine as long as > they are not referenced (use-after-free case). Syncing is addtional work. Well then how do i avoid the BUG_ON above ? Because the init_mm pgd is clear but none of the stall entry in any other mm. So if i unplug memory and replug memory at exact same address it tries to allocate new p4d/pud for struct page area and then when sync_global_pgds() is call it goes over the list of pgd and BUG_ON() : if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref)); So to me either above check need to go and we should overwritte pgd no matter what or we should restore previous behavior. I don't mind either one. Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm: pgds getting out of sync after memory hot remove 2017-05-22 14:11 ` Jerome Glisse @ 2017-05-22 14:29 ` Kirill A. Shutemov 0 siblings, 0 replies; 6+ messages in thread From: Kirill A. Shutemov @ 2017-05-22 14:29 UTC (permalink / raw) Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Andrew Morton, Michal Hocko, Mel Gorman On Mon, May 22, 2017 at 10:11:51AM -0400, Jerome Glisse wrote: > On Mon, May 22, 2017 at 04:12:15PM +0300, Kirill A. Shutemov wrote: > > On Fri, May 19, 2017 at 02:01:26PM -0400, Jerome Glisse wrote: > > > After memory hot remove it seems we do not synchronize pgds for kernel > > > virtual memory range (on vmemmap_free()). This seems bogus to me as it > > > means we are left with stall entry for process with mm != mm_init > > > > > > Yet i am puzzle by the fact that i am only now hitting this issue. It > > > never was an issue with 4.12 or before ie HMM never triggered following > > > BUG_ON inside sync_global_pgds(): > > > > > > if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) > > > BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref)); > > > > > > > > > It seems that Kirill 5 level page table changes play a role in this > > > behavior change. I could not bisect because HMM is painfull to rebase > > > for each bisection step so that is just my best guess. > > > > > > > > > Am i missing something here ? Am i wrong in assuming that should sync > > > pgd on vmemmap_free() ? If so anyone have a good guess on why i am now > > > seeing the above BUG_ON ? > > > > What would we gain by syncing pgd on free? Stale pgds are fine as long as > > they are not referenced (use-after-free case). Syncing is addtional work. > > Well then how do i avoid the BUG_ON above ? Because the init_mm pgd is > clear but none of the stall entry in any other mm. So if i unplug memory > and replug memory at exact same address it tries to allocate new p4d/pud > for struct page area and then when sync_global_pgds() is call it goes > over the list of pgd and BUG_ON() : > > if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) > BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref)); > > > So to me either above check need to go and we should overwritte pgd no > matter what or we should restore previous behavior. I don't mind either > one. I would prefer to drop the BUG_ON. Ingo? -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-05-22 14:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-19 18:01 [PATCH] x86/mm: pgds getting out of sync after memory hot remove Jérôme Glisse 2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse 2017-05-20 0:34 ` John Hubbard 2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov 2017-05-22 14:11 ` Jerome Glisse 2017-05-22 14:29 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).