linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
Date: Thu, 1 Jun 2017 11:11:09 -0400	[thread overview]
Message-ID: <20170601151107.GC3961@redhat.com> (raw)
In-Reply-To: <CALCETrWhehYF-2pPxH5S6px=hi=MaTeO6OC7_Ro3MgfKpyBhxg@mail.gmail.com>

On Thu, Jun 01, 2017 at 07:38:25AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 1, 2017 at 7:33 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> > On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote:
> >> On Wed, May 31, 2017 at 8:03 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> >> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
> >> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside
> >> > sync_global_pgds() is wrong.
> >> >
> >> > This patch remove the BUG_ON() and unconditionaly update stall pgd
> >> > entries.
> >> >
> >> > Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> >> > Cc: Ingo Molnar <mingo@kernel.org>
> >> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> > ---
> >> >  arch/x86/mm/init_64.c | 7 +------
> >> >  1 file changed, 1 insertion(+), 6 deletions(-)
> >> >
> >> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> >> > index ff95fe8..36b9020 100644
> >> > --- a/arch/x86/mm/init_64.c
> >> > +++ b/arch/x86/mm/init_64.c
> >> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
> >> >                         pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
> >> >                         spin_lock(pgt_lock);
> >> >
> >> > -                       if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
> >> > -                               BUG_ON(p4d_page_vaddr(*p4d)
> >> > -                                      != p4d_page_vaddr(*p4d_ref));
> >> > -
> >> > -                       if (p4d_none(*p4d))
> >> > -                               set_p4d(p4d, *p4d_ref);
> >> > +                       set_p4d(p4d, *p4d_ref);
> >>
> >> If we have a mismatch in the vmalloc range, vmalloc_fault is going to
> >> screw up and we'll end up using incorrect page tables.
> >>
> >> What's causing the mismatch?  If you're hitting this BUG in practice,
> >> I suspect we have a bug elsewhere.
> >
> > No bug elsewhere, simply hotplug memory then hotremove same memory you
> > just hotplugged then hotplug it again and you will trigger this as on
> > the first hotplug we allocate p4d/pud for the struct pages area, then on
> > hot remove we free that memory and clear the p4d/pud in the mm_init pgd
> > but not in any of the other pgds.
> 
> That sounds like a bug to me.  Either we should remove the stale
> entries and fix all the attendant races, or we should unconditionally
> allocate second-highest-level kernel page tables in unremovable memory
> and never free them.  I prefer the latter even though it's slightly
> slower.
> 
> > So at that point the next hotplug
> > will trigger the BUG because of stall entries from the first hotplug.
> 
> By the time we have a pgd with an entry pointing off into the woods,
> we've already lost.  Removing the BUG just hides the problem.

Then why did you sign of on af2cf278ef4f9289f88504c3e03cb12f76027575
this is what introduced this change in behavior.

If i understand you correctly you want to avoid deallocating p4d/pud
directory page when hotremove happen ? But this happen in common non
arch specific code vmemmap_populate_basepages() thought we can make
x86 vmemmap_populate() code arch different thought.

So i am not sure how to proceed here. My first attempt was to undo
af2cf278ef4f9289f88504c3e03cb12f76027575 so that we keep all pgds
synchronize. No if we want special case p4d/pud allocation that's
a different approach all together.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-06-01 15:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 15:03 [PATCH] x86/mm: do not BUG_ON() on stall pgd entries Jérôme Glisse
2017-06-01  9:57 ` Kirill A. Shutemov
2017-06-01 13:59 ` Andy Lutomirski
2017-06-01 14:33   ` Jerome Glisse
2017-06-01 14:38     ` Andy Lutomirski
2017-06-01 15:11       ` Jerome Glisse [this message]
2017-06-01 18:38         ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170601151107.GC3961@redhat.com \
    --to=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).