linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: akpm@linux-foundation.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andy Lutomirski <luto@kernel.org>,
	Logan Gunthorpe <logang@deltatee.com>
Subject: Re: [PATCH] x86/mm/hotplug: fix BUG_ON() after hotremove by not freeing pud v2
Date: Sat, 24 Jun 2017 14:05:22 -0400	[thread overview]
Message-ID: <20170624180521.GA2830@redhat.com> (raw)
In-Reply-To: <20170624064559.3upsr2temhjlw2jb@gmail.com>

On Sat, Jun 24, 2017 at 08:45:59AM +0200, Ingo Molnar wrote:
> 
> * Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > On Wed, Jun 07, 2017 at 09:17:06PM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Jun 07, 2017 at 01:38:00PM -0400, Jerome Glisse wrote:
> > > > > On Wed, Jun 07, 2017 at 08:03:25PM +0300, Kirill A. Shutemov wrote:
> > > > > > On Wed, Jun 07, 2017 at 10:46:20AM -0400, jglisse@redhat.com wrote:
> > > > > > > From: Jerome Glisse <jglisse@redhat.com>
> > > > > > > 
> > > > > > > With commit af2cf278ef4f we no longer free pud so that we do not
> > > > > > > have synchronize all pgd on hotremove/vfree. But the new 5 level
> > > > > > > page table patchset reverted that for 4 level page table.
> > > > > > > 
> > > > > > > This patch restore af2cf278ef4f and disable free_pud() if we are
> > > > > > > in the 4 level page table case thus avoiding BUG_ON() after hot-
> > > > > > > remove.
> > > > > > > 
> > > > > > > af2cf278ef4f x86/mm/hotplug: Don't remove PGD entries in
> > > > > > > remove_pagetable()
> > > > > > > 
> > > > > > > Changed since v1:
> > > > > > >   - make free_pud() conditional on the number of page table
> > > > > > >     level
> > > > > > >   - improved commit message
> > > > > > > 
> > > > > > > Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> > > > > > > Cc: Andy Lutomirski <luto@kernel.org>
> > > > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > > > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > > > > > > thus we now trigger a BUG_ON() l128 in sync_global_pgds()
> > > > > > > >
> > > > > > > > This patch remove free_pud() like in af2cf278ef4f
> > > > > > > ---
> > > > > > >  arch/x86/mm/init_64.c | 11 +++++++++++
> > > > > > >  1 file changed, 11 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > > > > > > index 95651dc..61028bc 100644
> > > > > > > --- a/arch/x86/mm/init_64.c
> > > > > > > +++ b/arch/x86/mm/init_64.c
> > > > > > > @@ -771,6 +771,16 @@ static void __meminit free_pmd_table(pmd_t
> > > > > > > *pmd_start, pud_t *pud)
> > > > > > >  	spin_unlock(&init_mm.page_table_lock);
> > > > > > >  }
> > > > > > >  
> > > > > > > +/*
> > > > > > > + * For 4 levels page table we do not want to free puds but for 5 levels
> > > > > > > + * we should free them. This code also need to change to adapt for boot
> > > > > > > + * time switching between 4 and 5 level.
> > > > > > > + */
> > > > > > > +#if CONFIG_PGTABLE_LEVELS == 4
> > > > > > > +static inline void free_pud_table(pud_t *pud_start, p4d_t *p4d)
> > > > > > > +{
> > > > > > > +}
> > > > > > 
> > > > > > Just "if (CONFIG_PGTABLE_LEVELS > 4)" before calling free_pud_table(), but
> > > > > > okay -- I'll rework it anyway for boot-time switching.
> > > > > 
> > > > > Err. "if (CONFIG_PGTABLE_LEVELS == 4)" obviously.
> > > > 
> > > > You want me to respawn a v3 or is that good enough until you finish
> > > > boot time 5 level page table ?
> > > 
> > > It doesn't matter for me. Upto Ingo.
> > 
> > Andrew any news on this ? This fix a regression in 4.12 so it would be nice to
> > have this fix or similar in. I can repost a v3 without inline ie directly ifdefing
> > the callsite.
> > 
> > Note that Kyrill will rework that but i think this is 4.13 material.
> 
> Please don't #ifdef the call site or tweak the inlines - isn't what Kirill 
> suggested:
> 
> 	if (CONFIG_PGTABLE_LEVELS == 4)
> 
> at the call site enough to fix the bug?

Right solution is if (CONFIG_PGTABLE_LEVELS == 5) at call site. I will spawn
a v3 with that instead of inline #if/#else

> 
> BTW., how can this be a regression, if in v4.12 CONFIG_PGTABLE_LEVELS is always 4?

So in af2cf278ef4f we no longer free pud and no longer synchronize pgd
because if we don't free pud that is pointless. With Kirill 5 level page
table code we need to free pud when in 5 level page table but not free
p4d. The thing is Kirill didn't make the free_pud conditional on 5 level
page table. So on 4 level page table with his patches that are now in 4.12
it frees the pud ie the pgd entry and because we no longer synchronize
pgd it can trigger the BUG_ON() after hotremove as reported by few peoples
so far.

So yes this is a regression and yes people see that regression in the
not so common case of hotremove freeing a pud and then a latter hotplug
trying to add a new pud for same kernel virtual address range.


> For CONFIG_PGTABLE_LEVELS == 5 it won't work - but we don't have 
> CONFIG_PGTABLE_LEVELS == 5 upstream yet.

In 4.12 there is already 5 level page table code and that is what regressed
this whole pgd entries get out of sync and trigger BUG_ON()

See Kirill f2a6a7050109e0a5c7a84c70aa6010f682b2f1ee for guilty patch.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2017-06-24 18:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-07 14:46 [PATCH] x86/mm/hotplug: fix BUG_ON() after hotremove by not freeing pud v2 jglisse
2017-06-07 16:14 ` Logan Gunthorpe
2017-06-07 17:03 ` Kirill A. Shutemov
2017-06-07 17:06   ` Kirill A. Shutemov
2017-06-07 17:38     ` Jerome Glisse
2017-06-07 18:17       ` Kirill A. Shutemov
2017-06-23 19:48         ` Jerome Glisse
2017-06-24  6:45           ` Ingo Molnar
2017-06-24 18:05             ` Jerome Glisse [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170624180521.GA2830@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).