All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, vbabka@suse.com, david@redhat.com,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH STABLE 4.9] mm: Avoid calling build_all_zonelists_init under hotplug context
Date: Thu, 20 Aug 2020 10:14:26 +0200	[thread overview]
Message-ID: <20200820081426.GF4049659@kroah.com> (raw)
In-Reply-To: <20200818122446.GA15067@dhcp22.suse.cz>

On Tue, Aug 18, 2020 at 02:24:46PM +0200, Michal Hocko wrote:
> On Tue 18-08-20 13:00:46, Oscar Salvador wrote:
> > Recently a customer of ours experienced a crash when booting the
> > system while enabling memory-hotplug.
> > 
> > The problem is that Normal zones on different nodes don't get their private
> > zone->pageset allocated, and keep sharing the initial boot_pageset.
> > The sharing between zones is normally safe as explained by the comment for
> > boot_pageset - it's a percpu structure, and manipulations are done with
> > disabled interrupts, and boot_pageset is set up in a way that any page placed
> > on its pcplist is immediately flushed to shared zone's freelist, because
> > pcp->high == 1.
> > However, the hotplug operation updates pcp->high to a higher value as it
> > expects to be operating on a private pageset.
> > 
> > The problem is in build_all_zonelists(), which is called when the first range
> > of pages is onlined for the Normal zone of node X or Y:
> > 
> > 	if (system_state == SYSTEM_BOOTING) {
> > 		build_all_zonelists_init();
> > 	} else {
> > 	#ifdef CONFIG_MEMORY_HOTPLUG
> > 		if (zone)
> > 			setup_zone_pageset(zone);
> > 	#endif
> > 		/* we have to stop all cpus to guarantee there is no user
> > 		of zonelist */
> > 		stop_machine(__build_all_zonelists, pgdat, NULL);
> > 		/* cpuset refresh routine should be here */
> > 	}
> > 
> > When called during hotplug, it should execute the setup_zone_pageset(zone)
> > which allocates the private pageset.
> > However, with memhp_default_state=online, this happens early while
> > system_state == SYSTEM_BOOTING is still true, hence this step is skipped.
> > (and build_all_zonelists_init() is probably unsafe anyway at this point).
> > 
> > Another hotplug operation on the same zone then leads to zone_pcp_update(zone)
> > called from online_pages(), which updates the pcp->high for the shared
> > boot_pageset to a value higher than 1.
> > At that point, pages freed from Node X and Y Normal zones can end up on the same
> > pcplist and from there they can be freed to the wrong zone's freelist,
> > leading to the corruption and crashes.
> > 
> > Please, note that upstream has fixed that differently (and unintentionally) by
> > adding another boot state (SYSTEM_SCHEDULING), which is set before smp_init().
> > That should happen before memory hotplug events even with memhp_default_state=online.
> > Backporting that would be too intrusive.
> > 
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > Debugged-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Yes, I believe this is the easiest and the least scary way to fix the
> issue for stable kernel users. Feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com> # for stable trees
> 
> for that purpose.

Now queued up, thanks!

greg k-h


  reply	other threads:[~2020-08-20  8:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-18 11:00 [PATCH STABLE 4.9] mm: Avoid calling build_all_zonelists_init under hotplug context Oscar Salvador
2020-08-18 12:24 ` Michal Hocko
2020-08-20  8:14   ` Greg KH [this message]
2020-08-18 12:26 ` David Hildenbrand
2020-08-20  8:39 ` Patch "mm: Avoid calling build_all_zonelists_init under hotplug context" has been added to the 4.9-stable tree gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200820081426.GF4049659@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.