All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	mszeredi@novell.com, ebiederm@xmission.com,
	Greg Kroah-Hartman <gregkh@suse.de>,
	gong.chen@intel.com, Tony Luck <tony.luck@intel.com>,
	Borislav Petkov <bp@amd64.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	linux-edac@vger.kernel.org, Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 2/2] mm: page allocator: Do not drain per-cpu lists via IPI from page allocator context
Date: Fri, 20 Jan 2012 08:48:40 +0000	[thread overview]
Message-ID: <20120120084840.GG3143@suse.de> (raw)
In-Reply-To: <4F188F52.1060303@linux.vnet.ibm.com>

On Fri, Jan 20, 2012 at 03:16:58AM +0530, Srivatsa S. Bhat wrote:
> [Reinstating the original Cc list]
> 
> On 01/19/2012 09:50 PM, Mel Gorman wrote:> 
> 
> > On a different x86-64 machines with an intel-specific MCE, I have
> > also noted that the value of num_online_cpus() can change while
> > stop_machine() is running.
> 
> 
> That is expected and intentional right? Meaning, it is during the
> stop_machine() thing itself that a CPU is actually taken offline.
> And at the same time, it is removed from the cpu_online_mask.
> 

It's intentional sometimes and no others. The machine does halt
sometimes and stays there.

> On Intel boxes, essentially, the following gets executed on the dying
> CPU, as set up by the stop_machine stuff.
> 
> __cpu_disable()
>     native_cpu_disable()
>         cpu_disable_common()
>             remove_cpu_from_maps()
>                 set_cpu_online(cpu, false)
> 			^^^^^^
> So, set_cpu_online will remove this CPU from the cpu_online_mask.
> And all this runs while still under the stop machine context.
> And this is exactly what we want right?
> 

We don't want it to halt in stop_machine forever waiting on acknowledges
that are never received until the NMI handler fires.

> > This is sensitive to timing and part of
> > the problem seems to be due to cmci_rediscover() running without the
> > CPU hotplug mutex held. This is not related to the IPI mess and is
> > unrelated to memory pressure but is just to note that CPU hotplug in
> > general can be fragile in parts.
> > 
> 
> 
> For the cmci_rediscover() part, I feel a simple get/put_online_cpus()
> around it should work.
> 

Yeah, that's the first thing I tried first too. Doesn't work though.

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	mszeredi@novell.com, ebiederm@xmission.com,
	Greg Kroah-Hartman <gregkh@suse.de>,
	gong.chen@intel.com, Tony Luck <tony.luck@intel.com>,
	Borislav Petkov <bp@amd64.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	linux-edac@vger.kernel.org, Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 2/2] mm: page allocator: Do not drain per-cpu lists via IPI from page allocator context
Date: Fri, 20 Jan 2012 08:48:40 +0000	[thread overview]
Message-ID: <20120120084840.GG3143@suse.de> (raw)
In-Reply-To: <4F188F52.1060303@linux.vnet.ibm.com>

On Fri, Jan 20, 2012 at 03:16:58AM +0530, Srivatsa S. Bhat wrote:
> [Reinstating the original Cc list]
> 
> On 01/19/2012 09:50 PM, Mel Gorman wrote:> 
> 
> > On a different x86-64 machines with an intel-specific MCE, I have
> > also noted that the value of num_online_cpus() can change while
> > stop_machine() is running.
> 
> 
> That is expected and intentional right? Meaning, it is during the
> stop_machine() thing itself that a CPU is actually taken offline.
> And at the same time, it is removed from the cpu_online_mask.
> 

It's intentional sometimes and no others. The machine does halt
sometimes and stays there.

> On Intel boxes, essentially, the following gets executed on the dying
> CPU, as set up by the stop_machine stuff.
> 
> __cpu_disable()
>     native_cpu_disable()
>         cpu_disable_common()
>             remove_cpu_from_maps()
>                 set_cpu_online(cpu, false)
> 			^^^^^^
> So, set_cpu_online will remove this CPU from the cpu_online_mask.
> And all this runs while still under the stop machine context.
> And this is exactly what we want right?
> 

We don't want it to halt in stop_machine forever waiting on acknowledges
that are never received until the NMI handler fires.

> > This is sensitive to timing and part of
> > the problem seems to be due to cmci_rediscover() running without the
> > CPU hotplug mutex held. This is not related to the IPI mess and is
> > unrelated to memory pressure but is just to note that CPU hotplug in
> > general can be fragile in parts.
> > 
> 
> 
> For the cmci_rediscover() part, I feel a simple get/put_online_cpus()
> around it should work.
> 

Yeah, that's the first thing I tried first too. Doesn't work though.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-01-20  8:48 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-11 10:11 [RFC PATCH 0/2] Improve reliability of CPU hotplug Mel Gorman
2012-01-11 10:11 ` Mel Gorman
2012-01-11 10:11 ` [PATCH 1/2] fs: sysfs: Do dcache-related updates to sysfs dentries under sysfs_mutex Mel Gorman
2012-01-11 10:11   ` Mel Gorman
2012-01-11 17:11   ` Eric W. Biederman
2012-01-11 17:11     ` Eric W. Biederman
2012-01-11 18:07     ` Mel Gorman
2012-01-11 18:07       ` Mel Gorman
2012-01-11 19:02       ` Eric W. Biederman
2012-01-11 19:02         ` Eric W. Biederman
2012-01-11 10:11 ` [PATCH 2/2] mm: page allocator: Do not drain per-cpu lists via IPI from page allocator context Mel Gorman
2012-01-11 10:11   ` Mel Gorman
2012-01-12 14:51   ` Gilad Ben-Yossef
2012-01-12 14:51     ` Gilad Ben-Yossef
2012-01-12 15:08     ` Peter Zijlstra
2012-01-12 15:08       ` Peter Zijlstra
2012-01-12 15:13       ` Gilad Ben-Yossef
2012-01-12 15:13         ` Gilad Ben-Yossef
2012-01-12 15:08     ` Gilad Ben-Yossef
2012-01-12 15:08       ` Gilad Ben-Yossef
2012-01-12 15:18   ` Peter Zijlstra
2012-01-12 15:18     ` Peter Zijlstra
2012-01-12 15:37     ` Mel Gorman
2012-01-12 15:37       ` Mel Gorman
2012-01-12 15:52       ` Peter Zijlstra
2012-01-12 15:52         ` Peter Zijlstra
2012-01-12 17:18         ` Mel Gorman
2012-01-12 17:18           ` Mel Gorman
2012-01-12 19:14           ` Gilad Ben-Yossef
2012-01-12 19:14             ` Gilad Ben-Yossef
2012-01-13 20:58             ` Milton Miller
2012-01-15 13:53               ` Gilad Ben-Yossef
2012-01-13 20:58           ` Milton Miller
2012-01-19 16:20             ` Mel Gorman
2012-01-19 21:46               ` Srivatsa S. Bhat
2012-01-19 21:46                 ` Srivatsa S. Bhat
2012-01-20  8:48                 ` Mel Gorman [this message]
2012-01-20  8:48                   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120120084840.GG3143@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bp@amd64.org \
    --cc=ebiederm@xmission.com \
    --cc=gilad@benyossef.com \
    --cc=gong.chen@intel.com \
    --cc=gregkh@suse.de \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    --cc=miltonm@bga.com \
    --cc=mingo@redhat.com \
    --cc=mszeredi@novell.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.