* [PATCH] CPU hotplug: Slow down hotplug operations
@ 2014-05-07 19:57 Borislav Petkov
2014-05-07 20:06 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-05-07 19:57 UTC (permalink / raw)
To: LKML
Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Mel Gorman, Steven Rostedt, Mike Galbraith, Andrew Morton,
Linus Torvalds
From: Borislav Petkov <bp@suse.de>
We have all those eager tester dudes which scratch up a dirty script to
pound on CPU hotplug senselessly and then report bugs they've managed to
trigger.
Well, first of all, most, if not all, bugs they trigger are CPU hotplug
related anyway. But we know hotplug is full of duct tape and brown
paper bags. So we end up clearly wasting too much time dealing with a
mechanism we know it is b0rked in the first place.
Oh, and I would understand if that pounding were close to some real
usage patterns but I've yet to receive a justification for toggling
cores on- and offline senselessly.
In any case, before this gets rewritten properly (I'm being told we
might get lucky after all) let's slow down hotplugging on purpose and
thus make it uninteresting, as a temporary brown paper bag solution
until the real thing gets done.
This way we'll save us a lot of time and efforts in chasing the wrong
bugs.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <mgalbraith@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
drivers/base/cpu.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc5297d..615c7af767ed 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -40,6 +40,11 @@ static void change_cpu_under_node(struct cpu *cpu,
cpu->node_id = to_nid;
}
+static void delay_hotplug(void)
+{
+ schedule_timeout_uninterruptible(msecs_to_jiffies(MSEC_PER_SEC));
+}
+
static int __ref cpu_subsys_online(struct device *dev)
{
struct cpu *cpu = container_of(dev, struct cpu, dev);
@@ -47,6 +52,8 @@ static int __ref cpu_subsys_online(struct device *dev)
int from_nid, to_nid;
int ret;
+ delay_hotplug();
+
from_nid = cpu_to_node(cpuid);
if (from_nid == NUMA_NO_NODE)
return -ENODEV;
@@ -65,6 +72,8 @@ static int __ref cpu_subsys_online(struct device *dev)
static int cpu_subsys_offline(struct device *dev)
{
+ delay_hotplug();
+
return cpu_down(dev->id);
}
--
1.9.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-07 19:57 [PATCH] CPU hotplug: Slow down hotplug operations Borislav Petkov
@ 2014-05-07 20:06 ` Andrew Morton
2014-05-07 20:22 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2014-05-07 20:06 UTC (permalink / raw)
To: Borislav Petkov
Cc: LKML, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, Steven Rostedt, Mike Galbraith,
Linus Torvalds, Srivatsa S. Bhat
On Wed, 7 May 2014 21:57:41 +0200 Borislav Petkov <bp@alien8.de> wrote:
> We have all those eager tester dudes which scratch up a dirty script to
> pound on CPU hotplug senselessly and then report bugs they've managed to
> trigger.
>
> Well, first of all, most, if not all, bugs they trigger are CPU hotplug
> related anyway. But we know hotplug is full of duct tape and brown
> paper bags. So we end up clearly wasting too much time dealing with a
> mechanism we know it is b0rked in the first place.
>
> Oh, and I would understand if that pounding were close to some real
> usage patterns but I've yet to receive a justification for toggling
> cores on- and offline senselessly.
>
> In any case, before this gets rewritten properly (I'm being told we
> might get lucky after all) let's slow down hotplugging on purpose and
> thus make it uninteresting, as a temporary brown paper bag solution
> until the real thing gets done.
>
> This way we'll save us a lot of time and efforts in chasing the wrong
> bugs.
Well, I only yesterday merged Srivatsa's `CPU hotplug, stop-machine:
plug race-window that leads to "IPI-to-offline-CPU"' bugfix. That bug
presumably wouldn't have been fixed if this patch was in place.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-07 20:06 ` Andrew Morton
@ 2014-05-07 20:22 ` Thomas Gleixner
2014-05-07 20:26 ` Borislav Petkov
2014-05-08 4:31 ` Srivatsa S. Bhat
0 siblings, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2014-05-07 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Borislav Petkov, LKML, H. Peter Anvin, Ingo Molnar,
Peter Zijlstra, Mel Gorman, Steven Rostedt, Mike Galbraith,
Linus Torvalds, Srivatsa S. Bhat
On Wed, 7 May 2014, Andrew Morton wrote:
> On Wed, 7 May 2014 21:57:41 +0200 Borislav Petkov <bp@alien8.de> wrote:
>
> > We have all those eager tester dudes which scratch up a dirty script to
> > pound on CPU hotplug senselessly and then report bugs they've managed to
> > trigger.
> >
> > Well, first of all, most, if not all, bugs they trigger are CPU hotplug
> > related anyway. But we know hotplug is full of duct tape and brown
> > paper bags. So we end up clearly wasting too much time dealing with a
> > mechanism we know it is b0rked in the first place.
> >
> > Oh, and I would understand if that pounding were close to some real
> > usage patterns but I've yet to receive a justification for toggling
> > cores on- and offline senselessly.
> >
> > In any case, before this gets rewritten properly (I'm being told we
> > might get lucky after all) let's slow down hotplugging on purpose and
> > thus make it uninteresting, as a temporary brown paper bag solution
> > until the real thing gets done.
> >
> > This way we'll save us a lot of time and efforts in chasing the wrong
> > bugs.
>
> Well, I only yesterday merged Srivatsa's `CPU hotplug, stop-machine:
> plug race-window that leads to "IPI-to-offline-CPU"' bugfix. That bug
> presumably wouldn't have been fixed if this patch was in place.
True.
OTOH, if people would have spent the same amount of time to rewrite
the hotplug mess, we would have a way bigger benefit. But no, we
prefer to add more layers of duct tape and bandaid hackery to it.
I tried a redesign and run out of cycles, but the patches are out
there and none of the folks who promised to complete them ever
delivered. If nothing fundamental changes, I'm going to spend some
serious time on it in the next couple of month.
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-07 20:22 ` Thomas Gleixner
@ 2014-05-07 20:26 ` Borislav Petkov
2014-05-11 17:02 ` Pavel Machek
2014-05-08 4:31 ` Srivatsa S. Bhat
1 sibling, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-05-07 20:26 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andrew Morton, LKML, H. Peter Anvin, Ingo Molnar, Peter Zijlstra,
Mel Gorman, Steven Rostedt, Mike Galbraith, Linus Torvalds,
Srivatsa S. Bhat
On Wed, May 07, 2014 at 10:22:33PM +0200, Thomas Gleixner wrote:
> On Wed, 7 May 2014, Andrew Morton wrote:
> > On Wed, 7 May 2014 21:57:41 +0200 Borislav Petkov <bp@alien8.de> wrote:
> >
> > > We have all those eager tester dudes which scratch up a dirty script to
> > > pound on CPU hotplug senselessly and then report bugs they've managed to
> > > trigger.
> > >
> > > Well, first of all, most, if not all, bugs they trigger are CPU hotplug
> > > related anyway. But we know hotplug is full of duct tape and brown
> > > paper bags. So we end up clearly wasting too much time dealing with a
> > > mechanism we know it is b0rked in the first place.
> > >
> > > Oh, and I would understand if that pounding were close to some real
> > > usage patterns but I've yet to receive a justification for toggling
> > > cores on- and offline senselessly.
> > >
> > > In any case, before this gets rewritten properly (I'm being told we
> > > might get lucky after all) let's slow down hotplugging on purpose and
> > > thus make it uninteresting, as a temporary brown paper bag solution
> > > until the real thing gets done.
> > >
> > > This way we'll save us a lot of time and efforts in chasing the wrong
> > > bugs.
> >
> > Well, I only yesterday merged Srivatsa's `CPU hotplug, stop-machine:
> > plug race-window that leads to "IPI-to-offline-CPU"' bugfix. That bug
> > presumably wouldn't have been fixed if this patch was in place.
>
> True.
>
> OTOH, if people would have spent the same amount of time to rewrite
> the hotplug mess, we would have a way bigger benefit. But no, we
> prefer to add more layers of duct tape and bandaid hackery to it.
>
> I tried a redesign and run out of cycles, but the patches are out
> there and none of the folks who promised to complete them ever
> delivered. If nothing fundamental changes, I'm going to spend some
> serious time on it in the next couple of month.
... and in the interim, we could slow down the duct tape and bandaid
hackery until it gets rewritten properly. The ever increasing in
complexity bugfixes say exactly that - it needs a long hard look and a
rewrite.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-07 20:22 ` Thomas Gleixner
2014-05-07 20:26 ` Borislav Petkov
@ 2014-05-08 4:31 ` Srivatsa S. Bhat
1 sibling, 0 replies; 9+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-08 4:31 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andrew Morton, Borislav Petkov, LKML, H. Peter Anvin, Ingo Molnar,
Peter Zijlstra, Mel Gorman, Steven Rostedt, Mike Galbraith,
Linus Torvalds
On 05/08/2014 01:52 AM, Thomas Gleixner wrote:
> On Wed, 7 May 2014, Andrew Morton wrote:
>> On Wed, 7 May 2014 21:57:41 +0200 Borislav Petkov <bp@alien8.de> wrote:
>>
>>> We have all those eager tester dudes which scratch up a dirty script to
>>> pound on CPU hotplug senselessly and then report bugs they've managed to
>>> trigger.
>>>
>>> Well, first of all, most, if not all, bugs they trigger are CPU hotplug
>>> related anyway. But we know hotplug is full of duct tape and brown
>>> paper bags. So we end up clearly wasting too much time dealing with a
>>> mechanism we know it is b0rked in the first place.
>>>
>>> Oh, and I would understand if that pounding were close to some real
>>> usage patterns but I've yet to receive a justification for toggling
>>> cores on- and offline senselessly.
>>>
>>> In any case, before this gets rewritten properly (I'm being told we
>>> might get lucky after all) let's slow down hotplugging on purpose and
>>> thus make it uninteresting, as a temporary brown paper bag solution
>>> until the real thing gets done.
>>>
>>> This way we'll save us a lot of time and efforts in chasing the wrong
>>> bugs.
>>
>> Well, I only yesterday merged Srivatsa's `CPU hotplug, stop-machine:
>> plug race-window that leads to "IPI-to-offline-CPU"' bugfix. That bug
>> presumably wouldn't have been fixed if this patch was in place.
>
> True.
>
> OTOH, if people would have spent the same amount of time to rewrite
> the hotplug mess, we would have a way bigger benefit. But no, we
> prefer to add more layers of duct tape and bandaid hackery to it.
>
> I tried a redesign and run out of cycles, but the patches are out
> there and none of the folks who promised to complete them ever
> delivered. If nothing fundamental changes, I'm going to spend some
> serious time on it in the next couple of month.
>
Yeah, that's quite unfortunate. Even several of my own attempts to try
and fix some of the chronic issues of CPU hotplug (such as the removal
of CPU hotplug's dependency on stop-machine, consolidation of all the
duplicated and buggy CPU hotplug code in various architectures etc.) all
met a similar fate. Initially there was some amount of consensus on these
patchsets and designs, but eventually they got nowhere due to lack of any
further feedback or signs of upstream acceptance.
Stop-machine()-free CPU hotplug, v6:
http://lwn.net/Articles/538819/
With performance improvements:
http://article.gmane.org/gmane.linux.kernel/1435249
Attempt to upstream that patchset in parts, v3:
http://lwn.net/Articles/556727/
Generic SMP boot/cpu-hotplug framework to consolidate arch/ code:
https://lwn.net/Articles/500185/
But, luckily the recent work to fix the notifier deadlock mess actually
went upstream, fairly quickly. So we have one less CPU hotplug problem
to fix! :-)
https://lkml.org/lkml/2014/3/10/522
Regards,
Srivatsa S. Bhat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-07 20:26 ` Borislav Petkov
@ 2014-05-11 17:02 ` Pavel Machek
2014-05-11 18:29 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: Pavel Machek @ 2014-05-11 17:02 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, Andrew Morton, LKML, H. Peter Anvin, Ingo Molnar,
Peter Zijlstra, Mel Gorman, Steven Rostedt, Mike Galbraith,
Linus Torvalds, Srivatsa S. Bhat
On Wed 2014-05-07 22:26:55, Borislav Petkov wrote:
> On Wed, May 07, 2014 at 10:22:33PM +0200, Thomas Gleixner wrote:
> > On Wed, 7 May 2014, Andrew Morton wrote:
> > > On Wed, 7 May 2014 21:57:41 +0200 Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > > We have all those eager tester dudes which scratch up a dirty script to
> > > > pound on CPU hotplug senselessly and then report bugs they've managed to
> > > > trigger.
> > > >
> > > > Well, first of all, most, if not all, bugs they trigger are CPU hotplug
> > > > related anyway. But we know hotplug is full of duct tape and brown
> > > > paper bags. So we end up clearly wasting too much time dealing with a
> > > > mechanism we know it is b0rked in the first place.
> > > >
> > > > Oh, and I would understand if that pounding were close to some real
> > > > usage patterns but I've yet to receive a justification for toggling
> > > > cores on- and offline senselessly.
> > > >
> > > > In any case, before this gets rewritten properly (I'm being told we
> > > > might get lucky after all) let's slow down hotplugging on purpose and
> > > > thus make it uninteresting, as a temporary brown paper bag solution
> > > > until the real thing gets done.
> > > >
> > > > This way we'll save us a lot of time and efforts in chasing the wrong
> > > > bugs.
> > >
> > > Well, I only yesterday merged Srivatsa's `CPU hotplug, stop-machine:
> > > plug race-window that leads to "IPI-to-offline-CPU"' bugfix. That bug
> > > presumably wouldn't have been fixed if this patch was in place.
> >
> > True.
> >
> > OTOH, if people would have spent the same amount of time to rewrite
> > the hotplug mess, we would have a way bigger benefit. But no, we
> > prefer to add more layers of duct tape and bandaid hackery to it.
> >
> > I tried a redesign and run out of cycles, but the patches are out
> > there and none of the folks who promised to complete them ever
> > delivered. If nothing fundamental changes, I'm going to spend some
> > serious time on it in the next couple of month.
>
> ... and in the interim, we could slow down the duct tape and bandaid
> hackery until it gets rewritten properly. The ever increasing in
> complexity bugfixes say exactly that - it needs a long hard look and a
> rewrite.
Well. If you add the delay, you'll mask real problems and cause regressions when
the delay is removed -- because fix-hotplug will probably take time to get right.
Bad idea, AFAICT.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-11 17:02 ` Pavel Machek
@ 2014-05-11 18:29 ` Thomas Gleixner
2014-05-11 18:48 ` Borislav Petkov
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2014-05-11 18:29 UTC (permalink / raw)
To: Pavel Machek
Cc: Borislav Petkov, Andrew Morton, LKML, H. Peter Anvin, Ingo Molnar,
Peter Zijlstra, Mel Gorman, Steven Rostedt, Mike Galbraith,
Linus Torvalds, Srivatsa S. Bhat
On Sun, 11 May 2014, Pavel Machek wrote:
> Well. If you add the delay, you'll mask real problems and cause regressions when
> the delay is removed -- because fix-hotplug will probably take time to get right.
>
> Bad idea, AFAICT.
Agreed, but Boris is right, that the current duct tape hackery needs
to stop. And the delay was the desperate attempt to wake up people to
focus on replacing the current mess instead of making it more
entangled.
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-11 18:29 ` Thomas Gleixner
@ 2014-05-11 18:48 ` Borislav Petkov
2014-05-12 20:56 ` Pavel Machek
0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-05-11 18:48 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Pavel Machek, Andrew Morton, LKML, H. Peter Anvin, Ingo Molnar,
Peter Zijlstra, Mel Gorman, Steven Rostedt, Mike Galbraith,
Linus Torvalds, Srivatsa S. Bhat
On Sun, May 11, 2014 at 08:29:30PM +0200, Thomas Gleixner wrote:
> On Sun, 11 May 2014, Pavel Machek wrote:
> > Well. If you add the delay, you'll mask real problems and cause regressions when
> > the delay is removed -- because fix-hotplug will probably take time to get right.
> >
> > Bad idea, AFAICT.
>
> Agreed, but Boris is right, that the current duct tape hackery needs
> to stop. And the delay was the desperate attempt to wake up people to
> focus on replacing the current mess instead of making it more
> entangled.
I think Pavel is missing the point: the delay will be removed with the
rewrite of cpu hotplug, after it actually works reliably. Then we won't
need delay anyway.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] CPU hotplug: Slow down hotplug operations
2014-05-11 18:48 ` Borislav Petkov
@ 2014-05-12 20:56 ` Pavel Machek
0 siblings, 0 replies; 9+ messages in thread
From: Pavel Machek @ 2014-05-12 20:56 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, Andrew Morton, LKML, H. Peter Anvin, Ingo Molnar,
Peter Zijlstra, Mel Gorman, Steven Rostedt, Mike Galbraith,
Linus Torvalds, Srivatsa S. Bhat
On Sun 2014-05-11 20:48:08, Borislav Petkov wrote:
> On Sun, May 11, 2014 at 08:29:30PM +0200, Thomas Gleixner wrote:
> > On Sun, 11 May 2014, Pavel Machek wrote:
> > > Well. If you add the delay, you'll mask real problems and cause regressions when
> > > the delay is removed -- because fix-hotplug will probably take time to get right.
> > >
> > > Bad idea, AFAICT.
> >
> > Agreed, but Boris is right, that the current duct tape hackery needs
> > to stop. And the delay was the desperate attempt to wake up people to
> > focus on replacing the current mess instead of making it more
> > entangled.
>
> I think Pavel is missing the point: the delay will be removed with the
> rewrite of cpu hotplug, after it actually works reliably. Then we won't
> need delay anyway.
Well.. and if you don't get it right at the first try, you'll get
regression and bisect pointing at delay removal. Also... with delay in
place, you won't notice if it gets less reliable.
printk() or comment should deter people from filling bug reports....
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-05-12 20:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-07 19:57 [PATCH] CPU hotplug: Slow down hotplug operations Borislav Petkov
2014-05-07 20:06 ` Andrew Morton
2014-05-07 20:22 ` Thomas Gleixner
2014-05-07 20:26 ` Borislav Petkov
2014-05-11 17:02 ` Pavel Machek
2014-05-11 18:29 ` Thomas Gleixner
2014-05-11 18:48 ` Borislav Petkov
2014-05-12 20:56 ` Pavel Machek
2014-05-08 4:31 ` Srivatsa S. Bhat
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox