[PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
@ 2008-12-18  1:09 Frederic Weisbecker
  2008-12-18 10:34 ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2008-12-18  1:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Steven Rostedt, Thomas Gleixner, Andrew Morton, Linux Kernel

Impact: fix a system hang on slow systems

While testing the function graph tracer on VirtualBox, I had a system hang
immediatly after enabling the tracer.
If hrtimer is enabled on kernel, a slow system can spend too much time during
tracing the hrtimer_interrupt which will do eternal loops, assuming it always
have to retry its process because too much time elapsed during its time
update.
Now we provide a feature which lurks at the number of retries on
hrtimer_interrupt. After 10 retries, the function graph tracer will definetly stop
its tracing.

Changes in v2 (Andrew's comments):

_ Moved ftrace_graph_hrtimer_*() from linux/ftrace.h to kernel/trace/trace.c, also
  uninlined them.
_ Turn ftrace_graph_hrtimer_loop into static
_ Fix my evil english :-)
_ Very small other fixes...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/ftrace.h |   10 ++++++++++
 kernel/hrtimer.c       |    5 +++++
 kernel/trace/ftrace.c  |   29 +++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 286af82..4874782 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -408,6 +408,13 @@ extern void unregister_ftrace_graph(void);
 extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 
+/*
+ * Prevent infinite loops on hrtimer_interrupt()
+ * due to tracing on slow systems.
+ */
+extern void ftrace_graph_hrtimer_enter(void);
+extern void ftrace_graph_hrtimer_retry(void);
+
 static inline int task_curr_ret_stack(struct task_struct *t)
 {
 	return t->curr_ret_stack;
@@ -437,6 +444,9 @@ static inline int task_curr_ret_stack(struct task_struct *tsk)
 
 static inline void pause_graph_tracing(void) { }
 static inline void unpause_graph_tracing(void) { }
+
+static inline void ftrace_graph_hrtimer_enter(void) { }
+static inline void ftrace_graph_hrtimer_retry(void) { }
 #endif
 
 #ifdef CONFIG_TRACING
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index b741f85..12fa4d9 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -44,6 +44,7 @@
 #include <linux/seq_file.h>
 #include <linux/err.h>
 #include <linux/debugobjects.h>
+#include <linux/ftrace.h>
 
 #include <asm/uaccess.h>
 
@@ -1186,7 +1187,11 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	cpu_base->nr_events++;
 	dev->next_event.tv64 = KTIME_MAX;
 
+	ftrace_graph_hrtimer_enter();
  retry:
+	/* Prevent from infinite loop on slow systems while tracing */
+	ftrace_graph_hrtimer_retry();
+
 	now = ktime_get();
 
 	expires_next.tv64 = KTIME_MAX;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a12f80e..0b3a7a8 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1955,6 +1955,9 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 static atomic_t ftrace_graph_active;
 
+/* Count number of retries attempts in hrtimer_interrupt() */
+static DEFINE_PER_CPU(int, ftrace_graph_hrtimer_loop);
+
 int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
 {
 	return 0;
@@ -2095,6 +2098,32 @@ void ftrace_graph_exit_task(struct task_struct *t)
 	kfree(ret_stack);
 }
 
+/*
+ * Those two functions prevent infinite loops on hrtimer_interrupt
+ * because the function graph tracer can consume too much time processing
+ * on slow systems, and the hrtimer_interrupt can relaunch its processing
+ * if too much time elapsed during its previous time update.
+ *
+ * Note that we don't need an atomic counter for loops here because
+ * interrupts are disabled in hrtimer_interrupt.
+ */
+
+void ftrace_graph_hrtimer_enter(void)
+{
+	__get_cpu_var(ftrace_graph_hrtimer_loop) = 0;
+}
+
+void ftrace_graph_hrtimer_retry(void)
+{
+	__get_cpu_var(ftrace_graph_hrtimer_loop)++;
+
+	if (__get_cpu_var(ftrace_graph_hrtimer_loop) == 10) {
+		ftrace_graph_stop();
+		printk(KERN_WARNING "function-graph-tracer: hrtimer_interrupt exceeded"
+				" 10 loops. Stopping tracing\n");
+	}
+}
+
 void ftrace_graph_stop(void)
 {
 	ftrace_stop();
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18  1:09 [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop Frederic Weisbecker
@ 2008-12-18 10:34 ` Ingo Molnar
  2008-12-18 10:48   ` Thomas Gleixner
  2008-12-18 10:51   ` Frédéric Weisbecker
  0 siblings, 2 replies; 16+ messages in thread
From: Ingo Molnar @ 2008-12-18 10:34 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner
  Cc: Steven Rostedt, Andrew Morton, Linux Kernel


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Impact: fix a system hang on slow systems
> 
> While testing the function graph tracer on VirtualBox, I had a system hang
> immediatly after enabling the tracer.
>
> If hrtimer is enabled on kernel, a slow system can spend too much time 
> during tracing the hrtimer_interrupt which will do eternal loops, 
> assuming it always have to retry its process because too much time 
> elapsed during its time update. Now we provide a feature which lurks at 
> the number of retries on hrtimer_interrupt. After 10 retries, the 
> function graph tracer will definetly stop its tracing.

hm, i dont really like this solution - it just works around the problem by 
'speeding up' the system. If we have a _real_ slow system, there's no such 
way for us to speed it up.

Thomas, what do you think - would you expect this lockup to happen on 
really slow systems? If yes, is there a way we could avoid it from 
happening - by driving some sort of 'mandatory interval', that is doubled 
in size every time we detect such a bad hrtimer loop?

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 10:34 ` Ingo Molnar
@ 2008-12-18 10:48   ` Thomas Gleixner
  2008-12-18 10:56     ` Frédéric Weisbecker
  2008-12-18 11:22     ` Ingo Molnar
  2008-12-18 10:51   ` Frédéric Weisbecker
  1 sibling, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2008-12-18 10:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Steven Rostedt, Andrew Morton, Linux Kernel

On Thu, 18 Dec 2008, Ingo Molnar wrote:
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > Impact: fix a system hang on slow systems
> > 
> > While testing the function graph tracer on VirtualBox, I had a system hang
> > immediatly after enabling the tracer.
> >
> > If hrtimer is enabled on kernel, a slow system can spend too much time 
> > during tracing the hrtimer_interrupt which will do eternal loops, 
> > assuming it always have to retry its process because too much time 
> > elapsed during its time update. Now we provide a feature which lurks at 
> > the number of retries on hrtimer_interrupt. After 10 retries, the 
> > function graph tracer will definetly stop its tracing.
> 
> hm, i dont really like this solution - it just works around the problem by 
> 'speeding up' the system. If we have a _real_ slow system, there's no such 
> way for us to speed it up.
> 
> Thomas, what do you think - would you expect this lockup to happen on 
> really slow systems? If yes, is there a way we could avoid it from 
> happening - by driving some sort of 'mandatory interval', that is doubled 
> in size every time we detect such a bad hrtimer loop?

In reality I have not seen such a problem yet, even on an old real
slow P1 which I tricked to do highres, but of course if we add such
time consuming debugs and make it slow enough the system will spend
all the time running the tick timer :)

We should at least warn once about such a loop. 

I'm not sure about the mandatory interval though: 

Try the same test with HZ=1000 periodic mode (HIGHRES/NOHZ=off) and I
bet you see the same problem, just not in hrtimer_interrupt().

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 10:34 ` Ingo Molnar
  2008-12-18 10:48   ` Thomas Gleixner
@ 2008-12-18 10:51   ` Frédéric Weisbecker
  2008-12-18 11:16     ` Ingo Molnar
  1 sibling, 1 reply; 16+ messages in thread
From: Frédéric Weisbecker @ 2008-12-18 10:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel

2008/12/18 Ingo Molnar <mingo@elte.hu>:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
>> Impact: fix a system hang on slow systems
>>
>> While testing the function graph tracer on VirtualBox, I had a system hang
>> immediatly after enabling the tracer.
>>
>> If hrtimer is enabled on kernel, a slow system can spend too much time
>> during tracing the hrtimer_interrupt which will do eternal loops,
>> assuming it always have to retry its process because too much time
>> elapsed during its time update. Now we provide a feature which lurks at
>> the number of retries on hrtimer_interrupt. After 10 retries, the
>> function graph tracer will definetly stop its tracing.
>
> hm, i dont really like this solution - it just works around the problem by
> 'speeding up' the system. If we have a _real_ slow system, there's no such
> way for us to speed it up.


It doesn' speed up the system actually. That's not the goal of this patch.
This patch disables completely the tracing on infinite loop in hrtimer, and
tracing will not be reenabled after it. That's too dangerous.

I'm not really proud of this patch, because it is a tricky hack... But
this is the most lightweight solution I've found....

> Thomas, what do you think - would you expect this lockup to happen on
> really slow systems? If yes, is there a way we could avoid it from
> happening - by driving some sort of 'mandatory interval', that is doubled
> in size every time we detect such a bad hrtimer loop?
>
>        Ingo
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 10:48   ` Thomas Gleixner
@ 2008-12-18 10:56     ` Frédéric Weisbecker
  2008-12-18 11:22     ` Ingo Molnar
  1 sibling, 0 replies; 16+ messages in thread
From: Frédéric Weisbecker @ 2008-12-18 10:56 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linux Kernel

2008/12/18 Thomas Gleixner <tglx@linutronix.de>:
> On Thu, 18 Dec 2008, Ingo Molnar wrote:
>> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>
>> > Impact: fix a system hang on slow systems
>> >
>> > While testing the function graph tracer on VirtualBox, I had a system hang
>> > immediatly after enabling the tracer.
>> >
>> > If hrtimer is enabled on kernel, a slow system can spend too much time
>> > during tracing the hrtimer_interrupt which will do eternal loops,
>> > assuming it always have to retry its process because too much time
>> > elapsed during its time update. Now we provide a feature which lurks at
>> > the number of retries on hrtimer_interrupt. After 10 retries, the
>> > function graph tracer will definetly stop its tracing.
>>
>> hm, i dont really like this solution - it just works around the problem by
>> 'speeding up' the system. If we have a _real_ slow system, there's no such
>> way for us to speed it up.
>>
>> Thomas, what do you think - would you expect this lockup to happen on
>> really slow systems? If yes, is there a way we could avoid it from
>> happening - by driving some sort of 'mandatory interval', that is doubled
>> in size every time we detect such a bad hrtimer loop?
>
> In reality I have not seen such a problem yet, even on an old real
> slow P1 which I tricked to do highres, but of course if we add such
> time consuming debugs and make it slow enough the system will spend
> all the time running the tick timer :)
>
> We should at least warn once about such a loop.
>
> I'm not sure about the mandatory interval though:
>
> Try the same test with HZ=1000 periodic mode (HIGHRES/NOHZ=off) and I
> bet you see the same problem, just not in hrtimer_interrupt().
>
> Thanks,
>
>        tglx
>

That's right. I hesitate to propagate this patch more globally for all
interrupts....

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 10:51   ` Frédéric Weisbecker
@ 2008-12-18 11:16     ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2008-12-18 11:16 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel


* Frédéric Weisbecker <fweisbec@gmail.com> wrote:

> 2008/12/18 Ingo Molnar <mingo@elte.hu>:
> >
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> >> Impact: fix a system hang on slow systems
> >>
> >> While testing the function graph tracer on VirtualBox, I had a system hang
> >> immediatly after enabling the tracer.
> >>
> >> If hrtimer is enabled on kernel, a slow system can spend too much time
> >> during tracing the hrtimer_interrupt which will do eternal loops,
> >> assuming it always have to retry its process because too much time
> >> elapsed during its time update. Now we provide a feature which lurks at
> >> the number of retries on hrtimer_interrupt. After 10 retries, the
> >> function graph tracer will definetly stop its tracing.
> >
> > hm, i dont really like this solution - it just works around the problem by
> > 'speeding up' the system. If we have a _real_ slow system, there's no such
> > way for us to speed it up.
> 
> 
> It doesn' speed up the system actually. That's not the goal of this 
> patch.

i meant that abstractly. The "solution" here is that your patch turns off 
the function-graph-tracer. The practical effect of that is: the system 
gets much faster at processing hrtimer IRQs and effectively "speeds up".

ok?

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 10:48   ` Thomas Gleixner
  2008-12-18 10:56     ` Frédéric Weisbecker
@ 2008-12-18 11:22     ` Ingo Molnar
  2008-12-18 21:07       ` Frédéric Weisbecker
  1 sibling, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2008-12-18 11:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Steven Rostedt, Andrew Morton, Linux Kernel


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 18 Dec 2008, Ingo Molnar wrote:
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > Impact: fix a system hang on slow systems
> > > 
> > > While testing the function graph tracer on VirtualBox, I had a system hang
> > > immediatly after enabling the tracer.
> > >
> > > If hrtimer is enabled on kernel, a slow system can spend too much time 
> > > during tracing the hrtimer_interrupt which will do eternal loops, 
> > > assuming it always have to retry its process because too much time 
> > > elapsed during its time update. Now we provide a feature which lurks at 
> > > the number of retries on hrtimer_interrupt. After 10 retries, the 
> > > function graph tracer will definetly stop its tracing.
> > 
> > hm, i dont really like this solution - it just works around the problem by 
> > 'speeding up' the system. If we have a _real_ slow system, there's no such 
> > way for us to speed it up.
> > 
> > Thomas, what do you think - would you expect this lockup to happen on 
> > really slow systems? If yes, is there a way we could avoid it from 
> > happening - by driving some sort of 'mandatory interval', that is doubled 
> > in size every time we detect such a bad hrtimer loop?
> 
> In reality I have not seen such a problem yet, even on an old real slow 
> P1 which I tricked to do highres, but of course if we add such time 
> consuming debugs and make it slow enough the system will spend all the 
> time running the tick timer :)
> 
> We should at least warn once about such a loop.
> 
> I'm not sure about the mandatory interval though:
> 
> Try the same test with HZ=1000 periodic mode (HIGHRES/NOHZ=off) and I 
> bet you see the same problem, just not in hrtimer_interrupt().

that would be important to double-check. Frederic, does the system lock up 
with a periodic 1khz HZ tick just as much? I.e. does the processing of a 
single timer interrupt take more than 1 milliseconds?

Granted, if the system is too slow to process the system clock, it's not 
useful.

But that's my point: instead of just randomly disabling functionality 
until the system gets 'fast enough' to process timer IRQs, how about 
dynamically and adaptively extending the required minimal timeout between 
hr-timer IRQs?

That will in essence self-tune the system into some minimally working 
state - instead of locking it up. Note that such a method would work with 
any source of timer IRQ slowness - not just tracing.

( And maybe the lockup is somehow hrtimer IRQ induced. If a 1khz clock
  still works for Frederic then that angle has to be investigated. )

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 11:22     ` Ingo Molnar
@ 2008-12-18 21:07       ` Frédéric Weisbecker
  2008-12-18 21:16         ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Frédéric Weisbecker @ 2008-12-18 21:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel

2008/12/18 Ingo Molnar <mingo@elte.hu>:
> i meant that abstractly. The "solution" here is that your patch turns off
> the function-graph-tracer. The practical effect of that is: the system
> gets much faster at processing hrtimer IRQs and effectively "speeds up".
>
> ok?
>
>        Ingo
>

Ok :-)

> that would be important to double-check. Frederic, does the system lock up
> with a periodic 1khz HZ tick just as much? I.e. does the processing of a
> single timer interrupt take more than 1 milliseconds?
>
> Granted, if the system is too slow to process the system clock, it's not
> useful.
>
> But that's my point: instead of just randomly disabling functionality
> until the system gets 'fast enough' to process timer IRQs, how about
> dynamically and adaptively extending the required minimal timeout between
> hr-timer IRQs?
>
> That will in essence self-tune the system into some minimally working
> state - instead of locking it up. Note that such a method would work with
> any source of timer IRQ slowness - not just tracing.
>
> ( And maybe the lockup is somehow hrtimer IRQ induced. If a 1khz clock
>  still works for Frederic then that angle has to be investigated. )
>
>        Ingo
>

Hmm. No, I just tested by setting 1000 HZ and disabled NOHZ, the
system doesn't lockup unless I turn on the
function graph tracer.
That's strange.
Perhaps I should look at the time elapsed between two retries on
hrtimer_interrupts because it takes almost more than one
second to disable tracing after I enabled it (that really too much for
10 interations inside hrtimer_interrupt, even with
the time of tracing process).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 21:07       ` Frédéric Weisbecker
@ 2008-12-18 21:16         ` Ingo Molnar
  2008-12-18 21:36           ` Frédéric Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2008-12-18 21:16 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel


* Frédéric Weisbecker <fweisbec@gmail.com> wrote:

> Hmm. No, I just tested by setting 1000 HZ and disabled NOHZ, the system 
> doesn't lockup unless I turn on the function graph tracer.

but that's what we are interested in: the system locks up with HZ=1000 and 
with the tracer on - right?

Which means that it's not some hrtimer problem, but simply the traced 
timer tick takes more than 1 millisecond to execute under this 
virtualization.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 21:16         ` Ingo Molnar
@ 2008-12-18 21:36           ` Frédéric Weisbecker
  2008-12-18 21:44             ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Frédéric Weisbecker @ 2008-12-18 21:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel

2008/12/18 Ingo Molnar <mingo@elte.hu>:
>
> * Frédéric Weisbecker <fweisbec@gmail.com> wrote:
>
>> Hmm. No, I just tested by setting 1000 HZ and disabled NOHZ, the system
>> doesn't lockup unless I turn on the function graph tracer.
>
> but that's what we are interested in: the system locks up with HZ=1000 and
> with the tracer on - right?

Yes.

> Which means that it's not some hrtimer problem, but simply the traced
> timer tick takes more than 1 millisecond to execute under this
> virtualization.
>
>        Ingo
>

Oh ok I see. Sorry I'm a bit slow today...
So the solution would be to adapt dynamically the timeout between
hrtimer irq. But I don't know that much hrtimer
to implement such a feature...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 21:36           ` Frédéric Weisbecker
@ 2008-12-18 21:44             ` Ingo Molnar
  2008-12-18 22:00               ` Frédéric Weisbecker
  2008-12-21 10:00               ` Ingo Molnar
  0 siblings, 2 replies; 16+ messages in thread
From: Ingo Molnar @ 2008-12-18 21:44 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel


* Frédéric Weisbecker <fweisbec@gmail.com> wrote:

> > Which means that it's not some hrtimer problem, but simply the traced 
> > timer tick takes more than 1 millisecond to execute under this 
> > virtualization.
> >
> >        Ingo
> >
> 
> Oh ok I see. Sorry I'm a bit slow today... So the solution would be to 
> adapt dynamically the timeout between hrtimer irq. But I don't know that 
> much hrtimer to implement such a feature...

just if you want this lockup to go away. I think you did the hardest bit 
already: to detect the situation reliably, without false positives. Now 
the 'action' needs to change: instead of 'turning off ftrace' (which is 
brutal - ftrace was just the last drop of water that pushed the system 
over the edge), we can instead do 'double the minimum clockevent delta 
threshold'.

there's already such code in kernel/time/tick-oneshot.c:

                /*
                 * We tried 2 times to program the device with the given
                 * min_delta_ns. If that's not working then we double it
                 * and emit a warning.
                 */
                if (++i > 2) {
                        /* Increase the min. delta and try again */
                        if (!dev->min_delta_ns)
                                dev->min_delta_ns = 5000;
                        else
                                dev->min_delta_ns += dev->min_delta_ns >> 1;

what would be needed is to simply double ->min_delta_ns on every such 
situation you detect? Once you do that, it takes effect on the next tick 
automatically.

Or something like that. In theory. :-)

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 21:44             ` Ingo Molnar
@ 2008-12-18 22:00               ` Frédéric Weisbecker
  2008-12-21 10:00               ` Ingo Molnar
  1 sibling, 0 replies; 16+ messages in thread
From: Frédéric Weisbecker @ 2008-12-18 22:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel

2008/12/18 Ingo Molnar <mingo@elte.hu>:
>
> * Frédéric Weisbecker <fweisbec@gmail.com> wrote:
>
>> > Which means that it's not some hrtimer problem, but simply the traced
>> > timer tick takes more than 1 millisecond to execute under this
>> > virtualization.
>> >
>> >        Ingo
>> >
>>
>> Oh ok I see. Sorry I'm a bit slow today... So the solution would be to
>> adapt dynamically the timeout between hrtimer irq. But I don't know that
>> much hrtimer to implement such a feature...
>
> just if you want this lockup to go away. I think you did the hardest bit
> already: to detect the situation reliably, without false positives. Now
> the 'action' needs to change: instead of 'turning off ftrace' (which is
> brutal - ftrace was just the last drop of water that pushed the system
> over the edge), we can instead do 'double the minimum clockevent delta
> threshold'.
>
> there's already such code in kernel/time/tick-oneshot.c:
>
>                /*
>                 * We tried 2 times to program the device with the given
>                 * min_delta_ns. If that's not working then we double it
>                 * and emit a warning.
>                 */
>                if (++i > 2) {
>                        /* Increase the min. delta and try again */
>                        if (!dev->min_delta_ns)
>                                dev->min_delta_ns = 5000;
>                        else
>                                dev->min_delta_ns += dev->min_delta_ns >> 1;
>
> what would be needed is to simply double ->min_delta_ns on every such
> situation you detect? Once you do that, it takes effect on the next tick
> automatically.
>
> Or something like that. In theory. :-)
>
>        Ingo
>

Ok. I understand. Thanks for these pointers, that seems more easy than
I thought.
I will send a patch in a few days....

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-18 21:44             ` Ingo Molnar
  2008-12-18 22:00               ` Frédéric Weisbecker
@ 2008-12-21 10:00               ` Ingo Molnar
  2008-12-21 10:12                 ` Ingo Molnar
  1 sibling, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2008-12-21 10:00 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel


btw., a graph-tracer buglet i noticed:

  3)               |  perf_counter_task_tick() {
  3)               |    perf_counter_task_sched_out() {
  3)               |      /* c:ffff88049b184000, u:0: 0000000020ca1ab0 + 0000000000000000 = 0000000020ca1ab0
 */
  3)               |      /* sched-out counter ffff88049b184000, state: 0
 */
  3)   5.024 us    |    }
  3)               |    perf_counter_task_sched_in() {
  3)               |      /* c:ffff88049b184000, u:0: 0000000020ca1ab0 + 0000000000000000 = 0000000020ca1ab0
 */
  3)               |      /* sched-in counter ffff88049b184000, state: 1
 */
  3)   2.605 us    |    }
  3)   9.714 us    |  }
  3)               |  perf_counter_task_tick() {

the bug is that the '*/' comment closing text is printed in the next line 
- that's not a linewrap, it happened like this in the trace. 

The reason is that the ftrace_printk did:

	ftrace_printk("c:%p, u:%d: %016Lx + %016Lx = %016Lx\n",

the closing \n is pretty natural - especially for kernel hackers who might 
just turn printk()s into ftrace_printk(), to embedd printouts in function 
traces. So it would be nice if ftrace skipped over \n's when printing them 
out as embedded C comments.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-21 10:00               ` Ingo Molnar
@ 2008-12-21 10:12                 ` Ingo Molnar
  2008-12-21 12:21                   ` Frédéric Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2008-12-21 10:12 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> btw., a graph-tracer buglet i noticed:

another thing. I turned off all the options that come ahead of the 'meat' 
of the trace data:

# cat /debug/tracing/trace_options 
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock 
nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate 
nouserstacktrace nosym-userobj printk-msg-only nofuncgraph-overrun 
nofuncgraph-cpu nofuncgraph-overhead nofuncgraph-proc 

but i still got the 'duration' field printed:

# tracer: function_graph
#
# DURATION            FUNCTION CALLS
#     |               |   |   |   |

0.865 us    |    }
            |    alloc_page_vma() {
0.295 us    |      get_vma_policy();
0.268 us    |      policy_zonelist();
            |      __alloc_pages_internal() {
0.257 us    |        __might_sleep();
            |        get_page_from_freelist() {


i'd have expected to only see the flow of execution as it happens.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-21 10:12                 ` Ingo Molnar
@ 2008-12-21 12:21                   ` Frédéric Weisbecker
  2008-12-22 10:46                     ` Frédéric Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Frédéric Weisbecker @ 2008-12-21 12:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel

2008/12/21 Ingo Molnar <mingo@elte.hu>:
>
> btw., a graph-tracer buglet i noticed:
>
>  3)               |  perf_counter_task_tick() {
>  3)               |    perf_counter_task_sched_out() {
>  3)               |      /* c:ffff88049b184000, u:0: 0000000020ca1ab0 + 0000000000000000 = 0000000020ca1ab0
>  */
>  3)               |      /* sched-out counter ffff88049b184000, state: 0
>  */
>  3)   5.024 us    |    }
>  3)               |    perf_counter_task_sched_in() {
>  3)               |      /* c:ffff88049b184000, u:0: 0000000020ca1ab0 + 0000000000000000 = 0000000020ca1ab0
>  */
>  3)               |      /* sched-in counter ffff88049b184000, state: 1
>  */
>  3)   2.605 us    |    }
>  3)   9.714 us    |  }
>  3)               |  perf_counter_task_tick() {
>
> the bug is that the '*/' comment closing text is printed in the next line
> - that's not a linewrap, it happened like this in the trace.
>
> The reason is that the ftrace_printk did:
>
>        ftrace_printk("c:%p, u:%d: %016Lx + %016Lx = %016Lx\n",
>
> the closing \n is pretty natural - especially for kernel hackers who might
> just turn printk()s into ftrace_printk(), to embedd printouts in function
> traces. So it would be nice if ftrace skipped over \n's when printing them
> out as embedded C comments.
>
>        Ingo
>


Yes I was afraid of the \n from the user for these comments. You're
right, I will skip the newlines
from the user.


2008/12/21 Ingo Molnar <mingo@elte.hu>:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> btw., a graph-tracer buglet i noticed:
>
> another thing. I turned off all the options that come ahead of the 'meat'
> of the trace data:
>
> # cat /debug/tracing/trace_options
> print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock
> nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate
> nouserstacktrace nosym-userobj printk-msg-only nofuncgraph-overrun
> nofuncgraph-cpu nofuncgraph-overhead nofuncgraph-proc
>
> but i still got the 'duration' field printed:
>
> # tracer: function_graph
> #
> # DURATION            FUNCTION CALLS
> #     |               |   |   |   |
>
> 0.865 us    |    }
>            |    alloc_page_vma() {
> 0.295 us    |      get_vma_policy();
> 0.268 us    |      policy_zonelist();
>            |      __alloc_pages_internal() {
> 0.257 us    |        __might_sleep();
>            |        get_page_from_freelist() {
>
>
> i'd have expected to only see the flow of execution as it happens.
>
>        Ingo
>


Strange. I will test that.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop
  2008-12-21 12:21                   ` Frédéric Weisbecker
@ 2008-12-22 10:46                     ` Frédéric Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frédéric Weisbecker @ 2008-12-22 10:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Andrew Morton, Linux Kernel

2008/12/21 Frédéric Weisbecker <fweisbec@gmail.com>:
> 2008/12/21 Ingo Molnar <mingo@elte.hu>:
>>
>> * Ingo Molnar <mingo@elte.hu> wrote:
>>
>>> btw., a graph-tracer buglet i noticed:
>>
>> another thing. I turned off all the options that come ahead of the 'meat'
>> of the trace data:
>>
>> # cat /debug/tracing/trace_options
>> print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock
>> nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate
>> nouserstacktrace nosym-userobj printk-msg-only nofuncgraph-overrun
>> nofuncgraph-cpu nofuncgraph-overhead nofuncgraph-proc
>>
>> but i still got the 'duration' field printed:
>>
>> # tracer: function_graph
>> #
>> # DURATION            FUNCTION CALLS
>> #     |               |   |   |   |
>>
>> 0.865 us    |    }
>>            |    alloc_page_vma() {
>> 0.295 us    |      get_vma_policy();
>> 0.268 us    |      policy_zonelist();
>>            |      __alloc_pages_internal() {
>> 0.257 us    |        __might_sleep();
>>            |        get_page_from_freelist() {
>>
>>
>> i'd have expected to only see the flow of execution as it happens.
>>
>>        Ingo
>>
>
>
> Strange. I will test that.
>

Did I say it was strange? No it isn't, there is no option to turn off
the duration :-)
I will add one.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-12-22 10:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-18  1:09 [PATCH v2] tracing/function-graph-tracer: prevent from hrtimer interrupt infinite loop Frederic Weisbecker
2008-12-18 10:34 ` Ingo Molnar
2008-12-18 10:48   ` Thomas Gleixner
2008-12-18 10:56     ` Frédéric Weisbecker
2008-12-18 11:22     ` Ingo Molnar
2008-12-18 21:07       ` Frédéric Weisbecker
2008-12-18 21:16         ` Ingo Molnar
2008-12-18 21:36           ` Frédéric Weisbecker
2008-12-18 21:44             ` Ingo Molnar
2008-12-18 22:00               ` Frédéric Weisbecker
2008-12-21 10:00               ` Ingo Molnar
2008-12-21 10:12                 ` Ingo Molnar
2008-12-21 12:21                   ` Frédéric Weisbecker
2008-12-22 10:46                     ` Frédéric Weisbecker
2008-12-18 10:51   ` Frédéric Weisbecker
2008-12-18 11:16     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox