public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Kevin Shanahan <kmshanah@ucwb.org.au>
Cc: Avi Kivity <avi@redhat.com>, "Rafael J. Wysocki" <rjw@sisk.pl>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Galbraith <efault@gmx.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
Date: Tue, 24 Mar 2009 12:47:15 +0100	[thread overview]
Message-ID: <20090324114715.GC6058@nowhere> (raw)
In-Reply-To: <20090324114409.GB6058@nowhere>

On Tue, Mar 24, 2009 at 12:44:12PM +0100, Frederic Weisbecker wrote:
> On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> > On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > > Ok, I've made a small script based on yours which could do this job.
> > > > > You will just have to set yourself a threshold of latency
> > > > > that you consider as buggy. I don't remember the latency you observed.
> > > > > About 5 secs right?
> > > > > 
> > > > > It's the "thres" variable in the script.
> > > > > 
> > > > > The resulting trace should be a mixup of the function graph traces
> > > > > and scheduler events which look like this:
> > > > > 
> > > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > > 
> > > > > + is a wakeup and ==> is a context switch.
> > > > > 
> > > > > The script will loop trying some pings and will only keep the trace that matches
> > > > > the latency threshold you defined.
> > > > > 
> > > > > Tell if the following script work for you.
> > > 
> > > ...
> > > 
> > > > Either way, I'll try to get some results in my maintenance window
> > > > tonight.
> > > 
> > > Testing did not go so well. I compiled and booted
> > > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > > started shutting down VMs to try and get it under control, but before I
> > > got back to tracing again the machine disappeared off the network -
> > > unresponsive to ping.
> > > 
> > > When I got in this morning, there was nothing on the console, nothing in
> > > the logs to show what went wrong. I will try again, but my next chance
> > > will probably be Saturday. Stay tuned.
> > 
> > Okay, new set of traces have been uploaded to:
> > 
> >   http://disenchant.net/tmp/bug-12465/trace-3/
> > 
> > These were done on the latest tip, which I pulled down this morning:
> > 2.6.29-rc8-tip-02744-gd9937cb.
> > 
> > The system load was very high again when I first tried to trace with
> > sevarl guests running, so I ended up only having the one guest running
> > and thankfully the bug was still reproducable that way.
> > 
> > Fingers crossed this set of traces is able to tell us something.
> > 
> > Regards,
> > Kevin.
> > 
> > 
> 
> Sorry, I've been late to answer.
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.
> 
> Thanks a lot Kevin!


Ah you will also need to increase the size of your buffer.
See below:
 
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr="hermes-old"
> 
> # Set here a threshold of latency in sec
> thres="5000"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> # Set the filter for functions to trace
> echo ''         > set_ftrace_filter  # clear filter functions
> echo '*sched*' >> set_ftrace_filter 
> echo '*wake*'  >> set_ftrace_filter
> echo '*kvm*'   >> set_ftrace_filter
> 
> # Reset the function_graph tracer
> echo function_graph > $prefix/current_tracer

Put a

echo 20000 > $prefix/buffer_size_kb

So that we will have enough space (hopefully).

Thanks!

> 
> while [ "$found" != "True" ]
> do
>         # Flush the previous buffer
>         echo trace > $prefix/trace
> 
>         echo 1 > $prefix/tracing_enabled
>         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
>         echo 0 > $prefix/tracing_enabled
> 
> 	echo $lat
> 	found=$(python -c "print float(str($lat).strip())")
>         sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"
> 
> 


  reply	other threads:[~2009-03-24 11:47 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-14 19:11 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-14 19:12 ` [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12337] ~100 extra wakeups reported by powertop Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12208] uml is very slow on 2.6.28 host Rafael J. Wysocki
2009-03-21 14:44   ` ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host) Michael Riepe
2009-03-21 15:22     ` Ingo Molnar
2009-03-21 17:02       ` ptrace performance Michael Riepe
2009-03-14 19:20 ` [Bug #12411] 2.6.28: BUG in r8169 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12209] oldish top core dumps (in its meminfo() function) Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-15  9:03   ` Kevin Shanahan
2009-03-15  9:18     ` Avi Kivity
2009-03-15  9:48       ` Ingo Molnar
2009-03-15  9:56         ` Avi Kivity
2009-03-15 10:03           ` Ingo Molnar
2009-03-15 10:13             ` Avi Kivity
2009-03-16  9:49     ` Avi Kivity
2009-03-16 12:46       ` Kevin Shanahan
2009-03-16 20:07         ` Frederic Weisbecker
2009-03-16 22:55           ` Kevin Shanahan
2009-03-18  0:20             ` Frederic Weisbecker
2009-03-18  1:16               ` Kevin Shanahan
2009-03-18  2:24                 ` Frederic Weisbecker
2009-03-18 21:24                 ` Kevin Shanahan
2009-03-21  5:00                   ` Kevin Shanahan
2009-03-21 14:08                     ` Frederic Weisbecker
2009-03-24 11:44                     ` Frederic Weisbecker
2009-03-24 11:47                       ` Frederic Weisbecker [this message]
2009-03-25 23:40                       ` Kevin Shanahan
2009-03-25 23:48                         ` Frederic Weisbecker
2009-03-26 20:22                       ` Kevin Shanahan
2009-03-14 19:20 ` [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12426] TMDC Joystick no longer works in kernel 2.6.28 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12619] Regression 2.6.28 and last - boot failed Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12612] hard lockup when interrupting cdda2wav Rafael J. Wysocki
2009-03-17  0:53   ` FUJITA Tomonori
2009-03-17 14:52     ` James Bottomley
2009-03-14 19:20 ` [Bug #12690] DPMS (LCD powersave, poweroff) don't work Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12798] No wake up after suspend Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12818] iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!) Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12835] Regression in backlight detection Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12868] iproute2 and regressing "ipv6: convert tunnels to net_device_ops" Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2009-03-21 17:01 2.6.29-rc8-git5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-21 17:07 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-21 19:50   ` Ingo Molnar
2009-03-03 19:34 2.6.29-rc6-git7: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-03 19:41 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-04  3:08   ` Kevin Shanahan
2009-03-08 10:04     ` Avi Kivity
2009-02-23 22:00 2.6.29-rc6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-23 22:03 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-24  0:59   ` Kevin Shanahan
2009-02-24  1:37     ` Rafael J. Wysocki
2009-02-24 12:09     ` Avi Kivity
2009-02-24 22:11       ` Kevin Shanahan
2009-02-14 20:48 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-04 10:55 2.6.29-rc3-git6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-04 10:58 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-05 19:35   ` Kevin Shanahan
2009-02-05 22:37     ` Rafael J. Wysocki
2009-01-19 21:41 2.6.29-rc2-git1: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-01-19 21:45 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-01-20  0:12   ` Kevin Shanahan
2009-01-20 11:35     ` Ingo Molnar
2009-01-20 12:37       ` Avi Kivity
2009-01-20 12:42       ` Kevin Shanahan
2009-01-20 12:56         ` Ingo Molnar
2009-01-20 13:07           ` Ingo Molnar
2009-01-20 14:59             ` Steven Rostedt
2009-01-20 15:04               ` Ingo Molnar
2009-01-20 17:53                 ` Steven Rostedt
2009-01-20 18:39                   ` Ingo Molnar
2009-01-20 17:47               ` Avi Kivity
2009-01-21 14:25                 ` Kevin Shanahan
2009-01-21 14:34                   ` Avi Kivity
2009-01-21 14:51                     ` Kevin Shanahan
2009-01-21 14:59                       ` Avi Kivity
2009-01-21 15:13                         ` Steven Rostedt
2009-01-22  1:48                         ` Steven Rostedt
2009-01-21 15:10                     ` Steven Rostedt
2009-01-21 15:18                     ` Ingo Molnar
2009-01-22 19:57                       ` Kevin Shanahan
2009-01-22 20:31                         ` Ingo Molnar
2009-01-26  9:55                       ` Kevin Shanahan
2009-01-26 11:35                         ` Peter Zijlstra
2009-01-26 15:00                           ` Ingo Molnar
2009-01-20 14:23           ` Kevin Shanahan
2009-01-20 14:25             ` Ingo Molnar
2009-01-20 15:51               ` Kevin Shanahan
2009-01-20 16:06                 ` Ingo Molnar
2009-01-20 16:19                   ` Peter Zijlstra
2009-01-20 14:46             ` Frédéric Weisbecker
2009-01-20 13:04         ` Avi Kivity
2009-01-20 17:54           ` Kevin Shanahan
2009-01-20 18:42             ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090324114715.GC6058@nowhere \
    --to=fweisbec@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=efault@gmx.de \
    --cc=kernel-testers@vger.kernel.org \
    --cc=kmshanah@ucwb.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox