public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Kevin Shanahan <kmshanah@ucwb.org.au>
Cc: Avi Kivity <avi@redhat.com>, "Rafael J. Wysocki" <rjw@sisk.pl>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Galbraith <efault@gmx.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
Date: Tue, 24 Mar 2009 12:44:12 +0100	[thread overview]
Message-ID: <20090324114409.GB6058@nowhere> (raw)
In-Reply-To: <1237611639.4933.4.camel@kulgan.wumi.org.au>

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.
> 
> Regards,
> Kevin.
> 
> 

Sorry, I've been late to answer.
As I explained in my previous mail, you trace is only
a snapshot that happened in 10 msec.

I experimented different sizes for the ring buffer but even
a 1 second trace require 20 Mo of memory. And a so huge trace
would be impractical.

I think we should keep the trace filters we had previously.
If you don't minde, could you please retest against latest -tip
the following updated patch? Iadded the filters, fixed the python
subshell and also flushed the buffer more nicely according to
a recent feature in -tip:

echo > trace 

instead of switching to nop.
You will need to pull latest -tip again.

Thanks a lot Kevin!


#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr="hermes-old"

# Set here a threshold of latency in sec
thres="5000"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

# Set the filter for functions to trace
echo ''         > set_ftrace_filter  # clear filter functions
echo '*sched*' >> set_ftrace_filter 
echo '*wake*'  >> set_ftrace_filter
echo '*kvm*'   >> set_ftrace_filter

# Reset the function_graph tracer
echo function_graph > $prefix/current_tracer

while [ "$found" != "True" ]
do
        # Flush the previous buffer
        echo trace > $prefix/trace

        echo 1 > $prefix/tracing_enabled
        lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
        echo 0 > $prefix/tracing_enabled

	echo $lat
	found=$(python -c "print float(str($lat).strip())")
        sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



  parent reply	other threads:[~2009-03-24 11:44 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-14 19:11 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-14 19:12 ` [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12411] 2.6.28: BUG in r8169 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12208] uml is very slow on 2.6.28 host Rafael J. Wysocki
2009-03-21 14:44   ` ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host) Michael Riepe
2009-03-21 15:22     ` Ingo Molnar
2009-03-21 17:02       ` ptrace performance Michael Riepe
2009-03-14 19:20 ` [Bug #12337] ~100 extra wakeups reported by powertop Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12209] oldish top core dumps (in its meminfo() function) Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12426] TMDC Joystick no longer works in kernel 2.6.28 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-15  9:03   ` Kevin Shanahan
2009-03-15  9:18     ` Avi Kivity
2009-03-15  9:48       ` Ingo Molnar
2009-03-15  9:56         ` Avi Kivity
2009-03-15 10:03           ` Ingo Molnar
2009-03-15 10:13             ` Avi Kivity
2009-03-16  9:49     ` Avi Kivity
2009-03-16 12:46       ` Kevin Shanahan
2009-03-16 20:07         ` Frederic Weisbecker
2009-03-16 22:55           ` Kevin Shanahan
2009-03-18  0:20             ` Frederic Weisbecker
2009-03-18  1:16               ` Kevin Shanahan
2009-03-18  2:24                 ` Frederic Weisbecker
2009-03-18 21:24                 ` Kevin Shanahan
2009-03-21  5:00                   ` Kevin Shanahan
2009-03-21 14:08                     ` Frederic Weisbecker
2009-03-24 11:44                     ` Frederic Weisbecker [this message]
2009-03-24 11:47                       ` Frederic Weisbecker
2009-03-25 23:40                       ` Kevin Shanahan
2009-03-25 23:48                         ` Frederic Weisbecker
2009-03-26 20:22                       ` Kevin Shanahan
2009-03-14 19:20 ` [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12612] hard lockup when interrupting cdda2wav Rafael J. Wysocki
2009-03-17  0:53   ` FUJITA Tomonori
2009-03-17 14:52     ` James Bottomley
2009-03-14 19:20 ` [Bug #12619] Regression 2.6.28 and last - boot failed Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12690] DPMS (LCD powersave, poweroff) don't work Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12798] No wake up after suspend Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12835] Regression in backlight detection Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12818] iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!) Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12868] iproute2 and regressing "ipv6: convert tunnels to net_device_ops" Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2009-03-21 17:01 2.6.29-rc8-git5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-21 17:07 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-21 19:50   ` Ingo Molnar
2009-03-03 19:34 2.6.29-rc6-git7: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-03 19:41 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-04  3:08   ` Kevin Shanahan
2009-03-08 10:04     ` Avi Kivity
2009-02-23 22:00 2.6.29-rc6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-23 22:03 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-24  0:59   ` Kevin Shanahan
2009-02-24  1:37     ` Rafael J. Wysocki
2009-02-24 12:09     ` Avi Kivity
2009-02-24 22:11       ` Kevin Shanahan
2009-02-14 20:48 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-04 10:55 2.6.29-rc3-git6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-04 10:58 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-05 19:35   ` Kevin Shanahan
2009-02-05 22:37     ` Rafael J. Wysocki
2009-01-19 21:41 2.6.29-rc2-git1: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-01-19 21:45 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-01-20  0:12   ` Kevin Shanahan
2009-01-20 11:35     ` Ingo Molnar
2009-01-20 12:37       ` Avi Kivity
2009-01-20 12:42       ` Kevin Shanahan
2009-01-20 12:56         ` Ingo Molnar
2009-01-20 13:07           ` Ingo Molnar
2009-01-20 14:59             ` Steven Rostedt
2009-01-20 15:04               ` Ingo Molnar
2009-01-20 17:53                 ` Steven Rostedt
2009-01-20 18:39                   ` Ingo Molnar
2009-01-20 17:47               ` Avi Kivity
2009-01-21 14:25                 ` Kevin Shanahan
2009-01-21 14:34                   ` Avi Kivity
2009-01-21 14:51                     ` Kevin Shanahan
2009-01-21 14:59                       ` Avi Kivity
2009-01-21 15:13                         ` Steven Rostedt
2009-01-22  1:48                         ` Steven Rostedt
2009-01-21 15:10                     ` Steven Rostedt
2009-01-21 15:18                     ` Ingo Molnar
2009-01-22 19:57                       ` Kevin Shanahan
2009-01-22 20:31                         ` Ingo Molnar
2009-01-26  9:55                       ` Kevin Shanahan
2009-01-26 11:35                         ` Peter Zijlstra
2009-01-26 15:00                           ` Ingo Molnar
2009-01-20 14:23           ` Kevin Shanahan
2009-01-20 14:25             ` Ingo Molnar
2009-01-20 15:51               ` Kevin Shanahan
2009-01-20 16:06                 ` Ingo Molnar
2009-01-20 16:19                   ` Peter Zijlstra
2009-01-20 14:46             ` Frédéric Weisbecker
2009-01-20 13:04         ` Avi Kivity
2009-01-20 17:54           ` Kevin Shanahan
2009-01-20 18:42             ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090324114409.GB6058@nowhere \
    --to=fweisbec@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=efault@gmx.de \
    --cc=kernel-testers@vger.kernel.org \
    --cc=kmshanah@ucwb.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox