linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Dave Jones <davej@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>, Jan Kara <jack@suse.cz>,
	Andy Isaacson <adi@hexapodia.org>,
	linux-kernel@vger.kernel.org, linux-mm@vger.kernel.org,
	kernel-team@fedoraproject.org
Subject: Re: long sleep_on_page delays writing to slow storage
Date: Fri, 18 Nov 2011 11:11:40 +0000	[thread overview]
Message-ID: <20111118111140.GA20840@suse.de> (raw)
In-Reply-To: <20111117194719.GA23213@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

On Thu, Nov 17, 2011 at 02:47:20PM -0500, Dave Jones wrote:
> On Tue, Nov 15, 2011 at 10:13:13AM +0000, Mel Gorman wrote:
>  
>  > If they are still experiencing major stalls, I have an experimental
>  > script that may be able to capture stack traces of processes stalled
>  > for more than 1 second. I've had some success with it locally so
>  > maybe they could try it out to identify if it's THP or something else.
>  
> I'm not sure if it's the same problem, but I'd be interested in trying
> that script.
> 

Monitor script is attached as watch-dstate.pl. Run it as

watch-dstate.pl -o logfile

I'm also attaching a post-processing script stap-dstate-frequency.

cat logfile | stap-dstate-frequency

will report on unique stack traces, what got stuck in them and
for how long. Unfortunately, this does require a working systemtap
installation because it had to work on systems without ftrace. Usually
systemtap is a case of installing debugging symbols and its package
but milage varies.

I ran this for a few days on my own desktop but found that the worst
stalls for firefox and evolution were in futex_wait with the second
worst in

[<ffffffffa018e3c5>] ext4_sync_file+0x225/0x290 [ext4]
[<ffffffff81178250>] do_fsync+0x50/0x80
[<ffffffff8117852e>] sys_fdatasync+0xe/0x20
[<ffffffff81448592>] system_call_fastpath+0x16/0x1b

The stall timing is approximate at best. If you find the stall figures
are way too high or unrealistic, try running with --accurate-stall. The
stall figures will be much more accurate but depending on your
kernel version, the stack traces may be one line long ending with
kretprobe_trampoline.

> When I build a kernel on my laptop, when it gets to the final link stage,
> and there's a ton of IO, my entire X session wedges for a few seconds.
> This may be unrelated, because this is on an SSD, which shouldn't suffer
> from the slow IO of the USB devices mentioned in this thread.
> 

I have a vague suspicion that there are some interactivity issues
around SSDs but I don't know why that is. I'm basing this on some
complaints of audio skipping with heavy kernel compiles on machines
very similar to my own other than mine uses rotary storage. It's on
the Christmas list to by myself a SSD to take a closer look.

> (This is even with that patch applied btw, perhaps adding further fuel to
> the idea that it's unrelated).
> 

I suspect it's not compaction related in that case but the script may be
able to tell for sure. If it does not catch anything, alter this line to
have a smaller threshold

global stall_threshold = 1000

-- 
Mel Gorman
SUSE Labs

[-- Attachment #2: watch-dstate.pl --]
[-- Type: application/x-perl, Size: 10807 bytes --]

[-- Attachment #3: stap-dstate-frequency --]
[-- Type: text/plain, Size: 2432 bytes --]

#!/usr/bin/perl
# This script reads the output from the dstate monitor and reports how
# many unique stack traces there were and what the stall times were.
# The objective is to identify the worst sources of stalls.
#
# Copyright Mel Gorman 2011
use strict;

my $line;
my %unique_event_counts;
my %unique_event_stall;
my %unique_event_stall_details;
my $total_stalled;

my ($process, $stalltime, $function, $event);
my ($stall_details, $trace, $first_trace, $reading_trace, $skip);
while ($line = <>) {
	# Watch for the beginning of a new event
	if ($line =~ /^time ([0-9]*): ([0-9]*) \((.*)\) Stalled: ([-0-9]*) ms: (.*)/) {

		# Skip uninteresting traces
		if (!$skip) {
			# Record the last event
			$unique_event_counts{$trace}++;
			$unique_event_stall_details{$trace} .= $event;
			if ($stalltime >= 0) {
				$unique_event_stall{$trace} += $stalltime;
				$total_stalled += $stalltime;
			}
		}

		# Start the next event
		$event = sprintf "%-20s %-20s %6d ms\n", $3, $5, $4;
		$reading_trace = 0;
		$stalltime = $4;

		if ($stalltime == 0) {
			print "DEBUG $line";
		}

		$first_trace = "";
		$trace = "";
	}

	# If we have reached a trace, blindly read it
	if ($reading_trace) {
		# Ignore traces that look like they are from user space
		if ($line =~ /^\[<0/) {
			$reading_trace = 0;
			next;
		}
		$trace .= $line;
		if ($first_trace eq "") {
			$first_trace = $line;
			$skip = 1;

			# Skip uninteresting traces
			if ($first_trace !~ / do_poll\+/ &&
				$first_trace !~ / kthread\+/ &&
				$first_trace !~ / khugepaged\+/ &&
				$first_trace !~ / sys_epoll_wait\+/ &&
				$first_trace !~ / kswapd\+/) {
				$skip = 0;
			}
		}
		next;
	}

	if ($line =~ /^\[<f/) {
		$reading_trace = 1;
		next;
	}
}

print "Overall stalled time: $total_stalled ms\n\n";
foreach my $trace (sort {$unique_event_stall{$b} <=> $unique_event_stall{$a}} keys %unique_event_stall) {
	#printf "Event $short_event us count %4d\n", $unique_event_counts{$event};
	#print $unique_event_stall_details{$event};
	printf "Time stalled in this event: %8d ms\n", $unique_event_stall{$trace};
	printf "Event count:                %8d\n", $unique_event_counts{$trace};
	print $unique_event_stall_details{$trace};
	print "$trace\n";
}

#print "\nDetails\n=======\n";
#foreach my $event (sort {$unique_event_stall{$b} <=> $unique_event_stall{$a}} keys %unique_event_stall) {
#	print "Event $event us count $unique_event_counts{$event}\n";
#	print "\n";
#}

  parent reply	other threads:[~2011-11-18 11:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-07  4:59 long sleep_on_page delays writing to slow storage Andy Isaacson
2011-11-09 17:00 ` Jan Kara
2011-11-09 17:52   ` Mel Gorman
2011-11-09 18:06     ` Andrea Arcangeli
2011-11-10  0:53       ` Mel Gorman
2011-11-10  1:54         ` Andrea Arcangeli
2011-11-10  9:34       ` Johannes Weiner
2011-11-14 18:47         ` Dave Jones
2011-11-15 10:13           ` Mel Gorman
2011-11-17 19:47             ` Dave Jones
2011-11-17 22:53               ` Andrea Arcangeli
2011-11-18 11:11               ` Mel Gorman [this message]
2011-11-18 12:19                 ` Josh Boyer
2011-11-21  9:18               ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111118111140.GA20840@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=adi@hexapodia.org \
    --cc=davej@redhat.com \
    --cc=jack@suse.cz \
    --cc=jweiner@redhat.com \
    --cc=kernel-team@fedoraproject.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).