All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Dave Airlie <airlied@redhat.com>
Cc: mikko.cal@gmail.com, thellstrom@vmware.com, devnull@plzk.org,
	bugzilla-daemon@bugzilla.kernel.org,
	dri-devel@lists.freedesktop.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Bug 16148] page allocation failure. order:1, mode:0x50d0
Date: Wed, 16 Jun 2010 14:54:03 +0100	[thread overview]
Message-ID: <20100616135403.GA3034@csn.ul.ie> (raw)
In-Reply-To: <1276642629.18293.33.camel@clockmaker-el6>

Hi Dave,

On Wed, Jun 16, 2010 at 08:57:09AM +1000, Dave Airlie wrote:
> > (switching back to email, actually)
> > 
> > On Sun, 13 Jun 2010 13:01:57 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=16148
> > >
> > >
> > > Mikko C. <mikko.cal@gmail.com> changed:
> > >
> > >            What    |Removed                     |Added
> > > ----------------------------------------------------------------------------
> > >                  CC|                            |mikko.cal@gmail.com
> > >
> > >
> > >
> > >
> > > --- Comment #8 from Mikko C. <mikko.cal@gmail.com>  2010-06-13 13:01:53 ---
> > > I have been getting this with 2.6.35-rc2 and rc3.
> > > Could it be the same problem?
> > >
> > >
> > > X: page allocation failure. order:0, mode:0x4
> > > Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1
> > > Call Trace:
> > >  [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680
> > >  [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680
> > >  [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm]
> > >  [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm]
> > >  [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm]
> > >  [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm]
> > >  [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm]
> > >  [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm]
> > >  [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm]
> > >  [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20
> > >  [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm]
> > >  [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm]
> > >  [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon]
> > >  [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon]
> > >  [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon]
> > >  [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon]
> > >  [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm]
> > >  [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]
> > >  [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100
> > >  [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0
> > >  [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530
> > >  [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0
> > >  [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80
> > >  [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90
> > >  [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b
> > 
> > That's different.  ttm_get_pages() looks pretty busted to me.  It's not
> > using __GFP_WAIT and it's not using __GFP_FS.  It's using a plain
> > GFP_DMA32 so it's using atomic allocations even though it doesn't need
> > to.  IOW, it's shooting itself in the head.
> > 

This is an accurate assessment.

> > Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS
> > and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT.
> > If so, it should very much do so.  If not then the function is
> > misdesigned and should be altered to take a gfp_t argument so the
> > caller can tell ttm_get_pages() which is the strongest allocation mode
> > which it may use.
> > 
> > > [TTM] Unable to allocate page.
> > > radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002)
> > > [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (7827456,
> > > 2, 4096, -12)
> > 
> > This bug actually broke stuff for you.
> > 
> > Something like this:
> > 
> > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a
> > +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> > @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page
> >  	/* No pool for cached pages */
> >  	if (pool == NULL) {
> >  		if (flags & TTM_PAGE_FLAG_DMA32)
> > -			gfp_flags |= GFP_DMA32;
> > +			gfp_flags |= GFP_KERNEL|GFP_DMA32;
> >  		else
> >  			gfp_flags |= GFP_HIGHUSER;
> >  
> > _
> > 
> > although I wonder whether it should be using pool->gfp_flags.
> > 
> > 
> > It's a shame that this code was developed and merged in secret :( Had
> > we known, we could have looked at enhancing mempools to cover the
> > requirement, or at implementing this in some generic fashion rather
> > than hiding it down in drivers/gpu/drm.
> 
> Its been post to lkml at least once or twice over the past few years
> though not as much as it was posted to dri-devel, but that was because
> we had never seen anyone show any interest in it outside of kernel
> hackers. Originally I was going to use the generic allocator stuff ia64
> uses for uncached allocations but it allocates memory ranges not pages
> so it wasn't useful. I also suggested getting a page flag for uncached
> allocator stuff, I was told to go write the code in my own corner and
> prove it was required. So I did, and it was cleaned up and worked on by
> others and I merged it. So can we lay off with the "in secret", the
> original code is nearly 2 years old at this point and just because -mm
> hackers choose to ignore it isn't our fault. Patches welcome.
> 

No comment on this aspect of things.

> So now back to the bug:
> 
> So the page pools are setup with gfp flags, in the normal case, 4 pools,
> one WC GFP_HIGHUSER pages, one UC HIGHUSER pages, one WC GFP_USER|
> GFP_DMA32, one UC GFP_USER|GFP_DMA32, so the pools are all fine, the
> problem here is the same as before we added the pools, which is the
> normal page allocation path, which needs the GFP_USER added instead of
> GFP_KERNEL.
> 
> That said I've noticed a lot more page allocation failure reports in
> 2.6.35-rcX than we've gotten for a long time, in code that hasn't
> changed (the AGP ones the other day for example) has something in the
> core MM regressed (again... didn't this happen back in 2.6.31 or
> something).
> 
> (cc'ing Mel who tracked these down before).
> 

Ok, so there has been very little changed in the page allocator that
might have caused this. The only possibility I can think of is
[6dda9d: page allocator: reduce fragmentation in buddy allocator by
adding buddies that are merging to the tail of the free lists] but I
would have expected the patch to help, not hinder, this situation.

In the last few kernel releases when there has been a spike in high-order
allocation, it has been largely due to an increased source of high-order
callers from somewhere. I am 99% sure the last time I encountered this, I used
the following script to debug it even though it's a bit of a hack. It uses
ftrace to track high-order allocators and reports the number of high-order
allocations (either atomic or normal) and the count of unique backtraces
(to point the finger at bad callers)

Unfortunately, I don't have access to a suitable machine to test this with
right now (my laptop isn't making high-order allocations to confirm yet)
but I'm pretty sure it's the right one!

If you do some fixed set of tests with 2.6.34 and 2.6.35-rc2 with this script
running and compare the reports, it might point the finger at a new source
of frequent high-order allocations that might be leading to more failures.
Do something like

./watch-highorder.pl -o highorder-2.6.34-report.txt
PID=$!
Do your tests
# Two sigints in quick succession to make the monitor exit
kill $PID
kill $PID 

==== CUT HERE ====
#!/usr/bin/perl
# This is a tool that analyses the trace output related to page allocation,
# sums up the number of high-order allocations taking place and who the
# callers are
#
# Example usage: trace-pagealloc-highorder.pl -o output-report.txt
# options
# -o, --output	Where to store the report
#
# Copyright (c) IBM Corporation 2009
# Author: Mel Gorman <mel@csn.ul.ie>
use strict;
use Getopt::Long;

# Tracepoint events
use constant MM_PAGE_ALLOC		=> 1;
use constant EVENT_UNKNOWN		=> 7;

use constant HIGH_NORMAL_HIGHORDER_ALLOC	=> 10;
use constant HIGH_ATOMIC_HIGHORDER_ALLOC	=> 11;

my $opt_output;
my %stats;
my %unique_events;
my $last_updated = 0;

$stats{HIGH_NORMAL_HIGHORDER_ALLOC} = 0;
$stats{HIGH_ATOMIC_HIGHORDER_ALLOC} = 0;

# Catch sigint and exit on request
my $sigint_report = 0;
my $sigint_exit = 0;
my $sigint_pending = 0;
my $sigint_received = 0;
sub sigint_handler {
	my $current_time = time;
	if ($current_time - 2 > $sigint_received) {
		print "SIGINT received, report pending. Hit ctrl-c again to exit\n";
		$sigint_report = 1;
	} else {
		if (!$sigint_exit) {
			print "Second SIGINT received quickly, exiting\n";
		}
		$sigint_exit++;
	}

	if ($sigint_exit > 3) {
		print "Many SIGINTs received, exiting now without report\n";
		exit;
	}

	$sigint_received = $current_time;
	$sigint_pending = 1;
}
$SIG{INT} = "sigint_handler";

# Parse command line options
GetOptions(
	'output=s'    => \$opt_output,
);

# Defaults for dynamically discovered regex's
my $regex_pagealloc_default = 'page=([0-9a-f]*) pfn=([0-9]*) order=([-0-9]*) migratetype=([-0-9]*) gfp_flags=([A-Z_|]*)';

# Dyanically discovered regex
my $regex_pagealloc;

# Static regex used. Specified like this for readability and for use with /o
#                      (process_pid)     (cpus      )   ( time  )   (tpoint    ) (details)
my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)';
my $regex_statname = '[-0-9]*\s\((.*)\).*';
my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*';

sub generate_traceevent_regex {
	my $event = shift;
	my $default = shift;
	my $regex;

	# Read the event format or use the default
	if (!open (FORMAT, "/sys/kernel/debug/tracing/events/$event/format")) {
		$regex = $default;
	} else {
		my $line;
		while (!eof(FORMAT)) {
			$line = <FORMAT>;
			if ($line =~ /^print fmt:\s"(.*)",.*/) {
				$regex = $1;
				$regex =~ s/%p/\([0-9a-f]*\)/g;
				$regex =~ s/%d/\([-0-9]*\)/g;
				$regex =~ s/%lu/\([0-9]*\)/g;
				$regex =~ s/%s/\([A-Z_|]*\)/g;
				$regex =~ s/\(REC->gfp_flags\).*/REC->gfp_flags/;
				$regex =~ s/\",.*//;
			}
		}
	}

	# Verify fields are in the right order
	my $tuple;
	foreach $tuple (split /\s/, $regex) {
		my ($key, $value) = split(/=/, $tuple);
		my $expected = shift;
		if ($key ne $expected) {
			print("WARNING: Format not as expected '$key' != '$expected'");
			$regex =~ s/$key=\((.*)\)/$key=$1/;
		}
	}

	if (defined shift) {
		die("Fewer fields than expected in format");
	}

	return $regex;
}
$regex_pagealloc = generate_traceevent_regex("kmem/mm_page_alloc",
			$regex_pagealloc_default,
			"page", "pfn",
			"order", "migratetype", "gfp_flags");

sub process_events {
	my $traceevent;
	my $process_pid = 0;
	my $cpus;
	my $timestamp;
	my $tracepoint;
	my $details;
	my $statline;
	my $nextline = 1;

	# Read each line of the event log
EVENT_PROCESS:
	while (($traceevent = <TRACING>) && !$sigint_exit) {
SKIP_LINEREAD:

		if ($traceevent eq "") {
			last EVENT_PROCSS;
		}

		if ($traceevent =~ /$regex_traceevent/o) {
			$process_pid = $1;
			$tracepoint = $4;
		} else {
			next;
		}

		# Perl Switch() sucks majorly
		if ($tracepoint eq "mm_page_alloc") {
			my ($page, $order, $gfp_flags, $type);
			my ($atomic);
			my $details = $5;

			if ($details !~ /$regex_pagealloc/o) {
				print "WARNING: Failed to parse mm_page_alloc as expected\n";
				print "$details\n";
				print "$regex_pagealloc\n";
				print "\n";
				next;
			}
			$page = $1;
			$order = $3;
			$gfp_flags = $5;

			# Only concerned with high-order allocs
			if ($order == 0) {
				next;
			}

			$stats{MM_PAGE_ALLOC}++;

			if ($gfp_flags =~ /ATOMIC/) {
				$stats{HIGH_ATOMIC_HIGHORDER_ALLOC}++;
				$type = "atomic";
			} else {
				$stats{HIGH_NORMAL_HIGHORDER_ALLOC}++;
				$type = "normal";
			}

			# Record the stack trace
			$traceevent = <TRACING>;
			if ($traceevent !~ /stack trace/) {
				goto SKIP_LINEREAD;
			}
			my $event = "order=$order $type gfp_flags=$gfp_flags\n";;
			while ($traceevent = <TRACING>) {
				if ($traceevent !~ /^ =>/) {
					$unique_events{$event}++;
					my $current = time;

					if ($current - $last_updated > 60) {
						$last_updated = $current;
						update_report();
					}
					goto SKIP_LINEREAD;
				}
				$event .= $traceevent;
			}
		} else {
			$stats{EVENT_UNKNOWN}++;
		}

		if ($sigint_pending) {
			last EVENT_PROCESS;
		}
	}
}

sub update_report() {
	my $event;
	open (REPORT, ">$opt_output") || die ("Failed to open $opt_output for writing");

	foreach $event (keys %unique_events) {
		print REPORT $unique_events{$event} . " instances $event\n";
	}
	print REPORT "High-order normal allocations: " . $stats{HIGH_NORMAL_HIGHORDER_ALLOC} . "\n";
	print REPORT "High-order atomic allocations: " . $stats{HIGH_ATOMIC_HIGHORDER_ALLOC} . "\n";

	close REPORT;
}

sub print_report() {
	print "\nReport\n";
	open (REPORT, $opt_output) || die ("Failed to open $opt_output for reading");
	while (<REPORT>) {
		print $_;
	}
	close REPORT;
}

# Process events or signals until neither is available
sub signal_loop() {
	my $sigint_processed;
	do {
		$sigint_processed = 0;
		process_events();

		# Handle pending signals if any
		if ($sigint_pending) {
			my $current_time = time;

			if ($sigint_exit) {
				print "Received exit signal\n";
				$sigint_pending = 0;
			}
			if ($sigint_report) {
				if ($current_time >= $sigint_received + 2) {
					update_report();
					print_report();
					$sigint_report = 0;
					$sigint_pending = 0;
					$sigint_processed = 1;
					$sigint_exit = 0;
				}
			}
		}
	} while ($sigint_pending || $sigint_processed);
}

sub set_traceoption($) {
	my $option = shift;

	open(TRACEOPT, ">/sys/kernel/debug/tracing/trace_options") || die("Failed to open trace_options");
	print TRACEOPT $option;
	close TRACEOPT;
}

sub enable_tracing($) {
	my $tracing = shift;

	open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || die("Failed to open tracing event $tracing");
	print TRACING "1";
	close TRACING;
}

sub disable_tracing($) {
	my $tracing = shift;

	open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || die("Failed to open tracing event $tracing");
	print TRACING "0";
	close TRACING;

}

set_traceoption("stacktrace");
set_traceoption("sym-offset");
set_traceoption("sym-addr");
enable_tracing("kmem/mm_page_alloc");
open(TRACING, "/sys/kernel/debug/tracing/trace_pipe") || die("Failed to open trace_pipe");
signal_loop();
close TRACING;
disable_tracing("kmem/mm_page_alloc");
update_report();
print_report();

      reply	other threads:[~2010-06-16 14:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-16148-27@https.bugzilla.kernel.org/>
2010-06-10 22:38 ` [Bug 16148] New: page allocation failure. order:1, mode:0x50d0 Andrew Morton
2010-06-10 23:15   ` Dave Airlie
2010-06-11  8:46     ` Thomas Hellstrom
2010-06-11 17:24       ` Andrew Morton
2010-06-11 20:22         ` Thomas Hellstrom
2010-06-11 20:37           ` Andrew Morton
2010-06-11 21:39             ` Thomas Hellstrom
     [not found] ` <201006131301.o5DD1vqb002755@demeter.kernel.org>
2010-06-15 22:41   ` [Bug 16148] " Andrew Morton
2010-06-15 22:57     ` Dave Airlie
2010-06-16 13:54       ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100616135403.GA3034@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=devnull@plzk.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=mikko.cal@gmail.com \
    --cc=thellstrom@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.