public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: [REGRESSION] funny sched_domain build failure during resume
Date: Fri, 9 May 2014 12:04:55 -0400	[thread overview]
Message-ID: <20140509160455.GA4486@htj.dyndns.org> (raw)

Hello, guys.

So, after resuming from suspend, I found my build jobs can not migrate
away from the CPU it started on and thus just making use of single
core.  It turns out the scheduler failed to build sched domains due to
order-3 allocation failure.

 systemd-sleep: page allocation failure: order:3, mode:0x104010
 CPU: 0 PID: 11648 Comm: systemd-sleep Not tainted 3.14.2-200.fc20.x86_64 #1
 Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013
  0000000000000000 000000001bc36890 ffff88009c2d5958 ffffffff816eec92
  0000000000104010 ffff88009c2d59e8 ffffffff8117a32a 0000000000000000
  ffff88021efe6b00 0000000000000003 0000000000104010 ffff88009c2d59e8
 Call Trace:
  [<ffffffff816eec92>] dump_stack+0x45/0x56
  [<ffffffff8117a32a>] warn_alloc_failed+0xfa/0x170
  [<ffffffff8117e8f5>] __alloc_pages_nodemask+0x8e5/0xb00
  [<ffffffff811c0ce3>] alloc_pages_current+0xa3/0x170
  [<ffffffff811796a4>] __get_free_pages+0x14/0x50
  [<ffffffff8119823e>] kmalloc_order_trace+0x2e/0xa0
  [<ffffffff810c033f>] build_sched_domains+0x1ff/0xcc0
  [<ffffffff810c123e>] partition_sched_domains+0x35e/0x3d0
  [<ffffffff811168e7>] cpuset_update_active_cpus+0x17/0x40
  [<ffffffff810c130a>] cpuset_cpu_active+0x5a/0x70
  [<ffffffff816f9f4c>] notifier_call_chain+0x4c/0x70
  [<ffffffff810b2a1e>] __raw_notifier_call_chain+0xe/0x10
  [<ffffffff8108a413>] cpu_notify+0x23/0x50
  [<ffffffff8108a678>] _cpu_up+0x188/0x1a0
  [<ffffffff816e1783>] enable_nonboot_cpus+0x93/0xf0
  [<ffffffff810d9d45>] suspend_devices_and_enter+0x325/0x450
  [<ffffffff810d9fe8>] pm_suspend+0x178/0x260
  [<ffffffff810d8e79>] state_store+0x79/0xf0
  [<ffffffff81355bdf>] kobj_attr_store+0xf/0x20
  [<ffffffff81262c4d>] sysfs_kf_write+0x3d/0x50
  [<ffffffff81266b12>] kernfs_fop_write+0xd2/0x140
  [<ffffffff811e964a>] vfs_write+0xba/0x1e0
  [<ffffffff811ea0a5>] SyS_write+0x55/0xd0
  [<ffffffff816ff029>] system_call_fastpath+0x16/0x1b

The allocation is from alloc_rootdomain().

	struct root_domain *rd;

	rd = kmalloc(sizeof(*rd), GFP_KERNEL);

The thing is the system has plenty of reclaimable memory and shouldn't
have any trouble satisfying one GFP_KERNEL order-3 allocation;
however, the problem is that this is during resume and the devices
haven't been woken up yet, so pm_restrict_gfp_mask() punches out
GFP_IOFS from all allocation masks and the page allocator has just
__GFP_WAIT to work with and, with enough bad luck, fails expectedly.

The problem has always been there but seems to have been exposed by
the addition of deadline scheduler support, which added cpudl to
root_domain making it larger by around 20k bytes on my setup, making
an order-3 allocation necessary during CPU online.

It looks like the allocation is for a temp buffer and there are also
percpu allocations going on.  Maybe just allocate the buffers on boot
and keep them around?

Kudos to Johannes for helping deciphering mm debug messages.

Thanks.

-- 
tejun

             reply	other threads:[~2014-05-09 16:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-09 16:04 Tejun Heo [this message]
2014-05-09 16:15 ` [REGRESSION] funny sched_domain build failure during resume Peter Zijlstra
2014-05-14 14:00 ` Peter Zijlstra
2014-05-14 14:05   ` Peter Zijlstra
2014-05-14 17:02   ` Tejun Heo
2014-05-14 17:10     ` Peter Zijlstra
2014-05-14 22:36       ` Tejun Heo
2014-05-15  8:57         ` Peter Zijlstra
2014-05-15 14:41         ` Johannes Weiner
2014-05-15 15:12           ` Peter Zijlstra
2014-05-16 11:47           ` Srivatsa S. Bhat
2014-05-15  8:40   ` Juri Lelli
2014-05-15  8:51     ` Peter Zijlstra
2014-05-15 11:52       ` Juri Lelli
2014-05-16 10:43       ` Peter Zijlstra
2014-05-16 11:01         ` Juri Lelli
2014-05-16 11:04           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140509160455.GA4486@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox