linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: raistlin@linux.it, juri.lelli@gmail.com,
	Ingo Molnar <mingo@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting
Date: Thu, 16 Jan 2014 14:48:51 +0100	[thread overview]
Message-ID: <52D7E343.40909@linaro.org> (raw)
In-Reply-To: <20140115120418.GD31570@twins.programming.kicks-ass.net>

On 01/15/2014 01:04 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>>
>> Hi all,
>>
>> I use the tip/sched/core branch.
>>
>> After git pulling yesterday, my host is unresponsive after booting the OS.
>>
>>   * It boots normally
>>   * It sends info to the console
>>   * The graphics does not work
>>   * The terminals show the prompt, I can enter the username but after
>> pressing enter, it does not give the password prompt
>>   * sysrq works more or less, I can't get the process stack but it receives
>> the command
>>
>> It is like no new process can be created.
>>
>> I have a dual Xeon processor E5325 (2 x 4 cores).
>>
>> After git bisecting, the following patch seems to introduce the bug.
>>
>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>
> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
> if graphics works, but I can ssh in and work on it without bother.
>
> I can even log in on the serial console without problems.
>
> I tried both tip/master and tip/sched/core.
>
> Would you happen to have a .config for me to try?

I was able to reduce the scope and reproduce the issue.

AFAICT, that happens with rsyslogd. When login in a tty, the login 
command sends a message through /dev/log. But rsyslogd is never woken up 
and blocked in poll_schedule_timeout. The login process is blocked in 
unix_wait_for_peer.

I can strace rsyslogd at startup. The two last sched_setscheduler calls 
fail.

 > grep sched trace.out

3570  sched_getparam(3570, { 0 })       = 0
3570  sched_getscheduler(3570)          = 0 (SCHED_OTHER)
3570  sched_get_priority_min(SCHED_OTHER) = 0
3570  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_get_priority_min(SCHED_OTHER) = 0
3571  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_get_priority_min(SCHED_OTHER) = 0
3571  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = 0
3571  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_min resumed> ) = 0
3571  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_max resumed> ) = 0
3571  sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = -1 EPERM (Operation not 
permitted)
3571  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_min resumed> ) = 0
3571  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_max resumed> ) = 0
3571  sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = -1 EPERM (Operation not 
permitted)

The same strace but on a kernel which does not hang. The calls to 
sched_setscheduler do not fail.

3292  sched_getparam(3292, { 0 })       = 0
3292  sched_getscheduler(3292)          = 0 (SCHED_OTHER)
3292  sched_get_priority_min(SCHED_OTHER) = 0
3292  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_get_priority_min(SCHED_OTHER) = 0
3293  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_get_priority_min(SCHED_OTHER) = 0
3293  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0
3293  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_min resumed> ) = 0
3293  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_max resumed> ) = 0
3293  sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0
3293  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_min resumed> ) = 0
3293  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_max resumed> ) = 0
3293  sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0

The EPERM error comes from kernel/sched/core.c:3303

...
		if (fair_policy(policy)) {
			if (!can_nice(p, attr->sched_nice))
				return -EPERM;
		}
...


But I don't know why this is leading to block a process or making 
rsyslogd being not woken up by a packet coming in the af_unix socket.

I hope that helps

   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


  parent reply	other threads:[~2014-01-16 13:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15  8:27 [BUG] [ tip/sched/core ] System unresponsive after booting Daniel Lezcano
2014-01-15  9:22 ` Ingo Molnar
2014-01-15 10:39   ` Peter Zijlstra
2014-01-15 11:00   ` Peter Zijlstra
2014-01-15 14:05     ` Peter Zijlstra
2014-01-15  9:25 ` Michael wang
2014-01-15 11:30   ` Peter Zijlstra
2014-01-15 13:28     ` Daniel Lezcano
2014-01-16 13:40     ` [tip:sched/core] sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls tip-bot for Peter Zijlstra
2014-01-15 13:27   ` [BUG] [ tip/sched/core ] System unresponsive after booting Daniel Lezcano
2014-01-15 12:04 ` Peter Zijlstra
2014-01-15 12:24   ` Ingo Molnar
2014-01-15 13:45     ` Daniel Lezcano
2014-01-15 13:09   ` Daniel Lezcano
2014-01-16 13:48   ` Daniel Lezcano [this message]
2014-01-16 14:17     ` Peter Zijlstra
2014-01-16 14:20       ` Daniel Lezcano
2014-01-16 14:25         ` Peter Zijlstra
2014-01-16 14:30           ` Daniel Lezcano
2014-01-16 15:42             ` Peter Zijlstra
2014-01-16 15:50               ` Daniel Lezcano
2014-01-16 16:54                 ` [PATCH] sched: Fix __sched_setscheduler() nice test Peter Zijlstra
2014-01-16 18:33                   ` Peter Zijlstra
2014-01-16 18:39                   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2014-01-16 15:28           ` [BUG] [ tip/sched/core ] System unresponsive after booting Daniel Lezcano
2014-01-16 15:48             ` Peter Zijlstra
2014-01-16 15:51               ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D7E343.40909@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=juri.lelli@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raistlin@linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).