All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: raistlin@linux.it, juri.lelli@gmail.com,
	Ingo Molnar <mingo@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting
Date: Thu, 16 Jan 2014 14:48:51 +0100	[thread overview]
Message-ID: <52D7E343.40909@linaro.org> (raw)
In-Reply-To: <20140115120418.GD31570@twins.programming.kicks-ass.net>

On 01/15/2014 01:04 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>>
>> Hi all,
>>
>> I use the tip/sched/core branch.
>>
>> After git pulling yesterday, my host is unresponsive after booting the OS.
>>
>>   * It boots normally
>>   * It sends info to the console
>>   * The graphics does not work
>>   * The terminals show the prompt, I can enter the username but after
>> pressing enter, it does not give the password prompt
>>   * sysrq works more or less, I can't get the process stack but it receives
>> the command
>>
>> It is like no new process can be created.
>>
>> I have a dual Xeon processor E5325 (2 x 4 cores).
>>
>> After git bisecting, the following patch seems to introduce the bug.
>>
>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>
> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
> if graphics works, but I can ssh in and work on it without bother.
>
> I can even log in on the serial console without problems.
>
> I tried both tip/master and tip/sched/core.
>
> Would you happen to have a .config for me to try?

I was able to reduce the scope and reproduce the issue.

AFAICT, that happens with rsyslogd. When login in a tty, the login 
command sends a message through /dev/log. But rsyslogd is never woken up 
and blocked in poll_schedule_timeout. The login process is blocked in 
unix_wait_for_peer.

I can strace rsyslogd at startup. The two last sched_setscheduler calls 
fail.

 > grep sched trace.out

3570  sched_getparam(3570, { 0 })       = 0
3570  sched_getscheduler(3570)          = 0 (SCHED_OTHER)
3570  sched_get_priority_min(SCHED_OTHER) = 0
3570  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_get_priority_min(SCHED_OTHER) = 0
3571  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_get_priority_min(SCHED_OTHER) = 0
3571  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = 0
3571  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_min resumed> ) = 0
3571  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_max resumed> ) = 0
3571  sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = -1 EPERM (Operation not 
permitted)
3571  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_min resumed> ) = 0
3571  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_max resumed> ) = 0
3571  sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = -1 EPERM (Operation not 
permitted)

The same strace but on a kernel which does not hang. The calls to 
sched_setscheduler do not fail.

3292  sched_getparam(3292, { 0 })       = 0
3292  sched_getscheduler(3292)          = 0 (SCHED_OTHER)
3292  sched_get_priority_min(SCHED_OTHER) = 0
3292  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_get_priority_min(SCHED_OTHER) = 0
3293  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_get_priority_min(SCHED_OTHER) = 0
3293  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0
3293  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_min resumed> ) = 0
3293  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_max resumed> ) = 0
3293  sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0
3293  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_min resumed> ) = 0
3293  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_max resumed> ) = 0
3293  sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0

The EPERM error comes from kernel/sched/core.c:3303

...
		if (fair_policy(policy)) {
			if (!can_nice(p, attr->sched_nice))
				return -EPERM;
		}
...


But I don't know why this is leading to block a process or making 
rsyslogd being not woken up by a packet coming in the af_unix socket.

I hope that helps

   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


  parent reply	other threads:[~2014-01-16 13:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15  8:27 [BUG] [ tip/sched/core ] System unresponsive after booting Daniel Lezcano
2014-01-15  9:22 ` Ingo Molnar
2014-01-15 10:39   ` Peter Zijlstra
2014-01-15 11:00   ` Peter Zijlstra
2014-01-15 14:05     ` Peter Zijlstra
2014-01-15  9:25 ` Michael wang
2014-01-15 11:30   ` Peter Zijlstra
2014-01-15 13:28     ` Daniel Lezcano
2014-01-16 13:40     ` [tip:sched/core] sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls tip-bot for Peter Zijlstra
2014-01-15 13:27   ` [BUG] [ tip/sched/core ] System unresponsive after booting Daniel Lezcano
2014-01-15 12:04 ` Peter Zijlstra
2014-01-15 12:24   ` Ingo Molnar
2014-01-15 13:45     ` Daniel Lezcano
2014-01-15 13:09   ` Daniel Lezcano
2014-01-16 13:48   ` Daniel Lezcano [this message]
2014-01-16 14:17     ` Peter Zijlstra
2014-01-16 14:20       ` Daniel Lezcano
2014-01-16 14:25         ` Peter Zijlstra
2014-01-16 14:30           ` Daniel Lezcano
2014-01-16 15:42             ` Peter Zijlstra
2014-01-16 15:50               ` Daniel Lezcano
2014-01-16 16:54                 ` [PATCH] sched: Fix __sched_setscheduler() nice test Peter Zijlstra
2014-01-16 18:33                   ` Peter Zijlstra
2014-01-16 18:39                   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2014-01-16 15:28           ` [BUG] [ tip/sched/core ] System unresponsive after booting Daniel Lezcano
2014-01-16 15:48             ` Peter Zijlstra
2014-01-16 15:51               ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D7E343.40909@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=juri.lelli@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raistlin@linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.