All of lore.kernel.org
 help / color / mirror / Atom feed
From: Onkalo Samu <samu.p.onkalo@nokia.com>
To: mingo@elte.hu, peterz@infradead.org
Cc: "Onkalo Samu.P" <samu.p.onkalo@nokia.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Bug in scheduler when using rt_mutex
Date: Mon, 17 Jan 2011 16:42:45 +0200	[thread overview]
Message-ID: <1295275365.12840.13.camel@kolo> (raw)


Hi

I believe that there are some problems in the scheduling when
the following happens:
- Normal priority process locks rt_mutex and sleeps while keeping it
locked.
- RT priority process blocks on the rt_mutex while normal priority
process is sleeping

This sequence can occur with I2C access when both normal priority
thread and irq-thread access the same I2C bus. I2C core
contains rt_mutex and I2C drivers can sleep with wait_for_completion.


I have seen following failure to happen (also with 2.6.37):

User process access some device handle or sysfs entry which finally
makes an I2C access. I2C core contains rt_mutex protection against
parallel access. Sometimes when the rt_mutex is unlocked, user process
is not running for a long time (several minutes). This can occur when
there are only small number of user processes running. In my test cases
there was only cat /dev/zero > /dev/null running at the background and
other process was accessing sysfs entry.

Example:

cat /dev/zero > /dev/null &
while [ 1 ] ; do
cat /sys/devices/platform/lis3lv02d/selftest
done

Selftest causes I2C accesses from both user process and irq-thread.

Based on my debugging following sequence occurs (single CPU
system):

1) There is some user process running at the background (like
cat /dev/zero..)
2) User process reads sysfs entry which causes I2C acccess
3) User process locks rt_mutex in the I2C-core
4) User process sleeps while it keeps rt_mutex locked
(wait_for_completion in I2C transfer function)
5) irq-thread is kicked to run
6) irq-thread tries to take rt_mutex which is allready locked by user
process
7) sleeping user process is promoted to irq-thread priority (RT class)
8) user process is woken up by completion mechanism and it finishes its
job
9) user process unlocks rt_mutex and is changed back to old priority and
scheduling class
10) irq-thread continues as expected

User process is stucked to at phase 9. Scheduler may skip that process
for a long time.

Based on my analysis vruntime calculations fails for the user process.
At phase 9, vruntime for that sched_entity is much bigger compared other
processes which leads to situation that it is not scheduled for a long
time.

Problem is that at phase 7) user process is sleeping and the rt_mutex
priority change control is done for the sleeping task. se.vruntime is
not modified and when the user process continues running se.vruntime
contains about twice the cfs_rq.min_runtime value.

Success case:
- user process locks rt_mutex
- irq-thread causes user process to be promoted to RT level while the
user process is in the running and "on_rq == 1" state
-> dequeue_task is called which modifies se.vruntime
dequeue_entity function:

	if (!(flags & DEQUEUE_SLEEP))
		se->vruntime -= cfs_rq->min_vruntime;

When the process is moved back from rt to normal priority enqueue_task
updates vruntime again to correct value:
enqueue_entity:
	if (!(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_WAKING))
		se->vruntime += cfs_rq->min_vruntime;


Failure case:
- user process locks rt_mutex
- and goes to sleep (wait_for_completion etc.)
- user process is dequeued to sleep state
-> vruntime is not updated in dequeue_entity

- irq-thread blocks to rt_mutex and user process is promoted to RT
priory
- User process wakes up and continues until it releases rt_mutex
-> User process is moved from rt-queue to cfs queue. WAKEUP / WAKING
flags are not set so vruntime is updated to incorrect value.

I have a simple dummy-driver which demonstrates the case. It is tested
with single CPU embedded system on 2.6.37.
I also have correction proposal, but it is quite possible that there is
better way to do this and it may be that I miss some case totally.
Scheduler is quite complex thing. I'll send patches for the test case
and for the proposal.

Br, Samu Onkalo








             reply	other threads:[~2011-01-17 14:38 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-17 14:42 Onkalo Samu [this message]
2011-01-17 15:00 ` Bug in scheduler when using rt_mutex Peter Zijlstra
2011-01-17 15:15   ` samu.p.onkalo
2011-01-17 15:28     ` Peter Zijlstra
2011-01-17 16:00 ` Peter Zijlstra
2011-01-18  8:23   ` Onkalo Samu
2011-01-18  8:59     ` Yong Zhang
2011-01-18 13:35       ` Peter Zijlstra
2011-01-18 14:25         ` Onkalo Samu
2011-01-19  2:38         ` Yong Zhang
2011-01-19  3:43           ` Mike Galbraith
2011-01-19  4:35             ` Yong Zhang
2011-01-19  5:40               ` Mike Galbraith
2011-01-19  6:09                 ` Yong Zhang
2011-01-19  6:37                   ` Mike Galbraith
2011-01-19  7:19                     ` Ingo Molnar
2011-01-19  7:41                       ` Mike Galbraith
2011-01-19  9:44           ` Peter Zijlstra
2011-01-19 10:38             ` Peter Zijlstra
2011-01-19 11:30               ` Peter Zijlstra
2011-01-19 12:58                 ` Onkalo Samu
2011-01-19 13:13                   ` Onkalo Samu
2011-01-19 13:30                     ` Peter Zijlstra
2011-01-20  4:18                       ` Yong Zhang
2011-01-20  4:27                         ` Yong Zhang
2011-01-20  5:32                           ` Yong Zhang
2011-01-20  4:59                         ` Mike Galbraith
2011-01-20  5:30                           ` Yong Zhang
2011-01-20  6:12                             ` Mike Galbraith
2011-01-20  7:06                               ` Yong Zhang
2011-01-20  8:37                                 ` Mike Galbraith
2011-01-20  9:07                                   ` Yong Zhang
2011-01-20 10:07                                     ` Mike Galbraith
2011-01-21 11:08                                       ` Peter Zijlstra
2011-01-21 12:24                                         ` Yong Zhang
2011-01-21 13:40                                           ` Peter Zijlstra
2011-01-21 15:03                                             ` Yong Zhang
2011-01-21 15:10                                               ` Peter Zijlstra
2011-01-21 13:15                                       ` Yong Zhang
2011-01-20  7:07                       ` Onkalo Samu
2011-01-21  6:25                         ` Onkalo Samu
2011-01-20  3:10             ` Yong Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1295275365.12840.13.camel@kolo \
    --to=samu.p.onkalo@nokia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.