All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Lange <thomas@corelatus.se>
To: mingo@redhat.com, peterz@infradead.org
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org
Subject: [BUG] sched: clock wrap bug in 2.6.35-stable kills scheduling
Date: Sun, 24 Jun 2012 19:32:11 +0200	[thread overview]
Message-ID: <4FE74F1B.6070803@corelatus.se> (raw)

Commit 305e683 introduced a wrap bug that causes task scheduling to fail
after sched_clock() wrap. On a 1000 HZ system with 32bit jiffies, this
occurs after 49.7 days.

Bug was introduced in 2.6.35.12 and is still present in linux-2.6.35.y HEAD.

Symptoms include one task getting all available cpu time while others get
_none_. Setting niceness seems to make things even worse. Running this code
in a new process after wrap completely lock up user space, thus triggering a
watchdog reboot:
{ nice(1); while(1); }

To reproduce bug in reasonable time, one can up HZ. With 16000 HZ, bug occurs
after 3.1 days.
Modifying sched_clock() to wrap when jiffies does triggers bug after 5 mins.

The basic problem seems to be that rq->clock_task get stuck forever with a
really high value when rq->clock starts over from 0.

This fix solves that problem:

diff --git a/kernel/sched.c b/kernel/sched.c
index d40d662..883448f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -657,6 +657,8 @@ inline void update_rq_clock(struct rq *rq)
        if (!rq->skip_clock_update)
                rq->clock = sched_clock_cpu(cpu_of(rq));
        irq_time = irq_time_cpu(cpu);
+       if (rq->clock < rq->clock_task)
+               rq->clock_task = 0;
        if (rq->clock - irq_time > rq->clock_task)
                rq->clock_task = rq->clock - irq_time;

I can create a proper patch if the above is acceptable.

A more appropriate solution would perhaps be to pull some additional sched
commits into stable branch, like fe44d62 and friends. I don't know enough
about scheduler internals to tell.

All tests were performed on mips32 systems, but all systems with 32bit
jiffies should be affected.

/Thomas

             reply	other threads:[~2012-06-24 17:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-24 17:32 Thomas Lange [this message]
2012-06-24 17:52 ` [BUG] sched: clock wrap bug in 2.6.35-stable kills scheduling Greg KH
2012-06-25  9:45 ` Peter Zijlstra
2012-06-25 10:38   ` Peter Zijlstra
2012-06-25 18:49     ` Thomas Lange
2012-06-25 19:33       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE74F1B.6070803@corelatus.se \
    --to=thomas@corelatus.se \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.