All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: "Török Edwin" <edwintorok@gmail.com>,
	"Ingo Molnar" <mingo@elte.hu>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Linux Kernel" <linux-kernel@vger.kernel.org>
Subject: Re: scheduler bug: process running since 5124095h
Date: Tue, 30 Mar 2010 10:07:15 +0200	[thread overview]
Message-ID: <1269936435.6699.21.camel@marge.simson.net> (raw)
In-Reply-To: <4BB1989D.4050809@jp.fujitsu.com>

On Tue, 2010-03-30 at 15:22 +0900, Hidetoshi Seto wrote:

> Quick report:
> 
> The reason why this commit have bisected is because it changed
> the type of time values from signed clock_t to unsigned cputime_t,
> so that the following if-block become to be always taken:
> 
>    > -       stime = nsec_to_clock_t(p->se.sum_exec_runtime) -
>    > -                       cputime_to_clock_t(task_utime(p));
>    > +       stime = nsecs_to_cputime(p->se.sum_exec_runtime) - task_utime(p);
>    > 
> >> >         if (stime >= 0)
>    > -               p->prev_stime = max(p->prev_stime, clock_t_to_cputime(stime));
>    > +               p->prev_stime = max(p->prev_stime, stime);
>    > 
>    >         return p->prev_stime;
> 
> >From strace of latancytop, it does write to /proc/<pid>/sched:
> 
>    5891  open("/proc/1/sched", O_RDWR)     = 5
>    5891  fstat(5, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>    5891  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
>     0x7fc6668f3000
>    5891  read(5, "init (1, #threads: 1)\n----------"..., 1024) = 776
>    5891  read(5, "", 1024)                 = 0
> >> 5891  write(5, "erase", 5)              = 5
>    5891  close(5)                          = 0
> 
> It results in:
> 
> [kernel/sched_debug.c]
> void proc_sched_set_task(struct task_struct *p)
> {
>  :
>         p->se.sum_exec_runtime                  = 0;
>         p->se.prev_sum_exec_runtime             = 0;
>         p->nvcsw                                = 0;
>         p->nivcsw                               = 0;
> }

Hm.  Dunno why that (evilness) exists.

> So soon some task will have great (in fact negative) stime.
> 
> There would be no doubt that this initialize in sched_debug.c
> will break monotonicity of sum_exec_runtime.  I confirmed that
> the issue is disappeared by comment-out of lines above.
> 
> Reverting the bisected commit is wrong solution, because it
> will bring another issue, i.e. lost of runtime, and u/stime
> seems to be frozen because these values restart from 0 so
> prev_* is used for a while.
> 
> How to fix?  Is this a bug of latencytop? Kernel?
> Please comment.

I looked at it in tip this morning (where task_stime doesn't exist), and
did the below to swat the underflow.  Seems ok, but I haven't yet
enabled CONFIG_LATENCYTOP.

diff --git a/kernel/sched.c b/kernel/sched.c
index 81575f1..cdf4fa4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3366,6 +3366,7 @@ void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 	 * Compare with previous values, to keep monotonicity:
 	 */
 	p->prev_utime = max(p->prev_utime, utime);
+	rtime = max(rtime, p->prev_utime);
 	p->prev_stime = max(p->prev_stime, cputime_sub(rtime, p->prev_utime));
 
 	*ut = p->prev_utime;
@@ -3396,6 +3397,7 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 		utime = rtime;
 
 	sig->prev_utime = max(sig->prev_utime, utime);
+	rtime = max(rtime, sig->prev_utime);
 	sig->prev_stime = max(sig->prev_stime,
 			      cputime_sub(rtime, sig->prev_utime));
 



  reply	other threads:[~2010-03-30  8:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4BADD408.8080609@gmail.com>
2010-03-28  8:49 ` scheduler bug: process running since 5124095h Török Edwin
2010-03-29 10:52   ` Mike Galbraith
2010-03-29 10:54     ` Peter Zijlstra
2010-03-29 11:33       ` Mike Galbraith
2010-03-29 12:04     ` Hidetoshi Seto
2010-03-30  6:22       ` Hidetoshi Seto
2010-03-30  8:07         ` Mike Galbraith [this message]
2010-03-30  9:09           ` Mike Galbraith
2010-03-30  9:01         ` Peter Zijlstra
2010-03-30  9:12           ` Mike Galbraith
2010-03-30  9:34             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1269936435.6699.21.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=edwintorok@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.