public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Divide-by-zero in the 2.6.23 scheduler code
@ 2007-11-14  1:14 Chuck Ebbert
  2007-11-14 13:27 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Ebbert @ 2007-11-14  1:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Peter Zijlstra

https://bugzilla.redhat.com/show_bug.cgi?id=340161

The problem code has been removed in 2.6.24. The below patch disables
SCHED_FEAT_PRECISE_CPU_LOAD which causes the offending code to be skipped
but does not prevent the user from enabling it.

The divide-by-zero is here in kernel/sched.c:

static void update_cpu_load(struct rq *this_rq)
{
	u64 fair_delta64, exec_delta64, idle_delta64, sample_interval64, tmp64;
	unsigned long total_load = this_rq->ls.load.weight;
	unsigned long this_load =  total_load;
	struct load_stat *ls = &this_rq->ls;
	int i, scale;

	this_rq->nr_load_updates++;
	if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
		goto do_avg;

	/* Update delta_fair/delta_exec fields first */
	update_curr_load(this_rq);

	fair_delta64 = ls->delta_fair + 1;
	ls->delta_fair = 0;

	exec_delta64 = ls->delta_exec + 1;
	ls->delta_exec = 0;

	sample_interval64 = this_rq->clock - ls->load_update_last;
	ls->load_update_last = this_rq->clock;

	if ((s64)sample_interval64 < (s64)TICK_NSEC)
		sample_interval64 = TICK_NSEC;

	if (exec_delta64 > sample_interval64)
		exec_delta64 = sample_interval64;

	idle_delta64 = sample_interval64 - exec_delta64;

======>	tmp64 = div64_64(SCHED_LOAD_SCALE * exec_delta64, fair_delta64);
	tmp64 = div64_64(tmp64 * exec_delta64, sample_interval64);

	this_load = (unsigned long)tmp64;

do_avg:

	/* Update our load: */
	for (i = 0, scale = 1; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
		unsigned long old_load, new_load;

		/* scale is effectively 1 << i now, and >> i divides by scale */

		old_load = this_rq->cpu_load[i];
		new_load = this_load;

		this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) >> i;
	}
}


---
 kernel/sched_fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.23.noarch.orig/kernel/sched_fair.c
+++ linux-2.6.23.noarch/kernel/sched_fair.c
@@ -93,7 +93,7 @@ unsigned int sysctl_sched_features __rea
 		SCHED_FEAT_FAIR_SLEEPERS	*1 |
 		SCHED_FEAT_SLEEPER_AVG		*0 |
 		SCHED_FEAT_SLEEPER_LOAD_AVG	*1 |
-		SCHED_FEAT_PRECISE_CPU_LOAD	*1 |
+		SCHED_FEAT_PRECISE_CPU_LOAD	*0 |
 		SCHED_FEAT_START_DEBIT		*1 |
 		SCHED_FEAT_SKIP_INITIAL		*0;
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Divide-by-zero in the 2.6.23 scheduler code
  2007-11-14  1:14 Divide-by-zero in the 2.6.23 scheduler code Chuck Ebbert
@ 2007-11-14 13:27 ` Peter Zijlstra
  2007-11-14 13:56   ` Ingo Molnar
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2007-11-14 13:27 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]


On Tue, 2007-11-13 at 20:14 -0500, Chuck Ebbert wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=340161

While I see the user has a divide by zero, I'm not understanding it.

> The problem code has been removed in 2.6.24. The below patch disables
> SCHED_FEAT_PRECISE_CPU_LOAD which causes the offending code to be skipped
> but does not prevent the user from enabling it.
> 
> The divide-by-zero is here in kernel/sched.c:
> 
> static void update_cpu_load(struct rq *this_rq)
> {
> 	u64 fair_delta64, exec_delta64, idle_delta64, sample_interval64, tmp64;
> 	unsigned long total_load = this_rq->ls.load.weight;
> 	unsigned long this_load =  total_load;
> 	struct load_stat *ls = &this_rq->ls;
> 	int i, scale;
> 
> 	this_rq->nr_load_updates++;
> 	if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
> 		goto do_avg;
> 
> 	/* Update delta_fair/delta_exec fields first */
> 	update_curr_load(this_rq);
> 
> 	fair_delta64 = ls->delta_fair + 1;

Shouldn't that +1 avoid fair_delta64 from being 0?

> 	ls->delta_fair = 0;
> 
> 	exec_delta64 = ls->delta_exec + 1;
> 	ls->delta_exec = 0;
> 
> 	sample_interval64 = this_rq->clock - ls->load_update_last;
> 	ls->load_update_last = this_rq->clock;
> 
> 	if ((s64)sample_interval64 < (s64)TICK_NSEC)
> 		sample_interval64 = TICK_NSEC;

This avoids sample_interval64 from being 0.

> 	if (exec_delta64 > sample_interval64)
> 		exec_delta64 = sample_interval64;
> 
> 	idle_delta64 = sample_interval64 - exec_delta64;
> 
> ======>	tmp64 = div64_64(SCHED_LOAD_SCALE * exec_delta64, fair_delta64);
> 	tmp64 = div64_64(tmp64 * exec_delta64, sample_interval64);
> 
> 	this_load = (unsigned long)tmp64;
> 
> do_avg:
> 
> 	/* Update our load: */
> 	for (i = 0, scale = 1; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
> 		unsigned long old_load, new_load;
> 
> 		/* scale is effectively 1 << i now, and >> i divides by scale */
> 
> 		old_load = this_rq->cpu_load[i];
> 		new_load = this_load;
> 
> 		this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) >> i;
> 	}
> }
> 

As for the patch, better to just rip out the entire feature..

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Divide-by-zero in the 2.6.23 scheduler code
  2007-11-14 13:27 ` Peter Zijlstra
@ 2007-11-14 13:56   ` Ingo Molnar
  2007-11-14 14:06     ` Dmitry Adamushko
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2007-11-14 13:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Chuck Ebbert, linux-kernel, Dmitry Adamushko


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> As for the patch, better to just rip out the entire feature..

for -stable it's safer to have smaller patches - so this patch is 
perfectly fine. A user can turn it back on under SCHED_DEBUG and by 
tweaking a debug flag - but that's not a big issue, you can mess up the 
system in numerous ways via /proc/sys anyway. So i'd suggest Chuck's 
patch for -stable.

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Divide-by-zero in the 2.6.23 scheduler code
  2007-11-14 13:56   ` Ingo Molnar
@ 2007-11-14 14:06     ` Dmitry Adamushko
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Adamushko @ 2007-11-14 14:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Chuck Ebbert, linux-kernel

[ forwarded to the list ]

so far, just a brief inspection below...

>
> The divide-by-zero is here in kernel/sched.c:
> [ ... ]
>
>         fair_delta64 = ls->delta_fair + 1;
>         ls->delta_fair = 0;
>
>         exec_delta64 = ls->delta_exec + 1;
>         ls->delta_exec = 0;
>
>         sample_interval64 = this_rq->clock - ls->load_update_last;
>         ls->load_update_last = this_rq->clock;
>
>         if ((s64)sample_interval64 < (s64)TICK_NSEC)
>                 sample_interval64 = TICK_NSEC;
>
>         if (exec_delta64 > sample_interval64)
>                 exec_delta64 = sample_interval64;
>
>         idle_delta64 = sample_interval64 - exec_delta64;
>
> ======> tmp64 = div64_64(SCHED_LOAD_SCALE * exec_delta64, fair_delta64);

fair_delta64 == 0
and
fair_delta64 == ls->delta_fair + 1;

so obviously, i.e. ls->delta_fair == -1. delta_fair is of 'insigned
long' and calculated in

__update_curr_load() by means of calc_delta_mine().

calc_delta_mine() does in the very end:

return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX);

(*) so that means, we likely got 'tmp' > (unsigned long)LONG_MAX in
calc_delta_mine()...

btw.,

- fair_delta64 == ls->delta_fair + 1;
+ fair_delta64 == (u64)ls->delta_fair + 1;

in update_cpu_load() would avoid the problem, I guess (and perhaps,
can be legitimate, logically-wise).

maybe on the system with low HZ value (I can't see the kernel config
immediately on the bugzilla page) and a task niced to the lowest
priority (is this 'kjournald' mentioned in the report of lower prio? )
running for a full tick, 'tmp' can be such a big value... hmm?


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-11-14 14:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-14  1:14 Divide-by-zero in the 2.6.23 scheduler code Chuck Ebbert
2007-11-14 13:27 ` Peter Zijlstra
2007-11-14 13:56   ` Ingo Molnar
2007-11-14 14:06     ` Dmitry Adamushko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox