* problem with nice values and cpu consumption in 2.6.11-5
@ 2005-05-03 14:24 Carlos Carvalho
2005-05-04 11:52 ` Kirill Korotaev
2005-05-04 12:12 ` Con Kolivas
0 siblings, 2 replies; 3+ messages in thread
From: Carlos Carvalho @ 2005-05-03 14:24 UTC (permalink / raw)
To: linux-kernel
Look at this cpu usage in a two-processor machine:
893 user1 39 19 7212 5892 492 R 99.7 1.1 3694:29 mi41
1118 user2 25 0 155m 61m 624 R 50.0 12.3 857:54.18 b170-se.x
1186 user3 25 0 155m 62m 640 R 50.2 12.3 103:25.22 b170-se.x
The job with nice 19 seems to be using 100% of cpu time while the
other two nice 0 jobs share a single processor with 50% only. This is
persistent, not a transient. I did a kill -STOP to the nice 19 job and
a kill -CONT, and for a while it decreased the cpu usage but later
returned to the above.
This is with kernel 2.6.11-5 and top 3.2.5. What's the reason for this
(apparent??) mis-behavior and how can I correct it? This is important
because the machine is used for number-crunching and users get really
upset when they don't get the expected share of cpu time...
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: problem with nice values and cpu consumption in 2.6.11-5
2005-05-03 14:24 problem with nice values and cpu consumption in 2.6.11-5 Carlos Carvalho
@ 2005-05-04 11:52 ` Kirill Korotaev
2005-05-04 12:12 ` Con Kolivas
1 sibling, 0 replies; 3+ messages in thread
From: Kirill Korotaev @ 2005-05-04 11:52 UTC (permalink / raw)
To: Carlos Carvalho; +Cc: linux-kernel
This is a real problem with O(1)-scheduler in 2.6... :(
The only workaround for you right now is to run 2.4 or move to some type
of virtualization solutions with fair cpu scheduler...
Kirill
> Look at this cpu usage in a two-processor machine:
>
> 893 user1 39 19 7212 5892 492 R 99.7 1.1 3694:29 mi41
> 1118 user2 25 0 155m 61m 624 R 50.0 12.3 857:54.18 b170-se.x
> 1186 user3 25 0 155m 62m 640 R 50.2 12.3 103:25.22 b170-se.x
>
> The job with nice 19 seems to be using 100% of cpu time while the
> other two nice 0 jobs share a single processor with 50% only. This is
> persistent, not a transient. I did a kill -STOP to the nice 19 job and
> a kill -CONT, and for a while it decreased the cpu usage but later
> returned to the above.
>
> This is with kernel 2.6.11-5 and top 3.2.5. What's the reason for this
> (apparent??) mis-behavior and how can I correct it? This is important
> because the machine is used for number-crunching and users get really
> upset when they don't get the expected share of cpu time...
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: problem with nice values and cpu consumption in 2.6.11-5
2005-05-03 14:24 problem with nice values and cpu consumption in 2.6.11-5 Carlos Carvalho
2005-05-04 11:52 ` Kirill Korotaev
@ 2005-05-04 12:12 ` Con Kolivas
1 sibling, 0 replies; 3+ messages in thread
From: Con Kolivas @ 2005-05-04 12:12 UTC (permalink / raw)
To: Carlos Carvalho; +Cc: linux-kernel, Ingo Molnar, Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 1701 bytes --]
On Wed, 4 May 2005 00:24, Carlos Carvalho wrote:
> Look at this cpu usage in a two-processor machine:
>
> 893 user1 39 19 7212 5892 492 R 99.7 1.1 3694:29 mi41
> 1118 user2 25 0 155m 61m 624 R 50.0 12.3 857:54.18 b170-se.x
> 1186 user3 25 0 155m 62m 640 R 50.2 12.3 103:25.22 b170-se.x
>
> The job with nice 19 seems to be using 100% of cpu time while the
> other two nice 0 jobs share a single processor with 50% only. This is
> persistent, not a transient. I did a kill -STOP to the nice 19 job and
> a kill -CONT, and for a while it decreased the cpu usage but later
> returned to the above.
>
> This is with kernel 2.6.11-5 and top 3.2.5. What's the reason for this
> (apparent??) mis-behavior and how can I correct it? This is important
> because the machine is used for number-crunching and users get really
> upset when they don't get the expected share of cpu time...
We currently do not have "nice" aware SMP balancing. The balancing is purely
designed with throughput in mind, and something about the behaviour of the
tasks you are running makes the scheduler design to balance them in this way.
The only way around this is to use affinities to bind tasks to cpus. The only
cross-cpu "nice" awareness we currently have is between hyperthread (SMT)
logical siblings, and not true physical cores.
I've been experimenting with code to make the SMP balancing "nice" aware but
the balancing design in the 2.6 scheduler changes every 3 minutes for some
apparent gain somewhere (it is getting impossible to track these) and there
is no baseline for me to work off, so I have, for the moment, given up on
that idea.
Cheers,
Con
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-05-04 12:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-03 14:24 problem with nice values and cpu consumption in 2.6.11-5 Carlos Carvalho
2005-05-04 11:52 ` Kirill Korotaev
2005-05-04 12:12 ` Con Kolivas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox