All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.0test9 + 2 * P IV Xeon 2.4GHz with HT + SATA + RAID1 = scheduler problems
@ 2003-11-13 11:12 Catalin BOIE
  2003-11-13 11:42 ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Catalin BOIE @ 2003-11-13 11:12 UTC (permalink / raw)
  To: linux-kernel

Hi!

I want to tell you that 2.6.0-test gets better and better. It works very
very well on several systems. Thank you very much, guys.

I have an server (like in the subject). The problem is that the scheduler
seems to behave weird. Sometimes a program just do nothing. There is no
disk activity, interrupts are a little over 1000, no disk requests,
context switches are ~40. The system is idle but it has work to do!
Can I provide more info?

I tried to put elevator=deadline and things seems worse.

If I'm not mistaken, the processes are in D state. Bt I'm not sure, I must
check again and right now I can't.

Also I suspect that scheduler doesn't pay special attention to virtual
(HT) processors. Is this true?

I have not seen this problem on other machines.

Thank you.

---
Catalin(ux) BOIE
catab@deuroconsult.ro

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.0test9 + 2 * P IV Xeon 2.4GHz with HT + SATA + RAID1 = scheduler problems
  2003-11-13 11:12 2.6.0test9 + 2 * P IV Xeon 2.4GHz with HT + SATA + RAID1 = scheduler problems Catalin BOIE
@ 2003-11-13 11:42 ` Nick Piggin
  2003-11-13 11:48   ` Catalin BOIE
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2003-11-13 11:42 UTC (permalink / raw)
  To: Catalin BOIE; +Cc: linux-kernel



Catalin BOIE wrote:

>Hi!
>
>I want to tell you that 2.6.0-test gets better and better. It works very
>very well on several systems. Thank you very much, guys.
>
>I have an server (like in the subject). The problem is that the scheduler
>seems to behave weird. Sometimes a program just do nothing. There is no
>disk activity, interrupts are a little over 1000, no disk requests,
>context switches are ~40. The system is idle but it has work to do!
>Can I provide more info?
>
>I tried to put elevator=deadline and things seems worse.
>
>If I'm not mistaken, the processes are in D state. Bt I'm not sure, I must
>check again and right now I can't.
>

Hi,
Please capture a Ctrl + Scroll Lock dump when you get processes stuck in
D state.

>
>Also I suspect that scheduler doesn't pay special attention to virtual
>(HT) processors. Is this true?
>

This is correct. Are you seeing any problems with HT? I think Linus
was hoping the NUMA / SMP scheduler could be generalised a bit more
so that HT would just fall into place. This might not happen before
2.7, so the shared runqueue approach might be the next best thing
(I like it).



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.0test9 + 2 * P IV Xeon 2.4GHz with HT + SATA + RAID1 = scheduler problems
  2003-11-13 11:42 ` Nick Piggin
@ 2003-11-13 11:48   ` Catalin BOIE
  2003-11-13 12:03     ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Catalin BOIE @ 2003-11-13 11:48 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

> Hi,
Hi!

> Please capture a Ctrl + Scroll Lock dump when you get processes stuck in
> D state.
I will.

> >Also I suspect that scheduler doesn't pay special attention to virtual
> >(HT) processors. Is this true?
> >
>
> This is correct. Are you seeing any problems with HT? I think Linus

Do you think that disabling HT (how I do it? noht?) will make things works
better? I suspect that a process is scheduled on a virtual processor that
doesn't get much chances to execute something. I don't know.

> was hoping the NUMA / SMP scheduler could be generalised a bit more
> so that HT would just fall into place. This might not happen before
> 2.7, so the shared runqueue approach might be the next best thing
> (I like it).

The problem with HT is the one that I describe here. From time to time a
process (mc, bash) is stuck for 2-6 seconds and then comes back. In test8
this was more visible.

Thank you very much, Nick!

---
Catalin(ux) BOIE
catab@deuroconsult.ro

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.0test9 + 2 * P IV Xeon 2.4GHz with HT + SATA + RAID1 = scheduler problems
  2003-11-13 11:48   ` Catalin BOIE
@ 2003-11-13 12:03     ` Nick Piggin
  2003-11-13 12:12       ` Catalin BOIE
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2003-11-13 12:03 UTC (permalink / raw)
  To: Catalin BOIE; +Cc: linux-kernel



Catalin BOIE wrote:

>>Hi,
>>
>Hi!
>
>
>>Please capture a Ctrl + Scroll Lock dump when you get processes stuck in
>>D state.
>>
>I will.
>
>
>>>Also I suspect that scheduler doesn't pay special attention to virtual
>>>(HT) processors. Is this true?
>>>
>>>
>>This is correct. Are you seeing any problems with HT? I think Linus
>>
>
>Do you think that disabling HT (how I do it? noht?) will make things works
>better? I suspect that a process is scheduled on a virtual processor that
>doesn't get much chances to execute something. I don't know.
>

I can't see an option to just disable HT with a quick grep. acpi=off
should do it though.

The virtual processors should get a roughly even amount of work done
AFAIK. I don't think the P4 allows any sort of control over priorities.

>
>>was hoping the NUMA / SMP scheduler could be generalised a bit more
>>so that HT would just fall into place. This might not happen before
>>2.7, so the shared runqueue approach might be the next best thing
>>(I like it).
>>
>
>The problem with HT is the one that I describe here. From time to time a
>process (mc, bash) is stuck for 2-6 seconds and then comes back. In test8
>this was more visible.
>
>Thank you very much, Nick!
>

Oh, so it is not any sort of disk IO work that is getting stuck? Then
don't worry about getting the Ctrl Scroll Lock trace...

OK, well yes turn HT off and see if that helps. One other thing which
springs to mind is that there is some CPU scheduler code that increases
timeslice grainularity as the CPU count increases. It seems a bit unlikely
that this is your problem though.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.0test9 + 2 * P IV Xeon 2.4GHz with HT + SATA + RAID1 = scheduler problems
  2003-11-13 12:03     ` Nick Piggin
@ 2003-11-13 12:12       ` Catalin BOIE
  0 siblings, 0 replies; 5+ messages in thread
From: Catalin BOIE @ 2003-11-13 12:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

> I can't see an option to just disable HT with a quick grep. acpi=off
> should do it though.
Do you think that disabling ACPI will be some problems with SMP?
I will try with "acpi=off".

> The virtual processors should get a roughly even amount of work done
> AFAIK. I don't think the P4 allows any sort of control over priorities.
But it is stuck because some resources are blocked by the other virtual
CPU, right? So, maybe this is the problem.

> >The problem with HT is the one that I describe here. From time to time a
> >process (mc, bash) is stuck for 2-6 seconds and then comes back. In test8
> >this was more visible.
> Oh, so it is not any sort of disk IO work that is getting stuck? Then
> don't worry about getting the Ctrl Scroll Lock trace...
I will come back with more info.

>
> OK, well yes turn HT off and see if that helps. One other thing which
> springs to mind is that there is some CPU scheduler code that increases
> timeslice grainularity as the CPU count increases. It seems a bit unlikely
> that this is your problem though.
Context switches are very low (~40) and there are ~40-50 processes
running. Only 2-4 processes are cpu-intensive (postgresql).


Thank you!
---
Catalin(ux) BOIE
catab@deuroconsult.ro

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-11-13 12:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-13 11:12 2.6.0test9 + 2 * P IV Xeon 2.4GHz with HT + SATA + RAID1 = scheduler problems Catalin BOIE
2003-11-13 11:42 ` Nick Piggin
2003-11-13 11:48   ` Catalin BOIE
2003-11-13 12:03     ` Nick Piggin
2003-11-13 12:12       ` Catalin BOIE

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.