2.6.X, NPTL, SCHED_FIFO and JACK

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.X, NPTL, SCHED_FIFO and JACK
@ 2004-06-30 13:41 Paul Davis
  2004-06-30 15:04 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Paul Davis @ 2004-06-30 13:41 UTC (permalink / raw)
  To: linux-kernel

JACK is the de-facto standard for low latency audio and
inter-application audio routing on Linux (its also widely appreciated
on OS X too). It makes heavy use of threads to provide the
functionality relied on by more than 2 dozen serious Linux audio
applications. For many users, its a requirement to use SCHED_FIFO and
mlockall() with audio applications, because of the realtime, low
latency nature of their configurations/goals.

Because of the recognition by kernel developers that 2.6 does not
perform as well as 2.4+lowlat (the Andrew Morton patches) when it
comes to scheduling latency, most audio developers and users have
remained with 2.4. Recently however, several brave souls have
attempted to test 2.6. The results have been mixed.

On the one hand, it does seem possible to get performance from an
unpatched 2.6 kernel that is pretty close to the 2.4+lowlat
numbers. Using the CKolivas patches for 2.6 only improves things
further. 

However, the ONLY way to get even vaguely reasonable performance in
this area is to disable the use of NPTL using LD_ASSUME_KERNEL. With
NPTL in use, there are a series of apparently interlocking problems
with scheduler parameter inheritance, scheduler performance and
decision making. Its more or less impossible to run JACK-enabled audio
systems on 2.6 with NPTL. A series of ugly kludges are beginning to
emerge within the Linux audio community, and I think its time we cut
them off before things get out of hand.

The JACK group is entirely open to the idea that we have made an error
in our use of the pthreads API, and that NPTL is simply exposing our
mistake. We can't see the error, however, and so for the moment, we
are working on the assumption that there are genuine kernel+glibc
errors. 

The first and most visible issue is with inheritance of SCHED_FIFO
scheduling. Although there are other mechanisms available under 2.6,
many people use the "jackstart" helper application which runs setuid
root and uses capabilities to start up JACK with the required caps to
allow use of SCHED_FIFO and mlockall(). This has worked very well in
2.4 for about 2 years, but in 2.6 JACK fails to get its threads to be
in the SCHED_FIFO scheduling class without a bunch of nasty kludges.

Things work correctly as soon as LD_ASSUME_KERNEL is used. 

We also see apparently impossible thread scheduling, where a thread
that should run immediately is delayed by a significant time, and the
thread that woke the first one up (and should be waiting for it to
execute) runs again, apparently without ever having blocked. Once
more, it all works correctly is LD_ASSUME_KERNEL is used to avoid
NPTL.

Are there known issues with the implementation of NPTL that might give
rise to this behaviour? What can we do to help understand and debug
it?

thanks,

Paul Davis <paul@linuxaudiosystems.com>                 Bala Cynwyd, PA, USA
Linux Audio Systems                                             610-667-4807
----------------------------------------------------------------------------
hybrid rather than pure; compromising rather than clean;
distorted rather than straightforward; ambiguous rather than
articulated; both-and rather than either-or; the difficult
unity of inclusion rather than the easy unity of exclusion.   Robert Venturi
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis
@ 2004-06-30 15:04 ` Ingo Molnar
  2004-06-30 15:18   ` Ingo Molnar
  2004-06-30 15:26   ` Jakub Jelinek
  2004-06-30 15:05 ` Ingo Molnar
  2004-07-01 18:03 ` Matt Mackall
  2 siblings, 2 replies; 24+ messages in thread
From: Ingo Molnar @ 2004-06-30 15:04 UTC (permalink / raw)
  To: Paul Davis; +Cc: linux-kernel


* Paul Davis <paul@linuxaudiosystems.com> wrote:

> The first and most visible issue is with inheritance of SCHED_FIFO
> scheduling. Although there are other mechanisms available under 2.6,
> many people use the "jackstart" helper application which runs setuid
> root and uses capabilities to start up JACK with the required caps to
> allow use of SCHED_FIFO and mlockall(). This has worked very well in
> 2.4 for about 2 years, but in 2.6 JACK fails to get its threads to be
> in the SCHED_FIFO scheduling class without a bunch of nasty kludges.
> 
> Things work correctly as soon as LD_ASSUME_KERNEL is used. 

A simple "strace -f" should show whether the setscheduler() call
succeeds or not. Does 'jackstart' do anything with glibc internals?

> We also see apparently impossible thread scheduling, where a thread
> that should run immediately is delayed by a significant time, and the
> thread that woke the first one up (and should be waiting for it to
> execute) runs again, apparently without ever having blocked. Once
> more, it all works correctly is LD_ASSUME_KERNEL is used to avoid
> NPTL.

there was a SCHED_FIFO bug in all 2.6 kernels prior 2.6.5, causing
erratic scheduling. Have you tried 2.6.6 or 2.6.7?

> Are there known issues with the implementation of NPTL that might give
> rise to this behaviour? What can we do to help understand and debug
> it?

there's nothing special about NPTL, scheduling-wise. But if SCHED_FIFO
is not properly set for all JACK threads that could explain the
symptoms. You talked about kludges that are necessary to make all
threads SCHED_FIFO - are you 100% sure that all JACK threads are indeed
SCHED_FIFO after these kludges are applied? If yes and you are running a
later kernel then it's something new and probably NPTL-unrelated.

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis
  2004-06-30 15:04 ` Ingo Molnar
@ 2004-06-30 15:05 ` Ingo Molnar
  2004-06-30 16:12   ` Paul Davis
  2004-07-01 18:03 ` Matt Mackall
  2 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2004-06-30 15:05 UTC (permalink / raw)
  To: Paul Davis; +Cc: linux-kernel


another question: do all JACK threads run at SCHED_FIFO, and do they all 
have the same rt_priority value?

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 15:04 ` Ingo Molnar
@ 2004-06-30 15:18   ` Ingo Molnar
  2004-06-30 15:26   ` Jakub Jelinek
  1 sibling, 0 replies; 24+ messages in thread
From: Ingo Molnar @ 2004-06-30 15:18 UTC (permalink / raw)
  To: Paul Davis; +Cc: linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> A simple "strace -f" should show whether the setscheduler() call
> succeeds or not. Does 'jackstart' do anything with glibc internals?

it seems part of the problem is that the setscheduler() calls 'succeed',
but the policy is not changed to SCHED_FIFO. The question here is, 
are the correct PIDs used?

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 15:04 ` Ingo Molnar
  2004-06-30 15:18   ` Ingo Molnar
@ 2004-06-30 15:26   ` Jakub Jelinek
  2004-06-30 16:32     ` Paul Davis
  1 sibling, 1 reply; 24+ messages in thread
From: Jakub Jelinek @ 2004-06-30 15:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Paul Davis, linux-kernel

On Wed, Jun 30, 2004 at 05:04:30PM +0200, Ingo Molnar wrote:
> > Are there known issues with the implementation of NPTL that might give
> > rise to this behaviour? What can we do to help understand and debug
> > it?
> 
> there's nothing special about NPTL, scheduling-wise. But if SCHED_FIFO
> is not properly set for all JACK threads that could explain the
> symptoms. You talked about kludges that are necessary to make all
> threads SCHED_FIFO - are you 100% sure that all JACK threads are indeed
> SCHED_FIFO after these kludges are applied? If yes and you are running a
> later kernel then it's something new and probably NPTL-unrelated.

One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
So, if you care about what scheduling created threads will have
and want it to work with both NPTL and LinuxThreads, you want
pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
explicitely.

	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 15:05 ` Ingo Molnar
@ 2004-06-30 16:12   ` Paul Davis
  2004-06-30 17:07     ` Ulrich Drepper
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Davis @ 2004-06-30 16:12 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

>another question: do all JACK threads run at SCHED_FIFO, and do they all 
>have the same rt_priority value?

They don't all run SCHED_FIFO. Just two threads in the server (one is
a watchdog designed to prevent system lockups) and at least one in
each client (there may be more depending on what the client does, but
its not created by JACK and JACK doesn't know about it). The client
threads run at 1 level lower priority than the servers main thread,
and that runs 1 level lower than the watchdog.

but ...

>it seems part of the problem is that the setscheduler() calls 'succeed',
>but the policy is not changed to SCHED_FIFO. The question here is,
>are the correct PIDs used?

this has me thinking. one of the major changes with NPTL is that all
threads share the same PID. so how in the world do we ever set the
scheduling policy of a single thread (as opposed to something
identified by a pid_t) to SCHED_FIFO?

--p

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 15:26   ` Jakub Jelinek
@ 2004-06-30 16:32     ` Paul Davis
  2004-06-30 16:57       ` Jakub Jelinek
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Davis @ 2004-06-30 16:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ingo Molnar, linux-kernel

>One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
>while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
>So, if you care about what scheduling created threads will have
>and want it to work with both NPTL and LinuxThreads, you want
>pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
>explicitely.

But since we always set the scheduling class explicitly, should the
inherited scheduler class make any difference?

--p

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 16:32     ` Paul Davis
@ 2004-06-30 16:57       ` Jakub Jelinek
  2004-06-30 17:52         ` Paul Davis
  0 siblings, 1 reply; 24+ messages in thread
From: Jakub Jelinek @ 2004-06-30 16:57 UTC (permalink / raw)
  To: Paul Davis; +Cc: Ingo Molnar, linux-kernel

On Wed, Jun 30, 2004 at 12:32:03PM -0400, Paul Davis wrote:
> >One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
> >while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
> >So, if you care about what scheduling created threads will have
> >and want it to work with both NPTL and LinuxThreads, you want
> >pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
> >explicitely.
> 
> But since we always set the scheduling class explicitly, should the
> inherited scheduler class make any difference?

Of course.
If you say
pthread_attr_init (&attr);
pthread_attr_setschedpolicy (&attr, SCHED_FIFO);
pthread_attr_setschedparam (&attr, &param);
pthread_create (&th, &attr, fn, arg);
then with LinuxThreads the thread will have FIFO policy while with
NPTL it won't unless the current thread has it.
If you:
pthread_attr_init (&attr);
pthread_attr_setschedpolicy (&attr, SCHED_FIFO);
pthread_attr_setschedparam (&attr, &param);
pthread_attr_setinheritsched (&attr, PTHREAD_INHERIT_SCHED);
pthread_create (&th, &attr, fn, arg);
then the thread will inherit scheduling parameters from current thread,
so unless it has FIFO the the fn thread will not have FIFO policy.
If you:
pthread_attr_init (&attr);
pthread_attr_setschedpolicy (&attr, SCHED_FIFO);
pthread_attr_setschedparam (&attr, &param);
pthread_attr_setinheritsched (&attr, PTHREAD_EXPLICIT_SCHED);
pthread_create (&th, &attr, fn, arg);
then thread will have FIFO policy in both NPTL and LinuxThreads.
For details see
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_attr_getinheritsched.html

The reason why LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED and
NPTL defaults to PTHREAD_INHERIT_SCHED is that those are the cheaper
variants.  LinuxThreads has a manager thread which creates the child
threads, so for INHERIT_SCHED it needs to issue some syscalls to query
scheduling parameters of the thread which called pthread_create.
In addition to this, no matter what inheritsched setting was, if the
desired sched parameters are different from the initial thread, it
needs to issue a system call to set it for the new thread.
NPTL doesn't have a manager thread and a child thread inherits parent
thread's settings without any syscalls anywhere.  For
PTHREAD_EXPLICIT_SCHED, it needs to issue a system call to set scheduling
params to the desired ones.

	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 16:12   ` Paul Davis
@ 2004-06-30 17:07     ` Ulrich Drepper
  2004-06-30 17:50       ` Paul Davis
  0 siblings, 1 reply; 24+ messages in thread
From: Ulrich Drepper @ 2004-06-30 17:07 UTC (permalink / raw)
  To: Paul Davis; +Cc: Ingo Molnar, linux-kernel

Paul Davis wrote:

> this has me thinking. one of the major changes with NPTL is that all
> threads share the same PID. so how in the world do we ever set the
> scheduling policy of a single thread (as opposed to something
> identified by a pid_t) to SCHED_FIFO?

If you have to ask this question than it's no wonder you get erratic
behavior.  It means you haven't looked at the pthread interface at all.

Define a pthread_attr_t with the appropriate setting (with
pthread_attr_setschedparam etc) and create the thread (and use
pthread_attr_setinheritsched correctly).  Alternatively use
pthread_setschedparam on already running threads.

And use a recent enough nptl version.   Very early versions didn't have
any of the scheduler handling implemented.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 17:07     ` Ulrich Drepper
@ 2004-06-30 17:50       ` Paul Davis
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Davis @ 2004-06-30 17:50 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Ingo Molnar, linux-kernel

>> this has me thinking. one of the major changes with NPTL is that all
>> threads share the same PID. so how in the world do we ever set the
>> scheduling policy of a single thread (as opposed to something
>> identified by a pid_t) to SCHED_FIFO?
>
>If you have to ask this question than it's no wonder you get erratic
>behavior.  It means you haven't looked at the pthread interface at all.

thanks, i appreciate the ad hominem remarks. you think we could ever
get SCHED_FIFO if we were not familiar with these calls? this is
really unnecessary...

my question wasn't about the pthread API. it was about what kernel API
was used to implement it. the simple answer would have been that we
use the TID, not the PID, or to have just pointed me at the source.

>And use a recent enough nptl version.   Very early versions didn't have
>any of the scheduler handling implemented.

we already discovered that. the people testing this stuff are using
the most recent "stable" release of glibc, for the most part.

--p

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 16:57       ` Jakub Jelinek
@ 2004-06-30 17:52         ` Paul Davis
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Davis @ 2004-06-30 17:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ingo Molnar, linux-kernel

>On Wed, Jun 30, 2004 at 12:32:03PM -0400, Paul Davis wrote:
>> >One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
>> >while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
>> >So, if you care about what scheduling created threads will have
>> >and want it to work with both NPTL and LinuxThreads, you want
>> >pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
>> >explicitely.
>> 
>> But since we always set the scheduling class explicitly, should the
>> inherited scheduler class make any difference?
>
>Of course.

i understand that in the context of "pthread_attr_*; pthread_create();",
but we use pthread_create() and then set scheduling class/priority
within the new thread. Why would INHERIT_SCHED affect that? Does it?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis
  2004-06-30 15:04 ` Ingo Molnar
  2004-06-30 15:05 ` Ingo Molnar
@ 2004-07-01 18:03 ` Matt Mackall
  2004-07-01 18:14   ` William Lee Irwin III
  2 siblings, 1 reply; 24+ messages in thread
From: Matt Mackall @ 2004-07-01 18:03 UTC (permalink / raw)
  To: Paul Davis; +Cc: linux-kernel

On Wed, Jun 30, 2004 at 09:41:46AM -0400, Paul Davis wrote:
> Because of the recognition by kernel developers that 2.6 does not
> perform as well as 2.4+lowlat (the Andrew Morton patches) when it
> comes to scheduling latency, most audio developers and users have
> remained with 2.4. Recently however, several brave souls have
> attempted to test 2.6. The results have been mixed.

I'm afraid these "brave souls" have shown up to the baby shower after
the child's been accepted to college. Developers getting around to
testing 2.6 after multiple vendors are shipping it should not be
characterized as courageous.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-01 18:03 ` Matt Mackall
@ 2004-07-01 18:14   ` William Lee Irwin III
  2004-07-01 22:45     ` Andrew Morton
  2004-07-02  3:27     ` Paul Davis
  0 siblings, 2 replies; 24+ messages in thread
From: William Lee Irwin III @ 2004-07-01 18:14 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Paul Davis, linux-kernel

On Wed, Jun 30, 2004 at 09:41:46AM -0400, Paul Davis wrote:
>> Because of the recognition by kernel developers that 2.6 does not
>> perform as well as 2.4+lowlat (the Andrew Morton patches) when it
>> comes to scheduling latency, most audio developers and users have
>> remained with 2.4. Recently however, several brave souls have
>> attempted to test 2.6. The results have been mixed.

On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote:
> I'm afraid these "brave souls" have shown up to the baby shower after
> the child's been accepted to college. Developers getting around to
> testing 2.6 after multiple vendors are shipping it should not be
> characterized as courageous.

I appear to have nuked the thread you're replying to in disgust over
this precise issue.


-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-01 18:14   ` William Lee Irwin III
@ 2004-07-01 22:45     ` Andrew Morton
  2004-07-02  0:45       ` William Lee Irwin III
  2004-07-02  3:27     ` Paul Davis
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2004-07-01 22:45 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: mpm, paul, linux-kernel

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote:
> > I'm afraid these "brave souls" have shown up to the baby shower after
> > the child's been accepted to college. Developers getting around to
> > testing 2.6 after multiple vendors are shipping it should not be
> > characterized as courageous.
> 
> I appear to have nuked the thread you're replying to in disgust over
> this precise issue.

In fairness, the CPU scheduler has been spinning like a top for a
couple of years, and it still ain't settled.

That's just the one in Linus's tree, let alone the umpteen rewrites which are
floating about.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-01 22:45     ` Andrew Morton
@ 2004-07-02  0:45       ` William Lee Irwin III
  2004-07-02  1:38         ` Peter Williams
  2004-07-02  3:03         ` Con Kolivas
  0 siblings, 2 replies; 24+ messages in thread
From: William Lee Irwin III @ 2004-07-02  0:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mpm, paul, linux-kernel

On Thu, Jul 01, 2004 at 03:45:54PM -0700, Andrew Morton wrote:
> In fairness, the CPU scheduler has been spinning like a top for a
> couple of years, and it still ain't settled.
> That's just the one in Linus's tree, let alone the umpteen rewrites
> which are floating about.

I've not seen much deep material there. Policy tweaks seem to be
what's gone on in mainline, and frankly most of the purported rewrites
are just that. I guess the ones that nuked the duelling queue silliness
are trying qualify but even they're leaving the load balancer untouched
and are carrying over large fractions of their predecessors unaltered.
The stuff that's gone around looks minor. It's not like they're teaching
sched.c to play cpu tetris for gang scheduling or Kalman filtering
profiling feedback to stripe tasks using different cpu resources across
SMT siblings or playing graph games to meet RT deadlines, so it doesn't
look like very much at all is going on to me.

It's pretty obvious why everyone and their brother is grinding out
purported scheduler rewrites: the code is self-contained, however,
nothing interesting is coming of all this. Never been for have so many
patches been written against the same file, accomplishing so little.

-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02  0:45       ` William Lee Irwin III
@ 2004-07-02  1:38         ` Peter Williams
  2004-07-02  2:53           ` William Lee Irwin III
  2004-07-02  3:03         ` Con Kolivas
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Williams @ 2004-07-02  1:38 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, mpm, paul, linux-kernel

William Lee Irwin III wrote:
> On Thu, Jul 01, 2004 at 03:45:54PM -0700, Andrew Morton wrote:
> 
>>In fairness, the CPU scheduler has been spinning like a top for a
>>couple of years, and it still ain't settled.
>>That's just the one in Linus's tree, let alone the umpteen rewrites
>>which are floating about.
> 
> 
> I've not seen much deep material there. Policy tweaks seem to be
> what's gone on in mainline, and frankly most of the purported rewrites
> are just that. I guess the ones that nuked the duelling queue silliness
> are trying qualify but even they're leaving the load balancer untouched
> and are carrying over large fractions of their predecessors unaltered.

That's because it's not all bad (or the problems are minor and can wait 
until later).

> The stuff that's gone around looks minor. It's not like they're teaching
> sched.c to play cpu tetris for gang scheduling or Kalman filtering
> profiling feedback to stripe tasks using different cpu resources across
> SMT siblings or playing graph games to meet RT deadlines, so it doesn't
> look like very much at all is going on to me.

To my mind, scheduling and load balancing are ALMOST orthogonal 
concepts.  Scheduling is concerned with doing a useful job within a 
single CPU and load balancing is about distributing tasks/load among the 
available CPUs.  To a large extent these are independent and are being 
worked on separately.  I am one of those fiddling with the schedulers 
but I'm leaving load balancing alone as it seems to me that the NUMA and 
hyper threading developers are the main players for that component.

To my mind the only contribution the scheduler component MAY want to 
make to load balancing would be to have some say in which tasks are 
chosen for migration.  I don't think that any of the currently proposed 
schedulers have a strong need to change the current mechanism(s) for 
selecting which tasks get migrated.  If you think otherwise please share 
your thoughts?

> 
> It's pretty obvious why everyone and their brother is grinding out
> purported scheduler rewrites: the code is self-contained,

The main reason is that the standard scheduler is a bit of a mess. The 
fact that the code is self contained just makes it easier to modify 
without touching lots of files. It's not the reason why the changes are 
being tried.

> however,
> nothing interesting is coming of all this. Never been for have so many
> patches been written against the same file, accomplishing so little.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02  1:38         ` Peter Williams
@ 2004-07-02  2:53           ` William Lee Irwin III
  0 siblings, 0 replies; 24+ messages in thread
From: William Lee Irwin III @ 2004-07-02  2:53 UTC (permalink / raw)
  To: Peter Williams; +Cc: Andrew Morton, mpm, paul, linux-kernel

William Lee Irwin III wrote:
>> I've not seen much deep material there. Policy tweaks seem to be
>> what's gone on in mainline, and frankly most of the purported rewrites
>> are just that. I guess the ones that nuked the duelling queue silliness
>> are trying qualify but even they're leaving the load balancer untouched
>> and are carrying over large fractions of their predecessors unaltered.

On Fri, Jul 02, 2004 at 11:38:17AM +1000, Peter Williams wrote:
> That's because it's not all bad (or the problems are minor and can wait 
> until later).

Whatever that has to do with, it doesn't really make the fiddling around
going on even noticeable. Hell, I do end-luserish crap too (amazing, I
actually appear to need luserspace to get code written) I've yet to see
a visible change in scheduler behavior in that context across all of
2.4 and 2.5 (and in fact since the earliest Linux kernels I've ever run)
apart from a reduction in cpu time spent in the scheduler itself
associated with (you guessed it) the merge of the incremental epoch
expiry stuff mingo did around early 2.5 (or at least that's the best
description I can come up with the algorithm, as it doesn't resemble any
of the normal algorithms). I suspect widespread placebo effects.

William Lee Irwin III wrote:
>> The stuff that's gone around looks minor. It's not like they're teaching
>> sched.c to play cpu tetris for gang scheduling or Kalman filtering
>> profiling feedback to stripe tasks using different cpu resources across
>> SMT siblings or playing graph games to meet RT deadlines, so it doesn't
>> look like very much at all is going on to me.

On Fri, Jul 02, 2004 at 11:38:17AM +1000, Peter Williams wrote:
> To my mind, scheduling and load balancing are ALMOST orthogonal 
> concepts.  Scheduling is concerned with doing a useful job within a 
> single CPU and load balancing is about distributing tasks/load among the 
> available CPUs.  To a large extent these are independent and are being 
> worked on separately.  I am one of those fiddling with the schedulers 
> but I'm leaving load balancing alone as it seems to me that the NUMA and 
> hyper threading developers are the main players for that component.
> To my mind the only contribution the scheduler component MAY want to 
> make to load balancing would be to have some say in which tasks are 
> chosen for migration.  I don't think that any of the currently proposed 
> schedulers have a strong need to change the current mechanism(s) for 
> selecting which tasks get migrated.  If you think otherwise please share 
> your thoughts?

That's an expedient program structure. There is no independence. Those
are examples of things that would have qualified as having been remotely
visible changes and not myriads of infinitesimal intra-queue twiddlings.

No, I don't want to touch scheduling policy (or anything else infested
with such massive quantities of holy penguin pee) with a 10-foot pole.

William Lee Irwin III wrote:
>> It's pretty obvious why everyone and their brother is grinding out
>> purported scheduler rewrites: the code is self-contained,

On Fri, Jul 02, 2004 at 11:38:17AM +1000, Peter Williams wrote:
> The main reason is that the standard scheduler is a bit of a mess. The 
> fact that the code is self contained just makes it easier to modify 
> without touching lots of files. It's not the reason why the changes are 
> being tried.

It means the barrier to entry is very low.

William Lee Irwin III wrote:
>> however,
>> nothing interesting is coming of all this. Never been for have so many
>> patches been written against the same file, accomplishing so little.

s/been for/before/

I wonder why I've started making homophone errors only in the past 5 years
where beforehand they were very rare. It's not like I started sounding out
words when I read or anything idiotic like that.

-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02  0:45       ` William Lee Irwin III
  2004-07-02  1:38         ` Peter Williams
@ 2004-07-02  3:03         ` Con Kolivas
  2004-07-02  3:05           ` William Lee Irwin III
  1 sibling, 1 reply; 24+ messages in thread
From: Con Kolivas @ 2004-07-02  3:03 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, mpm, paul, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

William Lee Irwin III wrote:
| On Thu, Jul 01, 2004 at 03:45:54PM -0700, Andrew Morton wrote:
|
|>In fairness, the CPU scheduler has been spinning like a top for a
|>couple of years, and it still ain't settled.
|>That's just the one in Linus's tree, let alone the umpteen rewrites
|>which are floating about.
|
|
| I've not seen much deep material there. Policy tweaks seem to be
| what's gone on in mainline, and frankly most of the purported rewrites
| are just that. I guess the ones that nuked the duelling queue silliness
| are trying qualify but even they're leaving the load balancer untouched
| and are carrying over large fractions of their predecessors unaltered.
| The stuff that's gone around looks minor. It's not like they're teaching
| sched.c to play cpu tetris for gang scheduling or Kalman filtering
| profiling feedback to stripe tasks using different cpu resources across
| SMT siblings or playing graph games to meet RT deadlines, so it doesn't
| look like very much at all is going on to me.

My impetus for doing a policy rewrite was the recurring complaint that
the 2.6 scheduler is currently too complicated for even basic
scheduling. I see no point in trying to implement other changes until
the framework for normal policies is in place that can be built on. I
don't see even the policy rewrites as being appropriate for 2.6, let
alone anything fancier. If we have something in place that more people
than not agree is satisfactory for normal scheduling, then more can be
added for 2.7+ development.

Con

| It's pretty obvious why everyone and their brother is grinding out
| purported scheduler rewrites: the code is self-contained, however,
| nothing interesting is coming of all this. Never been for have so many
| patches been written against the same file, accomplishing so little.
|
| -- wli
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA5NCKZUg7+tp6mRURAj/JAJ4qJzKxXWCUOT+LDBoGs0MEMi21owCfZqGo
S8scT9Ro6DbvumUt060ctOU=
=6I3d
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02  3:03         ` Con Kolivas
@ 2004-07-02  3:05           ` William Lee Irwin III
  0 siblings, 0 replies; 24+ messages in thread
From: William Lee Irwin III @ 2004-07-02  3:05 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, mpm, paul, linux-kernel

On Fri, Jul 02, 2004 at 01:03:39PM +1000, Con Kolivas wrote:
> My impetus for doing a policy rewrite was the recurring complaint that
> the 2.6 scheduler is currently too complicated for even basic
> scheduling. I see no point in trying to implement other changes until
> the framework for normal policies is in place that can be built on. I
> don't see even the policy rewrites as being appropriate for 2.6, let
> alone anything fancier. If we have something in place that more people
> than not agree is satisfactory for normal scheduling, then more can be
> added for 2.7+ development.

The point I had was really that what's going on is very minor.


-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-01 18:14   ` William Lee Irwin III
  2004-07-01 22:45     ` Andrew Morton
@ 2004-07-02  3:27     ` Paul Davis
  2004-07-02  7:37       ` William Lee Irwin III
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Davis @ 2004-07-02  3:27 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Matt Mackall, linux-kernel

>On Wed, Jun 30, 2004 at 09:41:46AM -0400, Paul Davis wrote:
>>> Because of the recognition by kernel developers that 2.6 does not
>>> perform as well as 2.4+lowlat (the Andrew Morton patches) when it
>>> comes to scheduling latency, most audio developers and users have
>>> remained with 2.4. Recently however, several brave souls have
>>> attempted to test 2.6. The results have been mixed.
>
>On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote:
>> I'm afraid these "brave souls" have shown up to the baby shower after
>> the child's been accepted to college. Developers getting around to
>> testing 2.6 after multiple vendors are shipping it should not be
>> characterized as courageous.

I call BS on this response.

We were told by A(ndrew)M(orton) and several other people that 2.6
would not be as good as 2.4 for low latency real time audio. It was
made clear that the preemption patches were considered more
appropriate even though they did not do anywhere near as reliable an
improvement as AM's lowlat patches. We found out (and I mean no
discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat
patches) that the author of the premiere lowlat patches for 2.4 would
not be maintaining a similar set for 2.6. We also found during the
development of 2.5 that there were a number of areas of real concern,
(the VM subsystem and the scheduler and the disk subsystems) but that
many notable kernel developers were not particularly interested in our
needs - we were considered odd, edge case studies.

So we just punted and said "ah, its OK, we still have 2.4 and that
works really, really well". I spent a lot of time working debugging,
testing, measuring and playing with on 2.3 and 2.4. I even tested the
HRT patches with great anticipation (they didn't work very well at
all, and I didn't have time to spend tracking that down then). I'm
terribly sorry, but I don't have time to do full-scale kernel
debugging and also develop applications that have already taken 4+
years to get to "useful". Frankly, the mess of dealing with the
development process for 2.3/2.4, with a VM subsystem that took a year
to stabilize into a situation where we could reliably stream realistic
audio workloads didn't make me feel too good when I started reading
about similar issues in 2.5 before it was even half-done. I tested
just about every MM patch from andrea and rik that came out for
2.3/2.4 - I did not have time to do that with 2.5.

And 2.4.19+ does work really well. The problem is that users are now
booting up 2.6 and finding out that (1) the deep changes in the thread
system have not been fully tested with real time thread applications
and (2) the scheduler, VM and disk subsystems appear to be conspiring
to prevent performance equivalent to 2.4+lowlat. Are we suprised? No,
we knew this would be the case? Are we complaining? Not really. Are we
asking for help? Are we offering to try to help as best we can? Yes,
we certainly are.

Courageous? Yes, because they are willing to start testing a kernel
that has been developed with an open admission by the kernel
development group that our needs are not considered particularly
important or relevant (and there is nothing wrong with that, just to
be clear about it). Linus made it clear 2 years ago that we weren't
going to get what we needed any time soon, and personally, I am
entirely happy with telling people to use 2.4+lowlat instead. There
are several distributions of Linux that build precisely this kernel
for users, and those users are very happy with it.

But NPTL has muddied the situation considerably. People did test NPTL
when it came out. It seemed to work perfectly OK. So we just assumed
that it would always work perfectly OK. It turns out, however, that it
no longer does. And therefore I wrote to try to find out what we could
do figure it out. 

>I appear to have nuked the thread you're replying to in disgust over
>this precise issue.

Disgust? Thanks for sharing.

--p

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02  3:27     ` Paul Davis
@ 2004-07-02  7:37       ` William Lee Irwin III
  2004-07-02 10:40         ` Takashi Iwai
  2004-07-02 14:42         ` Paul Davis
  0 siblings, 2 replies; 24+ messages in thread
From: William Lee Irwin III @ 2004-07-02  7:37 UTC (permalink / raw)
  To: Paul Davis; +Cc: Matt Mackall, linux-kernel

On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote:
>>> I'm afraid these "brave souls" have shown up to the baby shower after
>>> the child's been accepted to college. Developers getting around to
>>> testing 2.6 after multiple vendors are shipping it should not be
>>> characterized as courageous.

On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
> I call BS on this response.
> We were told by A(ndrew)M(orton) and several other people that 2.6
> would not be as good as 2.4 for low latency real time audio. It was
> made clear that the preemption patches were considered more
> appropriate even though they did not do anywhere near as reliable an
> improvement as AM's lowlat patches. We found out (and I mean no
> discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat
> patches) that the author of the premiere lowlat patches for 2.4 would
> not be maintaining a similar set for 2.6. We also found during the
> development of 2.5 that there were a number of areas of real concern,
> (the VM subsystem and the scheduler and the disk subsystems) but that
> many notable kernel developers were not particularly interested in our
> needs - we were considered odd, edge case studies.

Not only are lowlat-alike changes in mainline 2.6, the algorithms where
lowlat found explicit preemption points were necessary have been changed
in a number of cases to be asymptotically faster.

So you gave no feedback. What do you expect us to do? There are
enough other bugreports to keep us busy without testing the known
universe on behalf of you or anyone else sitting around waiting
silently for their needs to magically be addressed.

On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
> So we just punted and said "ah, its OK, we still have 2.4 and that
> works really, really well". I spent a lot of time working debugging,
> testing, measuring and playing with on 2.3 and 2.4. I even tested the
> HRT patches with great anticipation (they didn't work very well at
> all, and I didn't have time to spend tracking that down then). I'm
> terribly sorry, but I don't have time to do full-scale kernel
> debugging and also develop applications that have already taken 4+
> years to get to "useful". Frankly, the mess of dealing with the
> development process for 2.3/2.4, with a VM subsystem that took a year
> to stabilize into a situation where we could reliably stream realistic
> audio workloads didn't make me feel too good when I started reading
> about similar issues in 2.5 before it was even half-done. I tested
> just about every MM patch from andrea and rik that came out for
> 2.3/2.4 - I did not have time to do that with 2.5.

This level of participation is by no means a requirement. Just show
up, say, "I've got a problem, latency sucked $HERE while doing $THIS",
and it will be quashed in a manner similar to other performance and
functional issues when they're properly reported.

	At some point in the past, you wrote:
	>>> However, the ONLY way to get even vaguely reasonable
	>>> performance in this area is to disable the use of NPTL
	>>> using LD_ASSUME_KERNEL. With NPTL in use, there are a
	>>> series of apparently interlocking problems with scheduler
	>>> parameter inheritance, scheduler performance and decision
	>>> making. Its more or less impossible to run JACK-enabled audio
	>>> systems on 2.6 with NPTL. A series of ugly kludges are
	>>> beginning to emerge within the Linux audio community, and
	>>> I think its time we cut them off before things get out of hand.

The thing that went wrong here is that the report is very non-specific.
mingo, jakub, and uli had to go diving into your app's source etc.
hunting for bugs in your app, which is very nice of them to do, but not
really the way things are supposed to work. Narrowing the presumed
kernel issue down to a small enough userspace testcase or section of
code that you can reasonably post it is pretty much a burden you should
have taken on.

For one, the description of the nasty kludges or code that worked in
2.4 but not 2.6 should have been up-front. e.g. "I'm trying to get an
app to SCHED_FIFO, $FOO isn't working in 2.6 but does in 2.4" and bonus
points for "and my workaround to get it set up in 2.6 is $BAR" and so on.

On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
> And 2.4.19+ does work really well. The problem is that users are now
> booting up 2.6 and finding out that (1) the deep changes in the thread
> system have not been fully tested with real time thread applications
> and (2) the scheduler, VM and disk subsystems appear to be conspiring
> to prevent performance equivalent to 2.4+lowlat. Are we suprised? No,
> we knew this would be the case? Are we complaining? Not really. Are we
> asking for help? Are we offering to try to help as best we can? Yes,
> we certainly are.

The RT threading bits sounded largely like a userspace API change that
broke the app's initialization sequence, and that appears to be getting
fielded by mingo, jakub, and uli.

On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
> Courageous? Yes, because they are willing to start testing a kernel
> that has been developed with an open admission by the kernel
> development group that our needs are not considered particularly
> important or relevant (and there is nothing wrong with that, just to
> be clear about it). Linus made it clear 2 years ago that we weren't
> going to get what we needed any time soon, and personally, I am
> entirely happy with telling people to use 2.4+lowlat instead. There
> are several distributions of Linux that build precisely this kernel
> for users, and those users are very happy with it.

The userbase is so broad no one user group's needs are particularly
dominant. Surprise! You're coexisting with everyone else.

On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
> But NPTL has muddied the situation considerably. People did test NPTL
> when it came out. It seemed to work perfectly OK. So we just assumed
> that it would always work perfectly OK. It turns out, however, that it
> no longer does. And therefore I wrote to try to find out what we could
> do figure it out. 

This is too vague to do anything with; write up a coherent bug/problem
report for glibc and/or kernel maintainers to do something about.
"LD_ASSUME_KERNEL mysteriously makes app run smoother" is really
something you should have determined a proximal cause for before broad
sweeping statements about 2.6 ignoring the needs of whatever category
of apps this is in some misguided attempt to motivate someone to
discover the root cause and repair it on your behalf. Otherwise, if
LD_ASSUME_KERNEL fixes it for you, why would we care?

At some point in the past, I wrote:
>> I appear to have nuked the thread you're replying to in disgust over
>> this precise issue.

On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
> Disgust? Thanks for sharing.

Yes, disgust. You presume you are an isolated case, or potentially
"special". This is not so. There are people crawling out of the woodwork
all the time complaining about vague "$BIZARRE_ANCIENT_KERNEL did better
than -CURRENT" issues. Thus far your postings are indistinguishable
from those, and whether you like it or not, they're being classified
right alongside those due to their lack of specificity. Everyone's got
some kind of substance hidden somewhere. Presentation matters. We're
pulled in too many different directions to play guessing games and dive
into every userspace app whose author screams "regression!" that comes along.

In summary:
(1) please try to present adequate information directly
	-- describe your situation directly instead of needing people
	-- to debug your apps for you
(2) please avoid vague generalizations like "2.6 is ignoring RT audio"
	-- they're noninformative and inflammatory
(3) please test major kernel versions promptly after release
	-- this doesn't require particularly much effort, major kernel
	-- versions are infrequently released, and we don't actually
	-- need intense debugging/etc. from you, merely self-contained
	-- examples or descriptions of whatever is going wrong in
	-- userspace. Your description (not example) was not self-contained.

-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02  7:37       ` William Lee Irwin III
@ 2004-07-02 10:40         ` Takashi Iwai
  2004-07-06  0:48           ` Peter Williams
  2004-07-02 14:42         ` Paul Davis
  1 sibling, 1 reply; 24+ messages in thread
From: Takashi Iwai @ 2004-07-02 10:40 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Paul Davis, Matt Mackall, linux-kernel

At Fri, 2 Jul 2004 00:37:49 -0700,
William Lee Irwin III wrote:
> 
> On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote:
> >>> I'm afraid these "brave souls" have shown up to the baby shower after
> >>> the child's been accepted to college. Developers getting around to
> >>> testing 2.6 after multiple vendors are shipping it should not be
> >>> characterized as courageous.
> 
> On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
> > I call BS on this response.
> > We were told by A(ndrew)M(orton) and several other people that 2.6
> > would not be as good as 2.4 for low latency real time audio. It was
> > made clear that the preemption patches were considered more
> > appropriate even though they did not do anywhere near as reliable an
> > improvement as AM's lowlat patches. We found out (and I mean no
> > discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat
> > patches) that the author of the premiere lowlat patches for 2.4 would
> > not be maintaining a similar set for 2.6. We also found during the
> > development of 2.5 that there were a number of areas of real concern,
> > (the VM subsystem and the scheduler and the disk subsystems) but that
> > many notable kernel developers were not particularly interested in our
> > needs - we were considered odd, edge case studies.
> 
> Not only are lowlat-alike changes in mainline 2.6, the algorithms where
> lowlat found explicit preemption points were necessary have been changed
> in a number of cases to be asymptotically faster.
> 
> So you gave no feedback. What do you expect us to do? There are
> enough other bugreports to keep us busy without testing the known
> universe on behalf of you or anyone else sitting around waiting
> silently for their needs to magically be addressed.

Well, the point is that no kernel developer is watching and working on
low-latency fixes regulariy for 2.6 kernels, as Andrew did for every
2.4 release.  And, the users can't report easily what gets wrong.
(If the report were something like '2.6.x worked but 2.6.y not', it
 would be easy to figure out, but many users experience this problem
 between 2.4 and 2.6...)

Maybe this situation can be improved by enabling the xrun_debug proc
switch on ALSA, which shows the stack trace when a buffer
over/underrun happens.  Also, running a latencytest program would be
helpful for spotting out the problem.

BTW, 2.6 kernel works pretty well on my system.  Perhaps it's because
I run jackd directly as root.

I've also heard some people complaining after replacement with 2.6,
too, but I believe it's either driver-specific problem or a bug caused
by the NPTL incompatibility reported on this thread.
AFAIK, there are still some problematic parts, for example, a long
lock in shrink_dcache_parent(), and too-long RCU jobs in a tasklet,
but they are relatively minor. 

> In summary:
> (1) please try to present adequate information directly
> 	-- describe your situation directly instead of needing people
> 	-- to debug your apps for you

The problem is the incompatibility between NPTL and LinuxThreads.
As Paul pointed, if calling pthread_setschedparm() has no influence
_after_ creating the thread, it sounds like a bug to me.  This might
be a problem of glibc, not of kernel.  We don't know even it. 

Anyway, we'll need a small testcase to reproduce this problem...

--
Takashi Iwai <tiwai@suse.de>		ALSA Developer - www.alsa-project.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02  7:37       ` William Lee Irwin III
  2004-07-02 10:40         ` Takashi Iwai
@ 2004-07-02 14:42         ` Paul Davis
  1 sibling, 0 replies; 24+ messages in thread
From: Paul Davis @ 2004-07-02 14:42 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

>Not only are lowlat-alike changes in mainline 2.6, the algorithms where
>lowlat found explicit preemption points were necessary have been changed
>in a number of cases to be asymptotically faster.

*Some* of the algorithms. 

>So you gave no feedback. What do you expect us to do? There are

Actually, the linux audio community gave quite a lot of feedback early
in the life of 2.5, most of it directly to andrew, ingo and robert
(love). The situation wasn't good at all. It wasn't all the scheduler
(although that was pretty bad) and explicit preemption (which was
basically missing inspite of the preemption patch) - the VM system was
hosed for massive disk streaming, for example - and the feedback we
got, while sympathetic, was basically of the form "2.{5,6} isn't going
to the lowlat route, please wait and see what we come up with". So we
waited.

>> about similar issues in 2.5 before it was even half-done. I tested
>> just about every MM patch from andrea and rik that came out for
>> 2.3/2.4 - I did not have time to do that with 2.5.
>
>This level of participation is by no means a requirement. Just show

Given my sporadic observations of the kernel mailing list over the
last five years, I'd say that it often is a requirement, especially if
you are dealing with workloads and application behaviour that is
fundamentally different to the usual "linux stuff" and cannot be
reduced to simple test cases. And I've been happy to provide it when
there is some indication that the resulting feedback will make a
difference. Andrea, Ingo and Andrew all provided that sense of purpose
for 2.4.

>The thing that went wrong here is that the report is very non-specific.

We don't have anything very specific to report. Sometimes, the most
helpful bug reports start life as someone asking "this doesn't seem to
work very well under conditions X, Y but its OK with Z". Someone turns
around and says "oh, duh!" and the problem is fixed. Apparently, in
this situation, that may not be the case. No problem. We'll come back
with more specifics.

>really the way things are supposed to work. Narrowing the presumed
>kernel issue down to a small enough userspace testcase or section of
>code that you can reasonably post it is pretty much a burden you should
>have taken on.

And I/we're willing to do that (and have been doing that) once its
clear that this is the right path.

>For one, the description of the nasty kludges or code that worked in
>2.4 but not 2.6 should have been up-front. e.g. "I'm trying to get an

There were *no* nasty kludges in JACK for 2.4 unless you refer to a
technique recommended by many *nix programming books and wizards over
the last 20 years to deal with the rather limited security model that
Linux was offering in 2.4 along with its POSIX cousins. And I note in
passing that as within a week or two of us discovering the security
module system in 2.6, someone in the audio community immediately wrote
a very nice kernel module to remove the need for jackstart.

It would also be nice if you could at least implicitly acknowledge
that the one of the major reasons (mvista being the other) that the
latency performance of linux has improved in the last 4 years is
because us RT audio guys have done such nasty, fucked up, useless,
pathetic job of requesting and collaborating on efforts to improve
it. The preemption patch came from a different direction, and didn't
accomplish the same thing - hardly anyone on the kernel list seemed to
care that the kernel was filled with 50ms interrupt masks until we
started explaining how it made linux unusable for certain things that
worked OK on windows + macos; this then led to many of us helping ingo
and andrew in their incredible attempts to fix things.

That doesn't let us off the hook of decent bug reporting, but if you
could at least quit the adult-lecturing-recalcitrant-adolescent tone,
there would be more useful exchanges going on.

--p

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X, NPTL, SCHED_FIFO and JACK
  2004-07-02 10:40         ` Takashi Iwai
@ 2004-07-06  0:48           ` Peter Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Williams @ 2004-07-06  0:48 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Paul Davis, Matt Mackall, linux-kernel

Takashi Iwai wrote:
> At Fri, 2 Jul 2004 00:37:49 -0700,
> William Lee Irwin III wrote:
> 
>>On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote:
>>
>>>>>I'm afraid these "brave souls" have shown up to the baby shower after
>>>>>the child's been accepted to college. Developers getting around to
>>>>>testing 2.6 after multiple vendors are shipping it should not be
>>>>>characterized as courageous.
>>
>>On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote:
>>
>>>I call BS on this response.
>>>We were told by A(ndrew)M(orton) and several other people that 2.6
>>>would not be as good as 2.4 for low latency real time audio. It was
>>>made clear that the preemption patches were considered more
>>>appropriate even though they did not do anywhere near as reliable an
>>>improvement as AM's lowlat patches. We found out (and I mean no
>>>discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat
>>>patches) that the author of the premiere lowlat patches for 2.4 would
>>>not be maintaining a similar set for 2.6. We also found during the
>>>development of 2.5 that there were a number of areas of real concern,
>>>(the VM subsystem and the scheduler and the disk subsystems) but that
>>>many notable kernel developers were not particularly interested in our
>>>needs - we were considered odd, edge case studies.
>>
>>Not only are lowlat-alike changes in mainline 2.6, the algorithms where
>>lowlat found explicit preemption points were necessary have been changed
>>in a number of cases to be asymptotically faster.
>>
>>So you gave no feedback. What do you expect us to do? There are
>>enough other bugreports to keep us busy without testing the known
>>universe on behalf of you or anyone else sitting around waiting
>>silently for their needs to magically be addressed.
> 
> 
> Well, the point is that no kernel developer is watching and working on
> low-latency fixes regulariy for 2.6 kernels, as Andrew did for every
> 2.4 release.  And, the users can't report easily what gets wrong.
> (If the report were something like '2.6.x worked but 2.6.y not', it
>  would be easy to figure out, but many users experience this problem
>  between 2.4 and 2.6...)
> 
> Maybe this situation can be improved by enabling the xrun_debug proc
> switch on ALSA, which shows the stack trace when a buffer
> over/underrun happens.  Also, running a latencytest program would be
> helpful for spotting out the problem.
> 
> 
> BTW, 2.6 kernel works pretty well on my system.  Perhaps it's because
> I run jackd directly as root.
> 
> I've also heard some people complaining after replacement with 2.6,
> too, but I believe it's either driver-specific problem or a bug caused
> by the NPTL incompatibility reported on this thread.
> AFAIK, there are still some problematic parts, for example, a long
> lock in shrink_dcache_parent(), and too-long RCU jobs in a tasklet,
> but they are relatively minor. 
> 
> 
> 
>>In summary:
>>(1) please try to present adequate information directly
>>	-- describe your situation directly instead of needing people
>>	-- to debug your apps for you
> 
> 
> The problem is the incompatibility between NPTL and LinuxThreads.
> As Paul pointed, if calling pthread_setschedparm() has no influence
> _after_ creating the thread, it sounds like a bug to me.  This might
> be a problem of glibc, not of kernel.  We don't know even it. 
> 
> Anyway, we'll need a small testcase to reproduce this problem...

Version 1.4 of the various SPA schedulers (for 2.6.7) are available for 
download at <https://sourceforge.net/projects/cpuse/>.  In this 
modification I have attempted to minimize the scheduling overhead costs 
for SCHED_FIFO tasks.  I would appreciate any feedback on how successful 
I have been.

Thanks
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2004-07-06  3:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis
2004-06-30 15:04 ` Ingo Molnar
2004-06-30 15:18   ` Ingo Molnar
2004-06-30 15:26   ` Jakub Jelinek
2004-06-30 16:32     ` Paul Davis
2004-06-30 16:57       ` Jakub Jelinek
2004-06-30 17:52         ` Paul Davis
2004-06-30 15:05 ` Ingo Molnar
2004-06-30 16:12   ` Paul Davis
2004-06-30 17:07     ` Ulrich Drepper
2004-06-30 17:50       ` Paul Davis
2004-07-01 18:03 ` Matt Mackall
2004-07-01 18:14   ` William Lee Irwin III
2004-07-01 22:45     ` Andrew Morton
2004-07-02  0:45       ` William Lee Irwin III
2004-07-02  1:38         ` Peter Williams
2004-07-02  2:53           ` William Lee Irwin III
2004-07-02  3:03         ` Con Kolivas
2004-07-02  3:05           ` William Lee Irwin III
2004-07-02  3:27     ` Paul Davis
2004-07-02  7:37       ` William Lee Irwin III
2004-07-02 10:40         ` Takashi Iwai
2004-07-06  0:48           ` Peter Williams
2004-07-02 14:42         ` Paul Davis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox