* 2.6.X, NPTL, SCHED_FIFO and JACK
@ 2004-06-30 13:41 Paul Davis
2004-06-30 15:04 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Paul Davis @ 2004-06-30 13:41 UTC (permalink / raw)
To: linux-kernel
JACK is the de-facto standard for low latency audio and
inter-application audio routing on Linux (its also widely appreciated
on OS X too). It makes heavy use of threads to provide the
functionality relied on by more than 2 dozen serious Linux audio
applications. For many users, its a requirement to use SCHED_FIFO and
mlockall() with audio applications, because of the realtime, low
latency nature of their configurations/goals.
Because of the recognition by kernel developers that 2.6 does not
perform as well as 2.4+lowlat (the Andrew Morton patches) when it
comes to scheduling latency, most audio developers and users have
remained with 2.4. Recently however, several brave souls have
attempted to test 2.6. The results have been mixed.
On the one hand, it does seem possible to get performance from an
unpatched 2.6 kernel that is pretty close to the 2.4+lowlat
numbers. Using the CKolivas patches for 2.6 only improves things
further.
However, the ONLY way to get even vaguely reasonable performance in
this area is to disable the use of NPTL using LD_ASSUME_KERNEL. With
NPTL in use, there are a series of apparently interlocking problems
with scheduler parameter inheritance, scheduler performance and
decision making. Its more or less impossible to run JACK-enabled audio
systems on 2.6 with NPTL. A series of ugly kludges are beginning to
emerge within the Linux audio community, and I think its time we cut
them off before things get out of hand.
The JACK group is entirely open to the idea that we have made an error
in our use of the pthreads API, and that NPTL is simply exposing our
mistake. We can't see the error, however, and so for the moment, we
are working on the assumption that there are genuine kernel+glibc
errors.
The first and most visible issue is with inheritance of SCHED_FIFO
scheduling. Although there are other mechanisms available under 2.6,
many people use the "jackstart" helper application which runs setuid
root and uses capabilities to start up JACK with the required caps to
allow use of SCHED_FIFO and mlockall(). This has worked very well in
2.4 for about 2 years, but in 2.6 JACK fails to get its threads to be
in the SCHED_FIFO scheduling class without a bunch of nasty kludges.
Things work correctly as soon as LD_ASSUME_KERNEL is used.
We also see apparently impossible thread scheduling, where a thread
that should run immediately is delayed by a significant time, and the
thread that woke the first one up (and should be waiting for it to
execute) runs again, apparently without ever having blocked. Once
more, it all works correctly is LD_ASSUME_KERNEL is used to avoid
NPTL.
Are there known issues with the implementation of NPTL that might give
rise to this behaviour? What can we do to help understand and debug
it?
thanks,
Paul Davis <paul@linuxaudiosystems.com> Bala Cynwyd, PA, USA
Linux Audio Systems 610-667-4807
----------------------------------------------------------------------------
hybrid rather than pure; compromising rather than clean;
distorted rather than straightforward; ambiguous rather than
articulated; both-and rather than either-or; the difficult
unity of inclusion rather than the easy unity of exclusion. Robert Venturi
----------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis @ 2004-06-30 15:04 ` Ingo Molnar 2004-06-30 15:18 ` Ingo Molnar 2004-06-30 15:26 ` Jakub Jelinek 2004-06-30 15:05 ` Ingo Molnar 2004-07-01 18:03 ` Matt Mackall 2 siblings, 2 replies; 24+ messages in thread From: Ingo Molnar @ 2004-06-30 15:04 UTC (permalink / raw) To: Paul Davis; +Cc: linux-kernel * Paul Davis <paul@linuxaudiosystems.com> wrote: > The first and most visible issue is with inheritance of SCHED_FIFO > scheduling. Although there are other mechanisms available under 2.6, > many people use the "jackstart" helper application which runs setuid > root and uses capabilities to start up JACK with the required caps to > allow use of SCHED_FIFO and mlockall(). This has worked very well in > 2.4 for about 2 years, but in 2.6 JACK fails to get its threads to be > in the SCHED_FIFO scheduling class without a bunch of nasty kludges. > > Things work correctly as soon as LD_ASSUME_KERNEL is used. A simple "strace -f" should show whether the setscheduler() call succeeds or not. Does 'jackstart' do anything with glibc internals? > We also see apparently impossible thread scheduling, where a thread > that should run immediately is delayed by a significant time, and the > thread that woke the first one up (and should be waiting for it to > execute) runs again, apparently without ever having blocked. Once > more, it all works correctly is LD_ASSUME_KERNEL is used to avoid > NPTL. there was a SCHED_FIFO bug in all 2.6 kernels prior 2.6.5, causing erratic scheduling. Have you tried 2.6.6 or 2.6.7? > Are there known issues with the implementation of NPTL that might give > rise to this behaviour? What can we do to help understand and debug > it? there's nothing special about NPTL, scheduling-wise. But if SCHED_FIFO is not properly set for all JACK threads that could explain the symptoms. You talked about kludges that are necessary to make all threads SCHED_FIFO - are you 100% sure that all JACK threads are indeed SCHED_FIFO after these kludges are applied? If yes and you are running a later kernel then it's something new and probably NPTL-unrelated. Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 15:04 ` Ingo Molnar @ 2004-06-30 15:18 ` Ingo Molnar 2004-06-30 15:26 ` Jakub Jelinek 1 sibling, 0 replies; 24+ messages in thread From: Ingo Molnar @ 2004-06-30 15:18 UTC (permalink / raw) To: Paul Davis; +Cc: linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > A simple "strace -f" should show whether the setscheduler() call > succeeds or not. Does 'jackstart' do anything with glibc internals? it seems part of the problem is that the setscheduler() calls 'succeed', but the policy is not changed to SCHED_FIFO. The question here is, are the correct PIDs used? Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 15:04 ` Ingo Molnar 2004-06-30 15:18 ` Ingo Molnar @ 2004-06-30 15:26 ` Jakub Jelinek 2004-06-30 16:32 ` Paul Davis 1 sibling, 1 reply; 24+ messages in thread From: Jakub Jelinek @ 2004-06-30 15:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Davis, linux-kernel On Wed, Jun 30, 2004 at 05:04:30PM +0200, Ingo Molnar wrote: > > Are there known issues with the implementation of NPTL that might give > > rise to this behaviour? What can we do to help understand and debug > > it? > > there's nothing special about NPTL, scheduling-wise. But if SCHED_FIFO > is not properly set for all JACK threads that could explain the > symptoms. You talked about kludges that are necessary to make all > threads SCHED_FIFO - are you 100% sure that all JACK threads are indeed > SCHED_FIFO after these kludges are applied? If yes and you are running a > later kernel then it's something new and probably NPTL-unrelated. One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED. So, if you care about what scheduling created threads will have and want it to work with both NPTL and LinuxThreads, you want pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED); explicitely. Jakub ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 15:26 ` Jakub Jelinek @ 2004-06-30 16:32 ` Paul Davis 2004-06-30 16:57 ` Jakub Jelinek 0 siblings, 1 reply; 24+ messages in thread From: Paul Davis @ 2004-06-30 16:32 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Ingo Molnar, linux-kernel >One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED >while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED. >So, if you care about what scheduling created threads will have >and want it to work with both NPTL and LinuxThreads, you want >pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED); >explicitely. But since we always set the scheduling class explicitly, should the inherited scheduler class make any difference? --p ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 16:32 ` Paul Davis @ 2004-06-30 16:57 ` Jakub Jelinek 2004-06-30 17:52 ` Paul Davis 0 siblings, 1 reply; 24+ messages in thread From: Jakub Jelinek @ 2004-06-30 16:57 UTC (permalink / raw) To: Paul Davis; +Cc: Ingo Molnar, linux-kernel On Wed, Jun 30, 2004 at 12:32:03PM -0400, Paul Davis wrote: > >One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED > >while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED. > >So, if you care about what scheduling created threads will have > >and want it to work with both NPTL and LinuxThreads, you want > >pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED); > >explicitely. > > But since we always set the scheduling class explicitly, should the > inherited scheduler class make any difference? Of course. If you say pthread_attr_init (&attr); pthread_attr_setschedpolicy (&attr, SCHED_FIFO); pthread_attr_setschedparam (&attr, ¶m); pthread_create (&th, &attr, fn, arg); then with LinuxThreads the thread will have FIFO policy while with NPTL it won't unless the current thread has it. If you: pthread_attr_init (&attr); pthread_attr_setschedpolicy (&attr, SCHED_FIFO); pthread_attr_setschedparam (&attr, ¶m); pthread_attr_setinheritsched (&attr, PTHREAD_INHERIT_SCHED); pthread_create (&th, &attr, fn, arg); then the thread will inherit scheduling parameters from current thread, so unless it has FIFO the the fn thread will not have FIFO policy. If you: pthread_attr_init (&attr); pthread_attr_setschedpolicy (&attr, SCHED_FIFO); pthread_attr_setschedparam (&attr, ¶m); pthread_attr_setinheritsched (&attr, PTHREAD_EXPLICIT_SCHED); pthread_create (&th, &attr, fn, arg); then thread will have FIFO policy in both NPTL and LinuxThreads. For details see http://www.opengroup.org/onlinepubs/009695399/functions/pthread_attr_getinheritsched.html The reason why LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED and NPTL defaults to PTHREAD_INHERIT_SCHED is that those are the cheaper variants. LinuxThreads has a manager thread which creates the child threads, so for INHERIT_SCHED it needs to issue some syscalls to query scheduling parameters of the thread which called pthread_create. In addition to this, no matter what inheritsched setting was, if the desired sched parameters are different from the initial thread, it needs to issue a system call to set it for the new thread. NPTL doesn't have a manager thread and a child thread inherits parent thread's settings without any syscalls anywhere. For PTHREAD_EXPLICIT_SCHED, it needs to issue a system call to set scheduling params to the desired ones. Jakub ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 16:57 ` Jakub Jelinek @ 2004-06-30 17:52 ` Paul Davis 0 siblings, 0 replies; 24+ messages in thread From: Paul Davis @ 2004-06-30 17:52 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Ingo Molnar, linux-kernel >On Wed, Jun 30, 2004 at 12:32:03PM -0400, Paul Davis wrote: >> >One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED >> >while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED. >> >So, if you care about what scheduling created threads will have >> >and want it to work with both NPTL and LinuxThreads, you want >> >pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED); >> >explicitely. >> >> But since we always set the scheduling class explicitly, should the >> inherited scheduler class make any difference? > >Of course. i understand that in the context of "pthread_attr_*; pthread_create();", but we use pthread_create() and then set scheduling class/priority within the new thread. Why would INHERIT_SCHED affect that? Does it? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis 2004-06-30 15:04 ` Ingo Molnar @ 2004-06-30 15:05 ` Ingo Molnar 2004-06-30 16:12 ` Paul Davis 2004-07-01 18:03 ` Matt Mackall 2 siblings, 1 reply; 24+ messages in thread From: Ingo Molnar @ 2004-06-30 15:05 UTC (permalink / raw) To: Paul Davis; +Cc: linux-kernel another question: do all JACK threads run at SCHED_FIFO, and do they all have the same rt_priority value? Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 15:05 ` Ingo Molnar @ 2004-06-30 16:12 ` Paul Davis 2004-06-30 17:07 ` Ulrich Drepper 0 siblings, 1 reply; 24+ messages in thread From: Paul Davis @ 2004-06-30 16:12 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel >another question: do all JACK threads run at SCHED_FIFO, and do they all >have the same rt_priority value? They don't all run SCHED_FIFO. Just two threads in the server (one is a watchdog designed to prevent system lockups) and at least one in each client (there may be more depending on what the client does, but its not created by JACK and JACK doesn't know about it). The client threads run at 1 level lower priority than the servers main thread, and that runs 1 level lower than the watchdog. but ... >it seems part of the problem is that the setscheduler() calls 'succeed', >but the policy is not changed to SCHED_FIFO. The question here is, >are the correct PIDs used? this has me thinking. one of the major changes with NPTL is that all threads share the same PID. so how in the world do we ever set the scheduling policy of a single thread (as opposed to something identified by a pid_t) to SCHED_FIFO? --p ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 16:12 ` Paul Davis @ 2004-06-30 17:07 ` Ulrich Drepper 2004-06-30 17:50 ` Paul Davis 0 siblings, 1 reply; 24+ messages in thread From: Ulrich Drepper @ 2004-06-30 17:07 UTC (permalink / raw) To: Paul Davis; +Cc: Ingo Molnar, linux-kernel Paul Davis wrote: > this has me thinking. one of the major changes with NPTL is that all > threads share the same PID. so how in the world do we ever set the > scheduling policy of a single thread (as opposed to something > identified by a pid_t) to SCHED_FIFO? If you have to ask this question than it's no wonder you get erratic behavior. It means you haven't looked at the pthread interface at all. Define a pthread_attr_t with the appropriate setting (with pthread_attr_setschedparam etc) and create the thread (and use pthread_attr_setinheritsched correctly). Alternatively use pthread_setschedparam on already running threads. And use a recent enough nptl version. Very early versions didn't have any of the scheduler handling implemented. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 17:07 ` Ulrich Drepper @ 2004-06-30 17:50 ` Paul Davis 0 siblings, 0 replies; 24+ messages in thread From: Paul Davis @ 2004-06-30 17:50 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Ingo Molnar, linux-kernel >> this has me thinking. one of the major changes with NPTL is that all >> threads share the same PID. so how in the world do we ever set the >> scheduling policy of a single thread (as opposed to something >> identified by a pid_t) to SCHED_FIFO? > >If you have to ask this question than it's no wonder you get erratic >behavior. It means you haven't looked at the pthread interface at all. thanks, i appreciate the ad hominem remarks. you think we could ever get SCHED_FIFO if we were not familiar with these calls? this is really unnecessary... my question wasn't about the pthread API. it was about what kernel API was used to implement it. the simple answer would have been that we use the TID, not the PID, or to have just pointed me at the source. >And use a recent enough nptl version. Very early versions didn't have >any of the scheduler handling implemented. we already discovered that. the people testing this stuff are using the most recent "stable" release of glibc, for the most part. --p ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis 2004-06-30 15:04 ` Ingo Molnar 2004-06-30 15:05 ` Ingo Molnar @ 2004-07-01 18:03 ` Matt Mackall 2004-07-01 18:14 ` William Lee Irwin III 2 siblings, 1 reply; 24+ messages in thread From: Matt Mackall @ 2004-07-01 18:03 UTC (permalink / raw) To: Paul Davis; +Cc: linux-kernel On Wed, Jun 30, 2004 at 09:41:46AM -0400, Paul Davis wrote: > Because of the recognition by kernel developers that 2.6 does not > perform as well as 2.4+lowlat (the Andrew Morton patches) when it > comes to scheduling latency, most audio developers and users have > remained with 2.4. Recently however, several brave souls have > attempted to test 2.6. The results have been mixed. I'm afraid these "brave souls" have shown up to the baby shower after the child's been accepted to college. Developers getting around to testing 2.6 after multiple vendors are shipping it should not be characterized as courageous. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-01 18:03 ` Matt Mackall @ 2004-07-01 18:14 ` William Lee Irwin III 2004-07-01 22:45 ` Andrew Morton 2004-07-02 3:27 ` Paul Davis 0 siblings, 2 replies; 24+ messages in thread From: William Lee Irwin III @ 2004-07-01 18:14 UTC (permalink / raw) To: Matt Mackall; +Cc: Paul Davis, linux-kernel On Wed, Jun 30, 2004 at 09:41:46AM -0400, Paul Davis wrote: >> Because of the recognition by kernel developers that 2.6 does not >> perform as well as 2.4+lowlat (the Andrew Morton patches) when it >> comes to scheduling latency, most audio developers and users have >> remained with 2.4. Recently however, several brave souls have >> attempted to test 2.6. The results have been mixed. On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote: > I'm afraid these "brave souls" have shown up to the baby shower after > the child's been accepted to college. Developers getting around to > testing 2.6 after multiple vendors are shipping it should not be > characterized as courageous. I appear to have nuked the thread you're replying to in disgust over this precise issue. -- wli ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-01 18:14 ` William Lee Irwin III @ 2004-07-01 22:45 ` Andrew Morton 2004-07-02 0:45 ` William Lee Irwin III 2004-07-02 3:27 ` Paul Davis 1 sibling, 1 reply; 24+ messages in thread From: Andrew Morton @ 2004-07-01 22:45 UTC (permalink / raw) To: William Lee Irwin III; +Cc: mpm, paul, linux-kernel William Lee Irwin III <wli@holomorphy.com> wrote: > > On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote: > > I'm afraid these "brave souls" have shown up to the baby shower after > > the child's been accepted to college. Developers getting around to > > testing 2.6 after multiple vendors are shipping it should not be > > characterized as courageous. > > I appear to have nuked the thread you're replying to in disgust over > this precise issue. In fairness, the CPU scheduler has been spinning like a top for a couple of years, and it still ain't settled. That's just the one in Linus's tree, let alone the umpteen rewrites which are floating about. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-01 22:45 ` Andrew Morton @ 2004-07-02 0:45 ` William Lee Irwin III 2004-07-02 1:38 ` Peter Williams 2004-07-02 3:03 ` Con Kolivas 0 siblings, 2 replies; 24+ messages in thread From: William Lee Irwin III @ 2004-07-02 0:45 UTC (permalink / raw) To: Andrew Morton; +Cc: mpm, paul, linux-kernel On Thu, Jul 01, 2004 at 03:45:54PM -0700, Andrew Morton wrote: > In fairness, the CPU scheduler has been spinning like a top for a > couple of years, and it still ain't settled. > That's just the one in Linus's tree, let alone the umpteen rewrites > which are floating about. I've not seen much deep material there. Policy tweaks seem to be what's gone on in mainline, and frankly most of the purported rewrites are just that. I guess the ones that nuked the duelling queue silliness are trying qualify but even they're leaving the load balancer untouched and are carrying over large fractions of their predecessors unaltered. The stuff that's gone around looks minor. It's not like they're teaching sched.c to play cpu tetris for gang scheduling or Kalman filtering profiling feedback to stripe tasks using different cpu resources across SMT siblings or playing graph games to meet RT deadlines, so it doesn't look like very much at all is going on to me. It's pretty obvious why everyone and their brother is grinding out purported scheduler rewrites: the code is self-contained, however, nothing interesting is coming of all this. Never been for have so many patches been written against the same file, accomplishing so little. -- wli ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 0:45 ` William Lee Irwin III @ 2004-07-02 1:38 ` Peter Williams 2004-07-02 2:53 ` William Lee Irwin III 2004-07-02 3:03 ` Con Kolivas 1 sibling, 1 reply; 24+ messages in thread From: Peter Williams @ 2004-07-02 1:38 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Andrew Morton, mpm, paul, linux-kernel William Lee Irwin III wrote: > On Thu, Jul 01, 2004 at 03:45:54PM -0700, Andrew Morton wrote: > >>In fairness, the CPU scheduler has been spinning like a top for a >>couple of years, and it still ain't settled. >>That's just the one in Linus's tree, let alone the umpteen rewrites >>which are floating about. > > > I've not seen much deep material there. Policy tweaks seem to be > what's gone on in mainline, and frankly most of the purported rewrites > are just that. I guess the ones that nuked the duelling queue silliness > are trying qualify but even they're leaving the load balancer untouched > and are carrying over large fractions of their predecessors unaltered. That's because it's not all bad (or the problems are minor and can wait until later). > The stuff that's gone around looks minor. It's not like they're teaching > sched.c to play cpu tetris for gang scheduling or Kalman filtering > profiling feedback to stripe tasks using different cpu resources across > SMT siblings or playing graph games to meet RT deadlines, so it doesn't > look like very much at all is going on to me. To my mind, scheduling and load balancing are ALMOST orthogonal concepts. Scheduling is concerned with doing a useful job within a single CPU and load balancing is about distributing tasks/load among the available CPUs. To a large extent these are independent and are being worked on separately. I am one of those fiddling with the schedulers but I'm leaving load balancing alone as it seems to me that the NUMA and hyper threading developers are the main players for that component. To my mind the only contribution the scheduler component MAY want to make to load balancing would be to have some say in which tasks are chosen for migration. I don't think that any of the currently proposed schedulers have a strong need to change the current mechanism(s) for selecting which tasks get migrated. If you think otherwise please share your thoughts? > > It's pretty obvious why everyone and their brother is grinding out > purported scheduler rewrites: the code is self-contained, The main reason is that the standard scheduler is a bit of a mess. The fact that the code is self contained just makes it easier to modify without touching lots of files. It's not the reason why the changes are being tried. > however, > nothing interesting is coming of all this. Never been for have so many > patches been written against the same file, accomplishing so little. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 1:38 ` Peter Williams @ 2004-07-02 2:53 ` William Lee Irwin III 0 siblings, 0 replies; 24+ messages in thread From: William Lee Irwin III @ 2004-07-02 2:53 UTC (permalink / raw) To: Peter Williams; +Cc: Andrew Morton, mpm, paul, linux-kernel William Lee Irwin III wrote: >> I've not seen much deep material there. Policy tweaks seem to be >> what's gone on in mainline, and frankly most of the purported rewrites >> are just that. I guess the ones that nuked the duelling queue silliness >> are trying qualify but even they're leaving the load balancer untouched >> and are carrying over large fractions of their predecessors unaltered. On Fri, Jul 02, 2004 at 11:38:17AM +1000, Peter Williams wrote: > That's because it's not all bad (or the problems are minor and can wait > until later). Whatever that has to do with, it doesn't really make the fiddling around going on even noticeable. Hell, I do end-luserish crap too (amazing, I actually appear to need luserspace to get code written) I've yet to see a visible change in scheduler behavior in that context across all of 2.4 and 2.5 (and in fact since the earliest Linux kernels I've ever run) apart from a reduction in cpu time spent in the scheduler itself associated with (you guessed it) the merge of the incremental epoch expiry stuff mingo did around early 2.5 (or at least that's the best description I can come up with the algorithm, as it doesn't resemble any of the normal algorithms). I suspect widespread placebo effects. William Lee Irwin III wrote: >> The stuff that's gone around looks minor. It's not like they're teaching >> sched.c to play cpu tetris for gang scheduling or Kalman filtering >> profiling feedback to stripe tasks using different cpu resources across >> SMT siblings or playing graph games to meet RT deadlines, so it doesn't >> look like very much at all is going on to me. On Fri, Jul 02, 2004 at 11:38:17AM +1000, Peter Williams wrote: > To my mind, scheduling and load balancing are ALMOST orthogonal > concepts. Scheduling is concerned with doing a useful job within a > single CPU and load balancing is about distributing tasks/load among the > available CPUs. To a large extent these are independent and are being > worked on separately. I am one of those fiddling with the schedulers > but I'm leaving load balancing alone as it seems to me that the NUMA and > hyper threading developers are the main players for that component. > To my mind the only contribution the scheduler component MAY want to > make to load balancing would be to have some say in which tasks are > chosen for migration. I don't think that any of the currently proposed > schedulers have a strong need to change the current mechanism(s) for > selecting which tasks get migrated. If you think otherwise please share > your thoughts? That's an expedient program structure. There is no independence. Those are examples of things that would have qualified as having been remotely visible changes and not myriads of infinitesimal intra-queue twiddlings. No, I don't want to touch scheduling policy (or anything else infested with such massive quantities of holy penguin pee) with a 10-foot pole. William Lee Irwin III wrote: >> It's pretty obvious why everyone and their brother is grinding out >> purported scheduler rewrites: the code is self-contained, On Fri, Jul 02, 2004 at 11:38:17AM +1000, Peter Williams wrote: > The main reason is that the standard scheduler is a bit of a mess. The > fact that the code is self contained just makes it easier to modify > without touching lots of files. It's not the reason why the changes are > being tried. It means the barrier to entry is very low. William Lee Irwin III wrote: >> however, >> nothing interesting is coming of all this. Never been for have so many >> patches been written against the same file, accomplishing so little. s/been for/before/ I wonder why I've started making homophone errors only in the past 5 years where beforehand they were very rare. It's not like I started sounding out words when I read or anything idiotic like that. -- wli ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 0:45 ` William Lee Irwin III 2004-07-02 1:38 ` Peter Williams @ 2004-07-02 3:03 ` Con Kolivas 2004-07-02 3:05 ` William Lee Irwin III 1 sibling, 1 reply; 24+ messages in thread From: Con Kolivas @ 2004-07-02 3:03 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Andrew Morton, mpm, paul, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 William Lee Irwin III wrote: | On Thu, Jul 01, 2004 at 03:45:54PM -0700, Andrew Morton wrote: | |>In fairness, the CPU scheduler has been spinning like a top for a |>couple of years, and it still ain't settled. |>That's just the one in Linus's tree, let alone the umpteen rewrites |>which are floating about. | | | I've not seen much deep material there. Policy tweaks seem to be | what's gone on in mainline, and frankly most of the purported rewrites | are just that. I guess the ones that nuked the duelling queue silliness | are trying qualify but even they're leaving the load balancer untouched | and are carrying over large fractions of their predecessors unaltered. | The stuff that's gone around looks minor. It's not like they're teaching | sched.c to play cpu tetris for gang scheduling or Kalman filtering | profiling feedback to stripe tasks using different cpu resources across | SMT siblings or playing graph games to meet RT deadlines, so it doesn't | look like very much at all is going on to me. My impetus for doing a policy rewrite was the recurring complaint that the 2.6 scheduler is currently too complicated for even basic scheduling. I see no point in trying to implement other changes until the framework for normal policies is in place that can be built on. I don't see even the policy rewrites as being appropriate for 2.6, let alone anything fancier. If we have something in place that more people than not agree is satisfactory for normal scheduling, then more can be added for 2.7+ development. Con | It's pretty obvious why everyone and their brother is grinding out | purported scheduler rewrites: the code is self-contained, however, | nothing interesting is coming of all this. Never been for have so many | patches been written against the same file, accomplishing so little. | | -- wli -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFA5NCKZUg7+tp6mRURAj/JAJ4qJzKxXWCUOT+LDBoGs0MEMi21owCfZqGo S8scT9Ro6DbvumUt060ctOU= =6I3d -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 3:03 ` Con Kolivas @ 2004-07-02 3:05 ` William Lee Irwin III 0 siblings, 0 replies; 24+ messages in thread From: William Lee Irwin III @ 2004-07-02 3:05 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, mpm, paul, linux-kernel On Fri, Jul 02, 2004 at 01:03:39PM +1000, Con Kolivas wrote: > My impetus for doing a policy rewrite was the recurring complaint that > the 2.6 scheduler is currently too complicated for even basic > scheduling. I see no point in trying to implement other changes until > the framework for normal policies is in place that can be built on. I > don't see even the policy rewrites as being appropriate for 2.6, let > alone anything fancier. If we have something in place that more people > than not agree is satisfactory for normal scheduling, then more can be > added for 2.7+ development. The point I had was really that what's going on is very minor. -- wli ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-01 18:14 ` William Lee Irwin III 2004-07-01 22:45 ` Andrew Morton @ 2004-07-02 3:27 ` Paul Davis 2004-07-02 7:37 ` William Lee Irwin III 1 sibling, 1 reply; 24+ messages in thread From: Paul Davis @ 2004-07-02 3:27 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Matt Mackall, linux-kernel >On Wed, Jun 30, 2004 at 09:41:46AM -0400, Paul Davis wrote: >>> Because of the recognition by kernel developers that 2.6 does not >>> perform as well as 2.4+lowlat (the Andrew Morton patches) when it >>> comes to scheduling latency, most audio developers and users have >>> remained with 2.4. Recently however, several brave souls have >>> attempted to test 2.6. The results have been mixed. > >On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote: >> I'm afraid these "brave souls" have shown up to the baby shower after >> the child's been accepted to college. Developers getting around to >> testing 2.6 after multiple vendors are shipping it should not be >> characterized as courageous. I call BS on this response. We were told by A(ndrew)M(orton) and several other people that 2.6 would not be as good as 2.4 for low latency real time audio. It was made clear that the preemption patches were considered more appropriate even though they did not do anywhere near as reliable an improvement as AM's lowlat patches. We found out (and I mean no discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat patches) that the author of the premiere lowlat patches for 2.4 would not be maintaining a similar set for 2.6. We also found during the development of 2.5 that there were a number of areas of real concern, (the VM subsystem and the scheduler and the disk subsystems) but that many notable kernel developers were not particularly interested in our needs - we were considered odd, edge case studies. So we just punted and said "ah, its OK, we still have 2.4 and that works really, really well". I spent a lot of time working debugging, testing, measuring and playing with on 2.3 and 2.4. I even tested the HRT patches with great anticipation (they didn't work very well at all, and I didn't have time to spend tracking that down then). I'm terribly sorry, but I don't have time to do full-scale kernel debugging and also develop applications that have already taken 4+ years to get to "useful". Frankly, the mess of dealing with the development process for 2.3/2.4, with a VM subsystem that took a year to stabilize into a situation where we could reliably stream realistic audio workloads didn't make me feel too good when I started reading about similar issues in 2.5 before it was even half-done. I tested just about every MM patch from andrea and rik that came out for 2.3/2.4 - I did not have time to do that with 2.5. And 2.4.19+ does work really well. The problem is that users are now booting up 2.6 and finding out that (1) the deep changes in the thread system have not been fully tested with real time thread applications and (2) the scheduler, VM and disk subsystems appear to be conspiring to prevent performance equivalent to 2.4+lowlat. Are we suprised? No, we knew this would be the case? Are we complaining? Not really. Are we asking for help? Are we offering to try to help as best we can? Yes, we certainly are. Courageous? Yes, because they are willing to start testing a kernel that has been developed with an open admission by the kernel development group that our needs are not considered particularly important or relevant (and there is nothing wrong with that, just to be clear about it). Linus made it clear 2 years ago that we weren't going to get what we needed any time soon, and personally, I am entirely happy with telling people to use 2.4+lowlat instead. There are several distributions of Linux that build precisely this kernel for users, and those users are very happy with it. But NPTL has muddied the situation considerably. People did test NPTL when it came out. It seemed to work perfectly OK. So we just assumed that it would always work perfectly OK. It turns out, however, that it no longer does. And therefore I wrote to try to find out what we could do figure it out. >I appear to have nuked the thread you're replying to in disgust over >this precise issue. Disgust? Thanks for sharing. --p ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 3:27 ` Paul Davis @ 2004-07-02 7:37 ` William Lee Irwin III 2004-07-02 10:40 ` Takashi Iwai 2004-07-02 14:42 ` Paul Davis 0 siblings, 2 replies; 24+ messages in thread From: William Lee Irwin III @ 2004-07-02 7:37 UTC (permalink / raw) To: Paul Davis; +Cc: Matt Mackall, linux-kernel On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote: >>> I'm afraid these "brave souls" have shown up to the baby shower after >>> the child's been accepted to college. Developers getting around to >>> testing 2.6 after multiple vendors are shipping it should not be >>> characterized as courageous. On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: > I call BS on this response. > We were told by A(ndrew)M(orton) and several other people that 2.6 > would not be as good as 2.4 for low latency real time audio. It was > made clear that the preemption patches were considered more > appropriate even though they did not do anywhere near as reliable an > improvement as AM's lowlat patches. We found out (and I mean no > discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat > patches) that the author of the premiere lowlat patches for 2.4 would > not be maintaining a similar set for 2.6. We also found during the > development of 2.5 that there were a number of areas of real concern, > (the VM subsystem and the scheduler and the disk subsystems) but that > many notable kernel developers were not particularly interested in our > needs - we were considered odd, edge case studies. Not only are lowlat-alike changes in mainline 2.6, the algorithms where lowlat found explicit preemption points were necessary have been changed in a number of cases to be asymptotically faster. So you gave no feedback. What do you expect us to do? There are enough other bugreports to keep us busy without testing the known universe on behalf of you or anyone else sitting around waiting silently for their needs to magically be addressed. On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: > So we just punted and said "ah, its OK, we still have 2.4 and that > works really, really well". I spent a lot of time working debugging, > testing, measuring and playing with on 2.3 and 2.4. I even tested the > HRT patches with great anticipation (they didn't work very well at > all, and I didn't have time to spend tracking that down then). I'm > terribly sorry, but I don't have time to do full-scale kernel > debugging and also develop applications that have already taken 4+ > years to get to "useful". Frankly, the mess of dealing with the > development process for 2.3/2.4, with a VM subsystem that took a year > to stabilize into a situation where we could reliably stream realistic > audio workloads didn't make me feel too good when I started reading > about similar issues in 2.5 before it was even half-done. I tested > just about every MM patch from andrea and rik that came out for > 2.3/2.4 - I did not have time to do that with 2.5. This level of participation is by no means a requirement. Just show up, say, "I've got a problem, latency sucked $HERE while doing $THIS", and it will be quashed in a manner similar to other performance and functional issues when they're properly reported. At some point in the past, you wrote: >>> However, the ONLY way to get even vaguely reasonable >>> performance in this area is to disable the use of NPTL >>> using LD_ASSUME_KERNEL. With NPTL in use, there are a >>> series of apparently interlocking problems with scheduler >>> parameter inheritance, scheduler performance and decision >>> making. Its more or less impossible to run JACK-enabled audio >>> systems on 2.6 with NPTL. A series of ugly kludges are >>> beginning to emerge within the Linux audio community, and >>> I think its time we cut them off before things get out of hand. The thing that went wrong here is that the report is very non-specific. mingo, jakub, and uli had to go diving into your app's source etc. hunting for bugs in your app, which is very nice of them to do, but not really the way things are supposed to work. Narrowing the presumed kernel issue down to a small enough userspace testcase or section of code that you can reasonably post it is pretty much a burden you should have taken on. For one, the description of the nasty kludges or code that worked in 2.4 but not 2.6 should have been up-front. e.g. "I'm trying to get an app to SCHED_FIFO, $FOO isn't working in 2.6 but does in 2.4" and bonus points for "and my workaround to get it set up in 2.6 is $BAR" and so on. On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: > And 2.4.19+ does work really well. The problem is that users are now > booting up 2.6 and finding out that (1) the deep changes in the thread > system have not been fully tested with real time thread applications > and (2) the scheduler, VM and disk subsystems appear to be conspiring > to prevent performance equivalent to 2.4+lowlat. Are we suprised? No, > we knew this would be the case? Are we complaining? Not really. Are we > asking for help? Are we offering to try to help as best we can? Yes, > we certainly are. The RT threading bits sounded largely like a userspace API change that broke the app's initialization sequence, and that appears to be getting fielded by mingo, jakub, and uli. On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: > Courageous? Yes, because they are willing to start testing a kernel > that has been developed with an open admission by the kernel > development group that our needs are not considered particularly > important or relevant (and there is nothing wrong with that, just to > be clear about it). Linus made it clear 2 years ago that we weren't > going to get what we needed any time soon, and personally, I am > entirely happy with telling people to use 2.4+lowlat instead. There > are several distributions of Linux that build precisely this kernel > for users, and those users are very happy with it. The userbase is so broad no one user group's needs are particularly dominant. Surprise! You're coexisting with everyone else. On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: > But NPTL has muddied the situation considerably. People did test NPTL > when it came out. It seemed to work perfectly OK. So we just assumed > that it would always work perfectly OK. It turns out, however, that it > no longer does. And therefore I wrote to try to find out what we could > do figure it out. This is too vague to do anything with; write up a coherent bug/problem report for glibc and/or kernel maintainers to do something about. "LD_ASSUME_KERNEL mysteriously makes app run smoother" is really something you should have determined a proximal cause for before broad sweeping statements about 2.6 ignoring the needs of whatever category of apps this is in some misguided attempt to motivate someone to discover the root cause and repair it on your behalf. Otherwise, if LD_ASSUME_KERNEL fixes it for you, why would we care? At some point in the past, I wrote: >> I appear to have nuked the thread you're replying to in disgust over >> this precise issue. On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: > Disgust? Thanks for sharing. Yes, disgust. You presume you are an isolated case, or potentially "special". This is not so. There are people crawling out of the woodwork all the time complaining about vague "$BIZARRE_ANCIENT_KERNEL did better than -CURRENT" issues. Thus far your postings are indistinguishable from those, and whether you like it or not, they're being classified right alongside those due to their lack of specificity. Everyone's got some kind of substance hidden somewhere. Presentation matters. We're pulled in too many different directions to play guessing games and dive into every userspace app whose author screams "regression!" that comes along. In summary: (1) please try to present adequate information directly -- describe your situation directly instead of needing people -- to debug your apps for you (2) please avoid vague generalizations like "2.6 is ignoring RT audio" -- they're noninformative and inflammatory (3) please test major kernel versions promptly after release -- this doesn't require particularly much effort, major kernel -- versions are infrequently released, and we don't actually -- need intense debugging/etc. from you, merely self-contained -- examples or descriptions of whatever is going wrong in -- userspace. Your description (not example) was not self-contained. -- wli ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 7:37 ` William Lee Irwin III @ 2004-07-02 10:40 ` Takashi Iwai 2004-07-06 0:48 ` Peter Williams 2004-07-02 14:42 ` Paul Davis 1 sibling, 1 reply; 24+ messages in thread From: Takashi Iwai @ 2004-07-02 10:40 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Paul Davis, Matt Mackall, linux-kernel At Fri, 2 Jul 2004 00:37:49 -0700, William Lee Irwin III wrote: > > On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote: > >>> I'm afraid these "brave souls" have shown up to the baby shower after > >>> the child's been accepted to college. Developers getting around to > >>> testing 2.6 after multiple vendors are shipping it should not be > >>> characterized as courageous. > > On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: > > I call BS on this response. > > We were told by A(ndrew)M(orton) and several other people that 2.6 > > would not be as good as 2.4 for low latency real time audio. It was > > made clear that the preemption patches were considered more > > appropriate even though they did not do anywhere near as reliable an > > improvement as AM's lowlat patches. We found out (and I mean no > > discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat > > patches) that the author of the premiere lowlat patches for 2.4 would > > not be maintaining a similar set for 2.6. We also found during the > > development of 2.5 that there were a number of areas of real concern, > > (the VM subsystem and the scheduler and the disk subsystems) but that > > many notable kernel developers were not particularly interested in our > > needs - we were considered odd, edge case studies. > > Not only are lowlat-alike changes in mainline 2.6, the algorithms where > lowlat found explicit preemption points were necessary have been changed > in a number of cases to be asymptotically faster. > > So you gave no feedback. What do you expect us to do? There are > enough other bugreports to keep us busy without testing the known > universe on behalf of you or anyone else sitting around waiting > silently for their needs to magically be addressed. Well, the point is that no kernel developer is watching and working on low-latency fixes regulariy for 2.6 kernels, as Andrew did for every 2.4 release. And, the users can't report easily what gets wrong. (If the report were something like '2.6.x worked but 2.6.y not', it would be easy to figure out, but many users experience this problem between 2.4 and 2.6...) Maybe this situation can be improved by enabling the xrun_debug proc switch on ALSA, which shows the stack trace when a buffer over/underrun happens. Also, running a latencytest program would be helpful for spotting out the problem. BTW, 2.6 kernel works pretty well on my system. Perhaps it's because I run jackd directly as root. I've also heard some people complaining after replacement with 2.6, too, but I believe it's either driver-specific problem or a bug caused by the NPTL incompatibility reported on this thread. AFAIK, there are still some problematic parts, for example, a long lock in shrink_dcache_parent(), and too-long RCU jobs in a tasklet, but they are relatively minor. > In summary: > (1) please try to present adequate information directly > -- describe your situation directly instead of needing people > -- to debug your apps for you The problem is the incompatibility between NPTL and LinuxThreads. As Paul pointed, if calling pthread_setschedparm() has no influence _after_ creating the thread, it sounds like a bug to me. This might be a problem of glibc, not of kernel. We don't know even it. Anyway, we'll need a small testcase to reproduce this problem... -- Takashi Iwai <tiwai@suse.de> ALSA Developer - www.alsa-project.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 10:40 ` Takashi Iwai @ 2004-07-06 0:48 ` Peter Williams 0 siblings, 0 replies; 24+ messages in thread From: Peter Williams @ 2004-07-06 0:48 UTC (permalink / raw) To: Takashi Iwai; +Cc: Paul Davis, Matt Mackall, linux-kernel Takashi Iwai wrote: > At Fri, 2 Jul 2004 00:37:49 -0700, > William Lee Irwin III wrote: > >>On Thu, Jul 01, 2004 at 01:03:56PM -0500, Matt Mackall wrote: >> >>>>>I'm afraid these "brave souls" have shown up to the baby shower after >>>>>the child's been accepted to college. Developers getting around to >>>>>testing 2.6 after multiple vendors are shipping it should not be >>>>>characterized as courageous. >> >>On Thu, Jul 01, 2004 at 11:27:28PM -0400, Paul Davis wrote: >> >>>I call BS on this response. >>>We were told by A(ndrew)M(orton) and several other people that 2.6 >>>would not be as good as 2.4 for low latency real time audio. It was >>>made clear that the preemption patches were considered more >>>appropriate even though they did not do anywhere near as reliable an >>>improvement as AM's lowlat patches. We found out (and I mean no >>>discredit to AM whatsoever - he did an amazing job on the 2.4 lowlat >>>patches) that the author of the premiere lowlat patches for 2.4 would >>>not be maintaining a similar set for 2.6. We also found during the >>>development of 2.5 that there were a number of areas of real concern, >>>(the VM subsystem and the scheduler and the disk subsystems) but that >>>many notable kernel developers were not particularly interested in our >>>needs - we were considered odd, edge case studies. >> >>Not only are lowlat-alike changes in mainline 2.6, the algorithms where >>lowlat found explicit preemption points were necessary have been changed >>in a number of cases to be asymptotically faster. >> >>So you gave no feedback. What do you expect us to do? There are >>enough other bugreports to keep us busy without testing the known >>universe on behalf of you or anyone else sitting around waiting >>silently for their needs to magically be addressed. > > > Well, the point is that no kernel developer is watching and working on > low-latency fixes regulariy for 2.6 kernels, as Andrew did for every > 2.4 release. And, the users can't report easily what gets wrong. > (If the report were something like '2.6.x worked but 2.6.y not', it > would be easy to figure out, but many users experience this problem > between 2.4 and 2.6...) > > Maybe this situation can be improved by enabling the xrun_debug proc > switch on ALSA, which shows the stack trace when a buffer > over/underrun happens. Also, running a latencytest program would be > helpful for spotting out the problem. > > > BTW, 2.6 kernel works pretty well on my system. Perhaps it's because > I run jackd directly as root. > > I've also heard some people complaining after replacement with 2.6, > too, but I believe it's either driver-specific problem or a bug caused > by the NPTL incompatibility reported on this thread. > AFAIK, there are still some problematic parts, for example, a long > lock in shrink_dcache_parent(), and too-long RCU jobs in a tasklet, > but they are relatively minor. > > > >>In summary: >>(1) please try to present adequate information directly >> -- describe your situation directly instead of needing people >> -- to debug your apps for you > > > The problem is the incompatibility between NPTL and LinuxThreads. > As Paul pointed, if calling pthread_setschedparm() has no influence > _after_ creating the thread, it sounds like a bug to me. This might > be a problem of glibc, not of kernel. We don't know even it. > > Anyway, we'll need a small testcase to reproduce this problem... Version 1.4 of the various SPA schedulers (for 2.6.7) are available for download at <https://sourceforge.net/projects/cpuse/>. In this modification I have attempted to minimize the scheduling overhead costs for SCHED_FIFO tasks. I would appreciate any feedback on how successful I have been. Thanks Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.X, NPTL, SCHED_FIFO and JACK 2004-07-02 7:37 ` William Lee Irwin III 2004-07-02 10:40 ` Takashi Iwai @ 2004-07-02 14:42 ` Paul Davis 1 sibling, 0 replies; 24+ messages in thread From: Paul Davis @ 2004-07-02 14:42 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel >Not only are lowlat-alike changes in mainline 2.6, the algorithms where >lowlat found explicit preemption points were necessary have been changed >in a number of cases to be asymptotically faster. *Some* of the algorithms. >So you gave no feedback. What do you expect us to do? There are Actually, the linux audio community gave quite a lot of feedback early in the life of 2.5, most of it directly to andrew, ingo and robert (love). The situation wasn't good at all. It wasn't all the scheduler (although that was pretty bad) and explicit preemption (which was basically missing inspite of the preemption patch) - the VM system was hosed for massive disk streaming, for example - and the feedback we got, while sympathetic, was basically of the form "2.{5,6} isn't going to the lowlat route, please wait and see what we come up with". So we waited. >> about similar issues in 2.5 before it was even half-done. I tested >> just about every MM patch from andrea and rik that came out for >> 2.3/2.4 - I did not have time to do that with 2.5. > >This level of participation is by no means a requirement. Just show Given my sporadic observations of the kernel mailing list over the last five years, I'd say that it often is a requirement, especially if you are dealing with workloads and application behaviour that is fundamentally different to the usual "linux stuff" and cannot be reduced to simple test cases. And I've been happy to provide it when there is some indication that the resulting feedback will make a difference. Andrea, Ingo and Andrew all provided that sense of purpose for 2.4. >The thing that went wrong here is that the report is very non-specific. We don't have anything very specific to report. Sometimes, the most helpful bug reports start life as someone asking "this doesn't seem to work very well under conditions X, Y but its OK with Z". Someone turns around and says "oh, duh!" and the problem is fixed. Apparently, in this situation, that may not be the case. No problem. We'll come back with more specifics. >really the way things are supposed to work. Narrowing the presumed >kernel issue down to a small enough userspace testcase or section of >code that you can reasonably post it is pretty much a burden you should >have taken on. And I/we're willing to do that (and have been doing that) once its clear that this is the right path. >For one, the description of the nasty kludges or code that worked in >2.4 but not 2.6 should have been up-front. e.g. "I'm trying to get an There were *no* nasty kludges in JACK for 2.4 unless you refer to a technique recommended by many *nix programming books and wizards over the last 20 years to deal with the rather limited security model that Linux was offering in 2.4 along with its POSIX cousins. And I note in passing that as within a week or two of us discovering the security module system in 2.6, someone in the audio community immediately wrote a very nice kernel module to remove the need for jackstart. It would also be nice if you could at least implicitly acknowledge that the one of the major reasons (mvista being the other) that the latency performance of linux has improved in the last 4 years is because us RT audio guys have done such nasty, fucked up, useless, pathetic job of requesting and collaborating on efforts to improve it. The preemption patch came from a different direction, and didn't accomplish the same thing - hardly anyone on the kernel list seemed to care that the kernel was filled with 50ms interrupt masks until we started explaining how it made linux unusable for certain things that worked OK on windows + macos; this then led to many of us helping ingo and andrew in their incredible attempts to fix things. That doesn't let us off the hook of decent bug reporting, but if you could at least quit the adult-lecturing-recalcitrant-adolescent tone, there would be more useful exchanges going on. --p ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2004-07-06 3:20 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-06-30 13:41 2.6.X, NPTL, SCHED_FIFO and JACK Paul Davis 2004-06-30 15:04 ` Ingo Molnar 2004-06-30 15:18 ` Ingo Molnar 2004-06-30 15:26 ` Jakub Jelinek 2004-06-30 16:32 ` Paul Davis 2004-06-30 16:57 ` Jakub Jelinek 2004-06-30 17:52 ` Paul Davis 2004-06-30 15:05 ` Ingo Molnar 2004-06-30 16:12 ` Paul Davis 2004-06-30 17:07 ` Ulrich Drepper 2004-06-30 17:50 ` Paul Davis 2004-07-01 18:03 ` Matt Mackall 2004-07-01 18:14 ` William Lee Irwin III 2004-07-01 22:45 ` Andrew Morton 2004-07-02 0:45 ` William Lee Irwin III 2004-07-02 1:38 ` Peter Williams 2004-07-02 2:53 ` William Lee Irwin III 2004-07-02 3:03 ` Con Kolivas 2004-07-02 3:05 ` William Lee Irwin III 2004-07-02 3:27 ` Paul Davis 2004-07-02 7:37 ` William Lee Irwin III 2004-07-02 10:40 ` Takashi Iwai 2004-07-06 0:48 ` Peter Williams 2004-07-02 14:42 ` Paul Davis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox