* [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
@ 2005-02-04 10:03 Ingo Molnar
2005-02-04 15:19 ` Kevin Hilman
` (6 more replies)
0 siblings, 7 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-02-04 10:03 UTC (permalink / raw)
To: linux-kernel
i have released the -V0.7.38-01 Real-Time Preemption patch, which can be
downloaded from the usual place:
http://redhat.com/~mingo/realtime-preempt/
Changes since -37-03:
- merged to 2.6.11-rc3
- deadlock-tracer fix from Eugeny S. Mints
- converted an oprofile spinlock to raw, which should fix the bug
reported by Peter Zijlstra.
to create a -V0.7.38-01 tree from scratch, the patching order is:
http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.10.tar.bz2
http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.11-rc3.bz2
http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.11-rc3-V0.7.38-01
Ingo
^ permalink raw reply [flat|nested] 125+ messages in thread* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar @ 2005-02-04 15:19 ` Kevin Hilman 2005-02-04 17:30 ` Ingo Molnar 2005-02-04 18:19 ` Tom Rini ` (5 subsequent siblings) 6 siblings, 1 reply; 125+ messages in thread From: Kevin Hilman @ 2005-02-04 15:19 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel What is the proper way to setup a real counting semaphore under the -RT kernel? I've noticed that just using a struct semaphore, normal counting semaphore usage[*] can trigger the "lock recursion deadlock" in kernel/rt.c since 'struct semaphore' now uses an rt_mutex. What I've done for now is to use sema_init_nocheck() to disable the checking in the case of a counting semaphore, but I remember seeing discussion in an earlier thread about creating a separate counting semaphore type. Is this still planned? Kevin http://hilman.org/kevin/ [*] For example, an open semaphore being down'ed and thus acquired and the same thread doing a down() again before another thread has a chance to up() the semaphore. Ingo Molnar <mingo@elte.hu> writes: > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be > downloaded from the usual place: > > http://redhat.com/~mingo/realtime-preempt/ > > Changes since -37-03: > > - merged to 2.6.11-rc3 > > - deadlock-tracer fix from Eugeny S. Mints > > - converted an oprofile spinlock to raw, which should fix the bug > reported by Peter Zijlstra. > > to create a -V0.7.38-01 tree from scratch, the patching order is: > > http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.10.tar.bz2 > http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.11-rc3.bz2 > http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.11-rc3-V0.7.38-01 ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 15:19 ` Kevin Hilman @ 2005-02-04 17:30 ` Ingo Molnar 0 siblings, 0 replies; 125+ messages in thread From: Ingo Molnar @ 2005-02-04 17:30 UTC (permalink / raw) To: Kevin Hilman; +Cc: linux-kernel, Thomas Gleixner * Kevin Hilman <kevin@hilman.org> wrote: > What I've done for now is to use sema_init_nocheck() to disable the > checking in the case of a counting semaphore, but I remember seeing > discussion in an earlier thread about creating a separate counting > semaphore type. Is this still planned? the nocheck variant is the counting semaphore in essence. I removed the counting semaphore implementation because it caused more problems than it solved - but it can be reintroduced later. > [*] For example, an open semaphore being down'ed and thus acquired and > the same thread doing a down() again before another thread has a > chance to up() the semaphore. yeah, these are cases where the code is better off using completions anyway. Thomas Gleixner had a good bunch of patches to convers such semaphore use to completions - the most necessary ones are in -RT, and i hope he'll submit the whole bunch upstream after 2.6.11 is out :-) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar 2005-02-04 15:19 ` Kevin Hilman @ 2005-02-04 18:19 ` Tom Rini 2005-02-07 9:03 ` Ingo Molnar 2005-02-06 4:19 ` Valdis.Kletnieks ` (4 subsequent siblings) 6 siblings, 1 reply; 125+ messages in thread From: Tom Rini @ 2005-02-04 18:19 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Fri, Feb 04, 2005 at 11:03:47AM +0100, Ingo Molnar wrote: > > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be > downloaded from the usual place: > > http://redhat.com/~mingo/realtime-preempt/ I thought I saw you say x64 should be OK now a few releases ago, so: linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: `_atomic_dec_and_lock' undeclared here (not in a function) linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: initializer element is not constant linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: (near initialization for `__ksymtab__atomic_dec_and_lock.value') linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: __ksymtab__atomic_dec_and_lock causes a section type conflict make[2]: *** [arch/x86_64/kernel/x8664_ksyms.o] Error 1 make[1]: *** [arch/x86_64/kernel] Error 2 make: *** [_all] Error 2 -- Tom Rini http://gate.crashing.org/~trini/ ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 18:19 ` Tom Rini @ 2005-02-07 9:03 ` Ingo Molnar 2005-02-07 14:35 ` Tom Rini 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-07 9:03 UTC (permalink / raw) To: Tom Rini; +Cc: linux-kernel * Tom Rini <trini@kernel.crashing.org> wrote: > On Fri, Feb 04, 2005 at 11:03:47AM +0100, Ingo Molnar wrote: > > > > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be > > downloaded from the usual place: > > > > http://redhat.com/~mingo/realtime-preempt/ > > I thought I saw you say x64 should be OK now a few releases ago, so: > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: `_atomic_dec_and_lock' undeclared here (not in a function) > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: initializer element is not constant > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: (near initialization for `__ksymtab__atomic_dec_and_lock.value') > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: __ksymtab__atomic_dec_and_lock causes a section type conflict > make[2]: *** [arch/x86_64/kernel/x8664_ksyms.o] Error 1 > make[1]: *** [arch/x86_64/kernel] Error 2 > make: *** [_all] Error 2 please send me your .config - mine builds/boots/works fine. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-07 9:03 ` Ingo Molnar @ 2005-02-07 14:35 ` Tom Rini 2005-02-08 8:27 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Tom Rini @ 2005-02-07 14:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Mon, Feb 07, 2005 at 10:03:56AM +0100, Ingo Molnar wrote: > > * Tom Rini <trini@kernel.crashing.org> wrote: > > > On Fri, Feb 04, 2005 at 11:03:47AM +0100, Ingo Molnar wrote: > > > > > > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be > > > downloaded from the usual place: > > > > > > http://redhat.com/~mingo/realtime-preempt/ > > > > I thought I saw you say x64 should be OK now a few releases ago, so: > > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: `_atomic_dec_and_lock' undeclared here (not in a function) > > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: initializer element is not constant > > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: (near initialization for `__ksymtab__atomic_dec_and_lock.value') > > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: __ksymtab__atomic_dec_and_lock causes a section type conflict > > make[2]: *** [arch/x86_64/kernel/x8664_ksyms.o] Error 1 > > make[1]: *** [arch/x86_64/kernel] Error 2 > > make: *** [_all] Error 2 > > please send me your .config - mine builds/boots/works fine. I don't have it handy anymore, but I just cp'd arch/x86_64/defconfig to .config, ran oldconfig and turned RT off (PREEMPT_NONE=y) (oops, I did forget to mention that, didn't I? Sorry). -- Tom Rini http://gate.crashing.org/~trini/ ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-07 14:35 ` Tom Rini @ 2005-02-08 8:27 ` Ingo Molnar 0 siblings, 0 replies; 125+ messages in thread From: Ingo Molnar @ 2005-02-08 8:27 UTC (permalink / raw) To: Tom Rini; +Cc: linux-kernel * Tom Rini <trini@kernel.crashing.org> wrote: > > please send me your .config - mine builds/boots/works fine. > > I don't have it handy anymore, but I just cp'd arch/x86_64/defconfig > to .config, ran oldconfig and turned RT off (PREEMPT_NONE=y) [...] thanks - managed to reproduce it this way. The patch below fixes the x64 build error on !PREEMPT_RT and the resulting kernel boots fine as well, plus it fixes an x64 SMP build error as well. I have uploaded the -38-04 release with this fix. Ingo --- linux/arch/x86_64/kernel/x8664_ksyms.c +++ linux/arch/x86_64/kernel/x8664_ksyms.c @@ -194,7 +194,7 @@ EXPORT_SYMBOL(rwsem_down_write_failed_th EXPORT_SYMBOL(empty_zero_page); #ifdef CONFIG_HAVE_DEC_LOCK -EXPORT_SYMBOL(_atomic_dec_and_lock); +EXPORT_SYMBOL(_atomic_dec_and_raw_spin_lock); #endif EXPORT_SYMBOL(die_chain); --- linux/arch/x86_64/lib/dec_and_lock.c +++ linux/arch/x86_64/lib/dec_and_lock.c @@ -10,7 +10,7 @@ #include <linux/spinlock.h> #include <asm/atomic.h> -int _atomic_dec_and_lock(atomic_t *atomic, raw_spinlock_t *lock) +int _atomic_dec_and_raw_spin_lock(atomic_t *atomic, raw_spinlock_t *lock) { int counter; int newcount; --- linux/arch/x86_64/kernel/smp.c +++ linux/arch/x86_64/kernel/smp.c @@ -266,6 +266,16 @@ void smp_send_reschedule(int cpu) } /* + * this function sends a 'reschedule' IPI to all other CPUs. + * This is used when RT tasks are starving and other CPUs + * might be able to run them: + */ +void smp_send_reschedule_allbutself(void) +{ + send_IPI_allbutself(RESCHEDULE_VECTOR); +} + +/* * Structure and data for smp_call_function(). This is designed to minimise * static memory requirements. It also looks cleaner. */ ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar 2005-02-04 15:19 ` Kevin Hilman 2005-02-04 18:19 ` Tom Rini @ 2005-02-06 4:19 ` Valdis.Kletnieks 2005-02-07 9:21 ` Ingo Molnar 2005-02-08 7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks ` (3 subsequent siblings) 6 siblings, 1 reply; 125+ messages in thread From: Valdis.Kletnieks @ 2005-02-06 4:19 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1197 bytes --] On Fri, 04 Feb 2005 11:03:47 +0100, Ingo Molnar said: > > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be Building with: # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT_DESKTOP=y # CONFIG_PREEMPT_RT is not set CC kernel/sched.o kernel/sched.c:314:1: warning: "_finish_arch_switch" redefined kernel/sched.c:306:1: warning: this is the location of the previous definition caused by this part of the patch: @@ -288,12 +295,20 @@ static DEFINE_PER_CPU(struct runqueue, r #define task_rq(p) cpu_rq(task_cpu(p)) #define cpu_curr(cpu) (cpu_rq(cpu)->curr) +#ifdef CONFIG_PREEMPT_RT +# ifdef prepare_arch_switch +# error FIXME +# endif +#else +# define _finish_arch_switch finish_arch_switch +#endif + /* * Default context-switch locking: */ #ifndef prepare_arch_switch # define prepare_arch_switch(rq, next) do { } while (0) -# define finish_arch_switch(rq, next) spin_unlock_irq(&(rq)->lock) +# define _finish_arch_switch(rq, next) spin_unlock(&(rq)->lock) # define task_running(rq, p) ((rq)->curr == (p)) #endif What was intended for non-RT builds? [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-06 4:19 ` Valdis.Kletnieks @ 2005-02-07 9:21 ` Ingo Molnar 2005-02-07 15:08 ` Real-Time Preemption and UML? Esben Nielsen 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-07 9:21 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel * Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote: > Building with: > > # CONFIG_PREEMPT_NONE is not set > # CONFIG_PREEMPT_VOLUNTARY is not set > CONFIG_PREEMPT_DESKTOP=y > # CONFIG_PREEMPT_RT is not set > > CC kernel/sched.o > kernel/sched.c:314:1: warning: "_finish_arch_switch" redefined > kernel/sched.c:306:1: warning: this is the location of the previous definition ok, i fixed this in the -03 patch. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Real-Time Preemption and UML? 2005-02-07 9:21 ` Ingo Molnar @ 2005-02-07 15:08 ` Esben Nielsen 2005-02-07 18:35 ` Jeff Dike 0 siblings, 1 reply; 125+ messages in thread From: Esben Nielsen @ 2005-02-07 15:08 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I managed to get it to compile but it wont start - it simply stops somewhere in start_kernel() :-( Have anyone else looked at it? It doesn't sound like it makes much sense to have PREEMPT_REALTIME for UML but I thought it was a good developing platform for playing around before going to the real hardware, where the latency meassurements of course have to take place. The turn around time should be much shorter than rebooting a full PC every time and the possibility of getting debug output in the beginning should also be much better. Esben ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: Real-Time Preemption and UML? 2005-02-07 15:08 ` Real-Time Preemption and UML? Esben Nielsen @ 2005-02-07 18:35 ` Jeff Dike 2005-02-07 23:14 ` Esben Nielsen 0 siblings, 1 reply; 125+ messages in thread From: Jeff Dike @ 2005-02-07 18:35 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel simlo@phys.au.dk said: > Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I > managed to get it to compile but it wont start - it simply stops > somewhere in start_kernel() :-( I've never played with preemption on UML. No doubt it needs some work... Jeff ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: Real-Time Preemption and UML? 2005-02-07 18:35 ` Jeff Dike @ 2005-02-07 23:14 ` Esben Nielsen 2005-02-08 8:39 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Esben Nielsen @ 2005-02-07 23:14 UTC (permalink / raw) To: Jeff Dike; +Cc: Ingo Molnar, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1276 bytes --] Well, I keep trying a little bit more. In the mean while you can get some of the stuff I needed to change to at least get it to compile: One of the problems was use of direct architecture specific semaphores (which doesn't work under PREEMPT_REALTIME) and in places where a quick (maybe too quick) look at the code told me that completions ought to be used. Therefore I changed two semaphores to completions which compiled fine. I have tried the change on 2.6.11-rc2, and it seemed to work, but I have not tested it heavily. The patch is in an attachment - I hope the mail-list will alow that. It is simply too trouplesome otherwise when I am using Pine as mail client. Esben On Mon, 7 Feb 2005, Jeff Dike wrote: > simlo@phys.au.dk said: > > Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I > > managed to get it to compile but it wont start - it simply stops > > somewhere in start_kernel() :-( > > I've never played with preemption on UML. No doubt it needs some work... > > Jeff > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > [-- Attachment #2: Type: TEXT/PLAIN, Size: 2383 bytes --] --- linux-2.6.11-rc2-um/arch/um/drivers/port_kern.c.orig 2005-01-23 15:53:29.000000000 +0100 +++ linux-2.6.11-rc2-um/arch/um/drivers/port_kern.c 2005-02-06 19:54:52.000000000 +0100 @@ -23,7 +23,7 @@ struct port_list { struct list_head list; int has_connection; - struct semaphore sem; + struct completion done; int port; int fd; spinlock_t lock; @@ -66,7 +66,7 @@ conn->fd = fd; list_add(&conn->list, &conn->port->connections); - up(&conn->port->sem); + complete(&conn->port->done); return(IRQ_HANDLED); } @@ -183,13 +183,14 @@ *port = ((struct port_list) { .list = LIST_HEAD_INIT(port->list), .has_connection = 0, - .sem = __SEMAPHORE_INITIALIZER(port->sem, - 0), .lock = SPIN_LOCK_UNLOCKED, .port = port_num, .fd = fd, .pending = LIST_HEAD_INIT(port->pending), .connections = LIST_HEAD_INIT(port->connections) }); + + init_completion(&port->done), + list_add(&port->list, &ports); found: @@ -221,7 +222,7 @@ int fd; while(1){ - if(down_interruptible(&port->sem)) + if(wait_for_completion_interruptible(&port->done)) return(-ERESTARTSYS); spin_lock(&port->lock); --- linux-2.6.11-rc2-um/arch/um/drivers/xterm_kern.c.orig 2005-01-23 15:53:29.000000000 +0100 +++ linux-2.6.11-rc2-um/arch/um/drivers/xterm_kern.c 2005-02-06 19:54:58.000000000 +0100 @@ -16,7 +16,7 @@ #include "xterm.h" struct xterm_wait { - struct semaphore sem; + struct completion ready; int fd; int pid; int new_fd; @@ -32,7 +32,7 @@ return(IRQ_NONE); xterm->new_fd = fd; - up(&xterm->sem); + complete(&xterm->ready); return(IRQ_HANDLED); } @@ -49,10 +49,10 @@ /* This is a locked semaphore... */ *data = ((struct xterm_wait) - { .sem = __SEMAPHORE_INITIALIZER(data->sem, 0), - .fd = socket, + { .fd = socket, .pid = -1, .new_fd = -1 }); + init_completion(&data->ready); err = um_request_irq(XTERM_IRQ, socket, IRQ_READ, xterm_interrupt, SA_INTERRUPT | SA_SHIRQ | SA_SAMPLE_RANDOM, @@ -68,7 +68,7 @@ * * XXX Note, if the xterm doesn't work for some reason (eg. DISPLAY * isn't set) this will hang... */ - down(&data->sem); + wait_for_completion(&data->ready); free_irq_by_irq_and_dev(XTERM_IRQ, data); free_irq(XTERM_IRQ, data); ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: Real-Time Preemption and UML? 2005-02-07 23:14 ` Esben Nielsen @ 2005-02-08 8:39 ` Ingo Molnar 2005-02-08 18:55 ` Jeff Dike 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-08 8:39 UTC (permalink / raw) To: Esben Nielsen; +Cc: Jeff Dike, linux-kernel * Esben Nielsen <simlo@phys.au.dk> wrote: > Well, I keep trying a little bit more. In the mean while you can get > some of the stuff I needed to change to at least get it to compile: > > One of the problems was use of direct architecture specific semaphores > (which doesn't work under PREEMPT_REALTIME) and in places where a > quick (maybe too quick) look at the code told me that completions > ought to be used. Therefore I changed two semaphores to completions > which compiled fine. I have tried the change on 2.6.11-rc2, and it > seemed to work, but I have not tested it heavily. Jeff, any objections against adding this change to UML at some point? It's at most a cleanup for now (PREEMPT_RT not being an upstream feature), but it makes life easier if 'more exotic' semaphore details are not being relied on (even if that reliance is 100% correct currently). Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: Real-Time Preemption and UML? 2005-02-08 8:39 ` Ingo Molnar @ 2005-02-08 18:55 ` Jeff Dike 2005-02-08 21:20 ` Esben Nielsen 0 siblings, 1 reply; 125+ messages in thread From: Jeff Dike @ 2005-02-08 18:55 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, linux-kernel mingo@elte.hu said: > Jeff, any objections against adding this change to UML at some point? No, not at all. I just need to understand what CONFIG_PREEMPT requires of UML. >From a quick read of Documentation/preempt-locking.txt, this looks like it's implementing Rule #3 (unlock by the same task that locked), which looks fine. Jeff ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: Real-Time Preemption and UML? 2005-02-08 18:55 ` Jeff Dike @ 2005-02-08 21:20 ` Esben Nielsen 2005-02-08 21:44 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Esben Nielsen @ 2005-02-08 21:20 UTC (permalink / raw) To: Jeff Dike; +Cc: Ingo Molnar, linux-kernel On Tue, 8 Feb 2005, Jeff Dike wrote: > mingo@elte.hu said: > > Jeff, any objections against adding this change to UML at some point? > > No, not at all. I just need to understand what CONFIG_PREEMPT requires of > UML. Ingo can probably tell you in much more detail. My problem when I tried to compile with CONFIG_PREEMPT_RT (not CONFIG_PREEMPT!) was that __SEMAPHORE_INITIALIZER didn't exist since the architecture specific semaphore.h is not included in that configuration. The reason again is that locking (not completions) is changed a lot under CONFIG_PREEMPT_RT to introduce muteces instead of raw spinlocks and priority inheritance to make these lockings behave deterministicly. > > >From a quick read of Documentation/preempt-locking.txt, this looks like it's > implementing Rule #3 (unlock by the same task that locked), which looks fine. > Now I don't really know who I am responding to. But both up()s now changed to complete()s are in something looking very much like an interrupt handler. But again, as I said, I didn't analyze the code in detail, I just made it compile and checked that it worked in bare 2.6.11-rc2 UML - which I am not too sure how to set up and use to begin with! > Jeff > Esben ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: Real-Time Preemption and UML? 2005-02-08 21:20 ` Esben Nielsen @ 2005-02-08 21:44 ` Ingo Molnar 2005-02-08 23:02 ` Esben Nielsen 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-08 21:44 UTC (permalink / raw) To: Esben Nielsen; +Cc: Jeff Dike, linux-kernel * Esben Nielsen <simlo@phys.au.dk> wrote: > Now I don't really know who I am responding to. But both up()s now > changed to complete()s are in something looking very much like an > interrupt handler. But again, as I said, I didn't analyze the code in > detail, I just made it compile and checked that it worked in bare > 2.6.11-rc2 UML - which I am not too sure how to set up and use to > begin with! btw., UML is really easy to begin with: after you've compiled you get a 'linux' binary in the toplevel directory - just execute it via './linux' and you'll see a Linux kernel booting - that's all you need! Add a filesystem image via a root= parameter to that command and the UML kernel will start booting that filesystem image. (if you are adventurous you can even boot a real partition, but for the first user this is strongly discouraged.) There are a number of UML-ready filesystem images downloadable from the net. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: Real-Time Preemption and UML? 2005-02-08 21:44 ` Ingo Molnar @ 2005-02-08 23:02 ` Esben Nielsen 0 siblings, 0 replies; 125+ messages in thread From: Esben Nielsen @ 2005-02-08 23:02 UTC (permalink / raw) To: Ingo Molnar; +Cc: Jeff Dike, linux-kernel On Tue, 8 Feb 2005, Ingo Molnar wrote: > > * Esben Nielsen <simlo@phys.au.dk> wrote: > > > Now I don't really know who I am responding to. But both up()s now > > changed to complete()s are in something looking very much like an > > interrupt handler. But again, as I said, I didn't analyze the code in > > detail, I just made it compile and checked that it worked in bare > > 2.6.11-rc2 UML - which I am not too sure how to set up and use to > > begin with! > > btw., UML is really easy to begin with: after you've compiled you get a > 'linux' binary in the toplevel directory - just execute it via './linux' > and you'll see a Linux kernel booting - that's all you need! > > Add a filesystem image via a root= parameter to that command and the UML > kernel will start booting that filesystem image. (if you are adventurous > you can even boot a real partition, but for the first user this is > strongly discouraged.) There are a number of UML-ready filesystem images > downloadable from the net. > Thanks, I managed to get that far after googling a bit. I have had some problems with the filesystem though. Fixed now (I forgot to compile ext3 in *blush*.) But you might still be interessted in this trace (2.6.11-rc2 with or without my changes): line_ioctl: tty0: ioctl KDSIGACCEPT called Debug: sleeping function called from invalid context at include/asm/arch/semaphore.h:107 in_atomic():0, irqs_disabled():1 Call Trace: a08639e0: [<a003071b>] __might_sleep+0x9b/0xb8 a0863a10: [<a001d364>] uml_console_write+0x20/0x54 a0863a30: [<a00348cc>] __call_console_drivers+0x50/0x58 a0863a60: [<a00349c1>] call_console_drivers+0x7d/0x124 a0863a90: [<a0034f97>] release_console_sem+0xa3/0x25c a0863aa0: [<a0034fb0>] release_console_sem+0xbc/0x25c a0863ac0: [<a0034d3b>] vprintk+0x193/0x2d0 a0863ae0: [<a0034ba6>] printk+0x12/0x14 a0863b00: [<a001e996>] line_ioctl+0x8e/0x94 a0863b24: [<a001e908>] line_ioctl+0x0/0x94 a0863b30: [<a012e031>] tty_ioctl+0xfd/0x680 a0863b80: [<a00a253b>] do_ioctl+0x3f/0x64 a0863bb0: [<a00a2b7d>] sys_ioctl+0x13d/0x350 a0863bd0: [<a008971b>] sys_open+0x5b/0x74 a0863be0: [<a008970c>] sys_open+0x4c/0x74 a0863c00: [<a0018e8d>] execute_syscall_tt+0xa1/0xe0 a0863c1c: [<a01a9357>] sigemptyset+0x17/0x30 a0863c70: [<a0014eb2>] record_syscall_start+0x4e/0x58 a0863c90: [<a0018f0b>] syscall_handler_tt+0x3f/0x74 a0863cc0: [<a001a170>] sig_handler_common_tt+0x90/0x108 a0863cd0: [<a001a1d1>] sig_handler_common_tt+0xf1/0x108 a0863d00: [<a0028c13>] sig_handler+0x1f/0x38 a0863d20: [<a01a9058>] __restore+0x0/0x8 It could look like a semaphore which should be replaced by a spinlock (which will become a mutex in preempt-realtime :-) Esben > Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar ` (2 preceding siblings ...) 2005-02-06 4:19 ` Valdis.Kletnieks @ 2005-02-08 7:55 ` Valdis.Kletnieks 2005-02-08 8:45 ` Ingo Molnar 2005-02-08 21:58 ` William Weston ` (2 subsequent siblings) 6 siblings, 1 reply; 125+ messages in thread From: Valdis.Kletnieks @ 2005-02-08 7:55 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1052 bytes --] On Fri, 04 Feb 2005 11:03:47 +0100, Ingo Molnar said: > > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be > downloaded from the usual place: Hey Ingo.. Sorry to keep breaking stuff on you, but.. ;) Summary: Looks like CONFIG_NET_PKTGEN=y gives -V0.7.38-03 indigestion. I retrofitted 0.7.38-03 onto -rc3-mm1, and at boot it wedged up hard scrolling an error message. Looked like a 'scheduling while atomic' error coming from net/pktgen.o. Sorry for the incomplete traceback, but it locked before userspace came up, and I don't have hardware handy for a serial console.. I found a CONFIG_NET_PKTGEN=Y in the config, rebuilt with =n, and the resulting kernel boots fine (am using it as I type). Vanilla -rc3-mm1 also boots fine with the PTKGEN=y setting (as did 2.6.10-mm1-V0.7.34-01, the last -mm I built with a -RT patch). I haven't tried a vanilla -rc3-V0.7.38-03, but I don't see anyplace -mm1 hits pktgen.c If the above isn't enough to track down the issue, feel free to let me know what you'd like me to try next. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-08 7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks @ 2005-02-08 8:45 ` Ingo Molnar 2005-02-08 10:26 ` Valdis.Kletnieks 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-08 8:45 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel * Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote: > I found a CONFIG_NET_PKTGEN=Y in the config, rebuilt with =n, and the > resulting kernel boots fine (am using it as I type). Vanilla -rc3-mm1 > also boots fine with the PTKGEN=y setting (as did > 2.6.10-mm1-V0.7.34-01, the last -mm I built with a -RT patch). I > haven't tried a vanilla -rc3-V0.7.38-03, but I don't see anyplace -mm1 > hits pktgen.c > > If the above isn't enough to track down the issue, feel free to let me > know what you'd like me to try next. i tried to enable NET_PKTGEN in my vanilla-based -RT tree and it boots/works fine. Could you try a vanilla-based -RT tree too, with NET_PKTGEN enabled, and if it breaks send me your .config - if it doesnt break then could you send me your -mm1 .config? Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-08 8:45 ` Ingo Molnar @ 2005-02-08 10:26 ` Valdis.Kletnieks 0 siblings, 0 replies; 125+ messages in thread From: Valdis.Kletnieks @ 2005-02-08 10:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel [-- Attachment #1.1: Type: text/plain, Size: 707 bytes --] On Tue, 08 Feb 2005 09:45:29 +0100, Ingo Molnar said: > i tried to enable NET_PKTGEN in my vanilla-based -RT tree and it > boots/works fine. Could you try a vanilla-based -RT tree too, with > NET_PKTGEN enabled Plain -rc3-V0.7.38-03 loops at boot as well, so that rules out any -mm1 issues or a botched merge on my part. .config attached. Gut instinct is "yet another thing I broke by compiling with PREEMPT_DESKTOP rather than PREEMPT_RT"... (userspace is Fedora Core -devel tree as of today, gcc-3.4.3-17, just in case this is some squirrelly toolchain issue...) (Feel free to back-burner this issue if somebody has a more severe problem - I'm not in any actual need of NET_PKTGEN at the moment). [-- Attachment #1.2: .config --] [-- Type: text/plain , Size: 34709 bytes --] # # Automatically generated make config: don't edit # Linux kernel version: 2.6.11-rc3-RT-V0.7.38-03 # Tue Feb 8 04:43:25 2005 # CONFIG_X86=y CONFIG_MMU=y CONFIG_UID16=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y # # General setup # CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_SYSCTL=y CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_EMBEDDED=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_OBSOLETE_MODPARM=y # CONFIG_MODVERSIONS is not set CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set CONFIG_MPENTIUM4=y # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_HPET_TIMER=y # CONFIG_SMP is not set # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT_DESKTOP=y # CONFIG_PREEMPT_RT is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_SOFTIRQS=y CONFIG_PREEMPT_HARDIRQS=y # CONFIG_SPINLOCK_BKL is not set CONFIG_PREEMPT_BKL=y CONFIG_ASM_SEMAPHORES=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_TSC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_NONFATAL=y CONFIG_X86_MCE_P4THERMAL=y # CONFIG_TOSHIBA is not set CONFIG_I8K=m CONFIG_MICROCODE=m CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # # CONFIG_EDD is not set CONFIG_NOHIGHMEM=y # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y # CONFIG_EFI is not set CONFIG_HAVE_DEC_LOCK=y CONFIG_REGPARM=y # # Power management options (ACPI, APM) # CONFIG_PM=y # CONFIG_PM_DEBUG is not set # CONFIG_SOFTWARE_SUSPEND is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_BOOT=y CONFIG_ACPI_INTERPRETER=y # CONFIG_ACPI_SLEEP is not set CONFIG_ACPI_AC=m CONFIG_ACPI_BATTERY=m CONFIG_ACPI_BUTTON=m CONFIG_ACPI_VIDEO=y CONFIG_ACPI_FAN=m CONFIG_ACPI_PROCESSOR=m CONFIG_ACPI_THERMAL=m # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 CONFIG_ACPI_DEBUG=y CONFIG_ACPI_BUS=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_PCI=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y # CONFIG_ACPI_CONTAINER is not set # # APM (Advanced Power Management) BIOS Support # # CONFIG_APM is not set # # CPU Frequency scaling # CONFIG_CPU_FREQ=y # CONFIG_CPU_FREQ_DEBUG is not set CONFIG_CPU_FREQ_STAT=y CONFIG_CPU_FREQ_STAT_DETAILS=y CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_POWERSAVE=y CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPU_FREQ_TABLE=y # # CPUFreq processor drivers # # CONFIG_X86_ACPI_CPUFREQ is not set # CONFIG_X86_POWERNOW_K6 is not set # CONFIG_X86_POWERNOW_K7 is not set # CONFIG_X86_POWERNOW_K8 is not set # CONFIG_X86_GX_SUSPMOD is not set # CONFIG_X86_SPEEDSTEP_CENTRINO is not set CONFIG_X86_SPEEDSTEP_ICH=y # CONFIG_X86_SPEEDSTEP_SMI is not set CONFIG_X86_P4_CLOCKMOD=y # CONFIG_X86_CPUFREQ_NFORCE2 is not set # CONFIG_X86_LONGRUN is not set # CONFIG_X86_LONGHAUL is not set # # shared options # CONFIG_X86_SPEEDSTEP_LIB=y # CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK is not set # # Bus options (PCI, PCMCIA, EISA, MCA, ISA) # CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set # CONFIG_PCI_GOMMCONFIG is not set # CONFIG_PCI_GODIRECT is not set CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y # CONFIG_PCIEPORTBUS is not set # CONFIG_PCI_MSI is not set # CONFIG_PCI_LEGACY_PROC is not set CONFIG_PCI_NAMES=y CONFIG_ISA=y # CONFIG_EISA is not set # CONFIG_MCA is not set # CONFIG_SCx200 is not set # # PCCARD (PCMCIA/CardBus) support # CONFIG_PCCARD=m # CONFIG_PCMCIA_DEBUG is not set CONFIG_PCMCIA=m CONFIG_CARDBUS=y # # PC-card bridges # CONFIG_YENTA=m # CONFIG_PD6729 is not set # CONFIG_I82092 is not set # CONFIG_I82365 is not set # CONFIG_TCIC is not set CONFIG_PCMCIA_PROBE=y CONFIG_PCCARD_NONSTATIC=m # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set # # Executable file formats # CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_AOUT is not set CONFIG_BINFMT_MISC=y # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=m # CONFIG_DEBUG_DRIVER is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # CONFIG_PNP=y CONFIG_PNP_DEBUG=y # # Protocols # # CONFIG_ISAPNP is not set # CONFIG_PNPBIOS is not set CONFIG_PNPACPI=y # # Block devices # CONFIG_BLK_DEV_FD=m # CONFIG_BLK_DEV_XD is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_CRYPTOLOOP=y # CONFIG_BLK_DEV_NBD is not set # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=10240 CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_LBD is not set CONFIG_CDROM_PKTCDVD=m CONFIG_CDROM_PKTCDVD_BUFFERS=8 # CONFIG_CDROM_PKTCDVD_WCACHE is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y # CONFIG_BLK_DEV_IDECS is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set CONFIG_IDE_TASK_IOCTL=y # # IDE chipset support/bugfixes # # CONFIG_IDE_GENERIC is not set # CONFIG_BLK_DEV_CMD640 is not set # CONFIG_BLK_DEV_IDEPNP is not set CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y # CONFIG_BLK_DEV_OFFBOARD is not set # CONFIG_BLK_DEV_GENERIC is not set # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_RZ1000 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set # CONFIG_BLK_DEV_ATIIXP is not set # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SIS5513 is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set # CONFIG_IDE_CHIPSETS is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # # CONFIG_SCSI is not set # # Old CD-ROM drivers (not SCSI, not IDE) # # CONFIG_CD_NO_IDESCSI is not set # # Multi-device support (RAID and LVM) # CONFIG_MD=y # CONFIG_BLK_DEV_MD is not set CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y # CONFIG_DM_SNAPSHOT is not set # CONFIG_DM_MIRROR is not set # CONFIG_DM_ZERO is not set # # Fusion MPT device support # # # IEEE 1394 (FireWire) support # CONFIG_IEEE1394=m # # Subsystem Options # # CONFIG_IEEE1394_VERBOSEDEBUG is not set # CONFIG_IEEE1394_OUI_DB is not set # CONFIG_IEEE1394_EXTRA_CONFIG_ROMS is not set # # Device Drivers # # CONFIG_IEEE1394_PCILYNX is not set CONFIG_IEEE1394_OHCI1394=m # # Protocol Drivers # # CONFIG_IEEE1394_VIDEO1394 is not set # CONFIG_IEEE1394_ETH1394 is not set # CONFIG_IEEE1394_DV1394 is not set # CONFIG_IEEE1394_RAWIO is not set # CONFIG_IEEE1394_CMP is not set # # I2O device support # # CONFIG_I2O is not set # # Networking support # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set CONFIG_NETLINK_DEV=y CONFIG_UNIX=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set CONFIG_SYN_COOKIES=y CONFIG_INET_AH=y CONFIG_INET_ESP=y CONFIG_INET_IPCOMP=y CONFIG_INET_TUNNEL=y CONFIG_IP_TCPDIAG=y CONFIG_IP_TCPDIAG_IPV6=y # # IP: Virtual Server Configuration # # CONFIG_IP_VS is not set CONFIG_IPV6=y CONFIG_IPV6_PRIVACY=y CONFIG_INET6_AH=y CONFIG_INET6_ESP=y CONFIG_INET6_IPCOMP=y CONFIG_INET6_TUNNEL=y # CONFIG_IPV6_TUNNEL is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set # # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=m CONFIG_IP_NF_CT_ACCT=y CONFIG_IP_NF_CONNTRACK_MARK=y CONFIG_IP_NF_CT_PROTO_SCTP=m CONFIG_IP_NF_FTP=m CONFIG_IP_NF_IRC=m CONFIG_IP_NF_TFTP=m # CONFIG_IP_NF_AMANDA is not set # CONFIG_IP_NF_QUEUE is not set CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_MATCH_LIMIT=m CONFIG_IP_NF_MATCH_IPRANGE=m CONFIG_IP_NF_MATCH_MAC=m CONFIG_IP_NF_MATCH_PKTTYPE=m CONFIG_IP_NF_MATCH_MARK=m CONFIG_IP_NF_MATCH_MULTIPORT=m CONFIG_IP_NF_MATCH_TOS=m CONFIG_IP_NF_MATCH_RECENT=m CONFIG_IP_NF_MATCH_ECN=m CONFIG_IP_NF_MATCH_DSCP=m CONFIG_IP_NF_MATCH_AH_ESP=m CONFIG_IP_NF_MATCH_LENGTH=m CONFIG_IP_NF_MATCH_TTL=m CONFIG_IP_NF_MATCH_TCPMSS=m CONFIG_IP_NF_MATCH_HELPER=m CONFIG_IP_NF_MATCH_STATE=m CONFIG_IP_NF_MATCH_CONNTRACK=m CONFIG_IP_NF_MATCH_OWNER=m CONFIG_IP_NF_MATCH_ADDRTYPE=m CONFIG_IP_NF_MATCH_REALM=m CONFIG_IP_NF_MATCH_SCTP=m CONFIG_IP_NF_MATCH_COMMENT=m CONFIG_IP_NF_MATCH_CONNMARK=m CONFIG_IP_NF_MATCH_HASHLIMIT=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_TARGET_LOG=m CONFIG_IP_NF_TARGET_ULOG=m CONFIG_IP_NF_TARGET_TCPMSS=m CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=m CONFIG_IP_NF_TARGET_REDIRECT=m CONFIG_IP_NF_TARGET_NETMAP=m CONFIG_IP_NF_TARGET_SAME=m # CONFIG_IP_NF_NAT_SNMP_BASIC is not set CONFIG_IP_NF_NAT_IRC=m CONFIG_IP_NF_NAT_FTP=m CONFIG_IP_NF_NAT_TFTP=m CONFIG_IP_NF_MANGLE=m CONFIG_IP_NF_TARGET_TOS=m CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_DSCP=m CONFIG_IP_NF_TARGET_MARK=m CONFIG_IP_NF_TARGET_CLASSIFY=m CONFIG_IP_NF_TARGET_CONNMARK=m CONFIG_IP_NF_TARGET_CLUSTERIP=m CONFIG_IP_NF_RAW=m CONFIG_IP_NF_TARGET_NOTRACK=m CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_ARP_MANGLE=m # # IPv6: Netfilter Configuration # # CONFIG_IP6_NF_QUEUE is not set CONFIG_IP6_NF_IPTABLES=m CONFIG_IP6_NF_MATCH_LIMIT=m CONFIG_IP6_NF_MATCH_MAC=m CONFIG_IP6_NF_MATCH_RT=m CONFIG_IP6_NF_MATCH_OPTS=m CONFIG_IP6_NF_MATCH_FRAG=m CONFIG_IP6_NF_MATCH_HL=m CONFIG_IP6_NF_MATCH_MULTIPORT=m CONFIG_IP6_NF_MATCH_OWNER=m CONFIG_IP6_NF_MATCH_MARK=m CONFIG_IP6_NF_MATCH_IPV6HEADER=m CONFIG_IP6_NF_MATCH_AHESP=m CONFIG_IP6_NF_MATCH_LENGTH=m CONFIG_IP6_NF_MATCH_EUI64=m CONFIG_IP6_NF_FILTER=m CONFIG_IP6_NF_TARGET_LOG=m CONFIG_IP6_NF_MANGLE=m CONFIG_IP6_NF_TARGET_MARK=m CONFIG_IP6_NF_RAW=m CONFIG_XFRM=y CONFIG_XFRM_USER=y # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set CONFIG_NET_CLS_ROUTE=y # # Network testing # CONFIG_NET_PKTGEN=y # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set CONFIG_NETDEVICES=y CONFIG_DUMMY=y # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set # CONFIG_TUN is not set # CONFIG_ETHERTAP is not set # CONFIG_NET_SB1000 is not set # # ARCnet devices # # CONFIG_ARCNET is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_HAPPYMEAL is not set # CONFIG_SUNGEM is not set CONFIG_NET_VENDOR_3COM=y # CONFIG_EL1 is not set # CONFIG_EL2 is not set # CONFIG_ELPLUS is not set # CONFIG_EL16 is not set # CONFIG_EL3 is not set # CONFIG_3C515 is not set CONFIG_VORTEX=y # CONFIG_TYPHOON is not set # CONFIG_LANCE is not set # CONFIG_NET_VENDOR_SMC is not set # CONFIG_NET_VENDOR_RACAL is not set # # Tulip family network device support # CONFIG_NET_TULIP=y # CONFIG_DE2104X is not set # CONFIG_TULIP is not set # CONFIG_DE4X5 is not set # CONFIG_WINBOND_840 is not set # CONFIG_DM9102 is not set CONFIG_PCMCIA_XIRCOM=y # CONFIG_PCMCIA_XIRTULIP is not set # CONFIG_AT1700 is not set # CONFIG_DEPCA is not set # CONFIG_HP100 is not set # CONFIG_NET_ISA is not set # CONFIG_NET_PCI is not set # CONFIG_NET_POCKET is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # # Ethernet (10000 Mbit) # # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # CONFIG_NET_RADIO=y # # Obsolete Wireless cards support (pre-802.11) # # CONFIG_STRIP is not set # CONFIG_ARLAN is not set # CONFIG_WAVELAN is not set # CONFIG_PCMCIA_WAVELAN is not set # CONFIG_PCMCIA_NETWAVE is not set # # Wireless 802.11 Frequency Hopping cards support # # CONFIG_PCMCIA_RAYCS is not set # # Wireless 802.11b ISA/PCI cards support # # CONFIG_AIRO is not set CONFIG_HERMES=y # CONFIG_PLX_HERMES is not set # CONFIG_TMD_HERMES is not set CONFIG_PCI_HERMES=y # CONFIG_ATMEL is not set # # Wireless 802.11b Pcmcia/Cardbus cards support # CONFIG_PCMCIA_HERMES=m # CONFIG_AIRO_CS is not set # CONFIG_PCMCIA_WL3501 is not set # # Prism GT/Duette 802.11(a/b/g) PCI/Cardbus support # # CONFIG_PRISM54 is not set CONFIG_NET_WIRELESS=y # # PCMCIA network device support # CONFIG_NET_PCMCIA=y # CONFIG_PCMCIA_3C589 is not set # CONFIG_PCMCIA_3C574 is not set # CONFIG_PCMCIA_FMVJ18X is not set # CONFIG_PCMCIA_PCNET is not set # CONFIG_PCMCIA_NMCLAN is not set # CONFIG_PCMCIA_SMC91C92 is not set CONFIG_PCMCIA_XIRC2PS=m # CONFIG_PCMCIA_AXNET is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set CONFIG_PPP=m # CONFIG_PPP_MULTILINK is not set CONFIG_PPP_FILTER=y CONFIG_PPP_ASYNC=m # CONFIG_PPP_SYNC_TTY is not set CONFIG_PPP_DEFLATE=m CONFIG_PPP_BSDCOMP=m # CONFIG_PPPOE is not set # CONFIG_SLIP is not set # CONFIG_SHAPER is not set # CONFIG_NETCONSOLE is not set # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y # CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set # CONFIG_INPUT_EVDEV is not set # CONFIG_INPUT_EVBUG is not set # # Input I/O drivers # # CONFIG_GAMEPORT is not set CONFIG_SOUND_GAMEPORT=y CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_INPORT is not set # CONFIG_MOUSE_LOGIBM is not set # CONFIG_MOUSE_PC110PAD is not set # CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=y # CONFIG_INPUT_UINPUT is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_CS=m # CONFIG_SERIAL_8250_ACPI is not set CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_8250_EXTENDED=y CONFIG_SERIAL_8250_MANY_PORTS=y CONFIG_SERIAL_8250_SHARE_IRQ=y # CONFIG_SERIAL_8250_DETECT_IRQ is not set # CONFIG_SERIAL_8250_MULTIPORT is not set # CONFIG_SERIAL_8250_RSA is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_UNIX98_PTYS=y # CONFIG_LEGACY_PTYS is not set # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # CONFIG_WATCHDOG=y # CONFIG_WATCHDOG_NOWAYOUT is not set # # Watchdog Device Drivers # # CONFIG_SOFT_WATCHDOG is not set # CONFIG_ACQUIRE_WDT is not set # CONFIG_ADVANTECH_WDT is not set # CONFIG_ALIM1535_WDT is not set # CONFIG_ALIM7101_WDT is not set # CONFIG_SC520_WDT is not set # CONFIG_EUROTECH_WDT is not set # CONFIG_IB700_WDT is not set # CONFIG_WAFER_WDT is not set CONFIG_I8XX_TCO=m # CONFIG_SC1200_WDT is not set # CONFIG_SCx200_WDT is not set # CONFIG_60XX_WDT is not set # CONFIG_CPU5_WDT is not set # CONFIG_W83627HF_WDT is not set # CONFIG_W83877F_WDT is not set # CONFIG_MACHZ_WDT is not set # # ISA-based Watchdog Cards # # CONFIG_PCWATCHDOG is not set # CONFIG_MIXCOMWD is not set # CONFIG_WDT is not set # # PCI-based Watchdog Cards # # CONFIG_PCIPCWATCHDOG is not set # CONFIG_WDTPCI is not set # # USB-based Watchdog Cards # # CONFIG_USBPCWATCHDOG is not set CONFIG_HW_RANDOM=y CONFIG_NVRAM=m CONFIG_RTC=m CONFIG_RTC_HISTOGRAM=m CONFIG_BLOCKER=m # CONFIG_GEN_RTC is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # CONFIG_SONYPI is not set # # Ftape, the floppy tape device driver # # CONFIG_FTAPE is not set CONFIG_AGP=m # CONFIG_AGP_ALI is not set # CONFIG_AGP_ATI is not set # CONFIG_AGP_AMD is not set # CONFIG_AGP_AMD64 is not set CONFIG_AGP_INTEL=m # CONFIG_AGP_INTEL_MCH is not set # CONFIG_AGP_NVIDIA is not set # CONFIG_AGP_SIS is not set # CONFIG_AGP_SWORKS is not set # CONFIG_AGP_VIA is not set # CONFIG_AGP_EFFICEON is not set # CONFIG_DRM is not set # # PCMCIA character devices # # CONFIG_SYNCLINK_CS is not set # CONFIG_MWAVE is not set # CONFIG_RAW_DRIVER is not set CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set # CONFIG_HPET_MMAP is not set CONFIG_HANGCHECK_TIMER=y # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # # CONFIG_I2C_ALGOBIT is not set # CONFIG_I2C_ALGOPCF is not set # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_ELEKTOR is not set # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_ISA is not set # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT_LIGHT is not set CONFIG_I2C_PIIX4=y # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Hardware Sensors Chip support # # CONFIG_I2C_SENSOR is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_IT87 is not set # CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_PC87360 is not set # CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # # Other I2C Chip support # # CONFIG_SENSORS_EEPROM is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Misc devices # # CONFIG_IBM_ASM is not set # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # Graphics support # CONFIG_FB=y CONFIG_FB_MODE_HELPERS=y # CONFIG_FB_TILEBLITTING is not set # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set CONFIG_FB_VESA=y CONFIG_VIDEO_SELECT=y # CONFIG_FB_HGA is not set # CONFIG_FB_RIVA is not set # CONFIG_FB_I810 is not set # CONFIG_FB_INTEL is not set # CONFIG_FB_MATROX is not set # CONFIG_FB_RADEON_OLD is not set # CONFIG_FB_RADEON is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_TRIDENT is not set # CONFIG_FB_VIRTUAL is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y # CONFIG_MDA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y # CONFIG_FONTS is not set CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y # # Logo configuration # CONFIG_LOGO=y # CONFIG_LOGO_LINUX_MONO is not set # CONFIG_LOGO_LINUX_VGA16 is not set CONFIG_LOGO_LINUX_CLUT224=y # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Sound # CONFIG_SOUND=y # # Advanced Linux Sound Architecture # CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_SEQUENCER=y # CONFIG_SND_SEQ_DUMMY is not set CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y CONFIG_SND_SEQUENCER_OSS=y # CONFIG_SND_RTCTIMER is not set CONFIG_SND_VERBOSE_PRINTK=y CONFIG_SND_DEBUG=y CONFIG_SND_DEBUG_MEMORY=y CONFIG_SND_DEBUG_DETECT=y # # Generic devices # # CONFIG_SND_DUMMY is not set # CONFIG_SND_VIRMIDI is not set # CONFIG_SND_MTPAV is not set # CONFIG_SND_SERIAL_U16550 is not set # CONFIG_SND_MPU401 is not set # # ISA devices # # CONFIG_SND_AD1848 is not set # CONFIG_SND_CS4231 is not set # CONFIG_SND_CS4232 is not set # CONFIG_SND_CS4236 is not set # CONFIG_SND_ES1688 is not set # CONFIG_SND_ES18XX is not set # CONFIG_SND_GUSCLASSIC is not set # CONFIG_SND_GUSEXTREME is not set # CONFIG_SND_GUSMAX is not set # CONFIG_SND_INTERWAVE is not set # CONFIG_SND_INTERWAVE_STB is not set # CONFIG_SND_OPTI92X_AD1848 is not set # CONFIG_SND_OPTI92X_CS4231 is not set # CONFIG_SND_OPTI93X is not set # CONFIG_SND_SB8 is not set # CONFIG_SND_SB16 is not set # CONFIG_SND_SBAWE is not set # CONFIG_SND_WAVEFRONT is not set # CONFIG_SND_CMI8330 is not set # CONFIG_SND_OPL3SA2 is not set # CONFIG_SND_SGALAXY is not set # CONFIG_SND_SSCAPE is not set # # PCI devices # CONFIG_SND_AC97_CODEC=y # CONFIG_SND_ALI5451 is not set # CONFIG_SND_ATIIXP is not set # CONFIG_SND_ATIIXP_MODEM is not set # CONFIG_SND_AU8810 is not set # CONFIG_SND_AU8820 is not set # CONFIG_SND_AU8830 is not set # CONFIG_SND_AZT3328 is not set # CONFIG_SND_BT87X is not set # CONFIG_SND_CS46XX is not set # CONFIG_SND_CS4281 is not set # CONFIG_SND_EMU10K1 is not set # CONFIG_SND_EMU10K1X is not set # CONFIG_SND_CA0106 is not set # CONFIG_SND_KORG1212 is not set # CONFIG_SND_MIXART is not set # CONFIG_SND_NM256 is not set # CONFIG_SND_RME32 is not set # CONFIG_SND_RME96 is not set # CONFIG_SND_RME9652 is not set # CONFIG_SND_HDSP is not set # CONFIG_SND_TRIDENT is not set # CONFIG_SND_YMFPCI is not set # CONFIG_SND_ALS4000 is not set # CONFIG_SND_CMIPCI is not set # CONFIG_SND_ENS1370 is not set # CONFIG_SND_ENS1371 is not set # CONFIG_SND_ES1938 is not set # CONFIG_SND_ES1968 is not set # CONFIG_SND_MAESTRO3 is not set # CONFIG_SND_FM801 is not set # CONFIG_SND_ICE1712 is not set # CONFIG_SND_ICE1724 is not set CONFIG_SND_INTEL8X0=y # CONFIG_SND_INTEL8X0M is not set # CONFIG_SND_SONICVIBES is not set # CONFIG_SND_VIA82XX is not set # CONFIG_SND_VIA82XX_MODEM is not set # CONFIG_SND_VX222 is not set # # USB devices # # CONFIG_SND_USB_AUDIO is not set # CONFIG_SND_USB_USX2Y is not set # # PCMCIA devices # # CONFIG_SND_VXPOCKET is not set # CONFIG_SND_VXP440 is not set # CONFIG_SND_PDAUDIOCF is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB=y # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # CONFIG_USB_DEVICEFS=y CONFIG_USB_BANDWIDTH=y # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_SUSPEND is not set # CONFIG_USB_OTG is not set CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_SPLIT_ISO is not set # CONFIG_USB_EHCI_ROOT_HUB_TT is not set CONFIG_USB_OHCI_HCD=y CONFIG_USB_UHCI_HCD=y # CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers # # CONFIG_USB_AUDIO is not set # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_MIDI is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information # # CONFIG_USB_STORAGE is not set # # USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y # CONFIG_HID_FF is not set # CONFIG_USB_HIDDEV is not set # CONFIG_USB_AIPTEK is not set # CONFIG_USB_WACOM is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_MTOUCH is not set # CONFIG_USB_EGALAX is not set # CONFIG_USB_XPAD is not set # CONFIG_USB_ATI_REMOTE is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # # USB Multimedia devices # # CONFIG_USB_DABUSB is not set # # Video4Linux support is needed for USB Multimedia device support # # # USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set # # USB port drivers # # # USB Serial Converter support # # CONFIG_USB_SERIAL is not set # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set # CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_IDMOUSE is not set # CONFIG_USB_TEST is not set # # USB ATM/DSL drivers # # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # MMC/SD Card support # # CONFIG_MMC is not set # # InfiniBand support # # CONFIG_INFINIBAND is not set # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y # # XFS support # # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set CONFIG_QUOTA=y # CONFIG_QFMT_V1 is not set CONFIG_QFMT_V2=y CONFIG_QUOTACTL=y CONFIG_DNOTIFY=y # CONFIG_AUTOFS_FS is not set CONFIG_AUTOFS4_FS=y # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=y CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set CONFIG_DEVPTS_FS_XATTR=y CONFIG_DEVPTS_FS_SECURITY=y CONFIG_TMPFS=y CONFIG_TMPFS_XATTR=y CONFIG_TMPFS_SECURITY=y # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=m CONFIG_NFS_V3=y CONFIG_NFS_V4=y CONFIG_NFS_DIRECTIO=y CONFIG_NFSD=m CONFIG_NFSD_V3=y CONFIG_NFSD_V4=y CONFIG_NFSD_TCP=y CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=m CONFIG_SUNRPC=m CONFIG_SUNRPC_GSS=m CONFIG_RPCSEC_GSS_KRB5=m CONFIG_RPCSEC_GSS_SPKM3=m # CONFIG_SMB_FS is not set CONFIG_CIFS=m CONFIG_CIFS_STATS=y CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y # CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # # Partition Types # # CONFIG_PARTITION_ADVANCED is not set CONFIG_MSDOS_PARTITION=y # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set # CONFIG_NLS_ISO8859_15 is not set # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set # CONFIG_NLS_UTF8 is not set # # Profiling support # CONFIG_PROFILING=y CONFIG_OPROFILE=y # # Kernel hacking # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y CONFIG_SCHEDSTATS=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_PREEMPT=y CONFIG_WAKEUP_TIMING=y CONFIG_PREEMPT_TRACE=y # CONFIG_CRITICAL_PREEMPT_TIMING is not set # CONFIG_CRITICAL_IRQSOFF_TIMING is not set CONFIG_LATENCY_TIMING=y # CONFIG_LATENCY_TRACE is not set # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_BUGVERBOSE=y # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_FS is not set CONFIG_USE_FRAME_POINTER=y CONFIG_FRAME_POINTER=y CONFIG_EARLY_PRINTK=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_KPROBES=y CONFIG_DEBUG_STACK_USAGE=y # CONFIG_DEBUG_PAGEALLOC is not set # CONFIG_4KSTACKS is not set CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y # # Security options # CONFIG_KEYS=y CONFIG_KEYS_DEBUG_PROC_KEYS=y CONFIG_SECURITY=y CONFIG_SECURITY_NETWORK=y # CONFIG_SECURITY_CAPABILITIES is not set # CONFIG_SECURITY_ROOTPLUG is not set CONFIG_SECURITY_SECLVL=m CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_SELINUX_BOOTPARAM=y CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1 # CONFIG_SECURITY_SELINUX_DISABLE is not set CONFIG_SECURITY_SELINUX_DEVELOP=y CONFIG_SECURITY_SELINUX_AVC_STATS=y # CONFIG_SECURITY_SELINUX_MLS is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_CAST5=m CONFIG_CRYPTO_CAST6=m CONFIG_CRYPTO_TEA=m CONFIG_CRYPTO_ARC4=m CONFIG_CRYPTO_KHAZAD=m CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_DEFLATE=y # CONFIG_CRYPTO_MICHAEL_MIC is not set CONFIG_CRYPTO_CRC32C=m # CONFIG_CRYPTO_TEST is not set # # Hardware crypto devices # # CONFIG_CRYPTO_DEV_PADLOCK is not set # # Library routines # CONFIG_CRC_CCITT=m CONFIG_CRC32=y CONFIG_LIBCRC32C=m CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar ` (3 preceding siblings ...) 2005-02-08 7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks @ 2005-02-08 21:58 ` William Weston 2005-02-09 11:51 ` Ingo Molnar 2005-02-09 12:48 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley 2005-02-19 5:08 ` Lee Revell 2005-03-11 9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar 6 siblings, 2 replies; 125+ messages in thread From: William Weston @ 2005-02-08 21:58 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 6629 bytes --] Hi Ingo, Great work on the -RT kernel! Here's a status report from my Athlon box w/ kernel -RT-2.6.11-rc3-V0.7.38-03, realtime-lsm-0.8.5, jack-0.99.48, alsa-1.0.8, and latencytest-0.5.5: Latencytest (measured with RTC instead of latencytest LKM, which appears to be somewhat broken under later kernels) is reporting consistent latencies down below 0.24ms and no xruns. Jack_test4.1 is giving me good results with the default settings. I tried increasing the number of clients, but ran into the same issues other have. Jackd (-R -P64 -dalsa -dhw:0 -r44100 -p64 -n3 -i2 -o2) w/ one soft-synth client (using 15% to 30% of the CPU) will run for over 12 hours without any xruns, even during kernel compiles and nightly updatedb runs. Running wmcube (an impractical, greedy, little CPU meter), even when niced, causes lots of xruns. It may be good for worst-case-scenario desktop load testing. A couple BUGs are being logged (see below), but without any ill effect other than taking up space on my /var. jack_test4.1 results (with default settings): ************* SUMMARY RESULT **************** Total seconds ran . . . . . . : 300 Number of clients . . . . . . : 14 Ports per client . . . . . . : 4 Frames per buffer . . . . . . : 64 Number of runs . . . . . . . :( 1) ********************************************* Timeout Count . . . . . . . . :( 0) XRUN Count . . . . . . . . . : 0 Delay Count (>spare time) . . : 0 Delay Count (>1000 usecs) . . : 0 Delay Maximum . . . . . . . . : 92 usecs Cycle Maximum . . . . . . . . : 1100 usecs Average DSP Load. . . . . . . : 60.2 % Average CPU System Load . . . : 24.3 % Average CPU User Load . . . . : 40.2 % Average CPU Nice Load . . . . : 0.3 % Average CPU I/O Wait Load . . : 0.6 % Average CPU IRQ Load . . . . : 0.0 % Average CPU Soft-IRQ Load . . : 0.0 % Average Interrupt Rate . . . : 1751.7 /sec Average Context-Switch Rate . : 18563.4 /sec ********************************************* Delta Maximum . . . . . . . . : 0.00000 ********************************************* Network interface (via rhine) startup triggers these two BUGs: BUG: sleeping function called from invalid context ksoftirqd/0(2) at kernel/rt.c:1448 in_atomic():1 [00000001], irqs_disabled():0 [<c0103e77>] dump_stack+0x17/0x20 (12) [<c0119f89>] __might_sleep+0xd9/0xf0 (40) [<c0134816>] __spin_lock+0x36/0x50 (24) [<c0147914>] kmem_cache_alloc+0x34/0x120 (44) [<c01d3143>] sel_netif_lookup+0x63/0x150 (28) [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28) [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144) [<c02fd248>] udp_queue_rcv_skb+0xb8/0x280 (28) [<c02fd8e2>] udp_rcv+0x192/0x3e0 (100) [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32) [<c02dc595>] ip_rcv+0x215/0x3f0 (56) [<c02c201c>] netif_receive_skb+0x12c/0x160 (40) [<c02c20ce>] process_backlog+0x7e/0x110 (32) [<c02c21d2>] net_rx_action+0x72/0x130 (24) [<c0122428>] ___do_softirq+0x48/0xd0 (40) [<c012254b>] _do_softirq+0x1b/0x30 (8) [<c0122920>] ksoftirqd+0xa0/0xf0 (28) [<c01312fb>] kthread+0x8b/0xc0 (36) [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692) --------------------------- | preempt count: 00000002 ] | 2-level deep critical section nesting: ---------------------------------------- .. [<c013dd3f>] .... __do_IRQ+0xef/0x180 .....[<c0105306>] .. ( <= do_IRQ+0x56/0xa0) .. [<c0135240>] .... print_traces+0x10/0x40 .....[<c0103e77>] .. ( <= dump_stack+0x17/0x20) BUG: sleeping function called from invalid context ksoftirqd/0(2) at kernel/rt.c:1448 in_atomic():1 [00000001], irqs_disabled():0 [<c0103e77>] dump_stack+0x17/0x20 (12) [<c0119f89>] __might_sleep+0xd9/0xf0 (40) [<c0134816>] __spin_lock+0x36/0x50 (24) [<c0147914>] kmem_cache_alloc+0x34/0x120 (44) [<c01d3143>] sel_netif_lookup+0x63/0x150 (28) [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28) [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144) [<c02f6be6>] tcp_v4_rcv+0x4c6/0x8b0 (84) [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32) [<c02dc595>] ip_rcv+0x215/0x3f0 (56) [<c02c201c>] netif_receive_skb+0x12c/0x160 (40) [<c02c20ce>] process_backlog+0x7e/0x110 (32) [<c02c21d2>] net_rx_action+0x72/0x130 (24) [<c0122428>] ___do_softirq+0x48/0xd0 (40) [<c012254b>] _do_softirq+0x1b/0x30 (8) [<c0122920>] ksoftirqd+0xa0/0xf0 (28) [<c01312fb>] kthread+0x8b/0xc0 (36) [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692) --------------------------- | preempt count: 00000002 ] | 2-level deep critical section nesting: ---------------------------------------- .. [<c013dcc4>] .... __do_IRQ+0x74/0x180 .....[<c0105306>] .. ( <= do_IRQ+0x56/0xa0) .. [<c0118922>] .... scheduler_tick+0x62/0x300 .....[<c0107b2d>] .. ( <= timer_interrupt+0x4d/0x160) MIDI playback through any MPU-401 interface triggers the following BUG, reported once for each outgoing MIDI event (non MPU-401 hw interfaces and sw interfaces not affected): BUG: sleeping function called from invalid context ksoftirqd/0(2) at kernel/rt.c:1448 in_atomic():0 [00000000], irqs_disabled():1 [<c0103e77>] dump_stack+0x17/0x20 (12) [<c0119f89>] __might_sleep+0xd9/0xf0 (40) [<c0134816>] __spin_lock+0x36/0x50 (24) [<c013486b>] _spin_lock_irqsave+0xb/0x10 (8) [<e089674a>] snd_rawmidi_transmit_peek+0x3a/0xe0 [snd_rawmidi] (40) [<e088c700>] snd_mpu401_uart_output_write+0x20/0x90 [snd_mpu401_uart] (24) [<e088c7fc>] snd_mpu401_uart_output_trigger+0x8c/0xa0 [snd_mpu401_uart] (20) [<e0896a5c>] snd_rawmidi_kernel_write1+0x17c/0x190 [snd_rawmidi] (48) [<e0896a82>] snd_rawmidi_kernel_write+0x12/0x20 [snd_rawmidi] (12) [<e0c06117>] dump_midi+0x27/0x50 [snd_seq_midi] (16) [<e0c06192>] event_process_midi+0x52/0xb0 [snd_seq_midi] (40) [<e0a11acc>] snd_seq_deliver_single_event+0x12c/0x140 [snd_seq] (40) [<e0a11ca6>] snd_seq_deliver_event+0x36/0x50 [snd_seq] (24) [<e0a11cfb>] snd_seq_dispatch_event+0x3b/0x130 [snd_seq] (68) [<e0a14d4c>] snd_seq_check_queue+0xec/0x110 [snd_seq] (28) [<e0841067>] snd_timer_interrupt+0x2a7/0x2f0 [snd_timer] (56) [<c0126428>] run_timer_softirq+0x1c8/0x3c0 (52) [<c0122428>] ___do_softirq+0x48/0xd0 (40) [<c012254b>] _do_softirq+0x1b/0x30 (8) [<c0122920>] ksoftirqd+0xa0/0xf0 (28) [<c01312fb>] kthread+0x8b/0xc0 (36) [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692) --------------------------- | preempt count: 00000001 ] | 1-level deep critical section nesting: ---------------------------------------- .. [<c0135240>] .... print_traces+0x10/0x40 .....[<c0103e77>] .. ( <= dump_stack+0x17/0x20) Please let me know if there's anything else I can do to help debug this. Best Regards, --William Weston <weston at sysex.net> [-- Attachment #2: Type: TEXT/PLAIN, Size: 46320 bytes --] ver_linux output: Linux astarte.lysdexia.org 2.6.11-rc3-RT-V0.7.38-03 #1 Mon Feb 7 21:05:43 PST 2005 i686 athlon i386 GNU/Linux Gnu C 3.4.2 Gnu make 3.80 binutils 2.15.92.0.2 util-linux 2.12a mount 2.12a module-init-tools 3.1-pre5 e2fsprogs 1.35 jfsutils 1.1.7 reiserfsprogs 3.6.18 reiser4progs line quota-tools 3.12. PPP 2.4.2 isdn4k-utils 3.3 nfs-utils 1.0.6 Linux C Library 2.3.4 Dynamic linker (ldd) 2.3.4 Procps 3.2.3 Net-tools 1.60 Kbd 1.12 Sh-utils 5.2.1 udev 039 Modules Loaded nls_utf8 snd_seq_oss snd_seq_midi it87 eeprom i2c_sensor i2c_isa i2c_viapro i2c_dev i2c_core realtime binfmt_misc fan button ohci1394 ieee1394 snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_seq snd_emu10k1 snd_util_mem snd_hwdep snd_mpu401 snd_via82xx snd_mpu401_uart snd_cs46xx snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc gameport /proc/interrupts: CPU0 0: 51224998 IO-APIC-edge timer 0/24998 1: 82207 IO-APIC-edge i8042 2/82207 3: 102061 IO-APIC-edge MPU401 UART 0/2061 8: 1 IO-APIC-edge rtc 0/1 9: 0 IO-APIC-level acpi 0/0 12: 112406 IO-APIC-edge i8042 0/12405 14: 215412 IO-APIC-edge ide0 0/15262 15: 102 IO-APIC-edge ide1 1/100 16: 3683478 IO-APIC-level ohci1394, radeon@pci:0000:01:00.0 0/83478 17: 53626984 IO-APIC-level CS46XX 0/26984 19: 0 IO-APIC-level EMU10K1 0/0 22: 0 IO-APIC-level VIA8233 0/0 23: 74975 IO-APIC-level eth0 0/74975 NMI: 0 LOC: 51226373 ERR: 0 MIS: 0 /proc/ioports: 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0290-0297 : pnp 00:0c 0290-0297 : it87 0300-0301 : MPU401 UART 0370-0375 : pnp 00:0c 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 0cf8-0cff : PCI conf1 9800-98ff : 0000:00:12.0 9800-98ff : via-rhine a000-a00f : 0000:00:11.1 a000-a007 : ide0 a008-a00f : ide1 b400-b407 : 0000:00:0b.1 b800-b83f : 0000:00:0b.0 b800-b83f : EMU10K1 d000-dfff : PCI Bus #01 d800-d8ff : 0000:01:00.0 e000-e0ff : 0000:00:11.5 e000-e0ff : VIA8233 e400-e47f : motherboard e400-e403 : PM1a_EVT_BLK e404-e405 : PM1a_CNT_BLK e408-e40b : PM_TMR e420-e423 : GPE0_BLK e800-e81f : motherboard e800-e81f : pnp 00:01 e800-e807 : viapro-smbus lspci -vvv output: 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge Subsystem: ASUSTeK Computer Inc. A7V8X motherboard Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 0 Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M] Capabilities: [a0] AGP version 3.5 Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4 Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1 Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: ef000000-efefffff Prefetchable memory behind bridge: eff00000-f7ffffff Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B- Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:0b.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04) Subsystem: Creative Labs SB Audigy 2 ZS (SB0350) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (500ns min, 5000ns max) Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at b800 [size=64] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:0b.1 Input device controller: Creative Labs SB Audigy MIDI/Game port (rev 04) Subsystem: Creative Labs SB Audigy MIDI/Game Port Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Region 0: I/O ports at b400 [disabled] [size=8] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:0b.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04) (prog-if 10 [OHCI]) Subsystem: Creative Labs SB Audigy FireWire Port Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (500ns min, 1000ns max), Cache Line Size 08 Interrupt: pin B routed to IRQ 16 Region 0: Memory at ee800000 (32-bit, non-prefetchable) [size=2K] Region 1: Memory at ee000000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME+ 00:0e.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 [CrystalClear SoundFusion Audio Accelerator] (rev 01) Subsystem: Hercules: Unknown device a010 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (1000ns min, 6000ns max) Interrupt: pin A routed to IRQ 17 Region 0: Memory at ed800000 (32-bit, non-prefetchable) [size=4K] Region 1: Memory at ed000000 (32-bit, non-prefetchable) [size=1M] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge Subsystem: ASUSTeK Computer Inc. A7V8X-X motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP]) Subsystem: ASUSTeK Computer Inc. A7V8X-X motherboard rev. 1.01 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Interrupt: pin A routed to IRQ 255 Region 4: I/O ports at a000 [size=16] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50) Subsystem: ASUSTeK Computer Inc. A7V8X-X Motherboard Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin C routed to IRQ 22 Region 0: I/O ports at e000 [size=256] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) Subsystem: ASUSTeK Computer Inc. A7V8X-X Motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (750ns min, 2000ns max), Cache Line Size 08 Interrupt: pin A routed to IRQ 23 Region 0: I/O ports at 9800 [size=256] Region 1: Memory at ec000000 (32-bit, non-prefetchable) [size=256] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] (prog-if 00 [VGA]) Subsystem: PC Partner Limited: Unknown device 7c28 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (2000ns min), Cache Line Size 08 Interrupt: pin A routed to IRQ 16 Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at d800 [size=256] Region 2: Memory at ef000000 (32-bit, non-prefetchable) [size=64K] Expansion ROM at effe0000 [disabled] [size=128K] Capabilities: [58] AGP version 2.0 Status: RQ=48 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3- Rate=x1,x2,x4 Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- config: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.11-rc3-RT-V0.7.38-03 # Mon Feb 7 20:49:10 2005 # CONFIG_X86=y CONFIG_MMU=y CONFIG_UID16=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y # # General setup # CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_SYSCTL=y CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_OBSOLETE_MODPARM=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set CONFIG_MK7=y # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_SMP is not set # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT_DESKTOP is not set CONFIG_PREEMPT_RT=y CONFIG_PREEMPT=y CONFIG_PREEMPT_SOFTIRQS=y CONFIG_PREEMPT_HARDIRQS=y CONFIG_PREEMPT_BKL=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_TSC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_NONFATAL=y # CONFIG_X86_MCE_P4THERMAL is not set # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_MICROCODE=m CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # # CONFIG_EDD is not set CONFIG_NOHIGHMEM=y # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y # CONFIG_EFI is not set CONFIG_HAVE_DEC_LOCK=y CONFIG_REGPARM=y # # Power management options (ACPI, APM) # CONFIG_PM=y # CONFIG_PM_DEBUG is not set # CONFIG_SOFTWARE_SUSPEND is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_BOOT=y CONFIG_ACPI_INTERPRETER=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y # CONFIG_ACPI_AC is not set # CONFIG_ACPI_BATTERY is not set CONFIG_ACPI_BUTTON=m # CONFIG_ACPI_VIDEO is not set CONFIG_ACPI_FAN=m CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 CONFIG_ACPI_DEBUG=y CONFIG_ACPI_BUS=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_PCI=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y # CONFIG_ACPI_CONTAINER is not set # # APM (Advanced Power Management) BIOS Support # CONFIG_APM=y # CONFIG_APM_IGNORE_USER_SUSPEND is not set # CONFIG_APM_DO_ENABLE is not set # CONFIG_APM_CPU_IDLE is not set # CONFIG_APM_DISPLAY_BLANK is not set CONFIG_APM_RTC_IS_GMT=y # CONFIG_APM_ALLOW_INTS is not set CONFIG_APM_REAL_MODE_POWER_OFF=y # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI, PCMCIA, EISA, MCA, ISA) # CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set # CONFIG_PCI_GOMMCONFIG is not set # CONFIG_PCI_GODIRECT is not set CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y # CONFIG_PCIEPORTBUS is not set # CONFIG_PCI_MSI is not set CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y CONFIG_ISA=y # CONFIG_EISA is not set # CONFIG_MCA is not set # CONFIG_SCx200 is not set # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # PC-card bridges # CONFIG_PCMCIA_PROBE=y # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set # # Executable file formats # CONFIG_BINFMT_ELF=y CONFIG_BINFMT_AOUT=m CONFIG_BINFMT_MISC=m # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=m # CONFIG_DEBUG_DRIVER is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # CONFIG_PNP=y # CONFIG_PNP_DEBUG is not set # # Protocols # CONFIG_ISAPNP=y CONFIG_PNPBIOS=y # CONFIG_PNPBIOS_PROC_FS is not set CONFIG_PNPACPI=y # # Block devices # CONFIG_BLK_DEV_FD=y # CONFIG_BLK_DEV_XD is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_CRYPTOLOOP=m CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=m CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=16384 CONFIG_INITRAMFS_SOURCE="" # CONFIG_LBD is not set CONFIG_CDROM_PKTCDVD=y CONFIG_CDROM_PKTCDVD_BUFFERS=128 # CONFIG_CDROM_PKTCDVD_WCACHE is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set CONFIG_BLK_DEV_IDESCSI=m CONFIG_IDE_TASK_IOCTL=y # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y # CONFIG_BLK_DEV_CMD640 is not set CONFIG_BLK_DEV_IDEPNP=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y # CONFIG_BLK_DEV_OFFBOARD is not set CONFIG_BLK_DEV_GENERIC=y # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_RZ1000 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set # CONFIG_BLK_DEV_ATIIXP is not set # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set # CONFIG_BLK_DEV_PIIX is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SIS5513 is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set CONFIG_BLK_DEV_VIA82CXXX=y # CONFIG_IDE_ARM is not set # CONFIG_IDE_CHIPSETS is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_SCSI=m CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=m # CONFIG_CHR_DEV_ST is not set # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=m CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=m # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # # CONFIG_SCSI_MULTI_LUN is not set CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_LOGGING=y # # SCSI Transport Attributes # # CONFIG_SCSI_SPI_ATTRS is not set # CONFIG_SCSI_FC_ATTRS is not set # CONFIG_SCSI_ISCSI_ATTRS is not set # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_7000FASST is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AHA152X is not set # CONFIG_SCSI_AHA1542 is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_SCSI_DPT_I2O is not set # CONFIG_SCSI_IN2000 is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set # CONFIG_SCSI_SATA is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_DTC3280 is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_EATA_PIO is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_GENERIC_NCR5380 is not set # CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_NCR53C406A is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_PAS16 is not set # CONFIG_SCSI_PSI240I is not set # CONFIG_SCSI_QLOGIC_FAS is not set # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set CONFIG_SCSI_QLA2XXX=m # CONFIG_SCSI_QLA21XX is not set # CONFIG_SCSI_QLA22XX is not set # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set # CONFIG_SCSI_SYM53C416 is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_T128 is not set # CONFIG_SCSI_U14_34F is not set # CONFIG_SCSI_ULTRASTOR is not set # CONFIG_SCSI_NSP32 is not set # CONFIG_SCSI_DEBUG is not set # # Old CD-ROM drivers (not SCSI, not IDE) # # CONFIG_CD_NO_IDESCSI is not set # # Multi-device support (RAID and LVM) # # CONFIG_MD is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # # IEEE 1394 (FireWire) support # CONFIG_IEEE1394=m # # Subsystem Options # # CONFIG_IEEE1394_VERBOSEDEBUG is not set CONFIG_IEEE1394_OUI_DB=y CONFIG_IEEE1394_EXTRA_CONFIG_ROMS=y CONFIG_IEEE1394_CONFIG_ROM_IP1394=y # # Device Drivers # # CONFIG_IEEE1394_PCILYNX is not set CONFIG_IEEE1394_OHCI1394=m # # Protocol Drivers # CONFIG_IEEE1394_VIDEO1394=m CONFIG_IEEE1394_SBP2=m # CONFIG_IEEE1394_SBP2_PHYS_DMA is not set CONFIG_IEEE1394_ETH1394=m CONFIG_IEEE1394_DV1394=m CONFIG_IEEE1394_RAWIO=m CONFIG_IEEE1394_CMP=m CONFIG_IEEE1394_AMDTP=m # # I2O device support # # CONFIG_I2O is not set # # Networking support # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK_DEV=y CONFIG_UNIX=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set # CONFIG_SYN_COOKIES is not set # CONFIG_INET_AH is not set # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set # CONFIG_INET_TUNNEL is not set CONFIG_IP_TCPDIAG=y # CONFIG_IP_TCPDIAG_IPV6 is not set # CONFIG_IPV6 is not set # CONFIG_NETFILTER is not set CONFIG_XFRM=y # CONFIG_XFRM_USER is not set # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set CONFIG_LLC=y CONFIG_LLC2=y # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # CONFIG_NET_CLS_ROUTE is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set CONFIG_NETDEVICES=y # CONFIG_DUMMY is not set # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set CONFIG_TUN=y CONFIG_ETHERTAP=y # CONFIG_NET_SB1000 is not set # # ARCnet devices # # CONFIG_ARCNET is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_HAPPYMEAL is not set # CONFIG_SUNGEM is not set # CONFIG_NET_VENDOR_3COM is not set # CONFIG_LANCE is not set # CONFIG_NET_VENDOR_SMC is not set # CONFIG_NET_VENDOR_RACAL is not set # # Tulip family network device support # # CONFIG_NET_TULIP is not set # CONFIG_AT1700 is not set # CONFIG_DEPCA is not set # CONFIG_HP100 is not set # CONFIG_NET_ISA is not set CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set # CONFIG_AMD8111_ETH is not set # CONFIG_ADAPTEC_STARFIRE is not set # CONFIG_AC3200 is not set # CONFIG_APRICOT is not set # CONFIG_B44 is not set # CONFIG_FORCEDETH is not set # CONFIG_CS89x0 is not set # CONFIG_DGRS is not set # CONFIG_EEPRO100 is not set # CONFIG_E100 is not set # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set # CONFIG_NE2K_PCI is not set # CONFIG_8139CP is not set # CONFIG_8139TOO is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_TLAN is not set CONFIG_VIA_RHINE=y CONFIG_VIA_RHINE_MMIO=y # CONFIG_NET_POCKET is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set # CONFIG_VIA_VELOCITY is not set # CONFIG_TIGON3 is not set # # Ethernet (10000 Mbit) # # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set # CONFIG_NETCONSOLE is not set # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y # CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1152 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=864 CONFIG_INPUT_JOYDEV=m # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y # CONFIG_INPUT_EVBUG is not set # # Input I/O drivers # CONFIG_GAMEPORT=m CONFIG_SOUND_GAMEPORT=m CONFIG_GAMEPORT_NS558=m # CONFIG_GAMEPORT_L4 is not set # CONFIG_GAMEPORT_EMU10K1 is not set # CONFIG_GAMEPORT_VORTEX is not set # CONFIG_GAMEPORT_FM801 is not set CONFIG_GAMEPORT_CS461X=m CONFIG_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=m # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y CONFIG_MOUSE_SERIAL=m # CONFIG_MOUSE_INPORT is not set # CONFIG_MOUSE_LOGIBM is not set # CONFIG_MOUSE_PC110PAD is not set # CONFIG_MOUSE_VSXXXAA is not set CONFIG_INPUT_JOYSTICK=y CONFIG_JOYSTICK_ANALOG=m # CONFIG_JOYSTICK_A3D is not set # CONFIG_JOYSTICK_ADI is not set # CONFIG_JOYSTICK_COBRA is not set # CONFIG_JOYSTICK_GF2K is not set # CONFIG_JOYSTICK_GRIP is not set # CONFIG_JOYSTICK_GRIP_MP is not set # CONFIG_JOYSTICK_GUILLEMOT is not set # CONFIG_JOYSTICK_INTERACT is not set # CONFIG_JOYSTICK_SIDEWINDER is not set # CONFIG_JOYSTICK_TMDC is not set # CONFIG_JOYSTICK_IFORCE is not set # CONFIG_JOYSTICK_WARRIOR is not set # CONFIG_JOYSTICK_MAGELLAN is not set # CONFIG_JOYSTICK_SPACEORB is not set # CONFIG_JOYSTICK_SPACEBALL is not set # CONFIG_JOYSTICK_STINGER is not set # CONFIG_JOYSTICK_TWIDDLER is not set # CONFIG_JOYSTICK_JOYDUMP is not set # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=m CONFIG_INPUT_UINPUT=m # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y # CONFIG_SERIAL_8250_ACPI is not set CONFIG_SERIAL_8250_NR_UARTS=2 CONFIG_SERIAL_8250_EXTENDED=y # CONFIG_SERIAL_8250_MANY_PORTS is not set CONFIG_SERIAL_8250_SHARE_IRQ=y CONFIG_SERIAL_8250_DETECT_IRQ=y # CONFIG_SERIAL_8250_MULTIPORT is not set # CONFIG_SERIAL_8250_RSA is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_UNIX98_PTYS=y # CONFIG_LEGACY_PTYS is not set # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set CONFIG_HW_RANDOM=y CONFIG_NVRAM=m CONFIG_RTC=y CONFIG_RTC_HISTOGRAM=y CONFIG_BLOCKER=y # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # CONFIG_SONYPI is not set # # Ftape, the floppy tape device driver # # CONFIG_FTAPE is not set CONFIG_AGP=y # CONFIG_AGP_ALI is not set # CONFIG_AGP_ATI is not set # CONFIG_AGP_AMD is not set # CONFIG_AGP_AMD64 is not set # CONFIG_AGP_INTEL is not set # CONFIG_AGP_INTEL_MCH is not set # CONFIG_AGP_NVIDIA is not set # CONFIG_AGP_SIS is not set # CONFIG_AGP_SWORKS is not set CONFIG_AGP_VIA=y # CONFIG_AGP_EFFICEON is not set CONFIG_DRM=y # CONFIG_DRM_TDFX is not set # CONFIG_DRM_R128 is not set CONFIG_DRM_RADEON=y # CONFIG_DRM_MGA is not set # CONFIG_DRM_SIS is not set # CONFIG_MWAVE is not set # CONFIG_RAW_DRIVER is not set CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set CONFIG_HPET_MMAP=y # CONFIG_HANGCHECK_TIMER is not set # # I2C support # CONFIG_I2C=m CONFIG_I2C_CHARDEV=m # # I2C Algorithms # # CONFIG_I2C_ALGOBIT is not set # CONFIG_I2C_ALGOPCF is not set # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_ELEKTOR is not set # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set CONFIG_I2C_ISA=m # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PIIX4 is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set CONFIG_I2C_VIAPRO=m # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Hardware Sensors Chip support # CONFIG_I2C_SENSOR=m # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set CONFIG_SENSORS_IT87=m # CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_PC87360 is not set # CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # # Other I2C Chip support # CONFIG_SENSORS_EEPROM=m # CONFIG_SENSORS_PCF8574 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Misc devices # # CONFIG_IBM_ASM is not set # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # Graphics support # # CONFIG_FB is not set CONFIG_VIDEO_SELECT=y # # Console display driver support # CONFIG_VGA_CONSOLE=y # CONFIG_MDA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y # # Sound # CONFIG_SOUND=m # # Advanced Linux Sound Architecture # CONFIG_SND=m CONFIG_SND_TIMER=m CONFIG_SND_PCM=m CONFIG_SND_HWDEP=m CONFIG_SND_RAWMIDI=m CONFIG_SND_SEQUENCER=m CONFIG_SND_SEQ_DUMMY=m CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=m CONFIG_SND_PCM_OSS=m CONFIG_SND_SEQUENCER_OSS=y CONFIG_SND_RTCTIMER=m # CONFIG_SND_VERBOSE_PRINTK is not set # CONFIG_SND_DEBUG is not set # # Generic devices # CONFIG_SND_MPU401_UART=m CONFIG_SND_DUMMY=m CONFIG_SND_VIRMIDI=m # CONFIG_SND_MTPAV is not set CONFIG_SND_SERIAL_U16550=m CONFIG_SND_MPU401=m # # ISA devices # # CONFIG_SND_AD1816A is not set # CONFIG_SND_AD1848 is not set # CONFIG_SND_CS4231 is not set # CONFIG_SND_CS4232 is not set # CONFIG_SND_CS4236 is not set # CONFIG_SND_ES968 is not set # CONFIG_SND_ES1688 is not set # CONFIG_SND_ES18XX is not set # CONFIG_SND_GUSCLASSIC is not set # CONFIG_SND_GUSEXTREME is not set # CONFIG_SND_GUSMAX is not set # CONFIG_SND_INTERWAVE is not set # CONFIG_SND_INTERWAVE_STB is not set # CONFIG_SND_OPTI92X_AD1848 is not set # CONFIG_SND_OPTI92X_CS4231 is not set # CONFIG_SND_OPTI93X is not set # CONFIG_SND_SB8 is not set # CONFIG_SND_SB16 is not set # CONFIG_SND_SBAWE is not set # CONFIG_SND_WAVEFRONT is not set # CONFIG_SND_ALS100 is not set # CONFIG_SND_AZT2320 is not set # CONFIG_SND_CMI8330 is not set # CONFIG_SND_DT019X is not set # CONFIG_SND_OPL3SA2 is not set # CONFIG_SND_SGALAXY is not set # CONFIG_SND_SSCAPE is not set # # PCI devices # CONFIG_SND_AC97_CODEC=m # CONFIG_SND_ALI5451 is not set # CONFIG_SND_ATIIXP is not set # CONFIG_SND_ATIIXP_MODEM is not set # CONFIG_SND_AU8810 is not set # CONFIG_SND_AU8820 is not set # CONFIG_SND_AU8830 is not set # CONFIG_SND_AZT3328 is not set # CONFIG_SND_BT87X is not set CONFIG_SND_CS46XX=m CONFIG_SND_CS46XX_NEW_DSP=y # CONFIG_SND_CS4281 is not set CONFIG_SND_EMU10K1=m # CONFIG_SND_EMU10K1X is not set CONFIG_SND_CA0106=m # CONFIG_SND_KORG1212 is not set # CONFIG_SND_MIXART is not set # CONFIG_SND_NM256 is not set # CONFIG_SND_RME32 is not set # CONFIG_SND_RME96 is not set # CONFIG_SND_RME9652 is not set # CONFIG_SND_HDSP is not set # CONFIG_SND_TRIDENT is not set # CONFIG_SND_YMFPCI is not set # CONFIG_SND_ALS4000 is not set # CONFIG_SND_CMIPCI is not set # CONFIG_SND_ENS1370 is not set # CONFIG_SND_ENS1371 is not set # CONFIG_SND_ES1938 is not set # CONFIG_SND_ES1968 is not set # CONFIG_SND_MAESTRO3 is not set # CONFIG_SND_FM801 is not set # CONFIG_SND_ICE1712 is not set # CONFIG_SND_ICE1724 is not set # CONFIG_SND_INTEL8X0 is not set # CONFIG_SND_INTEL8X0M is not set # CONFIG_SND_SONICVIBES is not set CONFIG_SND_VIA82XX=m # CONFIG_SND_VIA82XX_MODEM is not set # CONFIG_SND_VX222 is not set # # USB devices # CONFIG_SND_USB_AUDIO=m # CONFIG_SND_USB_USX2Y is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB=y # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_SUSPEND is not set # CONFIG_USB_OTG is not set CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=m CONFIG_USB_EHCI_SPLIT_ISO=y CONFIG_USB_EHCI_ROOT_HUB_TT=y CONFIG_USB_OHCI_HCD=m CONFIG_USB_UHCI_HCD=m # CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers # CONFIG_USB_AUDIO=m # CONFIG_USB_BLUETOOTH_TTY is not set CONFIG_USB_MIDI=m # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information # CONFIG_USB_STORAGE=m # CONFIG_USB_STORAGE_DEBUG is not set CONFIG_USB_STORAGE_RW_DETECT=y CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_FREECOM=y CONFIG_USB_STORAGE_ISD200=y CONFIG_USB_STORAGE_DPCM=y CONFIG_USB_STORAGE_HP8200e=y CONFIG_USB_STORAGE_SDDR09=y CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y # # USB Input Devices # CONFIG_USB_HID=m CONFIG_USB_HIDINPUT=y # CONFIG_HID_FF is not set CONFIG_USB_HIDDEV=y # # USB HID Boot Protocol drivers # # CONFIG_USB_KBD is not set # CONFIG_USB_MOUSE is not set # CONFIG_USB_AIPTEK is not set # CONFIG_USB_WACOM is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_MTOUCH is not set # CONFIG_USB_EGALAX is not set CONFIG_USB_XPAD=m # CONFIG_USB_ATI_REMOTE is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set # # USB Multimedia devices # # CONFIG_USB_DABUSB is not set # # Video4Linux support is needed for USB Multimedia device support # # # USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set # # USB port drivers # # # USB Serial Converter support # CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_GENERIC=y CONFIG_USB_SERIAL_BELKIN=m CONFIG_USB_SERIAL_WHITEHEAT=m CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m # CONFIG_USB_SERIAL_CYPRESS_M8 is not set # CONFIG_USB_SERIAL_EMPEG is not set # CONFIG_USB_SERIAL_FTDI_SIO is not set # CONFIG_USB_SERIAL_VISOR is not set CONFIG_USB_SERIAL_IPAQ=m # CONFIG_USB_SERIAL_IR is not set # CONFIG_USB_SERIAL_EDGEPORT is not set # CONFIG_USB_SERIAL_EDGEPORT_TI is not set # CONFIG_USB_SERIAL_GARMIN is not set # CONFIG_USB_SERIAL_IPW is not set CONFIG_USB_SERIAL_KEYSPAN_PDA=m CONFIG_USB_SERIAL_KEYSPAN=m CONFIG_USB_SERIAL_KEYSPAN_MPR=y CONFIG_USB_SERIAL_KEYSPAN_USA28=y CONFIG_USB_SERIAL_KEYSPAN_USA28X=y CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y CONFIG_USB_SERIAL_KEYSPAN_USA19=y CONFIG_USB_SERIAL_KEYSPAN_USA18X=y CONFIG_USB_SERIAL_KEYSPAN_USA19W=y CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y CONFIG_USB_SERIAL_KEYSPAN_USA49W=y CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y CONFIG_USB_SERIAL_KLSI=m # CONFIG_USB_SERIAL_KOBIL_SCT is not set CONFIG_USB_SERIAL_MCT_U232=m CONFIG_USB_SERIAL_PL2303=m CONFIG_USB_SERIAL_SAFE=m CONFIG_USB_SERIAL_SAFE_PADDED=y # CONFIG_USB_SERIAL_TI is not set # CONFIG_USB_SERIAL_CYBERJACK is not set # CONFIG_USB_SERIAL_XIRCOM is not set # CONFIG_USB_SERIAL_OMNINET is not set CONFIG_USB_EZUSB=y # # USB Miscellaneous drivers # CONFIG_USB_EMI62=m CONFIG_USB_EMI26=m # CONFIG_USB_AUERSWALD is not set CONFIG_USB_RIO500=m # CONFIG_USB_LEGOTOWER is not set CONFIG_USB_LCD=m CONFIG_USB_LED=m # CONFIG_USB_CYTHERM is not set # CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_IDMOUSE is not set # CONFIG_USB_TEST is not set # # USB ATM/DSL drivers # # # USB Gadget Support # CONFIG_USB_GADGET=m # CONFIG_USB_GADGET_DEBUG_FILES is not set CONFIG_USB_GADGET_NET2280=y CONFIG_USB_NET2280=m # CONFIG_USB_GADGET_PXA2XX is not set # CONFIG_USB_GADGET_GOKU is not set # CONFIG_USB_GADGET_SA1100 is not set # CONFIG_USB_GADGET_LH7A40X is not set # CONFIG_USB_GADGET_DUMMY_HCD is not set # CONFIG_USB_GADGET_OMAP is not set CONFIG_USB_GADGET_DUALSPEED=y # CONFIG_USB_ZERO is not set # CONFIG_USB_ETH is not set # CONFIG_USB_GADGETFS is not set # CONFIG_USB_FILE_STORAGE is not set CONFIG_USB_G_SERIAL=m # # MMC/SD Card support # # CONFIG_MMC is not set # # InfiniBand support # # CONFIG_INFINIBAND is not set # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y CONFIG_JBD_DEBUG=y CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y # # XFS support # # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set CONFIG_ROMFS_FS=m # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m CONFIG_AUTOFS4_FS=m # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="ascii" CONFIG_NTFS_FS=m # CONFIG_NTFS_DEBUG is not set # CONFIG_NTFS_RW is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set CONFIG_DEVPTS_FS_XATTR=y CONFIG_DEVPTS_FS_SECURITY=y CONFIG_TMPFS=y CONFIG_TMPFS_XATTR=y CONFIG_TMPFS_SECURITY=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set CONFIG_AFFS_FS=m CONFIG_HFS_FS=m CONFIG_HFSPLUS_FS=m # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set CONFIG_CRAMFS=m # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=m CONFIG_NFS_V3=y # CONFIG_NFS_V4 is not set # CONFIG_NFS_DIRECTIO is not set CONFIG_NFSD=m CONFIG_NFSD_V3=y # CONFIG_NFSD_V4 is not set # CONFIG_NFSD_TCP is not set CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=m CONFIG_SUNRPC=m # CONFIG_RPCSEC_GSS_KRB5 is not set # CONFIG_RPCSEC_GSS_SPKM3 is not set CONFIG_SMB_FS=m # CONFIG_SMB_NLS_DEFAULT is not set CONFIG_CIFS=m # CONFIG_CIFS_STATS is not set CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y # CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set CONFIG_ATARI_PARTITION=y CONFIG_MAC_PARTITION=y CONFIG_MSDOS_PARTITION=y # CONFIG_BSD_DISKLABEL is not set # CONFIG_MINIX_SUBPARTITION is not set # CONFIG_SOLARIS_X86_PARTITION is not set # CONFIG_UNIXWARE_DISKLABEL is not set CONFIG_LDM_PARTITION=y # CONFIG_LDM_DEBUG is not set # CONFIG_SGI_PARTITION is not set # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_EFI_PARTITION is not set # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="utf8" CONFIG_NLS_CODEPAGE_437=m # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set CONFIG_NLS_ASCII=m CONFIG_NLS_ISO8859_1=m # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set CONFIG_NLS_ISO8859_15=m # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set CONFIG_NLS_UTF8=m # # Profiling support # CONFIG_PROFILING=y # CONFIG_OPROFILE is not set # # Kernel hacking # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y # CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set CONFIG_DEBUG_PREEMPT=y CONFIG_WAKEUP_TIMING=y CONFIG_PREEMPT_TRACE=y # CONFIG_CRITICAL_PREEMPT_TIMING is not set # CONFIG_CRITICAL_IRQSOFF_TIMING is not set CONFIG_LATENCY_TIMING=y # CONFIG_LATENCY_TRACE is not set CONFIG_RT_DEADLOCK_DETECT=y # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_INFO=y # CONFIG_DEBUG_FS is not set CONFIG_USE_FRAME_POINTER=y CONFIG_FRAME_POINTER=y CONFIG_EARLY_PRINTK=y CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y # CONFIG_DEBUG_PAGEALLOC is not set CONFIG_4KSTACKS=y CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y # # Security options # # CONFIG_KEYS is not set CONFIG_SECURITY=y CONFIG_SECURITY_NETWORK=y CONFIG_SECURITY_CAPABILITIES=m # CONFIG_SECURITY_ROOTPLUG is not set CONFIG_SECURITY_SECLVL=m CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_SELINUX_BOOTPARAM=y CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1 # CONFIG_SECURITY_SELINUX_DISABLE is not set CONFIG_SECURITY_SELINUX_DEVELOP=y CONFIG_SECURITY_SELINUX_AVC_STATS=y # CONFIG_SECURITY_SELINUX_MLS is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=m CONFIG_CRYPTO_SHA1=m CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_DES=m CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_CAST5=m CONFIG_CRYPTO_CAST6=m CONFIG_CRYPTO_TEA=m CONFIG_CRYPTO_ARC4=m CONFIG_CRYPTO_KHAZAD=m CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_DEFLATE=m CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_CRC32C=m CONFIG_CRYPTO_TEST=m # # Hardware crypto devices # # CONFIG_CRYPTO_DEV_PADLOCK is not set # # Library routines # CONFIG_CRC_CCITT=m CONFIG_CRC32=y CONFIG_LIBCRC32C=m CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=m CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_PC=y ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-08 21:58 ` William Weston @ 2005-02-09 11:51 ` Ingo Molnar 2005-02-10 2:13 ` William Weston 2005-02-09 12:48 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley 1 sibling, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-09 11:51 UTC (permalink / raw) To: William Weston; +Cc: linux-kernel * William Weston <weston@sysex.net> wrote: > Jackd (-R -P64 -dalsa -dhw:0 -r44100 -p64 -n3 -i2 -o2) w/ one > soft-synth client (using 15% to 30% of the CPU) will run for over 12 > hours without any xruns, even during kernel compiles and nightly > updatedb runs. > > Running wmcube (an impractical, greedy, little CPU meter), even when > niced, causes lots of xruns. It may be good for worst-case-scenario > desktop load testing. this phenomenon is very weird. Firstly, make sure that all relevant threads (including the soundcard IRQ thread, jackd threads, jack client thread, etc.) have higher RT priority than any other, latency-irrelevant threads in the system. If everything looks OK on the priority-administration side, could you enable wakeup-latency tracing via: CONFIG_WAKEUP_TIMING=y CONFIG_PREEMPT_TRACE=y # CONFIG_CRITICAL_PREEMPT_TIMING is not set # CONFIG_CRITICAL_IRQSOFF_TIMING is not set CONFIG_LATENCY_TIMING=y CONFIG_LATENCY_TRACE=y It should look like this in the Kernel Hacking menu of menuconfig: [*] Wakeup latency timing [ ] Non-preemptible critical section latency timing [ ] Interrupts-off critical section latency timing [*] Latency tracing what is the longest wakeup latency the tracer shows? You can start the measurement anew via: echo 0 > /proc/sys/kernel/preempt_max_latency every new maximum-latency event will be logged by the kernel, and the trace of the latest worst-case latency path can be found under /proc/latency_trace. (If the trace is very long then most of the time it's OK to just send the first 25 and last 10 lines. Putting the trace up to a website is a good solution too.) it should not matter how 'greedy' wmcube is. Does it do alot of graphics activity (perhaps 3D too?) - that could in theory cause hardware latencies - the latency traces will tell. > MIDI playback through any MPU-401 interface triggers the following > BUG, reported once for each outgoing MIDI event (non MPU-401 hw > interfaces and sw interfaces not affected): the patch below should fix this. (also included in -38-06 and later kernels.) Ingo --- linux/sound/drivers/mpu401/mpu401_uart.c.orig +++ linux/sound/drivers/mpu401/mpu401_uart.c @@ -316,12 +316,12 @@ static void snd_mpu401_uart_input_trigge /* read data in advance */ /* prevent double enter via rawmidi->event callback */ if (atomic_dec_and_test(&mpu->rx_loop)) { - local_irq_save(flags); + local_irq_save_nort(flags); if (spin_trylock(&mpu->input_lock)) { snd_mpu401_uart_input_read(mpu); spin_unlock(&mpu->input_lock); } - local_irq_restore(flags); + local_irq_restore_nort(flags); } atomic_inc(&mpu->rx_loop); } else { @@ -407,12 +407,12 @@ static void snd_mpu401_uart_output_trigg /* output pending data */ /* prevent double enter via rawmidi->event callback */ if (atomic_dec_and_test(&mpu->tx_loop)) { - local_irq_save(flags); + local_irq_save_nort(flags); if (spin_trylock(&mpu->output_lock)) { snd_mpu401_uart_output_write(mpu); spin_unlock(&mpu->output_lock); } - local_irq_restore(flags); + local_irq_restore_nort(flags); } atomic_inc(&mpu->tx_loop); } else { ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-09 11:51 ` Ingo Molnar @ 2005-02-10 2:13 ` William Weston 2005-02-10 7:52 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: William Weston @ 2005-02-10 2:13 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Wed, 9 Feb 2005, Ingo Molnar wrote: > > Running wmcube (an impractical, greedy, little CPU meter), even when > > niced, causes lots of xruns. It may be good for worst-case-scenario > > desktop load testing. > > this phenomenon is very weird. > > Firstly, make sure that all relevant threads (including the soundcard > IRQ thread, jackd threads, jack client thread, etc.) have higher RT > priority than any other, latency-irrelevant threads in the system. Thanks for the tip. I have schedtool installed, and all audio/MIDI IRQ threads, jack threads, and jack clients are now run with higher priorities than everything else. Before I adjusted priorities, I was getting a bunch of these when running latencytest (which have since disappeared): rtc: lost some interrupts at 8192Hz. bug in rtc_read(): called in state S_IDLE! IRQ 8 (RTC) is still giving me some issues, even after adjusting priorities: `IRQ 8'[232] is being piggy. need_resched=0, cpu=0 Read missed before next interrupt Should the RTC IRQ be given a new priority? If so, should it be lower, higher, or equal to the audio/MIDI/jack priorities? > If everything looks OK on the priority-administration side, could you > enable wakeup-latency tracing via: > > CONFIG_WAKEUP_TIMING=y > CONFIG_PREEMPT_TRACE=y > # CONFIG_CRITICAL_PREEMPT_TIMING is not set > # CONFIG_CRITICAL_IRQSOFF_TIMING is not set > CONFIG_LATENCY_TIMING=y > CONFIG_LATENCY_TRACE=y <snip> > what is the longest wakeup latency the tracer shows? You can start the > measurement anew via: > > echo 0 > /proc/sys/kernel/preempt_max_latency Max latency is in the realm of 13-18 after runs of jack_test4.1. > every new maximum-latency event will be logged by the kernel, and the > trace of the latest worst-case latency path can be found under > /proc/latency_trace. > > (If the trace is very long then most of the time it's OK to just send > the first 25 and last 10 lines. Putting the trace up to a website is a > good solution too.) See http://www.sysex.net/testing/ for the all of the test results and system info on a 2.6.11-rc3-RT-V0.7.38-06 kernel. This is from my most recent run of jack_test4.1 with wmcube and kernel compilation running (check /testing/dmesg for more): ( sshd-5940 |#0): new 4 s maximum-latency wakeup. ( IRQ 16-1803 |#0): new 5 s maximum-latency wakeup. ( make-28375|#0): new 6 s maximum-latency wakeup. ( ksoftirqd/0-2 |#0): new 6 s maximum-latency wakeup. ( ksoftirqd/0-2 |#0): new 7 s maximum-latency wakeup. ( ksoftirqd/0-2 |#0): new 8 s maximum-latency wakeup. ( ksoftirqd/0-2 |#0): new 8 s maximum-latency wakeup. ( ksoftirqd/0-2 |#0): new 9 s maximum-latency wakeup. ( ksoftirqd/0-2 |#0): new 10 s maximum-latency wakeup. ( ksoftirqd/0-2 |#0): new 10 s maximum-latency wakeup. ( jackd-29348|#0): new 12 s maximum-latency wakeup. ( jackd-29348|#0): new 14 s maximum-latency wakeup. ( jackd-29348|#0): new 15 s maximum-latency wakeup. > it should not matter how 'greedy' wmcube is. Does it do alot of graphics > activity (perhaps 3D too?) - that could in theory cause hardware > latencies - the latency traces will tell. Wmcube displays a 3D spinning cube, which spins faster (actually performs larger rotations between updates) when CPU usage goes up. When running niced, wmcube uses about 1% to 4% of the CPU, adds about 1000 context switches per second, and increases X load by 1% to 3% of the total CPU. Now that the priorities are tuned, I get no xruns while running wmcube, compiling a kernel, and running latencytest or jack_test4.1. > > MIDI playback through any MPU-401 interface triggers the following > > BUG, reported once for each outgoing MIDI event (non MPU-401 hw > > interfaces and sw interfaces not affected): > > the patch below should fix this. (also included in -38-06 and later > kernels.) > > Ingo > > --- linux/sound/drivers/mpu401/mpu401_uart.c.orig > +++ linux/sound/drivers/mpu401/mpu401_uart.c > @@ -316,12 +316,12 @@ static void snd_mpu401_uart_input_trigge > /* read data in advance */ > /* prevent double enter via rawmidi->event callback */ > if (atomic_dec_and_test(&mpu->rx_loop)) { > - local_irq_save(flags); > + local_irq_save_nort(flags); > if (spin_trylock(&mpu->input_lock)) { > snd_mpu401_uart_input_read(mpu); > spin_unlock(&mpu->input_lock); > } > - local_irq_restore(flags); > + local_irq_restore_nort(flags); > } > atomic_inc(&mpu->rx_loop); > } else { > @@ -407,12 +407,12 @@ static void snd_mpu401_uart_output_trigg > /* output pending data */ > /* prevent double enter via rawmidi->event callback */ > if (atomic_dec_and_test(&mpu->tx_loop)) { > - local_irq_save(flags); > + local_irq_save_nort(flags); > if (spin_trylock(&mpu->output_lock)) { > snd_mpu401_uart_output_write(mpu); > spin_unlock(&mpu->output_lock); > } > - local_irq_restore(flags); > + local_irq_restore_nort(flags); > } > atomic_inc(&mpu->tx_loop); > } else { This patch does fix the MIDI playback BUG I was seeing. Best Regards, --William Weston <weston at sysex.net> ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-10 2:13 ` William Weston @ 2005-02-10 7:52 ` Ingo Molnar 2005-02-10 20:21 ` George Anzinger 2005-03-03 19:36 ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints 0 siblings, 2 replies; 125+ messages in thread From: Ingo Molnar @ 2005-02-10 7:52 UTC (permalink / raw) To: William Weston; +Cc: linux-kernel * William Weston <weston@lysdexia.org> wrote: > > what is the longest wakeup latency the tracer shows? You can start the > > measurement anew via: > > > > echo 0 > /proc/sys/kernel/preempt_max_latency > > Max latency is in the realm of 13-18 after runs of jack_test4.1. that's 13-18 microsecond worst-case delay from point of wakeup to the point the woken up task has been context-switched to - pretty good for a generic OS ;-) > See http://www.sysex.net/testing/ for the all of the test results and > system info on a 2.6.11-rc3-RT-V0.7.38-06 kernel. your latency traces look perfectly fine, and the jack_test results look good too. > Now that the priorities are tuned, I get no xruns while running > wmcube, compiling a kernel, and running latencytest or jack_test4.1. ah, very good! Now that the setup is properly tuned for audio latencies, you might want to try to push up the number of jack_test clients again, to see how far you can go. Right now there's a ~50% DSP load with 14 clients, so maybe you can push it up to 20 clients. (for this test you'll likely want to turn off all options in the 'Kernel Hacking' menu - they increase overhead. Otherwise you probably want to run with the current options, so that you can send me BUG and latency traces ;) ) > This patch does fix the MIDI playback BUG I was seeing. ok. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-10 7:52 ` Ingo Molnar @ 2005-02-10 20:21 ` George Anzinger 2005-02-10 20:40 ` Ingo Molnar 2005-02-11 0:09 ` Sven Dietrich 2005-03-03 19:36 ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints 1 sibling, 2 replies; 125+ messages in thread From: George Anzinger @ 2005-02-10 20:21 UTC (permalink / raw) To: Ingo Molnar; +Cc: William Weston, linux-kernel If I want to write a patch that will work with or without the RT patch applied is the following enough? #ifndef RAW_SPIN_LOCK_UNLOCKED typedef raw_spinlock_t spinlock_t #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED #endif -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-10 20:21 ` George Anzinger @ 2005-02-10 20:40 ` Ingo Molnar 2005-02-10 21:05 ` George Anzinger 2005-02-11 0:09 ` Sven Dietrich 1 sibling, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-10 20:40 UTC (permalink / raw) To: George Anzinger; +Cc: William Weston, linux-kernel * George Anzinger <george@mvista.com> wrote: > If I want to write a patch that will work with or without the RT patch > applied is the following enough? > > #ifndef RAW_SPIN_LOCK_UNLOCKED > typedef raw_spinlock_t spinlock_t > #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED > #endif yeah. (but you should rather use DEFINE_SPINLOCK/DEFINE_RAW_SPINLOCK) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-10 20:40 ` Ingo Molnar @ 2005-02-10 21:05 ` George Anzinger 2005-02-11 8:34 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: George Anzinger @ 2005-02-10 21:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: William Weston, linux-kernel I am seeing: kernel/built-in.o(.text+0x4974): In function `copy_mm': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/fork.c:493: undefined reference to `__spin_is_locked' kernel/built-in.o(.text+0x9f5a): In function `next_thread': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/exit.c:877: undefined reference to `__raw_rwlock_is_locked' net/built-in.o(.text+0x1258): In function `__sock_create': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/socket.c:175: undefined reference to `__spin_is_locked' net/built-in.o(.text+0x16b54): In function `dev_deactivate': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/sched/sch_generic.c:594: undefined reference to `__spin_is_locked' make[1]: *** [vmlinux] Error 1 make: *** [bzImage] Error 2 Possibly from: define __raw_spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) #define __raw_spin_unlock_wait(x) \ do { barrier(); } while(__spin_is_locked(x)) in asm/spinlock.h should that be __raw_spin_is_locked(x) instead? -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-10 21:05 ` George Anzinger @ 2005-02-11 8:34 ` Ingo Molnar 2005-02-11 9:38 ` Sven Dietrich 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-11 8:34 UTC (permalink / raw) To: George Anzinger; +Cc: William Weston, linux-kernel * George Anzinger <george@mvista.com> wrote: > Possibly from: > define __raw_spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) > #define __raw_spin_unlock_wait(x) \ > do { barrier(); } while(__spin_is_locked(x)) > in asm/spinlock.h > > should that be __raw_spin_is_locked(x) instead? yeah. Is this in the ARM patch? I havent applied the ARM patch yet, waiting to see Thomas Gleixner's generic-hardirq based one. (which is more compelling from an architectural and long-term maintainance POV - but also more work to address all of RMK's concerns.) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 8:34 ` Ingo Molnar @ 2005-02-11 9:38 ` Sven Dietrich 2005-02-11 9:42 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Sven Dietrich @ 2005-02-11 9:38 UTC (permalink / raw) To: 'Ingo Molnar', 'George Anzinger' Cc: 'William Weston', linux-kernel No, this is not in arm. Here is the patch. Index: linux-2.6.10/include/asm-i386/spinlock.h =================================================================== --- linux-2.6.10.orig/include/asm-i386/spinlock.h 2005-02-11 09:25:39.224240321 +0000 +++ linux-2.6.10/include/asm-i386/spinlock.h 2005-02-11 09:25:58.006812173 +0000 @@ -30,7 +30,7 @@ #define __raw_spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) #define __raw_spin_unlock_wait(x) \ - do { barrier(); } while(__spin_is_locked(x)) + do { barrier(); } while(__raw_spin_is_locked(x)) #define spin_lock_string \ "\n1:\t" \ > -----Original Message----- > From: linux-kernel-owner@vger.kernel.org > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Ingo Molnar > Sent: Friday, February 11, 2005 12:34 AM > To: George Anzinger > Cc: William Weston; linux-kernel@vger.kernel.org > Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 > > > > * George Anzinger <george@mvista.com> wrote: > > > Possibly from: > > define __raw_spin_is_locked(x) (*(volatile signed char > *)(&(x)->lock) <= 0) > > #define __raw_spin_unlock_wait(x) \ > > do { barrier(); } while(__spin_is_locked(x)) > > in asm/spinlock.h > > > > should that be __raw_spin_is_locked(x) instead? > > yeah. Is this in the ARM patch? I havent applied the ARM > patch yet, waiting to see Thomas Gleixner's generic-hardirq > based one. (which is more compelling from an architectural > and long-term maintainance POV - but also more work to > address all of RMK's concerns.) > > Ingo > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to > majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 9:38 ` Sven Dietrich @ 2005-02-11 9:42 ` Ingo Molnar 0 siblings, 0 replies; 125+ messages in thread From: Ingo Molnar @ 2005-02-11 9:42 UTC (permalink / raw) To: Sven Dietrich Cc: 'George Anzinger', 'William Weston', linux-kernel * Sven Dietrich <sdietrich@mvista.com> wrote: > No, this is not in arm. Here is the patch. > > Index: linux-2.6.10/include/asm-i386/spinlock.h what version do you have? The current released patch is 2.6.11-rc3-V0.7.38-10. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-10 20:21 ` George Anzinger 2005-02-10 20:40 ` Ingo Molnar @ 2005-02-11 0:09 ` Sven Dietrich 2005-02-11 6:01 ` George Anzinger 2005-02-11 8:28 ` Ingo Molnar 1 sibling, 2 replies; 125+ messages in thread From: Sven Dietrich @ 2005-02-11 0:09 UTC (permalink / raw) To: george, 'Ingo Molnar'; +Cc: 'William Weston', linux-kernel [-- Attachment #1: Type: text/plain, Size: 1218 bytes --] Hi George, you may want to use this for reference. This patch adds a config option to allow you to select whether timer IRQ runs in thread or not. I'm not totally happy with the #ifdefs, but it may make witching back and forth easier. Sven > -----Original Message----- > From: linux-kernel-owner@vger.kernel.org > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of > George Anzinger > Sent: Thursday, February 10, 2005 12:21 PM > To: Ingo Molnar > Cc: William Weston; linux-kernel@vger.kernel.org > Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 > > > If I want to write a patch that will work with or without the > RT patch applied > is the following enough? > > #ifndef RAW_SPIN_LOCK_UNLOCKED > typedef raw_spinlock_t spinlock_t > #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED > #endif > > > -- > George Anzinger george@mvista.com > High-res-timers: http://sourceforge.net/projects/high-res-timers/ > > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to > majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > [-- Attachment #2: common_timer_irqthread.patch --] [-- Type: application/octet-stream, Size: 1956 bytes --] Index: linux-2.6.10-Omap1710/include/linux/time.h =================================================================== --- linux-2.6.10-Omap1710.orig/include/linux/time.h 2005-02-03 09:06:40.378530238 +0000 +++ linux-2.6.10-Omap1710/include/linux/time.h 2005-02-03 09:20:37.703894461 +0000 @@ -80,7 +80,20 @@ extern struct timespec xtime; extern struct timespec wall_to_monotonic; -extern raw_seqlock_t xtime_lock; + +#ifndef ARCH_HAVE_XTIME_LOCK + + #ifdef PREEMPT_TIMER_IRQ + #define XTIME_LOCK_T seqlock_t + #define DECLARE_XTIME_LOCK DECLARE_SEQLOCK(xtime_lock) + #else + #define XTIME_LOCK_T raw_seqlock_t + #define DECLARE_XTIME_LOCK DECLARE_RAW_SEQLOCK(xtime_lock) + #endif + +extern XTIME_LOCK_T xtime_lock; + +#endif static inline unsigned long get_seconds(void) { Index: linux-2.6.10-Omap1710/kernel/timer.c =================================================================== --- linux-2.6.10-Omap1710.orig/kernel/timer.c 2005-02-03 09:06:40.379529900 +0000 +++ linux-2.6.10-Omap1710/kernel/timer.c 2005-02-03 09:52:42.418866172 +0000 @@ -943,7 +943,7 @@ * playing with xtime and avenrun. */ #ifndef ARCH_HAVE_XTIME_LOCK -DECLARE_RAW_SEQLOCK(xtime_lock); +DECLARE_XTIME_LOCK; EXPORT_SYMBOL(xtime_lock); #endif Index: linux-2.6.10-Omap1710/lib/Kconfig.RT =================================================================== --- linux-2.6.10-Omap1710.orig/lib/Kconfig.RT 2005-02-03 09:06:40.379529900 +0000 +++ linux-2.6.10-Omap1710/lib/Kconfig.RT 2005-02-03 09:06:49.185545306 +0000 @@ -119,6 +119,14 @@ Say N if you are unsure. +config PREEMPT_TIMER_IRQ + bool "Run timer IRQ in a thread" + default y + depends on PREEMPT_HARDIRQS && ARM + help + This declares the xtime_lock as a mutex and allows + running the timer interrupt in a thread. + config SPINLOCK_BKL bool "Old-Style Big Kernel Lock" depends on (PREEMPT || SMP) && !PREEMPT_RT ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 0:09 ` Sven Dietrich @ 2005-02-11 6:01 ` George Anzinger 2005-02-11 8:28 ` Ingo Molnar 1 sibling, 0 replies; 125+ messages in thread From: George Anzinger @ 2005-02-11 6:01 UTC (permalink / raw) To: Sven Dietrich Cc: 'Ingo Molnar', 'William Weston', linux-kernel Sven Dietrich wrote: > Hi George, > > you may want to use this for reference. > > This patch adds a config option to allow you to select whether timer IRQ runs in thread or not. > > I'm not totally happy with the #ifdefs, but it may make witching back and forth easier. Thanks, but... You are addressing a different problem than I. I want to code the VST patch to work in a system with or without the RT patch (it is easy to work with the RT option on or off). The problem is setting up the spin locks it needs. My solution assumes that RAW_SPIN_LOCK_UNLOCKED will not be defined unless the RT patch is applied. As to your patch, in most archs the timer interrupt does accounting which requires input on just who was interrupted on the interrupt. This is lost when threading the timer IRQ. I think it was problems of this sort that caused Ingo to back away... George PS By the way, your mailer (Microsoft Outlook????) set up your attachment in such a way that my mailer would not inline it. You might want to look into this. > > Sven > > > >>-----Original Message----- >>From: linux-kernel-owner@vger.kernel.org >>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of >>George Anzinger >>Sent: Thursday, February 10, 2005 12:21 PM >>To: Ingo Molnar >>Cc: William Weston; linux-kernel@vger.kernel.org >>Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 >> >> >>If I want to write a patch that will work with or without the >>RT patch applied >>is the following enough? >> >>#ifndef RAW_SPIN_LOCK_UNLOCKED >>typedef raw_spinlock_t spinlock_t >>#define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED >>#endif >> >> >>-- >>George Anzinger george@mvista.com >>High-res-timers: http://sourceforge.net/projects/high-res-timers/ >> >>- >>To unsubscribe from this list: send the line "unsubscribe >>linux-kernel" in the body of a message to >>majordomo@vger.kernel.org More majordomo info at >>http://vger.kernel.org/majordomo-info.html >>Please read the FAQ at http://www.tux.org/lkml/ >> -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 0:09 ` Sven Dietrich 2005-02-11 6:01 ` George Anzinger @ 2005-02-11 8:28 ` Ingo Molnar 2005-02-11 9:53 ` Sven Dietrich 1 sibling, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-11 8:28 UTC (permalink / raw) To: Sven Dietrich; +Cc: george, 'William Weston', linux-kernel * Sven Dietrich <sdietrich@mvista.com> wrote: > This patch adds a config option to allow you to select whether timer > IRQ runs in thread or not. this patch only changes xtime_lock back and forth - it does in no way impact the 'threadedness' of the timer IRQ. (it does not move the timer IRQ into an interrupt thread.) nor do we really want to make it configurable - it's non-threaded right now and we'll see what effect this has on the worst-case latencies. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 8:28 ` Ingo Molnar @ 2005-02-11 9:53 ` Sven Dietrich 2005-02-11 10:04 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Sven Dietrich @ 2005-02-11 9:53 UTC (permalink / raw) To: 'Ingo Molnar'; +Cc: george, 'William Weston', linux-kernel Ingo wrote: > > * Sven Dietrich <sdietrich@mvista.com> wrote: > > > This patch adds a config option to allow you to select > whether timer > > IRQ runs in thread or not. > > this patch only changes xtime_lock back and forth - it does > in no way impact the 'threadedness' of the timer IRQ. (it > does not move the timer IRQ into an interrupt thread.) > > nor do we really want to make it configurable - it's > non-threaded right now and we'll see what effect this has on > the worst-case latencies. > > Ingo > Its clear that there are all sorts of issues with process accounting and other race conditions associated with running the timer in a thread. The timer IRQ does have a noticable impact especially on the slower CPUS. In this domain, precise process time accounting may not be all that important, as long as the scheduler does not get confused, and that lone NODELAY IRQ doesn't get delayed (as much). It would be nice if some of the process accounting could be pipelined or deferred, but I don't have those answers right now. Sven ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 9:53 ` Sven Dietrich @ 2005-02-11 10:04 ` Ingo Molnar 2005-02-11 21:49 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-11 10:04 UTC (permalink / raw) To: Sven Dietrich; +Cc: george, 'William Weston', linux-kernel * Sven Dietrich <sdietrich@mvista.com> wrote: > > this patch only changes xtime_lock back and forth - it does > > in no way impact the 'threadedness' of the timer IRQ. (it > > does not move the timer IRQ into an interrupt thread.) > > > > nor do we really want to make it configurable - it's > > non-threaded right now and we'll see what effect this has on > > the worst-case latencies. > > Its clear that there are all sorts of issues with process accounting > and other race conditions associated with running the timer in a > thread. > > The timer IRQ does have a noticable impact especially on the slower > CPUS. In this domain, precise process time accounting may not be all > that important, as long as the scheduler does not get confused, and > that lone NODELAY IRQ doesn't get delayed (as much). well, i saved the delta when i removed threaded timer IRQs, find the patch below, apply it with -R to -RT-V0.7.37-00 to get threaded irqs back on x86. Right now i dont plan to reintroduce threaded timer IRQs because it causes architecture merging problems (e.g. on x64 and MIPS) and also caused artifacts. So the complexity vs. latency benefit is not all that clear, especially at this stage. Also note that there were unsolved problems wrt. time handling in the threaded setup. (we can try it again later on. But if we do so it will have to be an all-or-nothing item - #ifdef hell and behavioral divergence is to be avoided.) Ingo --- linux.old/Makefile +++ linux.new/Makefile @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 11 -EXTRAVERSION =-rc2-RT-V0.7.36-06 +EXTRAVERSION =-rc2-RT-V0.7.37-00 NAME=Woozy Numbat # *DOCUMENTATION* --- linux.old/arch/i386/kernel/irq.c +++ linux.new/arch/i386/kernel/irq.c @@ -70,8 +70,6 @@ fastcall notrace unsigned int do_IRQ(str } } #endif - if (unlikely(!irq)) - direct_timer_interrupt(regs); #ifdef CONFIG_4KSTACKS --- linux.old/arch/i386/kernel/time.c +++ linux.new/arch/i386/kernel/time.c @@ -82,7 +82,7 @@ unsigned long cpu_khz; /* Detected as we extern unsigned long wall_jiffies; -DEFINE_SPINLOCK(rtc_lock); +DEFINE_RAW_SPINLOCK(rtc_lock); #include <asm/i8253.h> @@ -217,19 +217,6 @@ unsigned long notrace profile_pc(struct EXPORT_SYMBOL(profile_pc); #endif -#ifdef CONFIG_PREEMPT_HARDIRQS - -/* - * If the timer is redirected then this is the minimal - * interrupt-context processing we have to do: - */ -void direct_timer_interrupt(struct pt_regs *regs) -{ - do_timer_interrupt_hook(regs); -} - -#endif - /* * timer_interrupt() needs to keep up the real-time clock, * as well as call the "do_timer()" routine every clocktick @@ -254,9 +241,7 @@ static inline void do_timer_interrupt(in } #endif -#ifndef CONFIG_PREEMPT_HARDIRQS do_timer_interrupt_hook(regs); -#endif /* * If we have an externally synchronized Linux clock, then update @@ -313,7 +298,6 @@ irqreturn_t timer_interrupt(int irq, voi write_seqlock(&xtime_lock); cur_timer->mark_offset(); - do_timer(regs); do_timer_interrupt(irq, NULL, regs); --- linux.old/arch/i386/mach-default/setup.c +++ linux.new/arch/i386/mach-default/setup.c @@ -71,7 +71,7 @@ void __init trap_init_hook(void) { } -static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, "timer", NULL, NULL}; +static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT | SA_NODELAY, CPU_MASK_NONE, "timer", NULL, NULL}; /** * time_init_hook - do any specific initialisations for the system timer. --- linux.old/drivers/char/rtc.c +++ linux.new/drivers/char/rtc.c @@ -380,6 +380,8 @@ static inline void rtc_close_event(void) irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs) { + int mod; + /* * Can be an alarm interrupt, update complete interrupt, * or a periodic interrupt. We store the status in the @@ -401,10 +403,13 @@ irqreturn_t rtc_interrupt(int irq, void rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0); } + mod = 0; if (rtc_status & RTC_TIMER_ON) - mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); + mod = 1; spin_unlock (&rtc_lock); + if (mod) + mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); /* Now do the rest of the actions */ spin_lock(&rtc_task_lock); @@ -569,8 +574,8 @@ static int rtc_do_ioctl(unsigned int cmd if (rtc_status & RTC_TIMER_ON) { spin_lock_irq (&rtc_lock); rtc_status &= ~RTC_TIMER_ON; - del_timer(&rtc_irq_timer); spin_unlock_irq (&rtc_lock); + del_timer(&rtc_irq_timer); } return 0; } @@ -588,9 +593,9 @@ static int rtc_do_ioctl(unsigned int cmd if (!(rtc_status & RTC_TIMER_ON)) { spin_lock_irq (&rtc_lock); rtc_irq_timer.expires = jiffies + HZ/rtc_freq + 2*HZ/100; - add_timer(&rtc_irq_timer); rtc_status |= RTC_TIMER_ON; spin_unlock_irq (&rtc_lock); + add_timer(&rtc_irq_timer); } set_rtc_irq_bit(RTC_PIE); return 0; @@ -882,6 +887,7 @@ static int rtc_release(struct inode *ino { #ifdef RTC_IRQ unsigned char tmp; + int del; if (rtc_has_irq == 0) goto no_irq; @@ -900,11 +906,14 @@ static int rtc_release(struct inode *ino CMOS_WRITE(tmp, RTC_CONTROL); CMOS_READ(RTC_INTR_FLAGS); } + del = 0; if (rtc_status & RTC_TIMER_ON) { rtc_status &= ~RTC_TIMER_ON; - del_timer(&rtc_irq_timer); + del = 1; } spin_unlock_irq(&rtc_lock); + if (del) + del_timer(&rtc_irq_timer); if (file->f_flags & FASYNC) { rtc_fasync (-1, file, 0); @@ -981,6 +990,7 @@ int rtc_unregister(rtc_task_t *task) return -EIO; #else unsigned char tmp; + int del; spin_lock_irq(&rtc_lock); spin_lock(&rtc_task_lock); @@ -1000,12 +1010,15 @@ int rtc_unregister(rtc_task_t *task) CMOS_WRITE(tmp, RTC_CONTROL); CMOS_READ(RTC_INTR_FLAGS); } + del = 0; if (rtc_status & RTC_TIMER_ON) { rtc_status &= ~RTC_TIMER_ON; - del_timer(&rtc_irq_timer); + del = 1; } rtc_status &= ~RTC_IS_OPEN; spin_unlock(&rtc_task_lock); + if (del) + del_timer(&rtc_irq_timer); spin_unlock_irq(&rtc_lock); return 0; #endif @@ -1254,6 +1267,7 @@ module_exit(rtc_exit); static void rtc_dropped_irq(unsigned long data) { unsigned long freq; + int mod; spin_lock_irq (&rtc_lock); @@ -1263,8 +1277,9 @@ static void rtc_dropped_irq(unsigned lon } /* Just in case someone disabled the timer from behind our back... */ + mod = 0; if (rtc_status & RTC_TIMER_ON) - mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); + mod = 1; rtc_irq_data += ((rtc_freq/HZ)<<8); rtc_irq_data &= ~0xff; @@ -1273,6 +1288,8 @@ static void rtc_dropped_irq(unsigned lon freq = rtc_freq; spin_unlock_irq(&rtc_lock); + if (mod) + mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); printk(KERN_WARNING "rtc: lost some interrupts at %ldHz.\n", freq); --- linux.old/include/asm-i386/mach-default/do_timer.h +++ linux.new/include/asm-i386/mach-default/do_timer.h @@ -16,6 +16,7 @@ static inline void do_timer_interrupt_hook(struct pt_regs *regs) { + do_timer(regs); #ifndef CONFIG_SMP update_process_times(user_mode(regs)); #endif --- linux.old/include/linux/mc146818rtc.h +++ linux.new/include/linux/mc146818rtc.h @@ -17,7 +17,7 @@ #ifdef __KERNEL__ #include <linux/spinlock.h> /* spinlock_t */ -extern spinlock_t rtc_lock; /* serialize CMOS RAM access */ +extern raw_spinlock_t rtc_lock; /* serialize CMOS RAM access */ #endif /********************************************************************** --- linux.old/include/linux/sched.h +++ linux.new/include/linux/sched.h @@ -39,10 +39,8 @@ extern int softirq_preemption; #endif #ifdef CONFIG_PREEMPT_HARDIRQS extern int hardirq_preemption; -extern void direct_timer_interrupt(struct pt_regs *regs); #else # define hardirq_preemption 0 -# define direct_timer_interrupt(regs) do { } while (0) #endif #ifdef CONFIG_PREEMPT_BKL --- linux.old/include/linux/time.h +++ linux.new/include/linux/time.h @@ -80,7 +80,7 @@ mktime (unsigned int year, unsigned int extern struct timespec xtime; extern struct timespec wall_to_monotonic; -extern seqlock_t xtime_lock; +extern raw_seqlock_t xtime_lock; static inline unsigned long get_seconds(void) { --- linux.old/kernel/timer.c +++ linux.new/kernel/timer.c @@ -852,14 +852,7 @@ void update_process_times(int user_tick) */ static unsigned long count_active_tasks(void) { -#ifdef CONFIG_PREEMPT_RT - /* - * -1 for the timer IRQ thread: - */ - return (nr_running() - 1 + nr_uninterruptible()) * FIXED_1; -#else return (nr_running() + nr_uninterruptible()) * FIXED_1; -#endif } /* @@ -899,7 +892,7 @@ unsigned long wall_jiffies = INITIAL_JIF * playing with xtime and avenrun. */ #ifndef ARCH_HAVE_XTIME_LOCK -DECLARE_SEQLOCK(xtime_lock); +DECLARE_RAW_SEQLOCK(xtime_lock); EXPORT_SYMBOL(xtime_lock); #endif ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 10:04 ` Ingo Molnar @ 2005-02-11 21:49 ` Steven Rostedt 2005-02-13 12:59 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-02-11 21:49 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML Ingo, Here's a trivial patch to help others from freaking out when they see on a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. Index: kernel/sched.c =================================================================== --- kernel/sched.c (revision 75) +++ kernel/sched.c (working copy) @@ -4489,7 +4489,7 @@ task_t *relative; unsigned state; unsigned long free = 0; - static const char *stat_nam[] = { "R", "S", "D", "T", "t", "Z", "X" }; + static const char *stat_nam[] = { "R", "M", "S", "D", "T", "t", "Z", "X" }; printk("%-13.13s [%p]", p->comm, p); state = p->state ? __ffs(p->state) + 1 : 0; I figure that "M" would be a good fit for TASK_RUNNING_MUTEX. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-11 21:49 ` Steven Rostedt @ 2005-02-13 12:59 ` Ingo Molnar 2005-02-13 15:11 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-13 12:59 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > Ingo, > > Here's a trivial patch to help others from freaking out when they see > on a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. thanks, applied it to -39-00. > - static const char *stat_nam[] = { "R", "S", "D", "T", "t", "Z", "X" }; > + static const char *stat_nam[] = { "R", "M", "S", "D", "T", "t", "Z", "X" }; > I figure that "M" would be a good fit for TASK_RUNNING_MUTEX. yeah - it's "M" already in fs/proc/array.c, but i missed the sched.c case. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-13 12:59 ` Ingo Molnar @ 2005-02-13 15:11 ` Steven Rostedt 0 siblings, 0 replies; 125+ messages in thread From: Steven Rostedt @ 2005-02-13 15:11 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML On Sun, 2005-02-13 at 13:59 +0100, Ingo Molnar wrote: > yeah - it's "M" already in fs/proc/array.c, but i missed the sched.c > case. > You also missed the kernel/rt.c case :-) -- Steve Index: kernel/rt.c =================================================================== --- kernel/rt.c (revision 75) +++ kernel/rt.c (working copy) @@ -207,6 +207,7 @@ { switch (p->state) { case TASK_RUNNING: printk("R"); break; + case TASK_RUNNING_MUTEX: printk("M"); break; case TASK_INTERRUPTIBLE: printk("s"); break; case TASK_UNINTERRUPTIBLE: printk("D"); break; case TASK_STOPPED: printk("T"); break; This is still from the 38-06. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, deactivate() scheduling issue 2005-02-10 7:52 ` Ingo Molnar 2005-02-10 20:21 ` George Anzinger @ 2005-03-03 19:36 ` Eugeny S. Mints 2005-03-03 22:32 ` Esben Nielsen 2005-03-29 8:45 ` Ingo Molnar 1 sibling, 2 replies; 125+ messages in thread From: Eugeny S. Mints @ 2005-03-03 19:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 2025 bytes --] please consider the following scenario for full RT kernel. Task A is running then an irq is occured which in turn wakes up irq related thread (B) of a higher priority than A. my current understanding that actual context switch between A and B will occure at preempt_schedule_irq() on the "return form irq " path. in this case the following "if" statement in __schedule() always returns false since preempt_schedule_irq() always sets up PREEMPT_ACTIVE before __schedule() call. if ((prev->state & ~TASK_RUNNING_MUTEX) && !(preempt_count() & PREEMPT_ACTIVE)) { as result the deactivate() is never called for preempted task A in this scenario. BUt if the task A is preempted while not in TASK_RUNNING state such behaviour seems incorrect since we get a task in not TASK_RUNNING state linked into a run queue. An example: drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel) spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other tx operations */ while (dev->tx_buff.len > 0) { /* wait until tx idle */ spin_unlock_irqrestore(&dev->tx_lock, flags); 76: set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(10)); spin_lock_irqsave(&dev->tx_lock, flags); } At line 76 irqs are enabled, preemption is enabled. Let assume the task A executes this code and gets preempted right after line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be deactevated. Of cource this is the bug in set_current_state() utilization in this particular driver but schedule stuff should be robust to such bugs I believe. There are a lot such bugs in the kernel I believe. Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) condition is but if it's just a sort of optimization (not remove a task from run queue if it was preemped in TASK_RUNNING state) then probably it should be removed in order to save correctness. Patch attached. Eugeny [-- Attachment #2: sched.c.deactivate.patch --] [-- Type: text/plain, Size: 503 bytes --] --- sched.c.orig 2005-03-03 22:35:16.000000000 +0300 +++ sched.c 2005-03-03 22:34:58.000000000 +0300 @@ -2891,8 +2891,7 @@ spin_lock_irq(&rq->lock); switch_count = &prev->nvcsw; // TODO: temporary - to see it in vmstat - if ((prev->state & ~TASK_RUNNING_MUTEX) && - !(preempt_count() & PREEMPT_ACTIVE)) { + if ((prev->state & ~TASK_RUNNING_MUTEX)) { switch_count = &prev->nvcsw; if (unlikely((prev->state & TASK_INTERRUPTIBLE) && unlikely(signal_pending(prev)))) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, deactivate() scheduling issue 2005-03-03 19:36 ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints @ 2005-03-03 22:32 ` Esben Nielsen 2005-03-04 11:56 ` Eugeny S. Mints 2005-03-29 8:45 ` Ingo Molnar 1 sibling, 1 reply; 125+ messages in thread From: Esben Nielsen @ 2005-03-03 22:32 UTC (permalink / raw) To: Eugeny S. Mints; +Cc: Ingo Molnar, linux-kernel As I read the code the driver task (A) should _not_ be removed from the runqueue. It has to be waken up to call schedule_timeout() such it gets back on the runqueue after 10 ms. If it is taken out of the runqueue at line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE state! As I read the use PREEMPT_ACTIVE, it is there to test on wether this rescheduling is volentery or forced (a preemption). If it is forced the task shall ofcourse not go off the runqueue but stay there to run again when it gets the highest priority. That is why PREEMPT_ACTIVE is set in preempt_schedule() and preempt_schedule_irq(). On the other hand if the task itself has called schedule() or schedule_timeout() it has to go out of the runqueue and wait for some event to wake it up. Yes there will be tasks in state other that TASK_RUNNING on the runqueue. The "bug" as I see it is in the scheduler interface: There is no way to set the task state and call schedule() or schedule_timeout() atomicly. Therefore you can be preempted while the state is not TASK_RUNNING. Esben On Thu, 3 Mar 2005, Eugeny S. Mints wrote: > please consider the following scenario for full RT kernel. > > Task A is running then an irq is occured which in turn wakes up irq > related thread (B) of a higher priority than A. > > my current understanding that actual context switch between A and B will > occure at preempt_schedule_irq() on the "return form irq " path. > > in this case the following "if" statement in __schedule() always returns > false since preempt_schedule_irq() always sets up PREEMPT_ACTIVE > before __schedule() call. > > if ((prev->state & ~TASK_RUNNING_MUTEX) && > !(preempt_count() & PREEMPT_ACTIVE)) { > > as result the deactivate() is never called for preempted task A in this > scenario. BUt if the task A is preempted while not in TASK_RUNNING state > such behaviour seems incorrect since we get a task in not TASK_RUNNING > state linked into a run queue. > > An example: > > drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel) > > spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other > tx operations */ > while (dev->tx_buff.len > 0) { /* wait until tx idle */ > spin_unlock_irqrestore(&dev->tx_lock, flags); > 76: set_current_state(TASK_UNINTERRUPTIBLE); > schedule_timeout(msecs_to_jiffies(10)); > spin_lock_irqsave(&dev->tx_lock, flags); > } > > At line 76 irqs are enabled, preemption is enabled. > Let assume the task A executes this code and gets preempted right after > line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be > deactevated. Of cource this is the bug in set_current_state() > utilization in this particular driver but schedule stuff should be > robust to such bugs I believe. There are a lot such bugs in the kernel I > believe. > > Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) > condition is but if it's just a sort of optimization (not remove a > task from run queue if it was preemped in TASK_RUNNING state) then > probably it should be removed in order to save correctness. Patch attached. > > Eugeny > > ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, deactivate() scheduling issue 2005-03-03 22:32 ` Esben Nielsen @ 2005-03-04 11:56 ` Eugeny S. Mints 2005-03-04 15:45 ` George Anzinger 0 siblings, 1 reply; 125+ messages in thread From: Eugeny S. Mints @ 2005-03-04 11:56 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel Esben Nielsen wrote: > As I read the code the driver task (A) should _not_ be removed from the > runqueue. It has to be waken up to call schedule_timeout() such it gets > back on the runqueue after 10 ms. If it is taken out of the runqueue at > line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE > state! Exactly. This is definilty the bug in the driver code - a developer just didn;t care about proper utilization of set_current_state(). The driver works just because as you have described - his fortune that scheduler doesn't remove task in not TASK_RUNNING state from a run queue. And my main question was - does everybody think it's ok have task in not TASK_RUNNING state in run queue. My current feeling is that this should not be allowed. > As I read the use PREEMPT_ACTIVE, it is there to test on wether this > rescheduling is volentery or forced (a preemption). If it is forced the > task shall ofcourse not go off the runqueue but stay there to run again > when it gets the highest priority. That is why PREEMPT_ACTIVE is set in > preempt_schedule() and preempt_schedule_irq(). On the other hand if the > task itself has called schedule() or schedule_timeout() it has to go out > of the runqueue and wait for some event to wake it up. You right - it works perfectly - but not for my test case - I believe task in not TASK_RUNNING state should be removed from a run queue by the first (any - volontery or forced) execution of the schedule() which detects the task state is not TASK_RUNNIG. > > Yes there will be tasks in state other that TASK_RUNNING on the runqueue. > The "bug" as I see it is in the scheduler interface: There is no way to > set the task state and call schedule() or schedule_timeout() atomicly. > Therefore you can be preempted while the state is not TASK_RUNNING. Exactly. IMO this interface is weird and needs rework. I don;t undestand what the reason to set task state before schedule_timeout() call but not inside, right before the schedule(). The actual task state may be passed as a parameter. As to tasks in not TASK_RUNNING state into a run queue - I always believe the definition of a run queue is - queue of tasks ready to run, i.e. in TASK_RUNNING state. Eugeny > > Esben > > > On Thu, 3 Mar 2005, Eugeny S. Mints wrote: > > >>please consider the following scenario for full RT kernel. >> >>Task A is running then an irq is occured which in turn wakes up irq >>related thread (B) of a higher priority than A. >> >>my current understanding that actual context switch between A and B will >>occure at preempt_schedule_irq() on the "return form irq " path. >> >>in this case the following "if" statement in __schedule() always returns >>false since preempt_schedule_irq() always sets up PREEMPT_ACTIVE >>before __schedule() call. >> >> if ((prev->state & ~TASK_RUNNING_MUTEX) && >> !(preempt_count() & PREEMPT_ACTIVE)) { >> >>as result the deactivate() is never called for preempted task A in this >>scenario. BUt if the task A is preempted while not in TASK_RUNNING state >>such behaviour seems incorrect since we get a task in not TASK_RUNNING >>state linked into a run queue. >> >>An example: >> >>drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel) >> >> spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other >>tx operations */ >> while (dev->tx_buff.len > 0) { /* wait until tx idle */ >> spin_unlock_irqrestore(&dev->tx_lock, flags); >>76: set_current_state(TASK_UNINTERRUPTIBLE); >> schedule_timeout(msecs_to_jiffies(10)); >> spin_lock_irqsave(&dev->tx_lock, flags); >> } >> >>At line 76 irqs are enabled, preemption is enabled. >>Let assume the task A executes this code and gets preempted right after >>line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be >>deactevated. Of cource this is the bug in set_current_state() >>utilization in this particular driver but schedule stuff should be >>robust to such bugs I believe. There are a lot such bugs in the kernel I >>believe. >> >>Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) >> condition is but if it's just a sort of optimization (not remove a >>task from run queue if it was preemped in TASK_RUNNING state) then >>probably it should be removed in order to save correctness. Patch attached. >> >> Eugeny >> >> > > > > ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, deactivate() scheduling issue 2005-03-04 11:56 ` Eugeny S. Mints @ 2005-03-04 15:45 ` George Anzinger 0 siblings, 0 replies; 125+ messages in thread From: George Anzinger @ 2005-03-04 15:45 UTC (permalink / raw) To: Eugeny S. Mints; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel Eugeny S. Mints wrote: > Esben Nielsen wrote: > >> As I read the code the driver task (A) should _not_ be removed from the >> runqueue. It has to be waken up to call schedule_timeout() such it gets >> back on the runqueue after 10 ms. If it is taken out of the runqueue at >> line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE >> state! > > Exactly. This is definilty the bug in the driver code - a developer just > didn;t care about proper utilization of set_current_state(). The driver > works > just because as you have described - his fortune > that scheduler doesn't remove task in not TASK_RUNNING state from a run > queue. > And my main question was - does everybody think it's ok have task in not > TASK_RUNNING state in run queue. My current feeling is that this should > not be allowed. This is the normal and specified way to handle this sort of thing. There is a race issue that coding in this way avoids. The coding sequence is: a) set the task state to some state other than TASK_RUNNING. b) do what ever triggers the wake up. This may be several things, for example, an interrupt from some device OR a timeout. c) call schedule to wait. The race is getting to the schedule call before the wake up happens. If, for some reason, the wake up condition happens prior to the schedule call, it will set the task state back to TASK_RUNNING so that when the schedule() call is made the scheduler will just return which is the right thing (tm) to do as the condition being waited on has happened. We also note that disabling interrupts or preemption will NOT avoid the race unless you disable interrupts on ALL cpus, which is a VERY expensive cross cpu call. > >> As I read the use PREEMPT_ACTIVE, it is there to test on whether this >> rescheduling is voluntary or forced (a preemption). If it is forced the >> task shall of course not go off the runqueue but stay there to run again >> when it gets the highest priority. That is why PREEMPT_ACTIVE is set in >> preempt_schedule() and preempt_schedule_irq(). On the other hand if the >> task itself has called schedule() or schedule_timeout() it has to go out >> of the runqueue and wait for some event to wake it up. > > You right - it works perfectly - but not for my test case - I believe > task in not TASK_RUNNING state should be removed from a run queue by the > first (any - voluntary or forced) execution of the schedule() which > detects the task state is not TASK_RUNNIG. This would cause the task to loose control prior to its setting up the needed wakeup events. > >> >> Yes there will be tasks in state other that TASK_RUNNING on the runqueue. >> The "bug" as I see it is in the scheduler interface: There is no way to >> set the task state and call schedule() or schedule_timeout() atomicly. >> Therefore you can be preempted while the state is not TASK_RUNNING. > > Exactly. IMO this interface is weird and needs rework. I don;t understand > what the reason to set task state before schedule_timeout() call but not > inside, right before the schedule(). The actual task state may be passed > as a parameter. You are assuming that the task ONLY wants to do a timeout. Most of the time the timeout indicates an error condition. The timeout bounds the wait for what is really desired, i.e. a device interrupt, some other task signaling, or some such. Surly this is covered in the various driver writing guides... -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, deactivate() scheduling issue 2005-03-03 19:36 ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints 2005-03-03 22:32 ` Esben Nielsen @ 2005-03-29 8:45 ` Ingo Molnar 1 sibling, 0 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-29 8:45 UTC (permalink / raw) To: Eugeny S. Mints; +Cc: linux-kernel * Eugeny S. Mints <emints@ru.mvista.com> wrote: > please consider the following scenario for full RT kernel. > > Task A is running then an irq is occured which in turn wakes up irq > related thread (B) of a higher priority than A. > > my current understanding that actual context switch between A and B will > occure at preempt_schedule_irq() on the "return form irq " path. > > in this case the following "if" statement in __schedule() always returns > false since preempt_schedule_irq() always sets up PREEMPT_ACTIVE > before __schedule() call. > > if ((prev->state & ~TASK_RUNNING_MUTEX) && > !(preempt_count() & PREEMPT_ACTIVE)) { > > as result the deactivate() is never called for preempted task A in this > scenario. BUt if the task A is preempted while not in TASK_RUNNING state > such behaviour seems incorrect since we get a task in not TASK_RUNNING > state linked into a run queue. this behavior is intentional: 'forced preemption' (of any sort, even in the upstream kernel's CONFIG_PREEMPT model) should not impact the task's state. So it does not modify p->state. [ The TASK_RUNNING_MUTEX state furthermore enables wakeups to occur in an invariant way: even though technically the tasks are on the runqueue, a 'normal' wakeup is still noticed and later on acted upon.] this is very important for forced preemption to not impact the coding model of kernel code that is normally tested with !PREEMPT. (the TASK_RUNNING_MUTEX scheduler feature furthermore enables us to preempt without impacting wakeup logic.) > An example: > > drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel) > > spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other > tx operations */ > while (dev->tx_buff.len > 0) { /* wait until tx idle */ > spin_unlock_irqrestore(&dev->tx_lock, flags); > 76: set_current_state(TASK_UNINTERRUPTIBLE); > schedule_timeout(msecs_to_jiffies(10)); > spin_lock_irqsave(&dev->tx_lock, flags); > } > > At line 76 irqs are enabled, preemption is enabled. > Let assume the task A executes this code and gets preempted right after > line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be > deactevated. Of cource this is the bug in set_current_state() > utilization in this particular driver but schedule stuff should be > robust to such bugs I believe. There are a lot such bugs in the kernel I > believe. it is not a problem to have tasks with TASK_UNINTERRUPTIBLE on the runqueue - this happens every day with CONFIG_PREEMPT kernels, and it's fully intentional. Can you see any bugs caused by this behavior? Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-08 21:58 ` William Weston 2005-02-09 11:51 ` Ingo Molnar @ 2005-02-09 12:48 ` Stephen Smalley 2005-02-10 2:20 ` William Weston 1 sibling, 1 reply; 125+ messages in thread From: Stephen Smalley @ 2005-02-09 12:48 UTC (permalink / raw) To: William Weston; +Cc: Ingo Molnar, lkml, James Morris On Tue, 2005-02-08 at 16:58, William Weston wrote: > Hi Ingo, > > Great work on the -RT kernel! Here's a status report from my Athlon box > w/ kernel -RT-2.6.11-rc3-V0.7.38-03, realtime-lsm-0.8.5, jack-0.99.48, > alsa-1.0.8, and latencytest-0.5.5: <snip> > A couple BUGs are being logged (see below), but without any ill effect > other than taking up space on my /var. <snip> > Network interface (via rhine) startup triggers these two BUGs: > > BUG: sleeping function called from invalid context ksoftirqd/0(2) at > kernel/rt.c:1448 > in_atomic():1 [00000001], irqs_disabled():0 > [<c0103e77>] dump_stack+0x17/0x20 (12) > [<c0119f89>] __might_sleep+0xd9/0xf0 (40) > [<c0134816>] __spin_lock+0x36/0x50 (24) > [<c0147914>] kmem_cache_alloc+0x34/0x120 (44) > [<c01d3143>] sel_netif_lookup+0x63/0x150 (28) > [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28) > [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144) I'm not sure I understand, as sel_netif_lookup passes GFP_ATOMIC to kmalloc. > [<c02fd248>] udp_queue_rcv_skb+0xb8/0x280 (28) > [<c02fd8e2>] udp_rcv+0x192/0x3e0 (100) > [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32) > [<c02dc595>] ip_rcv+0x215/0x3f0 (56) > [<c02c201c>] netif_receive_skb+0x12c/0x160 (40) > [<c02c20ce>] process_backlog+0x7e/0x110 (32) > [<c02c21d2>] net_rx_action+0x72/0x130 (24) > [<c0122428>] ___do_softirq+0x48/0xd0 (40) > [<c012254b>] _do_softirq+0x1b/0x30 (8) > [<c0122920>] ksoftirqd+0xa0/0xf0 (28) > [<c01312fb>] kthread+0x8b/0xc0 (36) > [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692) > --------------------------- > | preempt count: 00000002 ] > | 2-level deep critical section nesting: > ---------------------------------------- > .. [<c013dd3f>] .... __do_IRQ+0xef/0x180 > .....[<c0105306>] .. ( <= do_IRQ+0x56/0xa0) > .. [<c0135240>] .... print_traces+0x10/0x40 > .....[<c0103e77>] .. ( <= dump_stack+0x17/0x20) -- Stephen Smalley <sds@epoch.ncsc.mil> National Security Agency ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-09 12:48 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley @ 2005-02-10 2:20 ` William Weston 0 siblings, 0 replies; 125+ messages in thread From: William Weston @ 2005-02-10 2:20 UTC (permalink / raw) To: Stephen Smalley; +Cc: Ingo Molnar, lkml, James Morris Two more of these sel_netif_lookup related BUGs were found with -RT-2.6.11-rc3-V0.7.38-06: BUG: sleeping function called from invalid context ksoftirqd/0(2) at kernel/rt.c:1448 in_atomic():1 [00000001], irqs_disabled():0 [<c0104183>] dump_stack+0x23/0x30 (20) [<c011be08>] __might_sleep+0xd8/0xf0 (36) [<c0139008>] __spin_lock+0x38/0x60 (24) [<c013904d>] _spin_lock+0x1d/0x20 (16) [<c015089f>] kmem_cache_alloc+0x3f/0x140 (44) [<c01ea1e9>] sel_netif_lookup+0x69/0x160 (40) [<c01ea3a8>] sel_netif_sids+0x38/0xd0 (40) [<c01e6c13>] selinux_socket_sock_rcv_skb+0xc3/0x2a0 (152) [<c032da2a>] udp_queue_rcv_skb+0xca/0x2d0 (40) [<c032e168>] udp_rcv+0x1c8/0x430 (96) [<c030ab3c>] ip_local_deliver+0x6c/0x210 (36) [<c030af19>] ip_rcv+0x239/0x430 (56) [<c02ed257>] netif_receive_skb+0x147/0x180 (48) [<c02ed30f>] process_backlog+0x7f/0x110 (28) [<c02ed41c>] net_rx_action+0x7c/0x130 (32) [<c0124e37>] ___do_softirq+0x57/0xf0 (40) [<c0124f75>] _do_softirq+0x25/0x30 (8) [<c0125395>] ksoftirqd+0xa5/0x100 (28) [<c0135676>] kthread+0xa6/0xe0 (48) [<c0101329>] kernel_thread_helper+0x5/0xc (537116692) --------------------------- | preempt count: 00000002 ] | 2-level deep critical section nesting: ---------------------------------------- .. [<c0145d5b>] .... __do_IRQ+0xfb/0x1a0 .....[<c01058df>] .. ( <= do_IRQ+0x6f/0xb0) .. [<c013c5eb>] .... print_traces+0x1b/0x60 .....[<c0104183>] .. ( <= dump_stack+0x23/0x30) BUG: sleeping function called from invalid context ksoftirqd/0(2) at kernel/rt.c:1448 in_atomic():1 [00000001], irqs_disabled():0 [<c0104183>] dump_stack+0x23/0x30 (20) [<c011be08>] __might_sleep+0xd8/0xf0 (36) [<c0139008>] __spin_lock+0x38/0x60 (24) [<c013904d>] _spin_lock+0x1d/0x20 (16) [<c015089f>] kmem_cache_alloc+0x3f/0x140 (44) [<c01ea1e9>] sel_netif_lookup+0x69/0x160 (40) [<c01ea3a8>] sel_netif_sids+0x38/0xd0 (40) [<c01e6c13>] selinux_socket_sock_rcv_skb+0xc3/0x2a0 (152) [<c0326c72>] tcp_v4_rcv+0x502/0x950 (76) [<c030ab3c>] ip_local_deliver+0x6c/0x210 (36) [<c030af19>] ip_rcv+0x239/0x430 (56) [<c02ed257>] netif_receive_skb+0x147/0x180 (48) [<c02ed30f>] process_backlog+0x7f/0x110 (28) [<c02ed41c>] net_rx_action+0x7c/0x130 (32) [<c0124e37>] ___do_softirq+0x57/0xf0 (40) [<c0124f75>] _do_softirq+0x25/0x30 (8) [<c0125395>] ksoftirqd+0xa5/0x100 (28) [<c0135676>] kthread+0xa6/0xe0 (48) [<c0101329>] kernel_thread_helper+0x5/0xc (537116692) --------------------------- | preempt count: 00000002 ] | 2-level deep critical section nesting: ---------------------------------------- .. [<c0145d5b>] .... __do_IRQ+0xfb/0x1a0 .....[<c01058df>] .. ( <= do_IRQ+0x6f/0xb0) .. [<c013c5eb>] .... print_traces+0x1b/0x60 .....[<c0104183>] .. ( <= dump_stack+0x23/0x30) Additional info about the system/kernel/config can be found at http://www.sysex.net/testing/ Best Regards, --William Weston <weston at sysex.net> On Wed, 9 Feb 2005, Stephen Smalley wrote: > On Tue, 2005-02-08 at 16:58, William Weston wrote: > > Hi Ingo, > > > > Great work on the -RT kernel! Here's a status report from my Athlon box > > w/ kernel -RT-2.6.11-rc3-V0.7.38-03, realtime-lsm-0.8.5, jack-0.99.48, > > alsa-1.0.8, and latencytest-0.5.5: > <snip> > > A couple BUGs are being logged (see below), but without any ill effect > > other than taking up space on my /var. > <snip> > > Network interface (via rhine) startup triggers these two BUGs: > > > > BUG: sleeping function called from invalid context ksoftirqd/0(2) at > > kernel/rt.c:1448 > > in_atomic():1 [00000001], irqs_disabled():0 > > [<c0103e77>] dump_stack+0x17/0x20 (12) > > [<c0119f89>] __might_sleep+0xd9/0xf0 (40) > > [<c0134816>] __spin_lock+0x36/0x50 (24) > > [<c0147914>] kmem_cache_alloc+0x34/0x120 (44) > > [<c01d3143>] sel_netif_lookup+0x63/0x150 (28) > > [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28) > > [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144) > > I'm not sure I understand, as sel_netif_lookup passes GFP_ATOMIC to > kmalloc. > > > [<c02fd248>] udp_queue_rcv_skb+0xb8/0x280 (28) > > [<c02fd8e2>] udp_rcv+0x192/0x3e0 (100) > > [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32) > > [<c02dc595>] ip_rcv+0x215/0x3f0 (56) > > [<c02c201c>] netif_receive_skb+0x12c/0x160 (40) > > [<c02c20ce>] process_backlog+0x7e/0x110 (32) > > [<c02c21d2>] net_rx_action+0x72/0x130 (24) > > [<c0122428>] ___do_softirq+0x48/0xd0 (40) > > [<c012254b>] _do_softirq+0x1b/0x30 (8) > > [<c0122920>] ksoftirqd+0xa0/0xf0 (28) > > [<c01312fb>] kthread+0x8b/0xc0 (36) > > [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692) > > --------------------------- > > | preempt count: 00000002 ] > > | 2-level deep critical section nesting: > > ---------------------------------------- > > .. [<c013dd3f>] .... __do_IRQ+0xef/0x180 > > .....[<c0105306>] .. ( <= do_IRQ+0x56/0xa0) > > .. [<c0135240>] .... print_traces+0x10/0x40 > > .....[<c0103e77>] .. ( <= dump_stack+0x17/0x20) > > -- > Stephen Smalley <sds@epoch.ncsc.mil> > National Security Agency ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar ` (4 preceding siblings ...) 2005-02-08 21:58 ` William Weston @ 2005-02-19 5:08 ` Lee Revell 2005-02-19 6:47 ` Lee Revell ` (2 more replies) 2005-03-11 9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar 6 siblings, 3 replies; 125+ messages in thread From: Lee Revell @ 2005-02-19 5:08 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > http://redhat.com/~mingo/realtime-preempt/ > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 -------------------------------------------------------------------- latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) ----------------- | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) ----------------- _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / kjournal-2478 0dn.4 0µs!: <756f6a6b> (<6c616e72>) kjournal-2478 0dn.4 0µs : __trace_start_sched_wakeup (try_to_wake_up) kjournal-2478 0dn.3 0µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.3 0µs : try_to_wake_up <<...>-2> (69 73): kjournal-2478 0dn.2 0µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.2 0µs : wake_up_process (do_softirq) kjournal-2478 0dn.1 1µs < (1) The repeating pattern is 8 of these: kjournal-2478 0.n.1 1µs : inverted_lock (journal_commit_transaction) kjournal-2478 0.n.1 1µs : __journal_unfile_buffer (journal_commit_transaction) kjournal-2478 0.n.1 1µs : journal_remove_journal_head (journal_commit_transaction) kjournal-2478 0.n.1 1µs : __journal_remove_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 1µs : __brelse (__journal_remove_journal_head) kjournal-2478 0.n.1 1µs : journal_free_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 2µs : kmem_cache_free (journal_free_journal_head) and one of these: kjournal-2478 0dn.1 9µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.2 9µs : free_block (cache_flusharray) kjournal-2478 0dn.1 11µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 11µs : memmove (cache_flusharray) kjournal-2478 0dn.1 11µs : memcpy (memmove) etc. Finally: kjournal-2478 0dn.1 704µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.2 704µs+: free_block (cache_flusharray) kjournal-2478 0dn.1 707µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 707µs : memmove (cache_flusharray) kjournal-2478 0dn.1 707µs : memcpy (memmove) kjournal-2478 0.n.1 708µs : inverted_lock (journal_commit_transaction) kjournal-2478 0.n.1 708µs : __journal_unfile_buffer (journal_commit_transaction) kjournal-2478 0.n.1 709µs : journal_remove_journal_head (journal_commit_transaction) kjournal-2478 0.n.1 709µs : __journal_remove_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 709µs : __brelse (__journal_remove_journal_head) kjournal-2478 0.n.1 709µs : journal_free_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 709µs : kmem_cache_free (journal_free_journal_head) kjournal-2478 0.n.. 710µs : preempt_schedule (journal_commit_transaction) kjournal-2478 0dn.. 710µs : __schedule (preempt_schedule) kjournal-2478 0dn.. 710µs : profile_hit (__schedule) kjournal-2478 0dn.1 710µs : sched_clock (__schedule) kjournal-2478 0dn.2 711µs : dequeue_task (__schedule) kjournal-2478 0dn.2 711µs : recalc_task_prio (__schedule) kjournal-2478 0dn.2 711µs : effective_prio (recalc_task_prio) kjournal-2478 0dn.2 711µs : enqueue_task (__schedule) <...>-2 0d..2 712µs : __switch_to (__schedule) <...>-2 0d..2 712µs : __schedule <kjournal-2478> (73 69): <...>-2 0d..2 712µs : finish_task_switch (__schedule) <...>-2 0d..1 712µs : trace_stop_sched_switched (finish_task_switch) <...>-2 0d..1 712µs : trace_stop_sched_switched <<...>-2> (69 0): <...>-2 0d..1 713µs : trace_stop_sched_switched (finish_task_switch) Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 5:08 ` Lee Revell @ 2005-02-19 6:47 ` Lee Revell 2005-02-19 9:00 ` Ingo Molnar 2005-03-10 9:37 ` Steven Rostedt 2 siblings, 0 replies; 125+ messages in thread From: Lee Revell @ 2005-02-19 6:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Sat, 2005-02-19 at 00:08 -0500, Lee Revell wrote: > On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > > http://redhat.com/~mingo/realtime-preempt/ > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. If I mount all filesystems with 'data=writeback', it works perfectly. I can run 'dbench 64', JACK with Hydrogen at 32 frames and have been unable to produce a single xrun. The maximum wakeup latency I have seen is 139us. With 'data=ordered', just launching a web browser can produce an xrun. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 5:08 ` Lee Revell 2005-02-19 6:47 ` Lee Revell @ 2005-02-19 9:00 ` Ingo Molnar 2005-02-19 9:03 ` Ingo Molnar 2005-03-10 9:37 ` Steven Rostedt 2 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-02-19 9:00 UTC (permalink / raw) To: Lee Revell; +Cc: linux-kernel, Andrew Morton * Lee Revell <rlrevell@joe-job.com> wrote: > On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > > http://redhat.com/~mingo/realtime-preempt/ > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. could you send me the full trace? Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 9:00 ` Ingo Molnar @ 2005-02-19 9:03 ` Ingo Molnar 2005-02-19 20:45 ` Lee Revell 2005-02-23 2:22 ` Lee Revell 0 siblings, 2 replies; 125+ messages in thread From: Ingo Molnar @ 2005-02-19 9:03 UTC (permalink / raw) To: Lee Revell; +Cc: linux-kernel, Andrew Morton * Ingo Molnar <mingo@elte.hu> wrote: > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > could you send me the full trace? just in case the system in question is still running - could you also do a 'verbose' trace via: echo 1 > /proc/sys/kernel/trace_verbose and then copying /proc/latency_trace again? (so that we can see the precise function call offsets - journal_commit_transaction() is a long function.) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 9:03 ` Ingo Molnar @ 2005-02-19 20:45 ` Lee Revell 2005-02-20 0:19 ` Lee Revell 2005-03-17 16:33 ` Lee Revell 2005-02-23 2:22 ` Lee Revell 1 sibling, 2 replies; 125+ messages in thread From: Lee Revell @ 2005-02-19 20:45 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 729 bytes --] On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > > > could you send me the full trace? > > just in case the system in question is still running - could you also do > a 'verbose' trace via: > > echo 1 > /proc/sys/kernel/trace_verbose OK, here is a 2912us verbose latency trace with "data=ordered", gzipped. dbench 32 or 64 is the easiest way to trigger these. I have not tried "data=journal". As previously stated "data=writeback" works perfectly - I ran JACK overnight while stressing the fs and did not get one xrun. Lee [-- Attachment #2: 2912us.gz --] [-- Type: application/x-gzip, Size: 56838 bytes --] ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 20:45 ` Lee Revell @ 2005-02-20 0:19 ` Lee Revell 2005-03-17 16:33 ` Lee Revell 1 sibling, 0 replies; 125+ messages in thread From: Lee Revell @ 2005-02-20 0:19 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote: > I have not tried "data=journal". As previously stated "data=writeback" > works perfectly - I ran JACK overnight while stressing the fs and did > not get one xrun. "data=journal" has the same good performance as "data=writeback". Only the ordered data mode is affected. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 20:45 ` Lee Revell 2005-02-20 0:19 ` Lee Revell @ 2005-03-17 16:33 ` Lee Revell 1 sibling, 0 replies; 125+ messages in thread From: Lee Revell @ 2005-03-17 16:33 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote: > On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > > > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > > > > > could you send me the full trace? > > > > just in case the system in question is still running - could you also do > > a 'verbose' trace via: > > > > echo 1 > /proc/sys/kernel/trace_verbose > > OK, here is a 2912us verbose latency trace with "data=ordered", gzipped. > dbench 32 or 64 is the easiest way to trigger these. > > I have not tried "data=journal". As previously stated "data=writeback" > works perfectly - I ran JACK overnight while stressing the fs and did > not get one xrun. Any update on this? The problem is still apparent in 2.6.11. It seems to be a regression from 2.6.10. And now I've heard 2.6.12-rc1 mentioned with no motion on this. Here's the trace again in case you missed it: http://www.alsa-project.org/~rlrevell/2912us The "latency regressions" thread was all sub-millisecond stuff which can be ignored IMHO. Still interesting because they are regressions after all, but not a real world problem. However this one can be several milliseconds. It's a real problem. I'd hate to have to ship 2.6.12 with a disclaimer that ext3 with "data=ordered" is not suitable for the desktop (as it clearly violates the stated desktop responsiveness goal of 1ms). Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 9:03 ` Ingo Molnar 2005-02-19 20:45 ` Lee Revell @ 2005-02-23 2:22 ` Lee Revell 1 sibling, 0 replies; 125+ messages in thread From: Lee Revell @ 2005-02-23 2:22 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > > > could you send me the full trace? > On my other machine this 333us trace is the longest latency reported in the first few minutes with PREEMPT_DESKTOP. It seems to be a regression from earlier versions. If I read the trace right copy_pte_range is the problem. Lee preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 -------------------------------------------------------------------- latency: 333 µs, #63/63, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) ----------------- | task: XFree86-2593 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / (T1/#0) dpkg 4362 0 5 00000006 00000000 [0000380181315825] 0.000ms (+3550398.796ms): <676b7064> (<00746500>) (T1/#2) dpkg 4362 0 5 00000006 00000002 [0000380181316227] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 <c012cbe6> (try_to_wake_up+0x81/0x150 <c010f911>) (T1/#3) dpkg 4362 0 5 00000004 00000003 [0000380181316766] 0.001ms (+0.001ms): wake_up_state+0x1e/0x30 <c010fa5e> (signal_wake_up+0x2d/0x30 <c011f7bd>) (T1/#4) dpkg 4362 0 5 00000000 00000004 [0000380181317637] 0.003ms (+0.000ms): __wake_up+0xe/0x70 <c011059e> (mousedev_event+0xd8/0x140 <c0223ac8>) (T1/#5) dpkg 4362 0 5 00000001 00000005 [0000380181318080] 0.003ms (+0.001ms): __wake_up_common+0xb/0x70 <c011052b> (__wake_up+0x3b/0x70 <c01105cb>) (T1/#6) dpkg 4362 0 5 00000000 00000006 [0000380181318983] 0.005ms (+0.002ms): usb_submit_urb+0xe/0x2c0 <dcabaefe> (hid_irq_in+0x4e/0xe0 <dca7335e>) (T1/#7) dpkg 4362 0 5 00000000 00000007 [0000380181320688] 0.008ms (+0.001ms): hcd_submit_urb+0xe/0x200 <dcaba57e> (usb_submit_urb+0x1c6/0x2c0 <dcabb0b6>) (T1/#8) dpkg 4362 0 5 00000001 00000008 [0000380181321463] 0.009ms (+0.000ms): usb_get_dev+0x9/0x30 <dcab5939> (hcd_submit_urb+0x1a9/0x200 <dcaba719>) (T1/#9) dpkg 4362 0 5 00000001 00000009 [0000380181321943] 0.010ms (+0.000ms): get_device+0x8/0x30 <c02012d8> (usb_get_dev+0x19/0x30 <dcab5949>) (T1/#10) dpkg 4362 0 5 00000001 0000000a [0000380181322283] 0.010ms (+0.000ms): kobject_get+0x9/0x30 <c01d7869> (get_device+0x1a/0x30 <c02012ea>) (T1/#11) dpkg 4362 0 5 00000001 0000000b [0000380181322691] 0.011ms (+0.001ms): kref_get+0x9/0x60 <c01d8339> (kobject_get+0x19/0x30 <c01d7879>) (T1/#12) dpkg 4362 0 5 00000000 0000000c [0000380181323295] 0.012ms (+0.000ms): usb_get_urb+0x9/0x20 <dcabaed9> (hcd_submit_urb+0xc6/0x200 <dcaba636>) (T1/#13) dpkg 4362 0 5 00000000 0000000d [0000380181323566] 0.012ms (+0.001ms): kref_get+0x9/0x60 <c01d8339> (usb_get_urb+0x16/0x20 <dcabaee6>) (T1/#14) dpkg 4362 0 5 00000000 0000000e [0000380181324216] 0.013ms (+0.000ms): uhci_urb_enqueue+0xe/0x290 <dca6bf4e> (hcd_submit_urb+0x123/0x200 <dcaba693>) (T1/#15) dpkg 4362 0 5 00000001 0000000f [0000380181324743] 0.014ms (+0.000ms): uhci_find_urb_ep+0xe/0xb0 <dca6be9e> (uhci_urb_enqueue+0x7a/0x290 <dca6bfba>) (T1/#16) dpkg 4362 0 5 00000001 00000010 [0000380181325251] 0.015ms (+0.000ms): uhci_alloc_urb_priv+0xb/0x80 <dca6aebb> (uhci_urb_enqueue+0x87/0x290 <dca6bfc7>) (T1/#17) dpkg 4362 0 5 00000001 00000011 [0000380181325582] 0.016ms (+0.001ms): kmem_cache_alloc+0xb/0x70 <c013dc6b> (uhci_alloc_urb_priv+0x1c/0x80 <dca6aecc>) (T1/#18) dpkg 4362 0 5 00000001 00000012 [0000380181326332] 0.017ms (+0.000ms): usb_check_bandwidth+0xc/0x140 <dcaba2fc> (uhci_urb_enqueue+0x200/0x290 <dca6c140>) (T1/#19) dpkg 4362 0 5 00000001 00000013 [0000380181326926] 0.018ms (+0.001ms): usb_calc_bus_time+0x9/0x270 <dcaba089> (usb_check_bandwidth+0x6b/0x140 <dcaba35b>) (T1/#20) dpkg 4362 0 5 00000001 00000014 [0000380181327893] 0.020ms (+0.001ms): uhci_submit_common+0xe/0x380 <dca6b77e> (uhci_urb_enqueue+0x239/0x290 <dca6c179>) (T1/#21) dpkg 4362 0 5 00000001 00000015 [0000380181328984] 0.021ms (+0.001ms): uhci_alloc_td+0xb/0x80 <dca6a5bb> (uhci_submit_common+0xf0/0x380 <dca6b860>) (T1/#22) dpkg 4362 0 5 00000001 00000016 [0000380181329685] 0.023ms (+0.002ms): dma_pool_alloc+0xe/0x1a0 <c02051fe> (uhci_alloc_td+0x20/0x80 <dca6a5d0>) (T1/#23) dpkg 4362 0 5 00000001 00000017 [0000380181331207] 0.025ms (+0.000ms): usb_get_dev+0x9/0x30 <dcab5939> (uhci_alloc_td+0x69/0x80 <dca6a619>) (T1/#24) dpkg 4362 0 5 00000001 00000018 [0000380181331544] 0.026ms (+0.000ms): get_device+0x8/0x30 <c02012d8> (usb_get_dev+0x19/0x30 <dcab5949>) (T1/#25) dpkg 4362 0 5 00000001 00000019 [0000380181331882] 0.026ms (+0.000ms): kobject_get+0x9/0x30 <c01d7869> (get_device+0x1a/0x30 <c02012ea>) (T1/#26) dpkg 4362 0 5 00000001 0000001a [0000380181332215] 0.027ms (+0.000ms): kref_get+0x9/0x60 <c01d8339> (kobject_get+0x19/0x30 <c01d7879>) (T1/#27) dpkg 4362 0 5 00000001 0000001b [0000380181332606] 0.027ms (+0.001ms): uhci_add_td_to_urb+0x9/0x30 <dca6af39> (uhci_submit_common+0x10b/0x380 <dca6b87b>) (T1/#28) dpkg 4362 0 5 00000001 0000001c [0000380181333448] 0.029ms (+0.000ms): uhci_alloc_qh+0xb/0x70 <dca6a89b> (uhci_submit_common+0x1d7/0x380 <dca6b947>) (T1/#29) dpkg 4362 0 5 00000001 0000001d [0000380181333880] 0.030ms (+0.001ms): dma_pool_alloc+0xe/0x1a0 <c02051fe> (uhci_alloc_qh+0x20/0x70 <dca6a8b0>) (T1/#30) dpkg 4362 0 5 00000001 0000001e [0000380181334888] 0.031ms (+0.000ms): usb_get_dev+0x9/0x30 <dcab5939> (uhci_alloc_qh+0x60/0x70 <dca6a8f0>) (T1/#31) dpkg 4362 0 5 00000001 0000001f [0000380181335311] 0.032ms (+0.000ms): get_device+0x8/0x30 <c02012d8> (usb_get_dev+0x19/0x30 <dcab5949>) (T1/#32) dpkg 4362 0 5 00000001 00000020 [0000380181335644] 0.033ms (+0.000ms): kobject_get+0x9/0x30 <c01d7869> (get_device+0x1a/0x30 <c02012ea>) (T1/#33) dpkg 4362 0 5 00000001 00000021 [0000380181335972] 0.033ms (+0.000ms): kref_get+0x9/0x60 <c01d8339> (kobject_get+0x19/0x30 <c01d7879>) (T1/#34) dpkg 4362 0 5 00000001 00000022 [0000380181336517] 0.034ms (+0.000ms): uhci_insert_tds_in_qh+0xb/0x60 <dca6a76b> (uhci_submit_common+0x1f7/0x380 <dca6b967>) (T1/#35) dpkg 4362 0 5 00000001 00000023 [0000380181337025] 0.035ms (+0.001ms): uhci_insert_qh+0xb/0x90 <dca6a9ab> (uhci_submit_common+0x235/0x380 <dca6b9a5>) (T1/#36) dpkg 4362 0 5 00000001 00000024 [0000380181337741] 0.036ms (+0.001ms): usb_claim_bandwidth+0x8/0x40 <dcaba438> (uhci_urb_enqueue+0x178/0x290 <dca6c0b8>) (T1/#37) dpkg 4362 0 5 00000000 00000025 [0000380181338690] 0.038ms (+0.000ms): usb_free_urb+0x8/0x20 <dcabaeb8> (uhci_finish_urb+0x40/0x60 <dca6c9b0>) (T1/#38) dpkg 4362 0 5 00000000 00000026 [0000380181339041] 0.038ms (+0.001ms): kref_put+0xa/0xb0 <c01d839a> (usb_free_urb+0x1a/0x20 <dcabaeca>) (T1/#39) dpkg 4362 0 5 00000000 00000027 [0000380181339653] 0.039ms (+0.000ms): __wake_up+0xe/0x70 <c011059e> (uhci_irq+0x1cd/0x200 <dca6cc5d>) (T1/#40) dpkg 4362 0 5 00000001 00000028 [0000380181340175] 0.040ms (+0.001ms): __wake_up_common+0xb/0x70 <c011052b> (__wake_up+0x3b/0x70 <c01105cb>) (T1/#41) dpkg 4362 0 5 00000001 00000029 [0000380181341026] 0.042ms (+0.000ms): note_interrupt+0xb/0x90 <c01341db> (__do_IRQ+0x148/0x160 <c0133938>) (T1/#42) dpkg 4362 0 5 00000001 0000002a [0000380181341399] 0.042ms (+0.000ms): end_8259A_irq+0x8/0x40 <c0107c38> (__do_IRQ+0x110/0x160 <c0133900>) (T1/#43) dpkg 4362 0 5 00000001 0000002b [0000380181341746] 0.043ms (+0.002ms): enable_8259A_irq+0xb/0x80 <c0107d1b> (__do_IRQ+0x110/0x160 <c0133900>) (T1/#44) dpkg 4362 0 7 00000002 0000002c [0000380181343089] 0.045ms (+0.001ms): irq_exit+0x8/0x50 <c0119fb8> (do_IRQ+0x60/0x80 <c01041f0>) (T6/#45) dpkg-4362 0dn.2 46µs!< (1) (T1/#46) dpkg 4362 0 2 00000001 0000002e [0000380181504494] 0.314ms (+0.000ms): preempt_schedule+0xa/0x70 <c027d0ca> (copy_pte_range+0xb7/0x1c0 <c0142ad7>) (T1/#47) dpkg 4362 0 2 00000001 0000002f [0000380181504953] 0.315ms (+0.000ms): __cond_resched_raw_spinlock+0x8/0x50 <c0111398> (copy_pte_range+0xa7/0x1c0 <c0142ac7>) (T1/#48) dpkg 4362 0 2 00000000 00000030 [0000380181505442] 0.316ms (+0.001ms): __cond_resched+0x9/0x70 <c0111329> (__cond_resched_raw_spinlock+0x3d/0x50 <c01113cd>) (T1/#49) dpkg 4362 0 3 00000000 00000031 [0000380181506068] 0.317ms (+0.000ms): __schedule+0xe/0x630 <c027c98e> (__cond_resched+0x45/0x70 <c0111365>) (T1/#50) dpkg 4362 0 3 00000000 00000032 [0000380181506442] 0.317ms (+0.001ms): profile_hit+0x9/0x50 <c0115749> (__schedule+0x3a/0x630 <c027c9ba>) (T1/#51) dpkg 4362 0 3 00000001 00000033 [0000380181507130] 0.318ms (+0.001ms): sched_clock+0xe/0xe0 <c010c3ae> (__schedule+0x62/0x630 <c027c9e2>) (T1/#52) dpkg 4362 0 3 00000002 00000034 [0000380181508079] 0.320ms (+0.000ms): dequeue_task+0xa/0x50 <c010f4ea> (__schedule+0x1ab/0x630 <c027cb2b>) (T1/#53) dpkg 4362 0 3 00000002 00000035 [0000380181508503] 0.321ms (+0.000ms): recalc_task_prio+0xc/0x1a0 <c010f64c> (__schedule+0x1c5/0x630 <c027cb45>) (T1/#54) dpkg 4362 0 3 00000002 00000036 [0000380181509011] 0.321ms (+0.000ms): effective_prio+0x8/0x50 <c010f5f8> (recalc_task_prio+0xa6/0x1a0 <c010f6e6>) (T1/#55) dpkg 4362 0 3 00000002 00000037 [0000380181509402] 0.322ms (+0.001ms): enqueue_task+0xa/0x80 <c010f53a> (__schedule+0x1cc/0x630 <c027cb4c>) (T4/#56) [ => dpkg ] 0.324ms (+0.001ms) (T1/#57) <...> 2593 0 1 00000002 00000039 [0000380181511577] 0.326ms (+0.002ms): __switch_to+0xb/0x1a0 <c0100f5b> (__schedule+0x2bd/0x630 <c027cc3d>) (T3/#58) <...>-2593 0d..2 328µs : __schedule+0x2ea/0x630 <c027cc6a> <dpkg-4362> (75 73): (T1/#59) <...> 2593 0 1 00000002 0000003b [0000380181513468] 0.329ms (+0.000ms): finish_task_switch+0xc/0x90 <c010fdec> (__schedule+0x2f6/0x630 <c027cc76>) (T1/#60) <...> 2593 0 1 00000001 0000003c [0000380181513919] 0.330ms (+0.000ms): trace_stop_sched_switched+0xa/0x150 <c012cc1a> (finish_task_switch+0x43/0x90 <c010fe23>) (T3/#61) <...>-2593 0d..1 330µs : trace_stop_sched_switched+0x42/0x150 <c012cc52> <<...>-2593> (73 0): (T1/#62) <...> 2593 0 1 00000001 0000003e [0000380181515016] 0.331ms (+0.000ms): trace_stop_sched_switched+0xfe/0x150 <c012cd0e> (finish_task_switch+0x43/0x90 <c010fe23>) vim:ft=help ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-02-19 5:08 ` Lee Revell 2005-02-19 6:47 ` Lee Revell 2005-02-19 9:00 ` Ingo Molnar @ 2005-03-10 9:37 ` Steven Rostedt 2005-03-10 9:54 ` Steven Rostedt 2 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-10 9:37 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, linux-kernel Hi Ingo, I notice a problem with the bit_spin_locks that would probably explain the kjournald latency problems. I'm working on a custom kernel based on your's and I needed to temporarily remove the scheduler_tick from update_process_times to implement some special scheduling needs. This caused kjournal to go into an infinite loop. Here's your bit_spin_lock: static inline void bit_spin_lock(int bitnum, unsigned long *addr) { /* * Assuming the lock is uncontended, this never enters * the body of the outer loop. If it is contended, then * within the inner loop a non-atomic test is used to * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) while (test_and_set_bit(bitnum, addr)) while (test_bit(bitnum, addr)) cpu_relax(); #endif __acquire(bitlock); } You removed the preempt disable and added the CONFIG_PREEMPT. What happens if a lower priority process gets the bit lock and gets preempted by a higher priority process that then tries to get this lock. It spins until it's quota runs out. This is what is happening to kjournald. A lower priority process gets the bit lock and kjournald preempts it causing kjournald to spin until it's quota is up to let the other process release the lock. Now, luckly your kernel kjournald is not realtime FIFO. If it were, you would than have a deadlock, try it. I just set kjournald (using your kernel) to FIFO prio 42 (prio 58 inside the kernel), and with a non-rt task, I did a build of the kernel. After a minute or two, all processes under the priority of kjournald were starved out of the CPU, and kjournald was spinning. Make sure your kjournald has a lower prioirty than your interrupt threads. The culprit is jbd_lock_bh_state and jbd_lock_bh_journal_head which call bit_spin_lock. Example of long latency: (or deadlock) journal_refile_buffer --> spin_lock(&journal->j_list_lock); --> journal_remove_journal_head(bh); --> jbd_lock_bh_journal_head(bh); --> bit_spin_lock(BH_JournalHead, &bh->b_state); The short term fix is probably to put back the preempt_disables, the long term is to get rid of these stupid bit_spin_lock busy loops. -- Steve On Sat, 19 Feb 2005, Lee Revell wrote: > On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > > http://redhat.com/~mingo/realtime-preempt/ > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 > -------------------------------------------------------------------- > latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) > ----------------- > | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) > ----------------- > > _------=> CPU# > / _-----=> irqs-off > | / _----=> need-resched > || / _---=> hardirq/softirq > ||| / _--=> preempt-depth > |||| / > ||||| delay > cmd pid ||||| time | caller > \ / ||||| \ | / > kjournal-2478 0dn.4 0µs!: <756f6a6b> (<6c616e72>) > kjournal-2478 0dn.4 0µs : __trace_start_sched_wakeup (try_to_wake_up) > kjournal-2478 0dn.3 0µs : preempt_schedule (try_to_wake_up) > kjournal-2478 0dn.3 0µs : try_to_wake_up <<...>-2> (69 73): > kjournal-2478 0dn.2 0µs : preempt_schedule (try_to_wake_up) > kjournal-2478 0dn.2 0µs : wake_up_process (do_softirq) > kjournal-2478 0dn.1 1µs < (1) > > The repeating pattern is 8 of these: > > kjournal-2478 0.n.1 1µs : inverted_lock (journal_commit_transaction) > kjournal-2478 0.n.1 1µs : __journal_unfile_buffer (journal_commit_transaction) > kjournal-2478 0.n.1 1µs : journal_remove_journal_head (journal_commit_transaction) > kjournal-2478 0.n.1 1µs : __journal_remove_journal_head (journal_remove_journal_head) > kjournal-2478 0.n.1 1µs : __brelse (__journal_remove_journal_head) > kjournal-2478 0.n.1 1µs : journal_free_journal_head (journal_remove_journal_head) > kjournal-2478 0.n.1 2µs : kmem_cache_free (journal_free_journal_head) > > and one of these: > > kjournal-2478 0dn.1 9µs : cache_flusharray (kmem_cache_free) > kjournal-2478 0dn.2 9µs : free_block (cache_flusharray) > kjournal-2478 0dn.1 11µs : preempt_schedule (cache_flusharray) > kjournal-2478 0dn.1 11µs : memmove (cache_flusharray) > kjournal-2478 0dn.1 11µs : memcpy (memmove) > > etc. Finally: > > kjournal-2478 0dn.1 704µs : cache_flusharray (kmem_cache_free) > kjournal-2478 0dn.2 704µs+: free_block (cache_flusharray) > kjournal-2478 0dn.1 707µs : preempt_schedule (cache_flusharray) > kjournal-2478 0dn.1 707µs : memmove (cache_flusharray) > kjournal-2478 0dn.1 707µs : memcpy (memmove) > kjournal-2478 0.n.1 708µs : inverted_lock (journal_commit_transaction) > kjournal-2478 0.n.1 708µs : __journal_unfile_buffer (journal_commit_transaction) > kjournal-2478 0.n.1 709µs : journal_remove_journal_head (journal_commit_transaction) > kjournal-2478 0.n.1 709µs : __journal_remove_journal_head (journal_remove_journal_head) > kjournal-2478 0.n.1 709µs : __brelse (__journal_remove_journal_head) > kjournal-2478 0.n.1 709µs : journal_free_journal_head (journal_remove_journal_head) > kjournal-2478 0.n.1 709µs : kmem_cache_free (journal_free_journal_head) > kjournal-2478 0.n.. 710µs : preempt_schedule (journal_commit_transaction) > kjournal-2478 0dn.. 710µs : __schedule (preempt_schedule) > kjournal-2478 0dn.. 710µs : profile_hit (__schedule) > kjournal-2478 0dn.1 710µs : sched_clock (__schedule) > kjournal-2478 0dn.2 711µs : dequeue_task (__schedule) > kjournal-2478 0dn.2 711µs : recalc_task_prio (__schedule) > kjournal-2478 0dn.2 711µs : effective_prio (recalc_task_prio) > kjournal-2478 0dn.2 711µs : enqueue_task (__schedule) > <...>-2 0d..2 712µs : __switch_to (__schedule) > <...>-2 0d..2 712µs : __schedule <kjournal-2478> (73 69): > <...>-2 0d..2 712µs : finish_task_switch (__schedule) > <...>-2 0d..1 712µs : trace_stop_sched_switched (finish_task_switch) > <...>-2 0d..1 712µs : trace_stop_sched_switched <<...>-2> (69 0): > <...>-2 0d..1 713µs : trace_stop_sched_switched (finish_task_switch) > > Lee > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-10 9:37 ` Steven Rostedt @ 2005-03-10 9:54 ` Steven Rostedt 2005-03-11 9:57 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-10 9:54 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, linux-kernel On Thu, 10 Mar 2005, Steven Rostedt wrote: > The short term fix is probably to put back the preempt_disables, the long > term is to get rid of these stupid bit_spin_lock busy loops. > Doing a quick search on the kernel, it looks like only kjournald uses the bit_spin_locks. I'll start converting them to spinlocks. The use seems to be more of a hack, since it is using bits in the state field for locking, and these bits aren't used for anything else. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-10 9:54 ` Steven Rostedt @ 2005-03-11 9:57 ` Ingo Molnar 2005-03-11 10:15 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-11 9:57 UTC (permalink / raw) To: Steven Rostedt; +Cc: Lee Revell, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > > The short term fix is probably to put back the preempt_disables, the long > > term is to get rid of these stupid bit_spin_lock busy loops. > > Doing a quick search on the kernel, it looks like only kjournald uses > the bit_spin_locks. I'll start converting them to spinlocks. The use > seems to be more of a hack, since it is using bits in the state field > for locking, and these bits aren't used for anything else. yeah. bit-spinlocks are really a hack. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 9:57 ` Ingo Molnar @ 2005-03-11 10:15 ` Steven Rostedt 2005-03-11 10:17 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-11 10:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: Lee Revell, linux-kernel On Fri, 11 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > > The short term fix is probably to put back the preempt_disables, the long > > > term is to get rid of these stupid bit_spin_lock busy loops. > > > > Doing a quick search on the kernel, it looks like only kjournald uses > > the bit_spin_locks. I'll start converting them to spinlocks. The use > > seems to be more of a hack, since it is using bits in the state field > > for locking, and these bits aren't used for anything else. > > yeah. bit-spinlocks are really a hack. > > Ingo > And this really sucks too! I've been looking into a fix for this and have yet to get something stable. As you probably already know, you can't just put back the preempt_disable since your spinlocks now schedule. So I've been looking into finding a way to get rid of these. I've tried making two global spinlocks, one for the state bit and one for the journal head bit use. But this deadlocks with j_state_lock. The journal head lock seems to be ok to be global, but the state lock needs to have one for every buffer head. I'm now hacking away to do this without touching the actual buffer head. But I'm not sure what some of the side effects this is having. I'll keep you posted when I get something working. I'm now having a crash course in how kjournal and friends work. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 10:15 ` Steven Rostedt @ 2005-03-11 10:17 ` Ingo Molnar 2005-03-11 10:24 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-11 10:17 UTC (permalink / raw) To: Steven Rostedt; +Cc: Lee Revell, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Doing a quick search on the kernel, it looks like only kjournald uses > > > the bit_spin_locks. I'll start converting them to spinlocks. The use > > > seems to be more of a hack, since it is using bits in the state field > > > for locking, and these bits aren't used for anything else. > > > > yeah. bit-spinlocks are really a hack. > > And this really sucks too! I've been looking into a fix for this and > have yet to get something stable. As you probably already know, you > can't just put back the preempt_disable since your spinlocks now > schedule. So I've been looking into finding a way to get rid of these. > > I've tried making two global spinlocks, one for the state bit and one > for the journal head bit use. But this deadlocks with j_state_lock. > The journal head lock seems to be ok to be global, but the state lock > needs to have one for every buffer head. I'm now hacking away to do > this without touching the actual buffer head. But I'm not sure what > some of the side effects this is having. I'll keep you posted when I > get something working. I'm now having a crash course in how kjournal > and friends work. did you try the canonical way of putting a spinlock into every buffer_head? Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 10:17 ` Ingo Molnar @ 2005-03-11 10:24 ` Steven Rostedt 2005-03-11 10:43 ` Andrew Morton 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-11 10:24 UTC (permalink / raw) To: Ingo Molnar; +Cc: Lee Revell, linux-kernel On Fri, 11 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > I've tried making two global spinlocks, one for the state bit and one > > for the journal head bit use. But this deadlocks with j_state_lock. > > The journal head lock seems to be ok to be global, but the state lock > > needs to have one for every buffer head. I'm now hacking away to do > > this without touching the actual buffer head. But I'm not sure what > > some of the side effects this is having. I'll keep you posted when I > > get something working. I'm now having a crash course in how kjournal > > and friends work. > > did you try the canonical way of putting a spinlock into every > buffer_head? > No, I'll try that now. I just didn't want to modify the buffer head struct just for journaling. But if it is the quickest and easiest fix, then I'll submit it and we can change it later. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 10:24 ` Steven Rostedt @ 2005-03-11 10:43 ` Andrew Morton 2005-03-11 10:53 ` Steven Rostedt 2005-03-11 14:40 ` Steven Rostedt 0 siblings, 2 replies; 125+ messages in thread From: Andrew Morton @ 2005-03-11 10:43 UTC (permalink / raw) To: rostedt; +Cc: mingo, rlrevell, linux-kernel Steven Rostedt <rostedt@goodmis.org> wrote: > > > did you try the canonical way of putting a spinlock into every > > buffer_head? > > > > No, I'll try that now. I just didn't want to modify the buffer head struct > just for journaling. But if it is the quickest and easiest fix, then I'll > submit it and we can change it later. You'll need two spinlocks. jbd_lock_bh_state() and jbd_lock_bh_journal_head(). ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 10:43 ` Andrew Morton @ 2005-03-11 10:53 ` Steven Rostedt 2005-03-11 14:40 ` Steven Rostedt 1 sibling, 0 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-11 10:53 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel On Fri, 11 Mar 2005, Andrew Morton wrote: > Steven Rostedt <rostedt@goodmis.org> wrote: > > No, I'll try that now. I just didn't want to modify the buffer head struct > > just for journaling. But if it is the quickest and easiest fix, then I'll > > submit it and we can change it later. > > You'll need two spinlocks. jbd_lock_bh_state() and jbd_lock_bh_journal_head(). > Yep, already did that. Now I need to reboot the new kernel and give it a try. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 10:43 ` Andrew Morton 2005-03-11 10:53 ` Steven Rostedt @ 2005-03-11 14:40 ` Steven Rostedt 2005-03-11 15:08 ` Steven Rostedt 2005-03-11 15:38 ` Ingo Molnar 1 sibling, 2 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-11 14:40 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel Here's the patch. It's probably more of an overkill wrt buffer heads, but it seems to be the easiest solution. I also put back some of the changes you made for the bit_spin_locks, so that they act the same as the vanilla kernel if PREEMPT_RT is not defined. Now I only tested this with PREEMPT_RT configured so I hope others can test it with it off. If I get time I'll do that as well. I patched this against linux-2.6.11-rc4-V0.7.39-02, so I hope it goes easily into .40. Lee, Could you see what the latencies are with kjournal with this patch applied. Thanks, -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c 2005-02-12 22:06:54.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c 2005-03-11 07:48:04.000000000 -0500 @@ -3002,6 +3002,10 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(&ret->b_jstate_lock); + spin_lock_init(&ret->b_jhead_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h 2005-02-12 22:05:10.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h 2005-03-11 07:59:44.000000000 -0500 @@ -62,6 +62,14 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private; /* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* + * Fixme: This should be in the journal code. + */ + spinlock_t b_jstate_lock; /* lock for journal state. */ + spinlock_t b_jhead_lock; /* lock for journal head. */ +#endif }; /* diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h 2005-02-12 22:07:18.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h 2005-03-11 07:57:47.000000000 -0500 @@ -314,6 +314,12 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh->b_bh; @@ -326,33 +332,34 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, &bh->b_state); + PICK_SPIN_LOCK(lock,BH_JournalHead,jhead); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, &bh->b_state); + PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead); } +#undef PICK_SPIN_LOCK struct jbd_revoke_table_s; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h 2005-03-10 08:47:25.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h 2005-03-11 09:06:26.254317378 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */ static inline int bit_spin_trylock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + if (test_and_set_bit(bitnum, addr)) { + preempt_enable(); return 0; + } #endif __acquire(bitlock); return 1; @@ -815,11 +827,12 @@ */ static inline void bit_spin_unlock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) BUG_ON(!test_bit(bitnum, addr)); smp_mb__before_clear_bit(); clear_bit(bitnum, addr); #endif + preempt_enable(); __release(bitlock); } @@ -828,12 +841,15 @@ */ static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) return test_bit(bitnum, addr); +#elif defined CONFIG_PREEMPT + return preempt_count(); #else return 1; #endif } +#endif /* CONFIG_PREEMPT_RT */ #define DEFINE_SPINLOCK(name) \ spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 14:40 ` Steven Rostedt @ 2005-03-11 15:08 ` Steven Rostedt 2005-03-11 15:30 ` K.R. Foley 2005-03-11 15:38 ` Ingo Molnar 1 sibling, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-11 15:08 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel > > +#ifdef CONFIG_PREEMPT_RT > +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock) > +#else > +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); > +#endif > + Oops, extra semicolon on the non RT side. I'll try again. -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c 2005-02-12 22:06:54.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c 2005-03-11 07:48:04.000000000 -0500 @@ -3002,6 +3002,10 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(&ret->b_jstate_lock); + spin_lock_init(&ret->b_jhead_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h 2005-02-12 22:05:10.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h 2005-03-11 07:59:44.000000000 -0500 @@ -62,6 +62,14 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private; /* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* + * Fixme: This should be in the journal code. + */ + spinlock_t b_jstate_lock; /* lock for journal state. */ + spinlock_t b_jhead_lock; /* lock for journal head. */ +#endif }; /* diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h 2005-02-12 22:07:18.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h 2005-03-11 07:57:47.000000000 -0500 @@ -314,6 +314,12 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state) +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh->b_bh; @@ -326,33 +332,34 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, &bh->b_state); + PICK_SPIN_LOCK(lock,BH_JournalHead,jhead); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, &bh->b_state); + PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead); } +#undef PICK_SPIN_LOCK struct jbd_revoke_table_s; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h 2005-03-10 08:47:25.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h 2005-03-11 09:06:26.254317378 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */ static inline int bit_spin_trylock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + if (test_and_set_bit(bitnum, addr)) { + preempt_enable(); return 0; + } #endif __acquire(bitlock); return 1; @@ -815,11 +827,12 @@ */ static inline void bit_spin_unlock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) BUG_ON(!test_bit(bitnum, addr)); smp_mb__before_clear_bit(); clear_bit(bitnum, addr); #endif + preempt_enable(); __release(bitlock); } @@ -828,12 +841,15 @@ */ static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) return test_bit(bitnum, addr); +#elif defined CONFIG_PREEMPT + return preempt_count(); #else return 1; #endif } +#endif /* CONFIG_PREEMPT_RT */ #define DEFINE_SPINLOCK(name) \ spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 15:08 ` Steven Rostedt @ 2005-03-11 15:30 ` K.R. Foley 0 siblings, 0 replies; 125+ messages in thread From: K.R. Foley @ 2005-03-11 15:30 UTC (permalink / raw) To: rostedt; +Cc: Andrew Morton, mingo, rlrevell, linux-kernel Steven Rostedt wrote: >>+#ifdef CONFIG_PREEMPT_RT >>+#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock) >>+#else >>+#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); >>+#endif >>+ > > > Oops, extra semicolon on the non RT side. > > > I'll try again. > > -- Steve Haven't tried it yet, but does apply cleanly to 2.6.11-final-V0.7.40-00. kr ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 14:40 ` Steven Rostedt 2005-03-11 15:08 ` Steven Rostedt @ 2005-03-11 15:38 ` Ingo Molnar 2005-03-11 16:01 ` Steven Rostedt 2005-03-11 20:39 ` Steven Rostedt 1 sibling, 2 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-11 15:38 UTC (permalink / raw) To: Steven Rostedt; +Cc: Andrew Morton, rlrevell, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > Here's the patch. It's probably more of an overkill wrt buffer heads, > but it seems to be the easiest solution. isnt there some ext3-private journal structure (journal-bh) linked off the bh? If the lock is in that structure then the overhead would only affect ext3. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 15:38 ` Ingo Molnar @ 2005-03-11 16:01 ` Steven Rostedt 2005-03-11 20:39 ` Steven Rostedt 1 sibling, 0 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-11 16:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, rlrevell, linux-kernel On Fri, 11 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Here's the patch. It's probably more of an overkill wrt buffer heads, > > but it seems to be the easiest solution. > > isnt there some ext3-private journal structure (journal-bh) linked off > the bh? If the lock is in that structure then the overhead would only > affect ext3. > Yes, there is, and I was trying to use it before you mentioned trying this (which works for now). The locks are called before and after the private pointer of the bh is set and removed. The journal_head lock, I was going to make global, and the state lock would go on this structure. I would have to do some hack in journal.c to flag the state lock when it was removing the journal head so that it didn't do the remove there, but did it after the state lock was released. But this still had a few crashes. The journal_head lock was used to lock when to add or remove the private data from the bh, so you can see why this structure can't be used for this purpose. But the state lock seemed to be ok for this. I need to know more about the journaling system. I'll look into doing this too, but this fix should due for now. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 15:38 ` Ingo Molnar 2005-03-11 16:01 ` Steven Rostedt @ 2005-03-11 20:39 ` Steven Rostedt 2005-03-11 20:46 ` Lee Revell 1 sibling, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-11 20:39 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, rlrevell, linux-kernel On Fri, 11 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Here's the patch. It's probably more of an overkill wrt buffer heads, > > but it seems to be the easiest solution. > > isnt there some ext3-private journal structure (journal-bh) linked off > the bh? If the lock is in that structure then the overhead would only > affect ext3. > OK, here it is (Yuck!). I was able to use the journal head (private data of the buffer head) for the state lock. I just decided to have the journal head lock be one global lock for all buffer heads, since it is used to add and remove the journal private data from the buffer head, and thus can't be stored in the journal private data. The state lock is now in the journal private data but we must be careful not to free this data before we unlock it. So here's what I've done. static inline void jbd_lock_bh_state(struct buffer_head *bh) { BUG_ON(!bh->b_private); atomic_inc(&bh2jh(bh)->b_state_wait_count); spin_lock(&bh2jh(bh)->b_state_lock); } I have a counter of those that want/have the lock, and this informs the journal_remove_journal_head that it should not free the jh. static void __journal_remove_journal_head(struct buffer_head *bh) { struct journal_head *jh = bh2jh(bh); J_ASSERT_JH(jh, jh->b_jcount >= 0); get_bh(bh); if (jh->b_jcount == 0) { if (jh->b_transaction == NULL && jh->b_next_transaction == NULL && jh->b_cp_transaction == NULL) { #ifdef CONFIG_PREEMPT_RT if (atomic_read(&jh->b_state_wait_count)) { BUG_ON(buffer_journalhead(bh)); set_buffer_journalhead(bh); } else #endif { Here the state_wait_count is checked, and if > 0, then using the bit that was originally used for locking the journal head, is set to inform the unlocking of the state lock that it needs to be removed. static inline void jbd_unlock_bh_state(struct buffer_head *bh) { int rmjh = 0; BUG_ON(!atomic_read(&bh2jh(bh)->b_state_wait_count)); atomic_dec(&bh2jh(bh)->b_state_wait_count); if (buffer_journalhead(bh)) { clear_buffer_journalhead(bh); rmjh = 1; } spin_unlock(&bh2jh(bh)->b_state_lock); if (rmjh) journal_remove_journal_head(bh); } Now in the unlocking of the state lock, the journal head bit is tested and if it is set, then the remove journal head function is called. Maybe this isn't the cleanest solution, but it keeps the overhead on the buffer heads down, so it's prefered over my last patch. Once again, this has only been tested with full preemption enabled, but I tried to keep it from changing the way non PREEMPT_RT works. I'm leaving now for the weekend, so I won't be able to respond to anyone till Monday. I'll also run this patch over the weekend while compiling the kernel in an endless loop while [ 1 ]; do make clean; make done With kjournal running FIFO, to see if it survives. Cheers, -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c 2005-02-12 22:05:29.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c 2005-03-11 14:54:21.000000000 -0500 @@ -80,6 +80,10 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef CONFIG_PREEMPT_RT +spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED; +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* @@ -1727,6 +1731,9 @@ jh = new_jh; new_jh = NULL; /* We consumed it */ set_buffer_jbd(bh); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(&jh->b_state_lock); +#endif bh->b_private = jh; jh->b_bh = bh; get_bh(bh); @@ -1767,26 +1774,34 @@ if (jh->b_transaction == NULL && jh->b_next_transaction == NULL && jh->b_cp_transaction == NULL) { - J_ASSERT_BH(bh, buffer_jbd(bh)); - J_ASSERT_BH(bh, jh2bh(jh) == bh); - BUFFER_TRACE(bh, "remove journal_head"); - if (jh->b_frozen_data) { - printk(KERN_WARNING "%s: freeing " - "b_frozen_data\n", - __FUNCTION__); - kfree(jh->b_frozen_data); - } - if (jh->b_committed_data) { - printk(KERN_WARNING "%s: freeing " - "b_committed_data\n", - __FUNCTION__); - kfree(jh->b_committed_data); +#ifdef CONFIG_PREEMPT_RT + if (atomic_read(&jh->b_state_wait_count)) { + BUG_ON(buffer_journalhead(bh)); + set_buffer_journalhead(bh); + } else +#endif + { + J_ASSERT_BH(bh, buffer_jbd(bh)); + J_ASSERT_BH(bh, jh2bh(jh) == bh); + BUFFER_TRACE(bh, "remove journal_head"); + if (jh->b_frozen_data) { + printk(KERN_WARNING "%s: freeing " + "b_frozen_data\n", + __FUNCTION__); + kfree(jh->b_frozen_data); + } + if (jh->b_committed_data) { + printk(KERN_WARNING "%s: freeing " + "b_committed_data\n", + __FUNCTION__); + kfree(jh->b_committed_data); + } + bh->b_private = NULL; + jh->b_bh = NULL; /* debug, really */ + clear_buffer_jbd(bh); + __brelse(bh); + journal_free_journal_head(jh); } - bh->b_private = NULL; - jh->b_bh = NULL; /* debug, really */ - clear_buffer_jbd(bh); - __brelse(bh); - journal_free_journal_head(jh); } else { BUFFER_TRACE(bh, "journal_head was locked"); } diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/transaction.c linux-2.6.11-rc4-V0.7.39-02/fs/jbd/transaction.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/transaction.c 2005-02-12 22:05:50.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/jbd/transaction.c 2005-03-11 13:25:49.000000000 -0500 @@ -1207,11 +1207,17 @@ BUFFER_TRACE(bh, "entry"); + /* + * Is it OK to check to see if this isn't a jbd buffer outside of + * locks? Now that jbd_lock_bh_state only works with jbd buffers + * I sure hope so. + */ + if (!buffer_jbd(bh)) + goto not_jbd; + jbd_lock_bh_state(bh); spin_lock(&journal->j_list_lock); - if (!buffer_jbd(bh)) - goto not_jbd; jh = bh2jh(bh); /* Critical error: attempting to delete a bitmap buffer, maybe? @@ -1219,7 +1225,7 @@ if (!J_EXPECT_JH(jh, !jh->b_committed_data, "inconsistent data on disk")) { err = -EIO; - goto not_jbd; + goto bad_jbd; } if (jh->b_transaction == handle->h_transaction) { @@ -1274,9 +1280,11 @@ } } -not_jbd: + +bad_jbd: spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); +not_jbd: __brelse(bh); return err; } diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h 2005-02-12 22:07:18.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h 2005-03-11 14:55:31.000000000 -0500 @@ -313,6 +313,7 @@ BUFFER_FNS(RevokeValid, revokevalid) TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +BUFFER_FNS(JournalHead,journalhead) static inline struct buffer_head *jh2bh(struct journal_head *jh) { @@ -324,6 +325,66 @@ return bh->b_private; } +void journal_remove_journal_head(struct buffer_head *bh); + +#ifdef CONFIG_PREEMPT_RT + +extern spinlock_t jbd_journal_head_lock; + +static inline void jbd_lock_bh_state(struct buffer_head *bh) +{ + BUG_ON(!bh->b_private); + atomic_inc(&bh2jh(bh)->b_state_wait_count); + spin_lock(&bh2jh(bh)->b_state_lock); +} + +static inline int jbd_trylock_bh_state(struct buffer_head *bh) +{ + int ret; + + BUG_ON(!bh->b_private); + + if ((ret = spin_trylock(&bh2jh(bh)->b_state_lock))) + atomic_inc(&bh2jh(bh)->b_state_wait_count); + + return ret; +} + +static inline int jbd_is_locked_bh_state(struct buffer_head *bh) +{ + return bh2jh(bh) ? spin_is_locked(&bh2jh(bh)->b_state_lock) : 0; +} + +static inline void jbd_unlock_bh_state(struct buffer_head *bh) +{ + int rmjh = 0; + + BUG_ON(!atomic_read(&bh2jh(bh)->b_state_wait_count)); + atomic_dec(&bh2jh(bh)->b_state_wait_count); + + if (buffer_journalhead(bh)) { + clear_buffer_journalhead(bh); + rmjh = 1; + } + + spin_unlock(&bh2jh(bh)->b_state_lock); + + if (rmjh) + journal_remove_journal_head(bh); +} + +static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) +{ + spin_lock(&jbd_journal_head_lock); +} + +static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) +{ + spin_unlock(&jbd_journal_head_lock); +} + +#else /* !CONFIG_PREEMPT_RT */ + static inline void jbd_lock_bh_state(struct buffer_head *bh) { bit_spin_lock(BH_State, &bh->b_state); @@ -354,6 +415,8 @@ bit_spin_unlock(BH_JournalHead, &bh->b_state); } +#endif /* CONFIG_PREEMPT_RT */ + struct jbd_revoke_table_s; /** @@ -918,7 +981,6 @@ */ struct journal_head *journal_add_journal_head(struct buffer_head *bh); struct journal_head *journal_grab_journal_head(struct buffer_head *bh); -void journal_remove_journal_head(struct buffer_head *bh); void journal_put_journal_head(struct journal_head *jh); /* diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/journal-head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/journal-head.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/journal-head.h 2005-02-12 22:07:39.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/journal-head.h 2005-03-11 15:14:07.774541864 -0500 @@ -80,6 +80,16 @@ * [j_list_lock] */ struct journal_head *b_cpnext, *b_cpprev; + + /* + * Lock the state of the buffer head. + */ + spinlock_t b_state_lock; + + /* + * Count the processes that want/have the state lock. + */ + atomic_t b_state_wait_count; }; #endif /* JOURNAL_HEAD_H_INCLUDED */ diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h 2005-03-10 08:47:25.000000000 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h 2005-03-11 09:06:26.000000000 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */ static inline int bit_spin_trylock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + if (test_and_set_bit(bitnum, addr)) { + preempt_enable(); return 0; + } #endif __acquire(bitlock); return 1; @@ -815,11 +827,12 @@ */ static inline void bit_spin_unlock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) BUG_ON(!test_bit(bitnum, addr)); smp_mb__before_clear_bit(); clear_bit(bitnum, addr); #endif + preempt_enable(); __release(bitlock); } @@ -828,12 +841,15 @@ */ static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) return test_bit(bitnum, addr); +#elif defined CONFIG_PREEMPT + return preempt_count(); #else return 1; #endif } +#endif /* CONFIG_PREEMPT_RT */ #define DEFINE_SPINLOCK(name) \ spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 20:39 ` Steven Rostedt @ 2005-03-11 20:46 ` Lee Revell 2005-03-11 22:06 ` Lee Revell 0 siblings, 1 reply; 125+ messages in thread From: Lee Revell @ 2005-03-11 20:46 UTC (permalink / raw) To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: > I'm leaving now for the weekend, so I won't be able to respond to anyone > till Monday. I'll also run this patch over the weekend while compiling > the kernel in an endless loop I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it goes. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 20:46 ` Lee Revell @ 2005-03-11 22:06 ` Lee Revell 2005-03-14 7:37 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Lee Revell @ 2005-03-11 22:06 UTC (permalink / raw) To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote: > On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: > > I'm leaving now for the weekend, so I won't be able to respond to anyone > > till Monday. I'll also run this patch over the weekend while compiling > > the kernel in an endless loop > > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it > goes. Does not seem to work at all with the above settings. It seemed OK until I started X. Then every time I launched an xterm it would disappear as soon as I typed anything. I could not switch consoles to see the Oops. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-11 22:06 ` Lee Revell @ 2005-03-14 7:37 ` Steven Rostedt 2005-03-14 9:33 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-14 7:37 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Fri, 11 Mar 2005, Lee Revell wrote: > On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote: > > On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: > > > I'm leaving now for the weekend, so I won't be able to respond to anyone > > > till Monday. I'll also run this patch over the weekend while compiling > > > the kernel in an endless loop > > > > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it > > goes. > > Does not seem to work at all with the above settings. It seemed OK > until I started X. Then every time I launched an xterm it would > disappear as soon as I typed anything. I could not switch consoles to > see the Oops. > Hi Lee, I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on my test machine) as data=ordered. I had no problem getting to X, starting an xterm and running a make. Actually it was a gnome-term since I didn't have xterm. But then I su to root, apt-get xterm, ran xterm, and did a make there with no problems. Did you patch this against 39-02 or -40-X? I haven't had time to upgrade to 40 yet. Maybe, I'll work on that today. Maybe your crash has something else to do with. My test machine has a serial hookup that I can look at even if the term goes down. I'll see if 40 gives me problems. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-14 7:37 ` Steven Rostedt @ 2005-03-14 9:33 ` Steven Rostedt 2005-03-14 10:10 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-14 9:33 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it > > > goes. > > > > Does not seem to work at all with the above settings. It seemed OK > > until I started X. Then every time I launched an xterm it would > > disappear as soon as I typed anything. I could not switch consoles to > > see the Oops. > > > > Hi Lee, > > I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on > my test machine) as data=ordered. I had no problem getting to X, starting > an xterm and running a make. Actually it was a gnome-term since I didn't > have xterm. But then I su to root, apt-get xterm, ran xterm, and did a > make there with no problems. > > Did you patch this against 39-02 or -40-X? > > I haven't had time to upgrade to 40 yet. Maybe, I'll work on that today. > I just downloaded -40 and applied my patch, compiled it with PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except I'm getting the following... BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0213438 *pde = 00000000 Oops: 0000 [#1] Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix CPU: 0 EIP: 0060:[<c0213438>] Not tainted VLI EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) EIP is at vt_ioctl+0x18/0x1ab0 eax: 00000000 ebx: 00005603 ecx: 00005603 edx: cb6c8780 esi: c0213420 edi: cc956000 ebp: cb613f18 esp: cb613e48 ds: 007b es: 007b ss: 0068 preempt: 00000000 Process XFree86 (pid: 4713, threadinfo=cb612000 task=cb5e0a40) Stack: cb5e0b90 cb612000 cb5e0a40 c034494c cb5e0a40 00000246 cb613e7c c0117217 c0344954 00000006 00000001 00000000 00000000 cb613ebc ce0cce24 c13e1800 cf1279b8 00000000 00000000 cb613ed4 c01707f1 cf1279b8 00000007 00000000 Call Trace: [<c0103cdf>] show_stack+0x7f/0xa0 (28) [<c0103e95>] show_registers+0x165/0x1d0 (56) [<c0104088>] die+0xc8/0x150 (64) [<c0115376>] do_page_fault+0x356/0x6c4 (216) [<c0103973>] error_code+0x2b/0x30 (268) [<c020e91b>] tty_ioctl+0x34b/0x490 (52) [<c016837f>] do_ioctl+0x4f/0x70 (32) [<c0168582>] vfs_ioctl+0x62/0x1d0 (40) [<c0168751>] sys_ioctl+0x61/0x90 (40) [<c0102ec3>] syscall_call+0x7/0xb (-8124) Code: ff ff 8d 05 88 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 I'll see if this happens without the patch, and if so, then I'll look into this further. Thanks, -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-14 9:33 ` Steven Rostedt @ 2005-03-14 10:10 ` Steven Rostedt 2005-03-14 15:50 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-14 10:10 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Mon, 14 Mar 2005, Steven Rostedt wrote: > > I just downloaded -40 and applied my patch, compiled it with > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except > I'm getting the following... > > BUG: Unable to handle kernel NULL pointer dereference at virtual address > 00000000 > printing eip: > c0213438 > *pde = 00000000 [snip] > > > I'll see if this happens without the patch, and if so, then I'll look into > this further. > Well, I took out my patch and this bug didn't happen, so I guess it's may fault! OK, I'll dig into it further. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-14 10:10 ` Steven Rostedt @ 2005-03-14 15:50 ` Steven Rostedt 2005-03-14 19:02 ` Steven Rostedt 2005-03-15 11:44 ` Steven Rostedt 0 siblings, 2 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-14 15:50 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Mon, 14 Mar 2005, Steven Rostedt wrote: > > On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > I just downloaded -40 and applied my patch, compiled it with > > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except > > I'm getting the following... > > > > BUG: Unable to handle kernel NULL pointer dereference at virtual address > > 00000000 > > printing eip: > > c0213438 > > *pde = 00000000 > > [snip] > > > > > > > I'll see if this happens without the patch, and if so, then I'll look into > > this further. > > > > Well, I took out my patch and this bug didn't happen, so I guess it's may > fault! OK, I'll dig into it further. > Here's a new patch. All I did was move BUFFER_FNS(JournalHead,journalhead) to inside the #ifdef CONFIG_PREEMPT_RT and my oops went away !?! This really bothers me since it just declares some functions and is not used with CONFIG_PREEMPT_RT off. I have no idea what's going on. Lee, can you see if this still crashes for you. Thanks, -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-14 09:46:41.000000000 -0500 @@ -80,6 +80,10 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef CONFIG_PREEMPT_RT +spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED; +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* @@ -1727,6 +1731,9 @@ jh = new_jh; new_jh = NULL; /* We consumed it */ set_buffer_jbd(bh); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(&jh->b_state_lock); +#endif bh->b_private = jh; jh->b_bh = bh; get_bh(bh); @@ -1767,26 +1774,34 @@ if (jh->b_transaction == NULL && jh->b_next_transaction == NULL && jh->b_cp_transaction == NULL) { - J_ASSERT_BH(bh, buffer_jbd(bh)); - J_ASSERT_BH(bh, jh2bh(jh) == bh); - BUFFER_TRACE(bh, "remove journal_head"); - if (jh->b_frozen_data) { - printk(KERN_WARNING "%s: freeing " - "b_frozen_data\n", - __FUNCTION__); - kfree(jh->b_frozen_data); - } - if (jh->b_committed_data) { - printk(KERN_WARNING "%s: freeing " - "b_committed_data\n", - __FUNCTION__); - kfree(jh->b_committed_data); +#ifdef CONFIG_PREEMPT_RT + if (atomic_read(&jh->b_state_wait_count)) { + BUG_ON(buffer_journalhead(bh)); + set_buffer_journalhead(bh); + } else +#endif + { + J_ASSERT_BH(bh, buffer_jbd(bh)); + J_ASSERT_BH(bh, jh2bh(jh) == bh); + BUFFER_TRACE(bh, "remove journal_head"); + if (jh->b_frozen_data) { + printk(KERN_WARNING "%s: freeing " + "b_frozen_data\n", + __FUNCTION__); + kfree(jh->b_frozen_data); + } + if (jh->b_committed_data) { + printk(KERN_WARNING "%s: freeing " + "b_committed_data\n", + __FUNCTION__); + kfree(jh->b_committed_data); + } + bh->b_private = NULL; + jh->b_bh = NULL; /* debug, really */ + clear_buffer_jbd(bh); + __brelse(bh); + journal_free_journal_head(jh); } - bh->b_private = NULL; - jh->b_bh = NULL; /* debug, really */ - clear_buffer_jbd(bh); - __brelse(bh); - journal_free_journal_head(jh); } else { BUFFER_TRACE(bh, "journal_head was locked"); } diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/transaction.c linux-2.6.11-final-V0.7.40-00/fs/jbd/transaction.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/transaction.c 2005-03-02 02:37:53.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/transaction.c 2005-03-14 09:46:41.000000000 -0500 @@ -1207,11 +1207,17 @@ BUFFER_TRACE(bh, "entry"); + /* + * Is it OK to check to see if this isn't a jbd buffer outside of + * locks? Now that jbd_lock_bh_state only works with jbd buffers + * I sure hope so. + */ + if (!buffer_jbd(bh)) + goto not_jbd; + jbd_lock_bh_state(bh); spin_lock(&journal->j_list_lock); - if (!buffer_jbd(bh)) - goto not_jbd; jh = bh2jh(bh); /* Critical error: attempting to delete a bitmap buffer, maybe? @@ -1219,7 +1225,7 @@ if (!J_EXPECT_JH(jh, !jh->b_committed_data, "inconsistent data on disk")) { err = -EIO; - goto not_jbd; + goto bad_jbd; } if (jh->b_transaction == handle->h_transaction) { @@ -1274,9 +1280,11 @@ } } -not_jbd: + +bad_jbd: spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); +not_jbd: __brelse(bh); return err; } diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-14 09:46:57.000000000 -0500 @@ -324,6 +324,68 @@ return bh->b_private; } +void journal_remove_journal_head(struct buffer_head *bh); + +#ifdef CONFIG_PREEMPT_RT + +BUFFER_FNS(JournalHead,journalhead) + +extern spinlock_t jbd_journal_head_lock; + +static inline void jbd_lock_bh_state(struct buffer_head *bh) +{ + BUG_ON(!bh->b_private); + atomic_inc(&bh2jh(bh)->b_state_wait_count); + spin_lock(&bh2jh(bh)->b_state_lock); +} + +static inline int jbd_trylock_bh_state(struct buffer_head *bh) +{ + int ret; + + BUG_ON(!bh->b_private); + + if ((ret = spin_trylock(&bh2jh(bh)->b_state_lock))) + atomic_inc(&bh2jh(bh)->b_state_wait_count); + + return ret; +} + +static inline int jbd_is_locked_bh_state(struct buffer_head *bh) +{ + return bh2jh(bh) ? spin_is_locked(&bh2jh(bh)->b_state_lock) : 0; +} + +static inline void jbd_unlock_bh_state(struct buffer_head *bh) +{ + int rmjh = 0; + + BUG_ON(!atomic_read(&bh2jh(bh)->b_state_wait_count)); + atomic_dec(&bh2jh(bh)->b_state_wait_count); + + if (buffer_journalhead(bh)) { + clear_buffer_journalhead(bh); + rmjh = 1; + } + + spin_unlock(&bh2jh(bh)->b_state_lock); + + if (rmjh) + journal_remove_journal_head(bh); +} + +static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) +{ + spin_lock(&jbd_journal_head_lock); +} + +static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) +{ + spin_unlock(&jbd_journal_head_lock); +} + +#else /* !CONFIG_PREEMPT_RT */ + static inline void jbd_lock_bh_state(struct buffer_head *bh) { bit_spin_lock(BH_State, &bh->b_state); @@ -354,6 +416,8 @@ bit_spin_unlock(BH_JournalHead, &bh->b_state); } +#endif /* CONFIG_PREEMPT_RT */ + struct jbd_revoke_table_s; /** @@ -918,7 +982,6 @@ */ struct journal_head *journal_add_journal_head(struct buffer_head *bh); struct journal_head *journal_grab_journal_head(struct buffer_head *bh); -void journal_remove_journal_head(struct buffer_head *bh); void journal_put_journal_head(struct journal_head *jh); /* diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/journal-head.h linux-2.6.11-final-V0.7.40-00/include/linux/journal-head.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/journal-head.h 2005-03-02 02:38:25.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/journal-head.h 2005-03-14 09:46:41.000000000 -0500 @@ -80,6 +80,16 @@ * [j_list_lock] */ struct journal_head *b_cpnext, *b_cpprev; + + /* + * Lock the state of the buffer head. + */ + spinlock_t b_state_lock; + + /* + * Count the processes that want/have the state lock. + */ + atomic_t b_state_wait_count; }; #endif /* JOURNAL_HEAD_H_INCLUDED */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-14 09:46:41.053696484 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */ static inline int bit_spin_trylock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + if (test_and_set_bit(bitnum, addr)) { + preempt_enable(); return 0; + } #endif __acquire(bitlock); return 1; @@ -815,11 +827,12 @@ */ static inline void bit_spin_unlock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) BUG_ON(!test_bit(bitnum, addr)); smp_mb__before_clear_bit(); clear_bit(bitnum, addr); #endif + preempt_enable(); __release(bitlock); } @@ -828,12 +841,15 @@ */ static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) return test_bit(bitnum, addr); +#elif defined CONFIG_PREEMPT + return preempt_count(); #else return 1; #endif } +#endif /* CONFIG_PREEMPT_RT */ #define DEFINE_SPINLOCK(name) \ spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-14 15:50 ` Steven Rostedt @ 2005-03-14 19:02 ` Steven Rostedt 2005-03-15 11:44 ` Steven Rostedt 1 sibling, 0 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-14 19:02 UTC (permalink / raw) To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel Hi Ingo, I've found something that is very interesting and I can't explain it. On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > > > I just downloaded -40 and applied my patch, compiled it with > > > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except > > > I'm getting the following... > > > > > > BUG: Unable to handle kernel NULL pointer dereference at virtual address > > > 00000000 > > > printing eip: > > > c0213438 > > > *pde = 00000000 > > > > [snip] > > > > > All I did now was to add this patch to your -40-00 kernel: diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-14 13:22:04.000000000 -0500 @@ -324,6 +324,8 @@ return bh->b_private; } +BUFFER_FNS(JournalHead,journalhead) + static inline void jbd_lock_bh_state(struct buffer_head *bh) { bit_spin_lock(BH_State, &bh->b_state); And I get the following output: BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0213118 *pde = 00000000 Oops: 0000 [#1] Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix CPU: 0 EIP: 0060:[<c0213118>] Not tainted VLI EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) EIP is at vt_ioctl+0x18/0x1ab0 eax: 00000000 ebx: 00005603 ecx: 00005603 edx: cee14d80 esi: c0213100 edi: cb4bd000 ebp: cc03bf18 esp: cc03be48 ds: 007b es: 007b ss: 0068 preempt: 00000000 Process XFree86 (pid: 4709, threadinfo=cc03a000 task=cf0d5020) Stack: cf0d5170 cc03a000 cf0d5020 c03448ec cf0d5020 00000246 cc03be7c c0117267 c03448f4 00000006 00000001 00000000 00000000 cc03bebc cf1b81ec ce820600 ce94a9b8 00000000 00000000 cc03bed4 c01704f1 ce94a9b8 00000007 00000000 Call Trace: [<c0103cdf>] show_stack+0x7f/0xa0 (28) [<c0103e95>] show_registers+0x165/0x1d0 (56) [<c0104088>] die+0xc8/0x150 (64) [<c01153c6>] do_page_fault+0x356/0x6c4 (216) [<c0103973>] error_code+0x2b/0x30 (268) [<c020e5fb>] tty_ioctl+0x34b/0x490 (52) [<c016807f>] do_ioctl+0x4f/0x70 (32) [<c0168282>] vfs_ioctl+0x62/0x1d0 (40) [<c0168451>] sys_ioctl+0x61/0x90 (40) [<c0102ec3>] syscall_call+0x7/0xb (-8124) Code: ff ff 8d 05 28 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 I don't know why. BUFFER_FNS is just defined as: #define BUFFER_FNS(bit, name) \ static inline void set_buffer_##name(struct buffer_head *bh) \ { \ set_bit(BH_##bit, &(bh)->b_state); \ } \ static inline void clear_buffer_##name(struct buffer_head *bh) \ { \ clear_bit(BH_##bit, &(bh)->b_state); \ } \ static inline int buffer_##name(const struct buffer_head *bh) \ { \ return test_bit(BH_##bit, &(bh)->b_state); \ } So all it does is make three function that are never used. set_buffer_journalhead(...) clear_buffer_journalhead(...) buffer_journalhead(...) Unless, some macro uses it, but I don't know why adding that line causes the bug output that I showed. If I remove that line, I don't get that output. And this is consistent. I've recompiled the kernel several times, and everytime I compile it with this added patch I get that output. And everytime without it, it runs fine. Oh, please note that this only happens with PREEMPT_DESKTOP, and not with PREEMPT_RT. I really think this is a symptom of something else and not the cause of the bug. What do you think? -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-14 15:50 ` Steven Rostedt 2005-03-14 19:02 ` Steven Rostedt @ 2005-03-15 11:44 ` Steven Rostedt 2005-03-15 12:00 ` Ingo Molnar 1 sibling, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-15 11:44 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel I've realized that my previous patch had too many problems with the way the journaling system works. So I went back to my first approach but added the journal_head lock as one global lock to keep the buffer head size smaller. I only added the state lock to the buffer head. I've tested this for some time now, and it works well (for the test at least). I'll recompile it with PREEMPT_DESKTOP to see if that works too. -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c linux-2.6.11-final-V0.7.40-00/fs/buffer.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c 2005-03-02 02:38:10.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/buffer.c 2005-03-15 03:41:15.000000000 -0500 @@ -3003,6 +3003,9 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(&ret->b_jstate_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 03:49:10.000000000 -0500 @@ -82,6 +82,8 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +spinlock_t journal_head_lock = SPIN_LOCK_UNLOCKED; + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h 2005-03-02 02:37:45.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h 2005-03-15 03:42:22.000000000 -0500 @@ -62,6 +62,13 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private; /* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* + * Fixme: This should be in the journal code. + */ + spinlock_t b_jstate_lock; /* lock for journal state. */ +#endif }; /* diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-15 03:45:33.000000000 -0500 @@ -314,6 +314,13 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +extern spinlock_t journal_head_lock; +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh->b_bh; @@ -326,24 +333,36 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); +} +#undef PICK_SPIN_LOCK + +#ifdef CONFIG_PREEMPT_RT +static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) +{ + spin_lock(&journal_head_lock); } +static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) +{ + spin_unlock(&journal_head_lock); +} +#else /* !CONFIG_PREEMPT_RT */ static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { bit_spin_lock(BH_JournalHead, &bh->b_state); @@ -353,6 +372,7 @@ { bit_spin_unlock(BH_JournalHead, &bh->b_state); } +#endif /* CONFIG_PREEMPT_RT */ struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 03:40:31.000000000 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */ static inline int bit_spin_trylock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + if (test_and_set_bit(bitnum, addr)) { + preempt_enable(); return 0; + } #endif __acquire(bitlock); return 1; @@ -815,11 +827,12 @@ */ static inline void bit_spin_unlock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) BUG_ON(!test_bit(bitnum, addr)); smp_mb__before_clear_bit(); clear_bit(bitnum, addr); #endif + preempt_enable(); __release(bitlock); } @@ -828,12 +841,15 @@ */ static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) return test_bit(bitnum, addr); +#elif defined CONFIG_PREEMPT + return preempt_count(); #else return 1; #endif } +#endif /* CONFIG_PREEMPT_RT */ #define DEFINE_SPINLOCK(name) \ spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 11:44 ` Steven Rostedt @ 2005-03-15 12:00 ` Ingo Molnar 2005-03-15 13:07 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-15 12:00 UTC (permalink / raw) To: Steven Rostedt; +Cc: Lee Revell, Andrew Morton, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > I've realized that my previous patch had too many problems with the > way the journaling system works. So I went back to my first approach > but added the journal_head lock as one global lock to keep the buffer > head size smaller. I only added the state lock to the buffer head. > I've tested this for some time now, and it works well (for the test at > least). I'll recompile it with PREEMPT_DESKTOP to see if that works > too. good progress - but the global lock may be a scalability worry on upstream though. Would it be possible to just mirror much of the current lock logic, but with spinlocks instead of bitlocks? And there should be no #ifdefs on PREEMPT_RT. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 12:00 ` Ingo Molnar @ 2005-03-15 13:07 ` Steven Rostedt 2005-03-15 13:35 ` Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-15 13:07 UTC (permalink / raw) To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel On Tue, 15 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > I've realized that my previous patch had too many problems with the > > way the journaling system works. So I went back to my first approach > > but added the journal_head lock as one global lock to keep the buffer > > head size smaller. I only added the state lock to the buffer head. > > I've tested this for some time now, and it works well (for the test at > > least). I'll recompile it with PREEMPT_DESKTOP to see if that works > > too. > > good progress - but the global lock may be a scalability worry on > upstream though. Would it be possible to just mirror much of the current > lock logic, but with spinlocks instead of bitlocks? And there should be > no #ifdefs on PREEMPT_RT. > The first patch I had just converted the bit spinlocks to spinlocks but I thought that adding two spinlocks was too much for every buffer head, even if it wasn't in the ext3 file system. The journal head spinlock is just used to add and remove the journal heads from the buffer heads, so I'm not sure how much contention is on them. I only have a dual smp system, so I can't test the system on large number of CPUs. What do you think, should we sacrafice memory for speed? What should we use instead of #ifdef PREEMPT_RT? Or should we just keep it the same for both. Since this fix is only to fix spinlocks that schedule, I figured that it would be better not to waste the memory of those not using PREEMPT_RT. Should I use the opposite PREEMPT_DESKTOP? Thanks, -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 13:07 ` Steven Rostedt @ 2005-03-15 13:35 ` Ingo Molnar 2005-03-15 13:55 ` Steven Rostedt 2005-03-15 18:05 ` Steven Rostedt 0 siblings, 2 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-15 13:35 UTC (permalink / raw) To: Steven Rostedt; +Cc: Lee Revell, Andrew Morton, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > > good progress - but the global lock may be a scalability worry on > > upstream though. Would it be possible to just mirror much of the current > > lock logic, but with spinlocks instead of bitlocks? And there should be > > no #ifdefs on PREEMPT_RT. > > The first patch I had just converted the bit spinlocks to spinlocks > but I thought that adding two spinlocks was too much for every buffer > head, even if it wasn't in the ext3 file system. The journal head > spinlock is just used to add and remove the journal heads from the > buffer heads, so I'm not sure how much contention is on them. I only > have a dual smp system, so I can't test the system on large number of > CPUs. What do you think, should we sacrafice memory for speed? there are two bad effects of global spinlocks: 1) contention 2) cacheline bouncing. It's #2 that would affect this spinlock. While i'm not sure this would show up in usual benchmarks, we should rather err on the side of more scalability. Two spinlocks are just two more machine words on most architectures, so i dont think it matters all that much, while it removes a major wart - as long as the two extra locks are for ext3 buffer-heads only. > What should we use instead of #ifdef PREEMPT_RT? Or should we just > keep it the same for both. Since this fix is only to fix spinlocks > that schedule, I figured that it would be better not to waste the > memory of those not using PREEMPT_RT. Should I use the opposite > PREEMPT_DESKTOP? i'd go for removing bit-spinlocks altogether, in the upstream kernel. It would simplify things, besides making PREEMPT_RT simpler as well. The memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, on x86) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 13:35 ` Ingo Molnar @ 2005-03-15 13:55 ` Steven Rostedt 2005-03-15 19:12 ` Andrew Morton 2005-03-15 18:05 ` Steven Rostedt 1 sibling, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-15 13:55 UTC (permalink / raw) To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel On Tue, 15 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > > What should we use instead of #ifdef PREEMPT_RT? Or should we just > > keep it the same for both. Since this fix is only to fix spinlocks > > that schedule, I figured that it would be better not to waste the > > memory of those not using PREEMPT_RT. Should I use the opposite > > PREEMPT_DESKTOP? > > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It > would simplify things, besides making PREEMPT_RT simpler as well. The > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, > on x86) > The problem here is that it's not ext3 bh's only. They're still the normal buffer head. The problem arrises because the ext3 "journal head" is allocated within these bit spin locks. I tried to monkey with putting the locks in the journal heads and have checks to see when to free them, but it wasn't that simple. I started having problems with some of the freeing transactions, I might have assumed too much. I'll give it one more try to get it into the journal heads, but after that, (if I fail) I'll let someone who understands the ext3 system better handle this. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 13:55 ` Steven Rostedt @ 2005-03-15 19:12 ` Andrew Morton 0 siblings, 0 replies; 125+ messages in thread From: Andrew Morton @ 2005-03-15 19:12 UTC (permalink / raw) To: rostedt; +Cc: mingo, rlrevell, linux-kernel Steven Rostedt <rostedt@goodmis.org> wrote: > > The problem here is that it's not ext3 bh's only. They're still the normal > buffer head. The problem arrises because the ext3 "journal head" is > allocated within these bit spin locks. Yes, the locks do want to live inside the buffer_head. Stephen has pointed out that we might want to remove jbd_lock_bh_journal_head() altogether some time, just use jbd_lock_bh_state() for that. In 2.4 these locks are global (or per-superblock). Making them a global spinlock would be acceptable for 2-ways and probably larger. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 13:35 ` Ingo Molnar 2005-03-15 13:55 ` Steven Rostedt @ 2005-03-15 18:05 ` Steven Rostedt 2005-03-15 19:09 ` Lee Revell ` (2 more replies) 1 sibling, 3 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-15 18:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel On Tue, 15 Mar 2005, Ingo Molnar wrote: > > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It > would simplify things, besides making PREEMPT_RT simpler as well. The > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, > on x86) > Hi Ingo, Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. We probably need to get priority inheritence working on this too someday, but for now it's better than wasting memory or getting into deadlocks. -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 11:58:14.000000000 -0500 @@ -82,6 +82,17 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +/* + * Used in the locking of the bh_state and bh_journalhead bit locks. + */ +int jbd_lock_bh_sleep(void *notused) +{ + schedule(); + return 0; +} +#endif + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-15 11:58:40.000000000 -0500 @@ -324,34 +324,63 @@ return bh->b_private; } +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +int jbd_lock_bh_sleep(void *notused); +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(&bh->b_state,BH_State,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + if (test_and_set_bit(BH_State, &bh->b_state)) + return 0; +#endif + __acquire(bitlock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + return test_bit(BH_State, &bh->b_state); +#else + return 1; +#endif } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_State, &bh->b_state); + smp_mb__after_clear_bit(); + wake_up_bit(&bh->b_state, BH_State); +#endif + __release(bitlock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(&bh->b_state,BH_JournalHead,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_JournalHead, &bh->b_state); + smp_mb__after_clear_bit(); + wake_up_bit(&bh->b_state, BH_JournalHead); +#endif + __release(bitlock); } struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 12:19:11.032217736 -0500 @@ -774,67 +774,6 @@ })) -/* - * bit-based spin_lock() - * - * Don't use this unless you really need to: spin_lock() and spin_unlock() - * are significantly faster. - */ -static inline void bit_spin_lock(int bitnum, unsigned long *addr) -{ - /* - * Assuming the lock is uncontended, this never enters - * the body of the outer loop. If it is contended, then - * within the inner loop a non-atomic test is used to - * busywait with less bus contention for a good time to - * attempt to acquire the lock bit. - */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) - cpu_relax(); -#endif - __acquire(bitlock); -} - -/* - * Return true if it was acquired - */ -static inline int bit_spin_trylock(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) - return 0; -#endif - __acquire(bitlock); - return 1; -} - -/* - * bit-based spin_unlock() - */ -static inline void bit_spin_unlock(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - BUG_ON(!test_bit(bitnum, addr)); - smp_mb__before_clear_bit(); - clear_bit(bitnum, addr); -#endif - __release(bitlock); -} - -/* - * Return true if the lock is held. - */ -static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - return test_bit(bitnum, addr); -#else - return 1; -#endif -} - #define DEFINE_SPINLOCK(name) \ spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 18:05 ` Steven Rostedt @ 2005-03-15 19:09 ` Lee Revell 2005-03-16 7:50 ` Steven Rostedt 2005-03-16 7:31 ` Steven Rostedt 2005-03-16 8:50 ` Ingo Molnar 2 siblings, 1 reply; 125+ messages in thread From: Lee Revell @ 2005-03-15 19:09 UTC (permalink / raw) To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: > Damn! The answer was right there in front of my eyes! Here's the cleanest > solution. I forgot about wait_on_bit_lock. I've converted all the locks > to use this instead. We probably need to get priority inheritence working > on this too someday, but for now it's better than wasting memory or > getting into deadlocks. > I am still not clear on why this did not hit with earlier kernels + PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another lock-break patch dropped? Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 19:09 ` Lee Revell @ 2005-03-16 7:50 ` Steven Rostedt 2005-03-16 18:21 ` Lee Revell 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-16 7:50 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Tue, 15 Mar 2005, Lee Revell wrote: > On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: > > Damn! The answer was right there in front of my eyes! Here's the cleanest > > solution. I forgot about wait_on_bit_lock. I've converted all the locks > > to use this instead. We probably need to get priority inheritence working > > on this too someday, but for now it's better than wasting memory or > > getting into deadlocks. > > > > I am still not clear on why this did not hit with earlier kernels + > PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another > lock-break patch dropped? > When did you start seeing this? This code has been there as far back as 2.6.7 (the earliest 2.6 kernel I still have laying around) and as far back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't start picking this up till later, or that you were just lucky that no contention was happening on that lock. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-16 7:50 ` Steven Rostedt @ 2005-03-16 18:21 ` Lee Revell 0 siblings, 0 replies; 125+ messages in thread From: Lee Revell @ 2005-03-16 18:21 UTC (permalink / raw) To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Wed, 2005-03-16 at 02:50 -0500, Steven Rostedt wrote: > > On Tue, 15 Mar 2005, Lee Revell wrote: > > > On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: > > > Damn! The answer was right there in front of my eyes! Here's the cleanest > > > solution. I forgot about wait_on_bit_lock. I've converted all the locks > > > to use this instead. We probably need to get priority inheritence working > > > on this too someday, but for now it's better than wasting memory or > > > getting into deadlocks. > > > > > > > I am still not clear on why this did not hit with earlier kernels + > > PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another > > lock-break patch dropped? > > > > When did you start seeing this? This code has been there as far back as > 2.6.7 (the earliest 2.6 kernel I still have laying around) and as far > back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't > start picking this up till later, or that you were just lucky that no > contention was happening on that lock. Sometime after the RT preempt patches were rebased to mainline. I don't see how there could be contention as I am on a UP. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 18:05 ` Steven Rostedt 2005-03-15 19:09 ` Lee Revell @ 2005-03-16 7:31 ` Steven Rostedt 2005-03-16 8:50 ` Ingo Molnar 2 siblings, 0 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-16 7:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel On Tue, 15 Mar 2005, Steven Rostedt wrote: > > > On Tue, 15 Mar 2005, Ingo Molnar wrote: > > > > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It > > would simplify things, besides making PREEMPT_RT simpler as well. The > > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, > > on x86) > > > > Hi Ingo, > > Damn! The answer was right there in front of my eyes! Here's the cleanest > solution. I forgot about wait_on_bit_lock. I've converted all the locks > to use this instead. We probably need to get priority inheritence working > on this too someday, but for now it's better than wasting memory or > getting into deadlocks. > One bit of caution on these. If we don't have PREEMPT_RT, then don't the spinlocks on SMP act the same as normal spinlocks, and that we should not schedule holding a spinlock? I believe that some of this locks are called within holding spin_locks. So this isn't the right solution for other than PREEMPT_RT. I also forgot to add might_sleep in the locking calls. Here's the patch with the might_sleep added. What should we do for non PREEPMT_RT? Maybe put the bit_spinlocks back in for that case? -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 11:58:14.000000000 -0500 @@ -82,6 +82,17 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +/* + * Used in the locking of the bh_state and bh_journalhead bit locks. + */ +int jbd_lock_bh_sleep(void *notused) +{ + schedule(); + return 0; +} +#endif + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-16 02:25:31.881251828 -0500 @@ -324,34 +324,65 @@ return bh->b_private; } +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +int jbd_lock_bh_sleep(void *notused); +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + might_sleep(); + wait_on_bit_lock(&bh->b_state,BH_State,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + if (test_and_set_bit(BH_State, &bh->b_state)) + return 0; +#endif + __acquire(bitlock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + return test_bit(BH_State, &bh->b_state); +#else + return 1; +#endif } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_State, &bh->b_state); + smp_mb__after_clear_bit(); + wake_up_bit(&bh->b_state, BH_State); +#endif + __release(bitlock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + might_sleep(); + wait_on_bit_lock(&bh->b_state,BH_JournalHead,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_JournalHead, &bh->b_state); + smp_mb__after_clear_bit(); + wake_up_bit(&bh->b_state, BH_JournalHead); +#endif + __release(bitlock); } struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 12:19:11.000000000 -0500 @@ -774,67 +774,6 @@ })) -/* - * bit-based spin_lock() - * - * Don't use this unless you really need to: spin_lock() and spin_unlock() - * are significantly faster. - */ -static inline void bit_spin_lock(int bitnum, unsigned long *addr) -{ - /* - * Assuming the lock is uncontended, this never enters - * the body of the outer loop. If it is contended, then - * within the inner loop a non-atomic test is used to - * busywait with less bus contention for a good time to - * attempt to acquire the lock bit. - */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) - cpu_relax(); -#endif - __acquire(bitlock); -} - -/* - * Return true if it was acquired - */ -static inline int bit_spin_trylock(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) - return 0; -#endif - __acquire(bitlock); - return 1; -} - -/* - * bit-based spin_unlock() - */ -static inline void bit_spin_unlock(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - BUG_ON(!test_bit(bitnum, addr)); - smp_mb__before_clear_bit(); - clear_bit(bitnum, addr); -#endif - __release(bitlock); -} - -/* - * Return true if the lock is held. - */ -static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - return test_bit(bitnum, addr); -#else - return 1; -#endif -} - #define DEFINE_SPINLOCK(name) \ spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-15 18:05 ` Steven Rostedt 2005-03-15 19:09 ` Lee Revell 2005-03-16 7:31 ` Steven Rostedt @ 2005-03-16 8:50 ` Ingo Molnar 2005-03-16 9:15 ` Andrew Morton 2 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 8:50 UTC (permalink / raw) To: Steven Rostedt; +Cc: Lee Revell, Andrew Morton, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > Damn! The answer was right there in front of my eyes! Here's the > cleanest solution. I forgot about wait_on_bit_lock. I've converted > all the locks to use this instead. [...] ah, indeed, this looks really nifty. Andrew? Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 2005-03-16 8:50 ` Ingo Molnar @ 2005-03-16 9:15 ` Andrew Morton 2005-03-16 9:51 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Andrew Morton @ 2005-03-16 9:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel Ingo Molnar <mingo@elte.hu> wrote: > > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Damn! The answer was right there in front of my eyes! Here's the > > cleanest solution. I forgot about wait_on_bit_lock. I've converted > > all the locks to use this instead. [...] > > ah, indeed, this looks really nifty. Andrew? > There's a little lock ranking diagram in jbd.h which tells us that these locks nest inside j_list_lock and j_state_lock. So I guess you'll need to turn those into semaphores. ^ permalink raw reply [flat|nested] 125+ messages in thread
* [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 9:15 ` Andrew Morton @ 2005-03-16 9:51 ` Ingo Molnar 2005-03-16 9:53 ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar 2005-03-16 10:04 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton 0 siblings, 2 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 9:51 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel * Andrew Morton <akpm@osdl.org> wrote: > > > Damn! The answer was right there in front of my eyes! Here's the > > > cleanest solution. I forgot about wait_on_bit_lock. I've converted > > > all the locks to use this instead. [...] > > > > ah, indeed, this looks really nifty. Andrew? > > > > There's a little lock ranking diagram in jbd.h which tells us that > these locks nest inside j_list_lock and j_state_lock. So I guess > you'll need to turn those into semaphores. indeed. I did this (see the three followup patches, against BK-curr), and it builds/boots/works just fine on an ext3 box. Do we want to try this in -mm? one worry would be that while spinlocks are NOP on UP, semaphores are not. OTOH, this could relax some of the preemptability constraints within ext3 and could make it more hackable. These patches enabled the removal of some of the lock-break code for example and could likely solve some of the remaining ext3 latencies. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* [patch 1/3] j_state_lock -> j_state_sem 2005-03-16 9:51 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar @ 2005-03-16 9:53 ` Ingo Molnar 2005-03-16 9:53 ` [patch 2/3] j_list_lock -> j_list_sem Ingo Molnar 2005-03-16 10:04 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton 1 sibling, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 9:53 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel this patch turns the j_state_lock spinlock into a mutex. Builds/boots/works fine on x86. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- linux/fs/jbd/checkpoint.c.orig +++ linux/fs/jbd/checkpoint.c @@ -78,25 +78,24 @@ static int __try_to_free_cp_buf(struct j void __log_wait_for_space(journal_t *journal) { int nblocks; - assert_spin_locked(&journal->j_state_lock); nblocks = jbd_space_needed(journal); while (__log_space_left(journal) < nblocks) { if (journal->j_flags & JFS_ABORT) return; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); down(&journal->j_checkpoint_sem); /* * Test again, another process may have checkpointed while we * were waiting for the checkpoint lock */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); nblocks = jbd_space_needed(journal); if (__log_space_left(journal) < nblocks) { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); log_do_checkpoint(journal); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); } up(&journal->j_checkpoint_sem); } @@ -404,7 +403,7 @@ int cleanup_journal_tail(journal_t *jour * next transaction ID we will write, and where it will * start. */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); spin_lock(&journal->j_list_lock); transaction = journal->j_checkpoint_transactions; if (transaction) { @@ -426,7 +425,7 @@ int cleanup_journal_tail(journal_t *jour /* If the oldest pinned transaction is at the tail of the log already then there's not much we can do right now. */ if (journal->j_tail_sequence == first_tid) { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return 1; } @@ -445,7 +444,7 @@ int cleanup_journal_tail(journal_t *jour journal->j_free += freed; journal->j_tail_sequence = first_tid; journal->j_tail = blocknr; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); if (!(journal->j_flags & JFS_ABORT)) journal_update_superblock(journal, 1); return 0; --- linux/fs/jbd/transaction.c.orig +++ linux/fs/jbd/transaction.c @@ -40,7 +40,7 @@ * new transaction and we can't block without protecting against other * processes trying to touch the journal while it is in transition. * - * Called under j_state_lock + * Called under j_state_sem */ static transaction_t * @@ -109,21 +109,21 @@ alloc_transaction: repeat: /* - * We need to hold j_state_lock until t_updates has been incremented, + * We need to hold j_state_sem until t_updates has been incremented, * for proper journal barrier handling */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); repeat_locked: if (is_journal_aborted(journal) || (journal->j_errno != 0 && !(journal->j_flags & JFS_ACK_ERR))) { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); ret = -EROFS; goto out; } /* Wait on the journal's transaction barrier if necessary */ if (journal->j_barrier_count) { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); wait_event(journal->j_wait_transaction_locked, journal->j_barrier_count == 0); goto repeat; @@ -131,7 +131,7 @@ repeat_locked: if (!journal->j_running_transaction) { if (!new_transaction) { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); goto alloc_transaction; } get_transaction(journal, new_transaction); @@ -149,7 +149,7 @@ repeat_locked: prepare_to_wait(&journal->j_wait_transaction_locked, &wait, TASK_UNINTERRUPTIBLE); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); schedule(); finish_wait(&journal->j_wait_transaction_locked, &wait); goto repeat; @@ -176,7 +176,7 @@ repeat_locked: prepare_to_wait(&journal->j_wait_transaction_locked, &wait, TASK_UNINTERRUPTIBLE); __log_start_commit(journal, transaction->t_tid); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); schedule(); finish_wait(&journal->j_wait_transaction_locked, &wait); goto repeat; @@ -225,7 +225,7 @@ repeat_locked: handle, nblocks, transaction->t_outstanding_credits, __log_space_left(journal)); spin_unlock(&transaction->t_handle_lock); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); out: if (new_transaction) kfree(new_transaction); @@ -321,7 +321,7 @@ int journal_extend(handle_t *handle, int result = 1; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); /* Don't extend a locked-down transaction! */ if (handle->h_transaction->t_state != T_RUNNING) { @@ -353,7 +353,7 @@ int journal_extend(handle_t *handle, int unlock: spin_unlock(&transaction->t_handle_lock); error_out: - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); out: return result; } @@ -392,7 +392,7 @@ int journal_restart(handle_t *handle, in J_ASSERT(transaction->t_updates > 0); J_ASSERT(journal_current_handle() == handle); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); spin_lock(&transaction->t_handle_lock); transaction->t_outstanding_credits -= handle->h_buffer_credits; transaction->t_updates--; @@ -403,7 +403,7 @@ int journal_restart(handle_t *handle, in jbd_debug(2, "restarting handle %p\n", handle); __log_start_commit(journal, transaction->t_tid); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); handle->h_buffer_credits = nblocks; ret = start_this_handle(journal, handle); @@ -425,7 +425,7 @@ void journal_lock_updates(journal_t *jou { DEFINE_WAIT(wait); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); ++journal->j_barrier_count; /* Wait until there are no running updates */ @@ -443,12 +443,12 @@ void journal_lock_updates(journal_t *jou prepare_to_wait(&journal->j_wait_updates, &wait, TASK_UNINTERRUPTIBLE); spin_unlock(&transaction->t_handle_lock); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); schedule(); finish_wait(&journal->j_wait_updates, &wait); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); } - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); /* * We have now established a barrier against other normal updates, but @@ -472,9 +472,9 @@ void journal_unlock_updates (journal_t * J_ASSERT(journal->j_barrier_count != 0); up(&journal->j_barrier); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); --journal->j_barrier_count; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); wake_up(&journal->j_wait_transaction_locked); } @@ -1336,7 +1336,7 @@ int journal_stop(handle_t *handle) } current->journal_info = NULL; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); spin_lock(&transaction->t_handle_lock); transaction->t_outstanding_credits -= handle->h_buffer_credits; transaction->t_updates--; @@ -1366,7 +1366,7 @@ int journal_stop(handle_t *handle) "handle %p\n", handle); /* This is non-blocking */ __log_start_commit(journal, transaction->t_tid); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); /* * Special case: JFS_SYNC synchronous updates require us @@ -1376,7 +1376,7 @@ int journal_stop(handle_t *handle) err = log_wait_commit(journal, tid); } else { spin_unlock(&transaction->t_handle_lock); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); } jbd_free_handle(handle); @@ -1739,7 +1739,7 @@ static int journal_unmap_buffer(journal_ if (!buffer_jbd(bh)) goto zap_buffer_unlocked; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); jbd_lock_bh_state(bh); spin_lock(&journal->j_list_lock); @@ -1776,7 +1776,7 @@ static int journal_unmap_buffer(journal_ journal->j_running_transaction); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); journal_put_journal_head(jh); return ret; } else { @@ -1790,7 +1790,7 @@ static int journal_unmap_buffer(journal_ journal->j_committing_transaction); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); journal_put_journal_head(jh); return ret; } else { @@ -1814,7 +1814,7 @@ static int journal_unmap_buffer(journal_ } spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); journal_put_journal_head(jh); return 0; } else { @@ -1833,7 +1833,7 @@ zap_buffer: zap_buffer_no_jh: spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); zap_buffer_unlocked: clear_buffer_dirty(bh); J_ASSERT_BH(bh, !buffer_jbddirty(bh)); --- linux/fs/jbd/commit.c.orig +++ linux/fs/jbd/commit.c @@ -144,9 +144,9 @@ static int journal_write_commit_record(j "JBD: barrier-based sync failed on %s - " "disabling barriers\n", bdevname(journal->j_dev, b)); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); journal->j_flags &= ~JFS_BARRIER; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); /* And try again, without the barrier */ clear_buffer_ordered(bh); @@ -211,7 +211,7 @@ void journal_commit_transaction(journal_ jbd_debug(1, "JBD: starting commit of transaction %d\n", commit_transaction->t_tid); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); commit_transaction->t_state = T_LOCKED; spin_lock(&commit_transaction->t_handle_lock); @@ -222,9 +222,9 @@ void journal_commit_transaction(journal_ TASK_UNINTERRUPTIBLE); if (commit_transaction->t_updates) { spin_unlock(&commit_transaction->t_handle_lock); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); schedule(); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); spin_lock(&commit_transaction->t_handle_lock); } finish_wait(&journal->j_wait_updates, &wait); @@ -291,7 +291,7 @@ void journal_commit_transaction(journal_ journal->j_running_transaction = NULL; commit_transaction->t_log_start = journal->j_head; wake_up(&journal->j_wait_transaction_locked); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); jbd_debug (3, "JBD: commit phase 2\n"); @@ -806,16 +806,16 @@ restart_loop: /* * This is a bit sleazy. We borrow j_list_lock to protect * journal->j_committing_transaction in __journal_remove_checkpoint. - * Really, __jornal_remove_checkpoint should be using j_state_lock but + * Really, __jornal_remove_checkpoint should be using j_state_sem but * it's a bit hassle to hold that across __journal_remove_checkpoint */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); spin_lock(&journal->j_list_lock); commit_transaction->t_state = T_FINISHED; J_ASSERT(commit_transaction == journal->j_committing_transaction); journal->j_commit_sequence = commit_transaction->t_tid; journal->j_committing_transaction = NULL; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); if (commit_transaction->t_checkpoint_list == NULL) { __journal_drop_transaction(journal, commit_transaction); --- linux/fs/jbd/journal.c.orig +++ linux/fs/jbd/journal.c @@ -148,7 +148,7 @@ int kjournald(void *arg) /* * And now, wait forever for commit wakeup events. */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); loop: if (journal->j_flags & JFS_UNMOUNT) @@ -159,10 +159,10 @@ loop: if (journal->j_commit_sequence != journal->j_commit_request) { jbd_debug(1, "OK, requests differ\n"); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); del_timer_sync(journal->j_commit_timer); journal_commit_transaction(journal); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); goto loop; } @@ -174,9 +174,9 @@ loop: * be already stopped. */ jbd_debug(1, "Now suspending kjournald\n"); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); refrigerator(PF_FREEZE); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); } else { /* * We assume on resume that commits are already there, @@ -194,9 +194,9 @@ loop: transaction->t_expires)) should_sleep = 0; if (should_sleep) { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); schedule(); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); } finish_wait(&journal->j_wait_commit, &wait); } @@ -214,7 +214,7 @@ loop: goto loop; end_loop: - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); del_timer_sync(journal->j_commit_timer); journal->j_task = NULL; wake_up(&journal->j_wait_done_commit); @@ -230,16 +230,16 @@ static void journal_start_thread(journal static void journal_kill_thread(journal_t *journal) { - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); journal->j_flags |= JFS_UNMOUNT; while (journal->j_task) { wake_up(&journal->j_wait_commit); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); wait_event(journal->j_wait_done_commit, journal->j_task == 0); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); } - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); } /* @@ -408,15 +408,13 @@ repeat: * * Called with the journal already locked. * - * Called under j_state_lock + * Called under j_state_sem */ int __log_space_left(journal_t *journal) { int left = journal->j_free; - assert_spin_locked(&journal->j_state_lock); - /* * Be pessimistic here about the number of those free blocks which * might be required for log descriptor control blocks. @@ -433,7 +431,7 @@ int __log_space_left(journal_t *journal) } /* - * Called under j_state_lock. Returns true if a transaction was started. + * Called under j_state_sem. Returns true if a transaction was started. */ int __log_start_commit(journal_t *journal, tid_t target) { @@ -460,9 +458,9 @@ int log_start_commit(journal_t *journal, { int ret; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); ret = __log_start_commit(journal, tid); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return ret; } @@ -481,7 +479,7 @@ int journal_force_commit_nested(journal_ transaction_t *transaction = NULL; tid_t tid; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (journal->j_running_transaction && !current->journal_info) { transaction = journal->j_running_transaction; __log_start_commit(journal, transaction->t_tid); @@ -489,12 +487,12 @@ int journal_force_commit_nested(journal_ transaction = journal->j_committing_transaction; if (!transaction) { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return 0; /* Nothing to retry */ } tid = transaction->t_tid; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); log_wait_commit(journal, tid); return 1; } @@ -507,7 +505,7 @@ int journal_start_commit(journal_t *jour { int ret = 0; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (journal->j_running_transaction) { tid_t tid = journal->j_running_transaction->t_tid; @@ -522,7 +520,7 @@ int journal_start_commit(journal_t *jour *ptid = journal->j_committing_transaction->t_tid; ret = 1; } - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return ret; } @@ -535,25 +533,25 @@ int log_wait_commit(journal_t *journal, int err = 0; #ifdef CONFIG_JBD_DEBUG - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (!tid_geq(journal->j_commit_request, tid)) { printk(KERN_EMERG "%s: error: j_commit_request=%d, tid=%d\n", __FUNCTION__, journal->j_commit_request, tid); } - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); #endif - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); while (tid_gt(tid, journal->j_commit_sequence)) { jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n", tid, journal->j_commit_sequence); wake_up(&journal->j_wait_commit); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); wait_event(journal->j_wait_done_commit, !tid_gt(tid, journal->j_commit_sequence)); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); } - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); if (unlikely(is_journal_aborted(journal))) { printk(KERN_EMERG "journal commit I/O error\n"); @@ -570,7 +568,7 @@ int journal_next_log_block(journal_t *jo { unsigned long blocknr; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); J_ASSERT(journal->j_free > 1); blocknr = journal->j_head; @@ -578,7 +576,7 @@ int journal_next_log_block(journal_t *jo journal->j_free--; if (journal->j_head == journal->j_last) journal->j_head = journal->j_first; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return journal_bmap(journal, blocknr, retp); } @@ -675,7 +673,7 @@ static journal_t * journal_init_common ( init_MUTEX(&journal->j_checkpoint_sem); spin_lock_init(&journal->j_revoke_lock); spin_lock_init(&journal->j_list_lock); - spin_lock_init(&journal->j_state_lock); + init_MUTEX(&journal->j_state_sem); journal->j_commit_interval = (HZ * JBD_DEFAULT_MAX_COMMIT_AGE); @@ -955,14 +953,14 @@ void journal_update_superblock(journal_t goto out; } - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); jbd_debug(1,"JBD: updating superblock (start %ld, seq %d, errno %d)\n", journal->j_tail, journal->j_tail_sequence, journal->j_errno); sb->s_sequence = cpu_to_be32(journal->j_tail_sequence); sb->s_start = cpu_to_be32(journal->j_tail); sb->s_errno = cpu_to_be32(journal->j_errno); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); BUFFER_TRACE(bh, "marking dirty"); mark_buffer_dirty(bh); @@ -976,12 +974,12 @@ out: * any future commit will have to be careful to update the * superblock again to re-record the true start of the log. */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (sb->s_start) journal->j_flags &= ~JFS_FLUSHED; else journal->j_flags |= JFS_FLUSHED; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); } /* @@ -1343,7 +1341,7 @@ int journal_flush(journal_t *journal) transaction_t *transaction = NULL; unsigned long old_tail; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); /* Force everything buffered to the log... */ if (journal->j_running_transaction) { @@ -1356,10 +1354,10 @@ int journal_flush(journal_t *journal) if (transaction) { tid_t tid = transaction->t_tid; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); log_wait_commit(journal, tid); } else { - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); } /* ...and flush everything in the log out to disk. */ @@ -1377,12 +1375,12 @@ int journal_flush(journal_t *journal) * the magic code for a fully-recovered superblock. Any future * commits of data to the journal will restore the current * s_start value. */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); old_tail = journal->j_tail; journal->j_tail = 0; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); journal_update_superblock(journal, 1); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); journal->j_tail = old_tail; J_ASSERT(!journal->j_running_transaction); @@ -1390,7 +1388,7 @@ int journal_flush(journal_t *journal) J_ASSERT(!journal->j_checkpoint_transactions); J_ASSERT(journal->j_head == journal->j_tail); J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return err; } @@ -1475,12 +1473,12 @@ void __journal_abort_hard(journal_t *jou printk(KERN_ERR "Aborting journal on device %s.\n", journal_dev_name(journal, b)); - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); journal->j_flags |= JFS_ABORT; transaction = journal->j_running_transaction; if (transaction) __log_start_commit(journal, transaction->t_tid); - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); } /* Soft abort: record the abort error status in the journal superblock, @@ -1565,12 +1563,12 @@ int journal_errno(journal_t *journal) { int err; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (journal->j_flags & JFS_ABORT) err = -EROFS; else err = journal->j_errno; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return err; } @@ -1585,12 +1583,12 @@ int journal_clear_err(journal_t *journal { int err = 0; - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (journal->j_flags & JFS_ABORT) err = -EROFS; else journal->j_errno = 0; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); return err; } @@ -1603,10 +1601,10 @@ int journal_clear_err(journal_t *journal */ void journal_ack_err(journal_t *journal) { - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (journal->j_errno) journal->j_flags |= JFS_ACK_ERR; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); } int journal_blocks_per_page(struct inode *inode) --- linux/fs/ext3/super.c.orig +++ linux/fs/ext3/super.c @@ -1653,12 +1653,12 @@ static void ext3_init_journal_params(str * interval here, but for now we'll just fall back to the jbd * default. */ - spin_lock(&journal->j_state_lock); + down(&journal->j_state_sem); if (test_opt(sb, BARRIER)) journal->j_flags |= JFS_BARRIER; else journal->j_flags &= ~JFS_BARRIER; - spin_unlock(&journal->j_state_lock); + up(&journal->j_state_sem); } static journal_t *ext3_get_journal(struct super_block *sb, int journal_inum) --- linux/include/linux/jbd.h.orig +++ linux/include/linux/jbd.h @@ -416,16 +416,16 @@ struct handle_s * j_list_lock * ->jbd_lock_bh_journal_head() (This is "innermost") * - * j_state_lock + * j_state_sem * ->jbd_lock_bh_state() * * jbd_lock_bh_state() * ->j_list_lock * - * j_state_lock + * j_state_sem * ->t_handle_lock * - * j_state_lock + * j_state_sem * ->j_list_lock (journal_unmap_buffer) * */ @@ -442,7 +442,7 @@ struct transaction_s * Transaction's current state * [no locking - only kjournald alters this] * FIXME: needs barriers - * KLUDGE: [use j_state_lock] + * KLUDGE: [use j_state_sem] */ enum { T_RUNNING, @@ -562,7 +562,7 @@ struct transaction_s * @j_sb_buffer: First part of superblock buffer * @j_superblock: Second part of superblock buffer * @j_format_version: Version of the superblock format - * @j_state_lock: Protect the various scalars in the journal + * @j_state_sem: Protect the various scalars in the journal * @j_barrier_count: Number of processes waiting to create a barrier lock * @j_barrier: The barrier lock itself * @j_running_transaction: The current running transaction.. @@ -615,12 +615,12 @@ struct transaction_s struct journal_s { - /* General journaling state flags [j_state_lock] */ + /* General journaling state flags [j_state_sem] */ unsigned long j_flags; /* * Is there an outstanding uncleared error on the journal (from a prior - * abort)? [j_state_lock] + * abort)? [j_state_sem] */ int j_errno; @@ -634,10 +634,10 @@ struct journal_s /* * Protect the various scalars in the journal */ - spinlock_t j_state_lock; + struct semaphore j_state_sem; /* - * Number of processes waiting to create a barrier lock [j_state_lock] + * Number of processes waiting to create a barrier lock [j_state_sem] */ int j_barrier_count; @@ -646,13 +646,13 @@ struct journal_s /* * Transactions: The current running transaction... - * [j_state_lock] [caller holding open handle] + * [j_state_sem] [caller holding open handle] */ transaction_t *j_running_transaction; /* * the transaction we are pushing to disk - * [j_state_lock] [caller holding open handle] + * [j_state_sem] [caller holding open handle] */ transaction_t *j_committing_transaction; @@ -688,25 +688,25 @@ struct journal_s /* * Journal head: identifies the first unused block in the journal. - * [j_state_lock] + * [j_state_sem] */ unsigned long j_head; /* * Journal tail: identifies the oldest still-used block in the journal. - * [j_state_lock] + * [j_state_sem] */ unsigned long j_tail; /* * Journal free: how many free blocks are there in the journal? - * [j_state_lock] + * [j_state_sem] */ unsigned long j_free; /* * Journal start and end: the block numbers of the first usable block - * and one beyond the last usable block in the journal. [j_state_lock] + * and one beyond the last usable block in the journal. [j_state_sem] */ unsigned long j_first; unsigned long j_last; @@ -739,24 +739,24 @@ struct journal_s struct inode *j_inode; /* - * Sequence number of the oldest transaction in the log [j_state_lock] + * Sequence number of the oldest transaction in the log [j_state_sem] */ tid_t j_tail_sequence; /* - * Sequence number of the next transaction to grant [j_state_lock] + * Sequence number of the next transaction to grant [j_state_sem] */ tid_t j_transaction_sequence; /* * Sequence number of the most recently committed transaction - * [j_state_lock]. + * [j_state_sem]. */ tid_t j_commit_sequence; /* * Sequence number of the most recent transaction wanting commit - * [j_state_lock] + * [j_state_sem] */ tid_t j_commit_request; @@ -858,7 +858,7 @@ extern void __wait_on_journal (journal_ * * We need to lock the journal during transaction state changes so that nobody * ever tries to take a handle on the running transaction while we are in the - * middle of moving it to the commit phase. j_state_lock does this. + * middle of moving it to the commit phase. j_state_sem does this. * * Note that the locking is completely interrupt unsafe. We never touch * journal structures from interrupts. @@ -1039,7 +1039,7 @@ extern int journal_blocks_per_page(struc /* * Return the minimum number of blocks which must be free in the journal - * before a new transaction may be started. Must be called under j_state_lock. + * before a new transaction may be started. Must be called under j_state_sem. */ static inline int jbd_space_needed(journal_t *journal) { ^ permalink raw reply [flat|nested] 125+ messages in thread
* [patch 2/3] j_list_lock -> j_list_sem 2005-03-16 9:53 ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar @ 2005-03-16 9:53 ` Ingo Molnar 2005-03-16 9:57 ` [patch 3/3] remove bitlocks Ingo Molnar 0 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 9:53 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel this patch turns the j_list_lock spinlock into a mutex. Builds/boots/works fine on x86. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- linux/fs/jbd/checkpoint.c.orig +++ linux/fs/jbd/checkpoint.c @@ -26,7 +26,7 @@ /* * Unlink a buffer from a transaction. * - * Called with j_list_lock held. + * Called with j_list_sem held. */ static inline void __buffer_unlink(struct journal_head *jh) @@ -47,7 +47,7 @@ static inline void __buffer_unlink(struc /* * Try to release a checkpointed buffer from its transaction. * Returns 1 if we released it. - * Requires j_list_lock + * Requires j_list_sem * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it */ static int __try_to_free_cp_buf(struct journal_head *jh) @@ -102,14 +102,14 @@ void __log_wait_for_space(journal_t *jou } /* - * We were unable to perform jbd_trylock_bh_state() inside j_list_lock. + * We were unable to perform jbd_trylock_bh_state() inside j_list_sem. * The caller must restart a list walk. Wait for someone else to run * jbd_unlock_bh_state(). */ static void jbd_sync_bh(journal_t *journal, struct buffer_head *bh) { get_bh(bh); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_lock_bh_state(bh); jbd_unlock_bh_state(bh); put_bh(bh); @@ -125,7 +125,7 @@ static void jbd_sync_bh(journal_t *journ * checkpoint. (journal_remove_checkpoint() deletes the transaction when * the last checkpoint buffer is cleansed) * - * Called with j_list_lock held. + * Called with j_list_sem held. */ static int __cleanup_transaction(journal_t *journal, transaction_t *transaction) { @@ -133,7 +133,6 @@ static int __cleanup_transaction(journal struct buffer_head *bh; int ret = 0; - assert_spin_locked(&journal->j_list_lock); jh = transaction->t_checkpoint_list; if (!jh) return 0; @@ -145,7 +144,7 @@ static int __cleanup_transaction(journal bh = jh2bh(jh); if (buffer_locked(bh)) { atomic_inc(&bh->b_count); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); wait_on_buffer(bh); /* the journal_head may have gone by now */ BUFFER_TRACE(bh, "brelse"); @@ -165,7 +164,7 @@ static int __cleanup_transaction(journal transaction_t *t = jh->b_transaction; tid_t tid = t->t_tid; - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); log_start_commit(journal, tid); log_wait_commit(journal, tid); @@ -192,7 +191,7 @@ static int __cleanup_transaction(journal return ret; out_return_1: - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); return 1; } @@ -203,9 +202,9 @@ __flush_batch(journal_t *journal, struct { int i; - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); ll_rw_block(WRITE, *batch_count, bhs); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); for (i = 0; i < *batch_count; i++) { struct buffer_head *bh = bhs[i]; clear_buffer_jwrite(bh); @@ -221,7 +220,7 @@ __flush_batch(journal_t *journal, struct * Return 1 if something happened which requires us to abort the current * scan of the checkpoint list. * - * Called with j_list_lock held. + * Called with j_list_sem held. * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it */ static int __flush_buffer(journal_t *journal, struct journal_head *jh, @@ -306,7 +305,7 @@ int log_do_checkpoint(journal_t *journal * AKPM: check this code. I had a feeling a while back that it * degenerates into a busy loop at unmount time. */ - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); while (journal->j_checkpoint_transactions) { transaction_t *transaction; struct journal_head *jh, *last_jh, *next_jh; @@ -327,15 +326,11 @@ int log_do_checkpoint(journal_t *journal bh = jh2bh(jh); if (!jbd_trylock_bh_state(bh)) { jbd_sync_bh(journal, bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); retry = 1; break; } retry = __flush_buffer(journal, jh, bhs, &batch_count, &drop_count); - if (cond_resched_lock(&journal->j_list_lock)) { - retry = 1; - break; - } } while (jh != last_jh && !retry); if (batch_count) @@ -365,7 +360,7 @@ int log_do_checkpoint(journal_t *journal if (journal->j_checkpoint_transactions != transaction) break; } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); result = cleanup_journal_tail(journal); if (result < 0) return result; @@ -404,7 +399,7 @@ int cleanup_journal_tail(journal_t *jour * start. */ down(&journal->j_state_sem); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); transaction = journal->j_checkpoint_transactions; if (transaction) { first_tid = transaction->t_tid; @@ -419,7 +414,7 @@ int cleanup_journal_tail(journal_t *jour first_tid = journal->j_transaction_sequence; blocknr = journal->j_head; } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); J_ASSERT(blocknr != 0); /* If the oldest pinned transaction is at the tail of the log @@ -459,7 +454,7 @@ int cleanup_journal_tail(journal_t *jour * Find all the written-back checkpoint buffers in the journal and release them. * * Called with the journal locked. - * Called with j_list_lock held. + * Called with j_list_sem held. * Returns number of bufers reaped (for debug) */ @@ -519,7 +514,7 @@ out: * checkpoint list. * * This function is called with the journal locked. - * This function is called with j_list_lock held. + * This function is called with j_list_sem held. */ void __journal_remove_checkpoint(struct journal_head *jh) @@ -573,7 +568,7 @@ out: * the log. * * Called with the journal locked. - * Called with j_list_lock held. + * Called with j_list_sem held. */ void __journal_insert_checkpoint(struct journal_head *jh, transaction_t *transaction) @@ -602,12 +597,11 @@ void __journal_insert_checkpoint(struct * point. * * Called with the journal locked. - * Called with j_list_lock held. + * Called with j_list_sem held. */ void __journal_drop_transaction(journal_t *journal, transaction_t *transaction) { - assert_spin_locked(&journal->j_list_lock); if (transaction->t_cpnext) { transaction->t_cpnext->t_cpprev = transaction->t_cpprev; transaction->t_cpprev->t_cpnext = transaction->t_cpnext; --- linux/fs/jbd/transaction.c.orig +++ linux/fs/jbd/transaction.c @@ -485,7 +485,7 @@ void journal_unlock_updates (journal_t * * continuing as gracefully as possible. # * * The caller should already hold the journal lock and - * j_list_lock spinlock: most callers will need those anyway + * j_list_sem mutex: most callers will need those anyway * in order to probe the buffer's journaling state safely. */ static void jbd_unexpected_dirty_buffer(struct journal_head *jh) @@ -694,9 +694,9 @@ repeat: J_ASSERT_JH(jh, !jh->b_next_transaction); jh->b_transaction = transaction; JBUFFER_TRACE(jh, "file as BJ_Reserved"); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); __journal_file_buffer(jh, transaction, BJ_Reserved); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); } done: @@ -796,7 +796,7 @@ int journal_get_create_access(handle_t * * reused here. */ jbd_lock_bh_state(bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); J_ASSERT_JH(jh, (jh->b_transaction == transaction || jh->b_transaction == NULL || (jh->b_transaction == journal->j_committing_transaction && @@ -813,7 +813,7 @@ int journal_get_create_access(handle_t * JBUFFER_TRACE(jh, "set next transaction"); jh->b_next_transaction = transaction; } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); /* @@ -962,7 +962,7 @@ int journal_dirty_data(handle_t *handle, * about it in this layer. */ jbd_lock_bh_state(bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); if (jh->b_transaction) { JBUFFER_TRACE(jh, "has transaction"); if (jh->b_transaction != handle->h_transaction) { @@ -1018,12 +1018,12 @@ int journal_dirty_data(handle_t *handle, */ if (buffer_dirty(bh)) { get_bh(bh); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); need_brelse = 1; sync_dirty_buffer(bh); jbd_lock_bh_state(bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); /* The buffer may become locked again at any time if it is redirtied */ } @@ -1055,7 +1055,7 @@ int journal_dirty_data(handle_t *handle, __journal_file_buffer(jh, handle->h_transaction, BJ_SyncData); } no_journal: - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); if (need_brelse) { BUFFER_TRACE(bh, "brelse"); @@ -1145,9 +1145,9 @@ int journal_dirty_metadata(handle_t *han J_ASSERT_JH(jh, jh->b_frozen_data == 0); JBUFFER_TRACE(jh, "file as BJ_Metadata"); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); __journal_file_buffer(jh, handle->h_transaction, BJ_Metadata); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); out_unlock_bh: jbd_unlock_bh_state(bh); out: @@ -1194,7 +1194,7 @@ int journal_forget (handle_t *handle, st BUFFER_TRACE(bh, "entry"); jbd_lock_bh_state(bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); if (!buffer_jbd(bh)) goto not_jbd; @@ -1246,7 +1246,7 @@ int journal_forget (handle_t *handle, st journal_remove_journal_head(bh); __brelse(bh); if (!buffer_jbd(bh)) { - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); __bforget(bh); goto drop; @@ -1269,7 +1269,7 @@ int journal_forget (handle_t *handle, st } not_jbd: - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); __brelse(bh); drop: @@ -1416,7 +1416,7 @@ int journal_force_commit(journal_t *jour * Append a buffer to a transaction list, given the transaction's list head * pointer. * - * j_list_lock is held. + * j_list_sem is held. * * jbd_lock_bh_state(jh2bh(jh)) is held. */ @@ -1440,7 +1440,7 @@ __blist_add_buffer(struct journal_head * * Remove a buffer from a transaction list, given the transaction's list * head pointer. * - * Called with j_list_lock held, and the journal may not be locked. + * Called with j_list_sem held, and the journal may not be locked. * * jbd_lock_bh_state(jh2bh(jh)) is held. */ @@ -1466,7 +1466,7 @@ __blist_del_buffer(struct journal_head * * is holding onto a copy of one of thee pointers, it could go bad. * Generally the caller needs to re-read the pointer from the transaction_t. * - * Called under j_list_lock. The journal may not be locked. + * Called under j_list_sem. The journal may not be locked. */ void __journal_unfile_buffer(struct journal_head *jh) { @@ -1476,8 +1476,6 @@ void __journal_unfile_buffer(struct jour J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); transaction = jh->b_transaction; - if (transaction) - assert_spin_locked(&transaction->t_journal->j_list_lock); J_ASSERT_JH(jh, jh->b_jlist < BJ_Types); if (jh->b_jlist != BJ_None) @@ -1525,9 +1523,9 @@ out: void journal_unfile_buffer(journal_t *journal, struct journal_head *jh) { jbd_lock_bh_state(jh2bh(jh)); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); __journal_unfile_buffer(jh); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(jh2bh(jh)); } @@ -1549,7 +1547,7 @@ __journal_try_to_free_buffer(journal_t * if (jh->b_next_transaction != 0) goto out; - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); if (jh->b_transaction != 0 && jh->b_cp_transaction == 0) { if (jh->b_jlist == BJ_SyncData || jh->b_jlist == BJ_Locked) { /* A written-back ordered data buffer */ @@ -1567,7 +1565,7 @@ __journal_try_to_free_buffer(journal_t * __brelse(bh); } } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); out: return; } @@ -1650,7 +1648,7 @@ busy: * release it. * Returns non-zero if JBD no longer has an interest in the buffer. * - * Called under j_list_lock. + * Called under j_list_sem. * * Called under jbd_lock_bh_state(bh). */ @@ -1731,7 +1729,7 @@ static int journal_unmap_buffer(journal_ BUFFER_TRACE(bh, "entry"); /* - * It is safe to proceed here without the j_list_lock because the + * It is safe to proceed here without the j_list_sem because the * buffers cannot be stolen by try_to_free_buffers as long as we are * holding the page lock. --sct */ @@ -1741,7 +1739,7 @@ static int journal_unmap_buffer(journal_ down(&journal->j_state_sem); jbd_lock_bh_state(bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); jh = journal_grab_journal_head(bh); if (!jh) @@ -1774,7 +1772,7 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget"); ret = __dispose_buffer(jh, journal->j_running_transaction); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); up(&journal->j_state_sem); journal_put_journal_head(jh); @@ -1788,7 +1786,7 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "give to committing trans"); ret = __dispose_buffer(jh, journal->j_committing_transaction); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); up(&journal->j_state_sem); journal_put_journal_head(jh); @@ -1812,7 +1810,7 @@ static int journal_unmap_buffer(journal_ journal->j_running_transaction); jh->b_next_transaction = NULL; } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); up(&journal->j_state_sem); journal_put_journal_head(jh); @@ -1831,7 +1829,7 @@ static int journal_unmap_buffer(journal_ zap_buffer: journal_put_journal_head(jh); zap_buffer_no_jh: - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_unlock_bh_state(bh); up(&journal->j_state_sem); zap_buffer_unlocked: @@ -1907,8 +1905,6 @@ void __journal_file_buffer(struct journa struct buffer_head *bh = jh2bh(jh); J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); - assert_spin_locked(&transaction->t_journal->j_list_lock); - J_ASSERT_JH(jh, jh->b_jlist < BJ_Types); J_ASSERT_JH(jh, jh->b_transaction == transaction || jh->b_transaction == 0); @@ -1974,9 +1970,9 @@ void journal_file_buffer(struct journal_ transaction_t *transaction, int jlist) { jbd_lock_bh_state(jh2bh(jh)); - spin_lock(&transaction->t_journal->j_list_lock); + down(&transaction->t_journal->j_list_sem); __journal_file_buffer(jh, transaction, jlist); - spin_unlock(&transaction->t_journal->j_list_lock); + up(&transaction->t_journal->j_list_sem); jbd_unlock_bh_state(jh2bh(jh)); } @@ -1986,7 +1982,7 @@ void journal_file_buffer(struct journal_ * already started to be used by a subsequent transaction, refile the * buffer on that transaction's metadata list. * - * Called under journal->j_list_lock + * Called under journal->j_list_sem * * Called under jbd_lock_bh_state(jh2bh(jh)) */ @@ -1996,8 +1992,6 @@ void __journal_refile_buffer(struct jour struct buffer_head *bh = jh2bh(jh); J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); - if (jh->b_transaction) - assert_spin_locked(&jh->b_transaction->t_journal->j_list_lock); /* If the buffer is now unused, just drop it. */ if (jh->b_next_transaction == NULL) { @@ -2040,12 +2034,12 @@ void journal_refile_buffer(journal_t *jo struct buffer_head *bh = jh2bh(jh); jbd_lock_bh_state(bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); __journal_refile_buffer(jh); jbd_unlock_bh_state(bh); journal_remove_journal_head(bh); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); __brelse(bh); } --- linux/fs/jbd/commit.c.orig +++ linux/fs/jbd/commit.c @@ -79,14 +79,14 @@ nope: } /* - * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is + * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_sem is * held. For ranking reasons we must trylock. If we lose, schedule away and - * return 0. j_list_lock is dropped in this case. + * return 0. j_list_sem is dropped in this case. */ static int inverted_lock(journal_t *journal, struct buffer_head *bh) { if (!jbd_trylock_bh_state(bh)) { - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); schedule(); return 0; } @@ -189,9 +189,9 @@ void journal_commit_transaction(journal_ */ #ifdef COMMIT_STATS - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); summarise_journal_usage(journal); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); #endif /* Do we need to erase the effects of a prior journal_flush? */ @@ -275,9 +275,9 @@ void journal_commit_transaction(journal_ * checkpoint lists. We do this *before* commit because it potentially * frees some memory */ - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); __journal_clean_checkpoint_list(journal); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_debug (3, "JBD: commit phase 1\n"); @@ -299,7 +299,7 @@ void journal_commit_transaction(journal_ * First, drop modified flag: all accesses to the buffers * will be tracked for a new trasaction only -bzzz */ - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); if (commit_transaction->t_buffers) { new_jh = jh = commit_transaction->t_buffers->b_tnext; do { @@ -309,7 +309,7 @@ void journal_commit_transaction(journal_ new_jh = new_jh->b_tnext; } while (new_jh != jh); } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); /* * Now start flushing things to disk, in the order they appear @@ -329,7 +329,7 @@ void journal_commit_transaction(journal_ */ write_out_data: cond_resched(); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); while (commit_transaction->t_sync_datalist) { struct buffer_head *bh; @@ -345,10 +345,6 @@ write_out_data: __journal_file_buffer(jh, commit_transaction, BJ_Locked); jbd_unlock_bh_state(bh); - if (lock_need_resched(&journal->j_list_lock)) { - spin_unlock(&journal->j_list_lock); - goto write_out_data; - } } else { if (buffer_dirty(bh)) { BUFFER_TRACE(bh, "start journal writeout"); @@ -357,7 +353,7 @@ write_out_data: if (bufs == journal->j_wbufsize) { jbd_debug(2, "submit %d writes\n", bufs); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); ll_rw_block(WRITE, bufs, wbuf); journal_brelse_array(wbuf, bufs); bufs = 0; @@ -371,19 +367,15 @@ write_out_data: jbd_unlock_bh_state(bh); journal_remove_journal_head(bh); put_bh(bh); - if (lock_need_resched(&journal->j_list_lock)) { - spin_unlock(&journal->j_list_lock); - goto write_out_data; - } } } } if (bufs) { - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); ll_rw_block(WRITE, bufs, wbuf); journal_brelse_array(wbuf, bufs); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); } /* @@ -396,15 +388,15 @@ write_out_data: bh = jh2bh(jh); get_bh(bh); if (buffer_locked(bh)) { - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); wait_on_buffer(bh); if (unlikely(!buffer_uptodate(bh))) err = -EIO; - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); } if (!inverted_lock(journal, bh)) { put_bh(bh); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); continue; } if (buffer_jbd(bh) && jh->b_jlist == BJ_Locked) { @@ -416,9 +408,8 @@ write_out_data: jbd_unlock_bh_state(bh); } put_bh(bh); - cond_resched_lock(&journal->j_list_lock); } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); if (err) __journal_abort_hard(journal); @@ -614,7 +605,7 @@ start_journal_io: jbd_debug(3, "JBD: commit phase 4\n"); /* - * akpm: these are BJ_IO, and j_list_lock is not needed. + * akpm: these are BJ_IO, and j_list_sem is not needed. * See __journal_try_to_free_buffer. */ wait_for_iobuf: @@ -752,7 +743,7 @@ restart_loop: jh->b_frozen_data = NULL; } - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); cp_transaction = jh->b_cp_transaction; if (cp_transaction) { JBUFFER_TRACE(jh, "remove from old cp transaction"); @@ -792,7 +783,7 @@ restart_loop: journal_remove_journal_head(bh); /* needs a brelse */ release_buffer_page(bh); } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); if (cond_resched()) goto restart_loop; } @@ -804,13 +795,13 @@ restart_loop: J_ASSERT(commit_transaction->t_state == T_COMMIT); /* - * This is a bit sleazy. We borrow j_list_lock to protect + * This is a bit sleazy. We borrow j_list_sem to protect * journal->j_committing_transaction in __journal_remove_checkpoint. * Really, __jornal_remove_checkpoint should be using j_state_sem but * it's a bit hassle to hold that across __journal_remove_checkpoint */ down(&journal->j_state_sem); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); commit_transaction->t_state = T_FINISHED; J_ASSERT(commit_transaction == journal->j_committing_transaction); journal->j_commit_sequence = commit_transaction->t_tid; @@ -835,7 +826,7 @@ restart_loop: commit_transaction; } } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); jbd_debug(1, "JBD: commit %d complete, head %d\n", journal->j_commit_sequence, journal->j_tail_sequence); --- linux/fs/jbd/journal.c.orig +++ linux/fs/jbd/journal.c @@ -672,7 +672,7 @@ static journal_t * journal_init_common ( init_MUTEX(&journal->j_barrier); init_MUTEX(&journal->j_checkpoint_sem); spin_lock_init(&journal->j_revoke_lock); - spin_lock_init(&journal->j_list_lock); + init_MUTEX(&journal->j_list_sem); init_MUTEX(&journal->j_state_sem); journal->j_commit_interval = (HZ * JBD_DEFAULT_MAX_COMMIT_AGE); @@ -1139,17 +1139,17 @@ void journal_destroy(journal_t *journal) /* Force any old transactions to disk */ /* Totally anal locking here... */ - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); while (journal->j_checkpoint_transactions != NULL) { - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); log_do_checkpoint(journal); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); } J_ASSERT(journal->j_running_transaction == NULL); J_ASSERT(journal->j_committing_transaction == NULL); J_ASSERT(journal->j_checkpoint_transactions == NULL); - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); /* We can now mark the journal as empty. */ journal->j_tail = 0; @@ -1361,13 +1361,13 @@ int journal_flush(journal_t *journal) } /* ...and flush everything in the log out to disk. */ - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); while (!err && journal->j_checkpoint_transactions != NULL) { - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); err = log_do_checkpoint(journal); - spin_lock(&journal->j_list_lock); + down(&journal->j_list_sem); } - spin_unlock(&journal->j_list_lock); + up(&journal->j_list_sem); cleanup_journal_tail(journal); /* Finally, mark the journal as really needing no recovery. --- linux/include/linux/jbd.h.orig +++ linux/include/linux/jbd.h @@ -413,20 +413,20 @@ struct handle_s /* * Lock ranking: * - * j_list_lock + * j_list_sem * ->jbd_lock_bh_journal_head() (This is "innermost") * * j_state_sem * ->jbd_lock_bh_state() * * jbd_lock_bh_state() - * ->j_list_lock + * ->j_list_sem * * j_state_sem * ->t_handle_lock * * j_state_sem - * ->j_list_lock (journal_unmap_buffer) + * ->j_list_sem (journal_unmap_buffer) * */ @@ -458,62 +458,62 @@ struct transaction_s */ unsigned long t_log_start; - /* Number of buffers on the t_buffers list [j_list_lock] */ + /* Number of buffers on the t_buffers list [j_list_sem] */ int t_nr_buffers; /* * Doubly-linked circular list of all buffers reserved but not yet - * modified by this transaction [j_list_lock] + * modified by this transaction [j_list_sem] */ struct journal_head *t_reserved_list; /* * Doubly-linked circular list of all buffers under writeout during - * commit [j_list_lock] + * commit [j_list_sem] */ struct journal_head *t_locked_list; /* * Doubly-linked circular list of all metadata buffers owned by this - * transaction [j_list_lock] + * transaction [j_list_sem] */ struct journal_head *t_buffers; /* * Doubly-linked circular list of all data buffers still to be - * flushed before this transaction can be committed [j_list_lock] + * flushed before this transaction can be committed [j_list_sem] */ struct journal_head *t_sync_datalist; /* * Doubly-linked circular list of all forget buffers (superseded * buffers which we can un-checkpoint once this transaction commits) - * [j_list_lock] + * [j_list_sem] */ struct journal_head *t_forget; /* * Doubly-linked circular list of all buffers still to be flushed before - * this transaction can be checkpointed. [j_list_lock] + * this transaction can be checkpointed. [j_list_sem] */ struct journal_head *t_checkpoint_list; /* * Doubly-linked circular list of temporary buffers currently undergoing - * IO in the log [j_list_lock] + * IO in the log [j_list_sem] */ struct journal_head *t_iobuf_list; /* * Doubly-linked circular list of metadata buffers being shadowed by log * IO. The IO buffers on the iobuf list and the shadow buffers on this - * list match each other one for one at all times. [j_list_lock] + * list match each other one for one at all times. [j_list_sem] */ struct journal_head *t_shadow_list; /* * Doubly-linked circular list of control buffers being written to the - * log. [j_list_lock] + * log. [j_list_sem] */ struct journal_head *t_log_list; @@ -536,7 +536,7 @@ struct transaction_s /* * Forward and backward links for the circular list of all transactions - * awaiting checkpoint. [j_list_lock] + * awaiting checkpoint. [j_list_sem] */ transaction_t *t_cpnext, *t_cpprev; @@ -590,7 +590,7 @@ struct transaction_s * @j_fs_dev: Device which holds the client fs. For internal journal this will * be equal to j_dev * @j_maxlen: Total maximum capacity of the journal region on disk. - * @j_list_lock: Protects the buffer lists and internal buffer state. + * @j_list_sem: Protects the buffer lists and internal buffer state. * @j_inode: Optional inode where we store the journal. If present, all journal * block numbers are mapped into this inode via bmap(). * @j_tail_sequence: Sequence number of the oldest transaction in the log @@ -658,7 +658,7 @@ struct journal_s /* * ... and a linked circular list of all transactions waiting for - * checkpointing. [j_list_lock] + * checkpointing. [j_list_sem] */ transaction_t *j_checkpoint_transactions; @@ -731,7 +731,7 @@ struct journal_s /* * Protects the buffer lists and internal buffer state. */ - spinlock_t j_list_lock; + struct semaphore j_list_sem; /* Optional inode where we store the journal. If present, all */ /* journal block numbers are mapped into this inode via */ --- linux/include/linux/journal-head.h.orig +++ linux/include/linux/journal-head.h @@ -56,7 +56,7 @@ struct journal_head { * metadata: either the running transaction or the committing * transaction (if there is one). Only applies to buffers on a * transaction's data or metadata journaling list. - * [j_list_lock] [jbd_lock_bh_state()] + * [j_list_sem] [jbd_lock_bh_state()] */ transaction_t *b_transaction; @@ -77,14 +77,14 @@ struct journal_head { /* * Pointer to the compound transaction against which this buffer * is checkpointed. Only dirty buffers can be checkpointed. - * [j_list_lock] + * [j_list_sem] */ transaction_t *b_cp_transaction; /* * Doubly-linked list of buffers still remaining to be flushed * before an old transaction can be checkpointed. - * [j_list_lock] + * [j_list_sem] */ struct journal_head *b_cpnext, *b_cpprev; }; ^ permalink raw reply [flat|nested] 125+ messages in thread
* [patch 3/3] remove bitlocks 2005-03-16 9:53 ` [patch 2/3] j_list_lock -> j_list_sem Ingo Molnar @ 2005-03-16 9:57 ` Ingo Molnar 0 siblings, 0 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 9:57 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel this patch is a port of Steven Rostedt's bitlock-removal patch to BK-curr. It changes the ext3 code to use wait_on_bit_lock() on &jbd_lock_bh_sleep, instead of the bitlock primitives. Builds/boots/works fine on x86. From: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- linux/fs/jbd/journal.c.orig +++ linux/fs/jbd/journal.c @@ -82,6 +82,17 @@ EXPORT_SYMBOL(journal_force_commit); static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +/* + * Used in the locking of the bh_state and bh_journalhead bit locks. + */ +int jbd_lock_bh_sleep(void *notused) +{ + schedule(); + return 0; +} +#endif + /* * Helper function used to manage commit timeouts */ --- linux/include/linux/jbd.h.orig +++ linux/include/linux/jbd.h @@ -65,7 +65,6 @@ extern int journal_enable_debug; } \ } while (0) #else -#define jbd_debug(f, a...) /**/ #endif extern void * __jbd_kmalloc (const char *where, size_t size, int flags, int retry); @@ -324,34 +323,63 @@ static inline struct journal_head *bh2jh return bh->b_private; } +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +int jbd_lock_bh_sleep(void *notused); +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(&bh->b_state,BH_State,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + if (test_and_set_bit(BH_State, &bh->b_state)) + return 0; +#endif + __acquire(bitlock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + return test_bit(BH_State, &bh->b_state); +#else + return 1; +#endif } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_State, &bh->b_state); + smp_mb__after_clear_bit(); + wake_up_bit(&bh->b_state, BH_State); +#endif + __release(bitlock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(&bh->b_state,BH_JournalHead,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, &bh->b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_JournalHead, &bh->b_state); + smp_mb__after_clear_bit(); + wake_up_bit(&bh->b_state, BH_JournalHead); +#endif + __release(bitlock); } struct jbd_revoke_table_s; --- linux/include/linux/spinlock.h.orig +++ linux/include/linux/spinlock.h @@ -522,78 +522,6 @@ extern int _atomic_dec_and_lock(atomic_t #define atomic_dec_and_lock(atomic,lock) __cond_lock(_atomic_dec_and_lock(atomic,lock)) -/* - * bit-based spin_lock() - * - * Don't use this unless you really need to: spin_lock() and spin_unlock() - * are significantly faster. - */ -static inline void bit_spin_lock(int bitnum, unsigned long *addr) -{ - /* - * Assuming the lock is uncontended, this never enters - * the body of the outer loop. If it is contended, then - * within the inner loop a non-atomic test is used to - * busywait with less bus contention for a good time to - * attempt to acquire the lock bit. - */ - preempt_disable(); -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - while (test_and_set_bit(bitnum, addr)) { - while (test_bit(bitnum, addr)) { - preempt_enable(); - cpu_relax(); - preempt_disable(); - } - } -#endif - __acquire(bitlock); -} - -/* - * Return true if it was acquired - */ -static inline int bit_spin_trylock(int bitnum, unsigned long *addr) -{ - preempt_disable(); -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - if (test_and_set_bit(bitnum, addr)) { - preempt_enable(); - return 0; - } -#endif - __acquire(bitlock); - return 1; -} - -/* - * bit-based spin_unlock() - */ -static inline void bit_spin_unlock(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - BUG_ON(!test_bit(bitnum, addr)); - smp_mb__before_clear_bit(); - clear_bit(bitnum, addr); -#endif - preempt_enable(); - __release(bitlock); -} - -/* - * Return true if the lock is held. - */ -static inline int bit_spin_is_locked(int bitnum, unsigned long *addr) -{ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - return test_bit(bitnum, addr); -#elif defined CONFIG_PREEMPT - return preempt_count(); -#else - return 1; -#endif -} - #define DEFINE_SPINLOCK(x) spinlock_t x = SPIN_LOCK_UNLOCKED #define DEFINE_RWLOCK(x) rwlock_t x = RW_LOCK_UNLOCKED ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 9:51 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar 2005-03-16 9:53 ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar @ 2005-03-16 10:04 ` Andrew Morton 2005-03-16 10:12 ` Ingo Molnar 2005-03-16 10:19 ` Ingo Molnar 1 sibling, 2 replies; 125+ messages in thread From: Andrew Morton @ 2005-03-16 10:04 UTC (permalink / raw) To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel Ingo Molnar <mingo@elte.hu> wrote: > > > There's a little lock ranking diagram in jbd.h which tells us that > > these locks nest inside j_list_lock and j_state_lock. So I guess > > you'll need to turn those into semaphores. > > indeed. I did this (see the three followup patches, against BK-curr), > and it builds/boots/works just fine on an ext3 box. Do we want to try > this in -mm? ooh, I'd rather not. I spent an intense three days removing all the sleeping locks from ext3 (and three months debugging the result). Ended up gaining 1000% on 16-way. Putting them back in will really hurt the SMP performance. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:04 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton @ 2005-03-16 10:12 ` Ingo Molnar 2005-03-16 10:23 ` Steven Rostedt 2005-03-16 10:26 ` Andrew Morton 2005-03-16 10:19 ` Ingo Molnar 1 sibling, 2 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 10:12 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel * Andrew Morton <akpm@osdl.org> wrote: > Ingo Molnar <mingo@elte.hu> wrote: > > > > > There's a little lock ranking diagram in jbd.h which tells us that > > > these locks nest inside j_list_lock and j_state_lock. So I guess > > > you'll need to turn those into semaphores. > > > > indeed. I did this (see the three followup patches, against BK-curr), > > and it builds/boots/works just fine on an ext3 box. Do we want to try > > this in -mm? > > ooh, I'd rather not. I spent an intense three days removing all the > sleeping locks from ext3 (and three months debugging the result). > Ended up gaining 1000% on 16-way. > > Putting them back in will really hurt the SMP performance. ah. Yeah. Sniff. if we gain 1000% on a 16-way then there's something really wrong about semaphores (or scheduling) though. A semaphore is almost a spinlock, in the uncontended case - and even under contention we really (should) just spend the cycles that we'd spend spinning. There will be some intermediate contention level where semaphores hurt, but 1000% sounds truly excessive. Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:12 ` Ingo Molnar @ 2005-03-16 10:23 ` Steven Rostedt 2005-03-16 10:26 ` Ingo Molnar 2005-03-16 10:26 ` Andrew Morton 1 sibling, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-16 10:23 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, rlrevell, linux-kernel On Wed, 16 Mar 2005, Ingo Molnar wrote: > > * Andrew Morton <akpm@osdl.org> wrote: > > > > ooh, I'd rather not. I spent an intense three days removing all the > > sleeping locks from ext3 (and three months debugging the result). > > Ended up gaining 1000% on 16-way. > > > > Putting them back in will really hurt the SMP performance. > > ah. Yeah. Sniff. > > if we gain 1000% on a 16-way then there's something really wrong about > semaphores (or scheduling) though. A semaphore is almost a spinlock, in > the uncontended case - and even under contention we really (should) just > spend the cycles that we'd spend spinning. There will be some > intermediate contention level where semaphores hurt, but 1000% sounds > truly excessive. > Could it possibly be that in the process of removing all the sleeping locks from ext3, that Andrew also removed a flaw in ext3 itself that is responsible for the 1000% improvement? -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:23 ` Steven Rostedt @ 2005-03-16 10:26 ` Ingo Molnar 0 siblings, 0 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 10:26 UTC (permalink / raw) To: Steven Rostedt; +Cc: Andrew Morton, rlrevell, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > > > ooh, I'd rather not. I spent an intense three days removing all the > > > sleeping locks from ext3 (and three months debugging the result). > > > Ended up gaining 1000% on 16-way. > > > > > > Putting them back in will really hurt the SMP performance. > > > > ah. Yeah. Sniff. > > > > if we gain 1000% on a 16-way then there's something really wrong about > > semaphores (or scheduling) though. A semaphore is almost a spinlock, in > > the uncontended case - and even under contention we really (should) just > > spend the cycles that we'd spend spinning. There will be some > > intermediate contention level where semaphores hurt, but 1000% sounds > > truly excessive. > > > > Could it possibly be that in the process of removing all the sleeping > locks from ext3, that Andrew also removed a flaw in ext3 itself that > is responsible for the 1000% improvement? i think the chances for that are really remote. I think it must have been a workload ending up scheduling itself to death, while spinlocks force atomicity of execution and affinity. we should be able to see the same scenario with PREEMPT_RT on a 16-way :-) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:12 ` Ingo Molnar 2005-03-16 10:23 ` Steven Rostedt @ 2005-03-16 10:26 ` Andrew Morton 2005-03-16 10:29 ` Ingo Molnar 2005-03-16 10:34 ` Arjan van de Ven 1 sibling, 2 replies; 125+ messages in thread From: Andrew Morton @ 2005-03-16 10:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel Ingo Molnar <mingo@elte.hu> wrote: > > > ooh, I'd rather not. I spent an intense three days removing all the > > sleeping locks from ext3 (and three months debugging the result). > > Ended up gaining 1000% on 16-way. > > > > Putting them back in will really hurt the SMP performance. > > ah. Yeah. Sniff. > > if we gain 1000% on a 16-way then there's something really wrong about > semaphores (or scheduling) though. A semaphore is almost a spinlock, in > the uncontended case - and even under contention we really (should) just > spend the cycles that we'd spend spinning. There will be some > intermediate contention level where semaphores hurt, but 1000% sounds > truly excessive. I forget how much of the 1000% came from that, but it was quite a lot. Removing the BKL was the first step. That took the context switch rate under high load from ~10,000/sec up to ~300,000/sec. Because the first thing a CPU hit on entry to the fs was then a semaphore. Performance rather took a dive. Of course the locks also became much finer-grained, so the contention opportunities lessened. But j_list_lock and j_state_lock have fs-wide scope, so I'd expect the context switch rate to go up quite a lot again. The hold times are short, and a context switch hurts rather ore than a quick spin. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:26 ` Andrew Morton @ 2005-03-16 10:29 ` Ingo Molnar 2005-03-16 10:41 ` Andrew Morton 2005-03-16 10:34 ` Arjan van de Ven 1 sibling, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 10:29 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel * Andrew Morton <akpm@osdl.org> wrote: > I forget how much of the 1000% came from that, but it was quite a lot. > > Removing the BKL was the first step. That took the context switch > rate under high load from ~10,000/sec up to ~300,000/sec. Because the > first thing a CPU hit on entry to the fs was then a semaphore. > Performance rather took a dive. > > Of course the locks also became much finer-grained, so the contention > opportunities lessened. But j_list_lock and j_state_lock have fs-wide > scope, so I'd expect the context switch rate to go up quite a lot > again. > > The hold times are short, and a context switch hurts rather ore than a > quick spin. which particular workload was this - dbench? (I can try PREEMPT_RT on an 8-way, such effects will show up tenfold.) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:29 ` Ingo Molnar @ 2005-03-16 10:41 ` Andrew Morton 0 siblings, 0 replies; 125+ messages in thread From: Andrew Morton @ 2005-03-16 10:41 UTC (permalink / raw) To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel Ingo Molnar <mingo@elte.hu> wrote: > > > * Andrew Morton <akpm@osdl.org> wrote: > > > I forget how much of the 1000% came from that, but it was quite a lot. > > > > Removing the BKL was the first step. That took the context switch > > rate under high load from ~10,000/sec up to ~300,000/sec. Because the > > first thing a CPU hit on entry to the fs was then a semaphore. > > Performance rather took a dive. > > > > Of course the locks also became much finer-grained, so the contention > > opportunities lessened. But j_list_lock and j_state_lock have fs-wide > > scope, so I'd expect the context switch rate to go up quite a lot > > again. > > > > The hold times are short, and a context switch hurts rather ore than a > > quick spin. > > which particular workload was this - dbench? (I can try PREEMPT_RT on an > 8-way, such effects will show up tenfold.) > Oh gee, that was back in the days when Martin was being useful. SDET, I think. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:26 ` Andrew Morton 2005-03-16 10:29 ` Ingo Molnar @ 2005-03-16 10:34 ` Arjan van de Ven 1 sibling, 0 replies; 125+ messages in thread From: Arjan van de Ven @ 2005-03-16 10:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, rostedt, rlrevell, linux-kernel On Wed, 2005-03-16 at 02:26 -0800, Andrew Morton wrote: > > The hold times are short, and a context switch hurts rather ore than a > quick > spin. so we need a spinaphore ;) ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:04 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton 2005-03-16 10:12 ` Ingo Molnar @ 2005-03-16 10:19 ` Ingo Molnar 2005-03-16 10:40 ` Andrew Morton 1 sibling, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 10:19 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel * Andrew Morton <akpm@osdl.org> wrote: > > > There's a little lock ranking diagram in jbd.h which tells us that > > > these locks nest inside j_list_lock and j_state_lock. So I guess > > > you'll need to turn those into semaphores. > > > > indeed. I did this (see the three followup patches, against BK-curr), > > and it builds/boots/works just fine on an ext3 box. Do we want to try > > this in -mm? > > ooh, I'd rather not. I spent an intense three days removing all the > sleeping locks from ext3 (and three months debugging the result). > Ended up gaining 1000% on 16-way. > > Putting them back in will really hurt the SMP performance. seems like turning the bitlocks into spinlocks is the best option then. We'd need one lock in buffer_head (j_state_lock, renamed to something more sensible like b_private_lock), and one lock in journal_head (j_list_lock) i guess. How much would the +4/+8 bytes size increase in buffer_head [on SMP] be frowned upon? Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:19 ` Ingo Molnar @ 2005-03-16 10:40 ` Andrew Morton 2005-03-16 10:51 ` Ingo Molnar 2005-03-16 11:05 ` Steven Rostedt 0 siblings, 2 replies; 125+ messages in thread From: Andrew Morton @ 2005-03-16 10:40 UTC (permalink / raw) To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel Ingo Molnar <mingo@elte.hu> wrote: > > > * Andrew Morton <akpm@osdl.org> wrote: > > > > > There's a little lock ranking diagram in jbd.h which tells us that > > > > these locks nest inside j_list_lock and j_state_lock. So I guess > > > > you'll need to turn those into semaphores. > > > > > > indeed. I did this (see the three followup patches, against BK-curr), > > > and it builds/boots/works just fine on an ext3 box. Do we want to try > > > this in -mm? > > > > ooh, I'd rather not. I spent an intense three days removing all the > > sleeping locks from ext3 (and three months debugging the result). > > Ended up gaining 1000% on 16-way. > > > > Putting them back in will really hurt the SMP performance. > > seems like turning the bitlocks into spinlocks is the best option then. > We'd need one lock in buffer_head (j_state_lock, renamed to something > more sensible like b_private_lock), and one lock in journal_head > (j_list_lock) i guess. Those two are in the journal, actually. You refer to jbd_lock_bh_state() and jbd_lock_bh_journal_head(). I think they both need to be in the buffer_head. jbd_lock_bh_journal_head() can probably go away (just use caller's jbd_lock_bh_state()). Or make them global, or put them in the journal. > How much would the +4/+8 bytes size increase in > buffer_head [on SMP] be frowned upon? It wouldn't be the end of the world. I'm not clear on what bits of the rt-super-low-latency stuff is intended for mainline though? ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:40 ` Andrew Morton @ 2005-03-16 10:51 ` Ingo Molnar 2005-03-16 11:05 ` Steven Rostedt 1 sibling, 0 replies; 125+ messages in thread From: Ingo Molnar @ 2005-03-16 10:51 UTC (permalink / raw) To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel * Andrew Morton <akpm@osdl.org> wrote: > > How much would the +4/+8 bytes size increase in > > buffer_head [on SMP] be frowned upon? > > It wouldn't be the end of the world. I'm not clear on what bits of > the rt-super-low-latency stuff is intended for mainline though? in the long run, most of it. There are no conceptual barriers so far, the -RT tree consists of lots of small details and the PREEMPT_RT framework itself. We are trying to solve (and merge) the small details first (in upstream), so that PREEMPT_RT itself becomes uncontroversial. (and it's not really the low latency that matters mainly - more valuable is the fact that under PREEMPT_RT high latencies are statistically much more unlikely [you need to do some really intentional and easy to see things to introduce high latencies], while in the current upstream kernel, high latencies are often side-effects of pretty normal kernel coding activities, so low latencies are always a catch-up game that can never be truly won for sure. So yes, while a 10 usec worst-case latency under arbitrary Linux workloads [on the right hardware] is indeed sexy, more important is that things are much more deterministic and hence much more trustable from a hard-RT POV.) Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 10:40 ` Andrew Morton 2005-03-16 10:51 ` Ingo Molnar @ 2005-03-16 11:05 ` Steven Rostedt 2005-03-16 11:19 ` Andrew Morton 1 sibling, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-16 11:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, rlrevell, linux-kernel On Wed, 16 Mar 2005, Andrew Morton wrote: > > Those two are in the journal, actually. You refer to jbd_lock_bh_state() > and jbd_lock_bh_journal_head(). I think they both need to be in the > buffer_head. jbd_lock_bh_journal_head() can probably go away (just use > caller's jbd_lock_bh_state()). > > Or make them global, or put them in the journal. The jbd_lock_bh_journal_head can be one global lock without a problem. But when I made jbd_lock_bh_state a global lock, I believe it deadlocked on me. So this one has to go into the buffer head. What do you mean with "put them in the journal", do you mean the journal_s structure? Is there a safe way to get to that structure from the buffer head? The state lock is used quite a bit and it gets tricky trying to figure out how to use other structures wrt buffer_heads at all the locations that use jbd_lock_bh_state. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 11:05 ` Steven Rostedt @ 2005-03-16 11:19 ` Andrew Morton 2005-03-16 14:04 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Andrew Morton @ 2005-03-16 11:19 UTC (permalink / raw) To: rostedt; +Cc: mingo, rlrevell, linux-kernel Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Wed, 16 Mar 2005, Andrew Morton wrote: > > > > > Those two are in the journal, actually. You refer to jbd_lock_bh_state() > > and jbd_lock_bh_journal_head(). I think they both need to be in the > > buffer_head. jbd_lock_bh_journal_head() can probably go away (just use > > caller's jbd_lock_bh_state()). > > > > Or make them global, or put them in the journal. > > The jbd_lock_bh_journal_head can be one global lock without a problem. As I say, we can probably eliminate it. > But > when I made jbd_lock_bh_state a global lock, I believe it deadlocked on > me. That's a worry. > So this one has to go into the buffer head. What do you mean with > "put them in the journal", do you mean the journal_s structure? Yes. > Is there a > safe way to get to that structure from the buffer head? No convenient way, iirc. But there's usually a fairly straightforward way to get at the journal from within JBD code. > The state lock is > used quite a bit and it gets tricky trying to figure out how to use other > structures wrt buffer_heads at all the locations that use > jbd_lock_bh_state. That one should go into the buffer_head, I guess. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 11:19 ` Andrew Morton @ 2005-03-16 14:04 ` Steven Rostedt 2005-03-16 16:47 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-16 14:04 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel On Wed, 16 Mar 2005, Andrew Morton wrote: > > > But > > when I made jbd_lock_bh_state a global lock, I believe it deadlocked on > > me. > > That's a worry. > OK, I'm wrong here. I just tried it again and it didn't deadlock (that must have been another lock I was dealing with). But it does test if the buffer head is locked or not, and asserts if it is. I'm running the following patch with on problems so far. I still use the lock bits to determine if the bh state is locked. Do you and Ingo think that this would have too much contention. Ingo, I still get the following bug because of the added BUFFER_FNS and DESKTOP_PREEMPT. I haven't tried this with RT yet. I'll see if this shows a deadlock there. BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0214888 *pde = 00000000 Oops: 0000 [#1] Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix CPU: 0 EIP: 0060:[<c0214888>] Not tainted VLI EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) EIP is at vt_ioctl+0x18/0x1ab0 eax: 00000000 ebx: 00005603 ecx: 00005603 edx: cec18c80 esi: c0214870 edi: cb49e000 ebp: cb479f18 esp: cb479e48 ds: 007b es: 007b ss: 0068 preempt: 00000000 Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530) Stack: cb403680 cb478000 cb403530 c034594c cb403530 00000246 cb479e7c c0117217 c0345954 00000006 00000001 00000000 00000000 cb479ebc cefa1c04 c13e1000 ced6b9b8 00000000 00000000 cb479ed4 c01707f1 ced6b9b8 00000007 00000000 Call Trace: [<c0103cdf>] show_stack+0x7f/0xa0 (28) [<c0103e95>] show_registers+0x165/0x1d0 (56) [<c0104088>] die+0xc8/0x150 (64) [<c0115376>] do_page_fault+0x356/0x6c4 (216) [<c0103973>] error_code+0x2b/0x30 (268) [<c020fd6b>] tty_ioctl+0x34b/0x490 (52) [<c016837f>] do_ioctl+0x4f/0x70 (32) [<c0168582>] vfs_ioctl+0x62/0x1d0 (40) [<c0168751>] sys_ioctl+0x61/0x90 (40) [<c0102ec3>] syscall_call+0x7/0xb (-8124) Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 Here's the patch (on Ingo's -40 kernel). diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-16 07:47:50.000000000 -0500 @@ -82,6 +82,9 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED; +spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED; + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-16 08:51:27.292105187 -0500 @@ -313,6 +313,8 @@ BUFFER_FNS(RevokeValid, revokevalid) TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +BUFFER_FNS(State,state) +BUFFER_FNS(JournalHead,journal) static inline struct buffer_head *jh2bh(struct journal_head *jh) { @@ -324,34 +326,50 @@ return bh->b_private; } +extern spinlock_t jbd_state_lock; +extern spinlock_t jbd_journal_lock; + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); + spin_lock(&jbd_state_lock); + BUG_ON(buffer_state(bh)); + set_buffer_state(bh); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); + if (spin_trylock(&jbd_state_lock)) { + BUG_ON(buffer_state(bh)); + set_buffer_state(bh); + return 1; + } + return 0; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); + return buffer_state(bh); //spin_is_locked(&jbd_state_lock); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); + BUG_ON(!buffer_state(bh)); + clear_buffer_state(bh); + spin_unlock(&jbd_state_lock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, &bh->b_state); + spin_lock(&jbd_journal_lock); + BUG_ON(buffer_journal(bh)); + set_buffer_journal(bh); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, &bh->b_state); + BUG_ON(!buffer_journal(bh)); + clear_buffer_journal(bh); + spin_unlock(&jbd_journal_lock); } struct jbd_revoke_table_s; ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 14:04 ` Steven Rostedt @ 2005-03-16 16:47 ` Steven Rostedt 2005-03-16 17:47 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-16 16:47 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel On Wed, 16 Mar 2005, Steven Rostedt wrote: > > Ingo, I still get the following bug because of the added BUFFER_FNS and > DESKTOP_PREEMPT. I haven't tried this with RT yet. I'll see if this shows > a deadlock there. > > Hi Ingo, I just ran this with PREEMPT_RT and it works fine. Now is this the best solution, or adding a lock to the buffer head? This works but I don't have anything more than a 2X CPU to test this on. If either you or Andrew can try this on the 8x or 16x that would be great.. Also, I only get the BUG with PREEMPT_DESKTOP. I really don't understand why this happens. I sent you a test patch earlier with just adding BUFFER_FNS(JournalHead,journalhead) in jbd.h, and under PREEMPT_DESKTOP that causes this bug as well. No other changes, just adding the BUFFER_FNS call causes this. I can't find any other reference to buffer_journal (besides reiser_fs). What do you think, and are you getting the same bug? -- Steve > BUG: Unable to handle kernel NULL pointer dereference at virtual address > 00000000 > printing eip: > c0214888 > *pde = 00000000 > Oops: 0000 [#1] > Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse > pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm > snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd > intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix > CPU: 0 > EIP: 0060:[<c0214888>] Not tainted VLI > EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) > EIP is at vt_ioctl+0x18/0x1ab0 > eax: 00000000 ebx: 00005603 ecx: 00005603 edx: cec18c80 > esi: c0214870 edi: cb49e000 ebp: cb479f18 esp: cb479e48 > ds: 007b es: 007b ss: 0068 preempt: 00000000 > Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530) > Stack: cb403680 cb478000 cb403530 c034594c cb403530 00000246 cb479e7c > c0117217 > c0345954 00000006 00000001 00000000 00000000 cb479ebc cefa1c04 > c13e1000 > ced6b9b8 00000000 00000000 cb479ed4 c01707f1 ced6b9b8 00000007 > 00000000 > Call Trace: > [<c0103cdf>] show_stack+0x7f/0xa0 (28) > [<c0103e95>] show_registers+0x165/0x1d0 (56) > [<c0104088>] die+0xc8/0x150 (64) > [<c0115376>] do_page_fault+0x356/0x6c4 (216) > [<c0103973>] error_code+0x2b/0x30 (268) > [<c020fd6b>] tty_ioctl+0x34b/0x490 (52) > [<c016837f>] do_ioctl+0x4f/0x70 (32) > [<c0168582>] vfs_ioctl+0x62/0x1d0 (40) > [<c0168751>] sys_ioctl+0x61/0x90 (40) > [<c0102ec3>] syscall_call+0x7/0xb (-8124) > Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 > 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34 > 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 > > > > > Here's the patch (on Ingo's -40 kernel). > > diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c > --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 > +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-16 07:47:50.000000000 -0500 > @@ -82,6 +82,9 @@ > > static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); > > +spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED; > +spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED; > + > /* > * Helper function used to manage commit timeouts > */ > diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h > --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 > +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-16 08:51:27.292105187 -0500 > @@ -313,6 +313,8 @@ > BUFFER_FNS(RevokeValid, revokevalid) > TAS_BUFFER_FNS(RevokeValid, revokevalid) > BUFFER_FNS(Freed, freed) > +BUFFER_FNS(State,state) > +BUFFER_FNS(JournalHead,journal) > > static inline struct buffer_head *jh2bh(struct journal_head *jh) > { > @@ -324,34 +326,50 @@ > return bh->b_private; > } > > +extern spinlock_t jbd_state_lock; > +extern spinlock_t jbd_journal_lock; > + > static inline void jbd_lock_bh_state(struct buffer_head *bh) > { > - bit_spin_lock(BH_State, &bh->b_state); > + spin_lock(&jbd_state_lock); > + BUG_ON(buffer_state(bh)); > + set_buffer_state(bh); > } > > static inline int jbd_trylock_bh_state(struct buffer_head *bh) > { > - return bit_spin_trylock(BH_State, &bh->b_state); > + if (spin_trylock(&jbd_state_lock)) { > + BUG_ON(buffer_state(bh)); > + set_buffer_state(bh); > + return 1; > + } > + return 0; > } > > static inline int jbd_is_locked_bh_state(struct buffer_head *bh) > { > - return bit_spin_is_locked(BH_State, &bh->b_state); > + return buffer_state(bh); //spin_is_locked(&jbd_state_lock); > } > > static inline void jbd_unlock_bh_state(struct buffer_head *bh) > { > - bit_spin_unlock(BH_State, &bh->b_state); > + BUG_ON(!buffer_state(bh)); > + clear_buffer_state(bh); > + spin_unlock(&jbd_state_lock); > } > > static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) > { > - bit_spin_lock(BH_JournalHead, &bh->b_state); > + spin_lock(&jbd_journal_lock); > + BUG_ON(buffer_journal(bh)); > + set_buffer_journal(bh); > } > > static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) > { > - bit_spin_unlock(BH_JournalHead, &bh->b_state); > + BUG_ON(!buffer_journal(bh)); > + clear_buffer_journal(bh); > + spin_unlock(&jbd_journal_lock); > } > > struct jbd_revoke_table_s; > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 16:47 ` Steven Rostedt @ 2005-03-16 17:47 ` Steven Rostedt 2005-03-16 19:20 ` Lee Revell ` (2 more replies) 0 siblings, 3 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-16 17:47 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel On Wed, 16 Mar 2005, Steven Rostedt wrote: > > Hi Ingo, > > I just ran this with PREEMPT_RT and it works fine. Not quite, and I will assume that some of the other patches I sent have this same problem. The jbd_trylock_bh_state really scares me. It seems that in fs/jbd/commit.c in journal_commit_transaction we have the following code: write_out_data: cond_resched(); spin_lock(&journal->j_list_lock); while (commit_transaction->t_sync_datalist) { struct buffer_head *bh; jh = commit_transaction->t_sync_datalist; commit_transaction->t_sync_datalist = jh->b_tnext; bh = jh2bh(jh); if (buffer_locked(bh)) { BUFFER_TRACE(bh, "locked"); if (!inverted_lock(journal, bh)) goto write_out_data; where invert_data simply is: /* * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is * held. For ranking reasons we must trylock. If we lose, schedule away and * return 0. j_list_lock is dropped in this case. */ static int inverted_lock(journal_t *journal, struct buffer_head *bh) { if (!jbd_trylock_bh_state(bh)) { spin_unlock(&journal->j_list_lock); schedule(); return 0; } return 1; } So, with kjournal running as a FIFO, it may hit this (as it did with my last test) and not get the lock. All it does is release another lock (ranking reasons) and calls schedule and tries again. With kjournal the highest running process on the system (UP) it deadlocks since whoever has the lock will never get a chance to run. There's a couple of places that jbd_trylock_bh_state is used in checkpoint.c, but this is the one place that it definitely deadlocks the system. I believe that the code in checkpoint.c also has this problem. I guess one way to solve this is to add a wait queue here (before schedule()), and have the one holding the lock to wake up all on the waitqueue when they release it. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 17:47 ` Steven Rostedt @ 2005-03-16 19:20 ` Lee Revell 2005-03-17 7:15 ` Steven Rostedt 2005-03-16 21:15 ` Andrew Morton 2005-03-17 9:58 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Steven Rostedt 2 siblings, 1 reply; 125+ messages in thread From: Lee Revell @ 2005-03-16 19:20 UTC (permalink / raw) To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel On Wed, 2005-03-16 at 12:47 -0500, Steven Rostedt wrote: > > On Wed, 16 Mar 2005, Steven Rostedt wrote: > > > > > Hi Ingo, > > > > I just ran this with PREEMPT_RT and it works fine. > > Not quite, and I will assume that some of the other patches I sent have > this same problem. The jbd_trylock_bh_state really scares me. It seems > that in fs/jbd/commit.c in journal_commit_transaction we have the > following code: I am a bit confused, big surprise. Does this thread still have anything to do with this trace from my "Latency regressions" bug report? http://www.alsa-project.org/~rlrevell/2912us The problem only is apparent with PREEMPT_DESKTOP and "data=ordered". PREEMPT_RT has always worked perfectly. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 19:20 ` Lee Revell @ 2005-03-17 7:15 ` Steven Rostedt 2005-03-17 15:41 ` Lee Revell 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-17 7:15 UTC (permalink / raw) To: Lee Revell; +Cc: Andrew Morton, mingo, linux-kernel On Wed, 16 Mar 2005, Lee Revell wrote: > I am a bit confused, big surprise. Does this thread still have anything > to do with this trace from my "Latency regressions" bug report? Don't worry, I've been in a state of confusion for a long time now ;-) > > http://www.alsa-project.org/~rlrevell/2912us > > The problem only is apparent with PREEMPT_DESKTOP and "data=ordered". > > PREEMPT_RT has always worked perfectly. > I'm surprise that PREEMPT_RT does work. I'm no longer sure that this does affect your latency anymore. It probably does indirectly somehow. I still think it has to do with the bitspinlocks. But I'm not sure. Just let me know if you want to be taken off this thread and I'll remove you from my CC list. Until then, I'll keep you on. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-17 7:15 ` Steven Rostedt @ 2005-03-17 15:41 ` Lee Revell 2005-03-17 16:23 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Lee Revell @ 2005-03-17 15:41 UTC (permalink / raw) To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel On Thu, 2005-03-17 at 02:15 -0500, Steven Rostedt wrote: > > On Wed, 16 Mar 2005, Lee Revell wrote: > > > I am a bit confused, big surprise. Does this thread still have anything > > to do with this trace from my "Latency regressions" bug report? > > Don't worry, I've been in a state of confusion for a long time now ;-) > > > > > http://www.alsa-project.org/~rlrevell/2912us > > > > The problem only is apparent with PREEMPT_DESKTOP and "data=ordered". > > > > PREEMPT_RT has always worked perfectly. > > > > I'm surprise that PREEMPT_RT does work. I'm no longer sure that this does > affect your latency anymore. It probably does indirectly somehow. I > still think it has to do with the bitspinlocks. But I'm not sure. Just > let me know if you want to be taken off this thread and I'll remove you > from my CC list. Until then, I'll keep you on. Sorry, it's hard to follow this thread. Just to make sure we're all on the same page, what exactly is the symptom of this ext3 issue you are working on? Is it a performance regression, or a latency issue, or a lockup - ? Whatever your problem is, I am not seeing it. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-17 15:41 ` Lee Revell @ 2005-03-17 16:23 ` Steven Rostedt 2005-03-17 16:36 ` Lee Revell 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-17 16:23 UTC (permalink / raw) To: Lee Revell; +Cc: Andrew Morton, mingo, linux-kernel On Thu, 17 Mar 2005, Lee Revell wrote: > > Sorry, it's hard to follow this thread. Just to make sure we're all on > the same page, what exactly is the symptom of this ext3 issue you are > working on? Is it a performance regression, or a latency issue, or a > lockup - ? > > Whatever your problem is, I am not seeing it. > The root is a lockup. I think you can get this lockup whether or not it is PREEMPT_RT or PREEPMT_DESKTOP. All you need is CONFIG_PREEMPT turned on. Then this is what you want to do on a UP Machine. Set kjournald to FIFO (any realtime priority). And then from a non-RT task, just do a "make clean; make" on the kernel. It may take a few minutes but your system will lock up. That's because kjournal will wait on the bit_spin_lock, but will never be preempted by the one holding the lock, because it is FIFO and the one holding the lock (the kernel compile) is not RT. Even if it was, and the same priority as kjournal, it would still lock, since kjournal is FIFO and will only yield to higher priority threads. Now this lockup has uncovered other problems with ext3. Mainly that it uses bit spinlocks, which in of itself is bad. You don't want a busy wait unless you really need it. A normal spinlock is such a thing in vanilla SMP systems, since a schedule would take longer than the one holding the lock. Ingo's RT kernel, removes most of these, and makes them into mutexes. This may slow down the overall performance but it shortens latencies for RT tasks, which is what RT tries to do. Now the latest problem is also bad, since you should never just call schedule as a "yield" to let someone else release a lock. Since the ranking order of the locks prevents just grabbing the lock and then risking a deadlock, ext3 tries to get the lock, and if it fails, it releases the other lock it has, calls schedule, then tries again. This is usually bad, since it would most likely be rescheduled, so basically it is worst than a spinlock, since it actually goes through the schedule logic again and spins! With Ingo's RT patch, this also becomes a deadlock the same way as bit_spin_locks can. Hope this helps, -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-17 16:23 ` Steven Rostedt @ 2005-03-17 16:36 ` Lee Revell 2005-03-18 6:58 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Lee Revell @ 2005-03-17 16:36 UTC (permalink / raw) To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel On Thu, 2005-03-17 at 11:23 -0500, Steven Rostedt wrote: > > On Thu, 17 Mar 2005, Lee Revell wrote: > > > > > Sorry, it's hard to follow this thread. Just to make sure we're all on > > the same page, what exactly is the symptom of this ext3 issue you are > > working on? Is it a performance regression, or a latency issue, or a > > lockup - ? > > > > Whatever your problem is, I am not seeing it. > > > > The root is a lockup. I think you can get this lockup whether or not it > is PREEMPT_RT or PREEPMT_DESKTOP. All you need is CONFIG_PREEMPT turned > on. Then this is what you want to do on a UP Machine. OK, no need to cc: me on this one any more. It's really low priority IMO compared to the big latencies I am seeing with ext3 and "data=ordered". Unless you think there is any relation. Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-17 16:36 ` Lee Revell @ 2005-03-18 6:58 ` Steven Rostedt 2005-03-18 18:19 ` Lee Revell 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-18 6:58 UTC (permalink / raw) To: Lee Revell; +Cc: Andrew Morton, mingo, linux-kernel On Thu, 17 Mar 2005, Lee Revell wrote: > > OK, no need to cc: me on this one any more. It's really low priority > IMO compared to the big latencies I am seeing with ext3 and > "data=ordered". Unless you think there is any relation. > IMO a deadlock is higher priority than a big latency :-) I still belive that something to do with the locking in ext3 has to do with your latencies, but I'll take you off when I send something to Andrew or Ingo next time. Hopefully, they'll do the same. When this problem is solved on Ingo's side, maybe this will solve your latency problem, so I recommend that you keep trying the latest RT kernels. BTW what test are you running that causes these latencies? -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-18 6:58 ` Steven Rostedt @ 2005-03-18 18:19 ` Lee Revell 0 siblings, 0 replies; 125+ messages in thread From: Lee Revell @ 2005-03-18 18:19 UTC (permalink / raw) To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel On Fri, 2005-03-18 at 01:58 -0500, Steven Rostedt wrote: > > On Thu, 17 Mar 2005, Lee Revell wrote: > > > > OK, no need to cc: me on this one any more. It's really low priority > > IMO compared to the big latencies I am seeing with ext3 and > > "data=ordered". Unless you think there is any relation. > > > > IMO a deadlock is higher priority than a big latency :-) > Of course, if I was hitting the deadlock in normal use. > I still belive that something to do with the locking in ext3 has to do > with your latencies, but I'll take you off when I send something to Andrew > or Ingo next time. Hopefully, they'll do the same. If you suspect they are related then yes I would like to be copied. > > When this problem is solved on Ingo's side, maybe this will solve your > latency problem, so I recommend that you keep trying the latest RT > kernels. BTW what test are you running that causes these latencies? dbench 16 Lee ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 17:47 ` Steven Rostedt 2005-03-16 19:20 ` Lee Revell @ 2005-03-16 21:15 ` Andrew Morton 2005-03-17 9:21 ` Steven Rostedt 2005-03-17 9:58 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Steven Rostedt 2 siblings, 1 reply; 125+ messages in thread From: Andrew Morton @ 2005-03-16 21:15 UTC (permalink / raw) To: rostedt; +Cc: mingo, rlrevell, linux-kernel Steven Rostedt <rostedt@goodmis.org> wrote: > > /* > * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock > is > * held. For ranking reasons we must trylock. If we lose, schedule away > and > * return 0. j_list_lock is dropped in this case. > */ > static int inverted_lock(journal_t *journal, struct buffer_head *bh) > { > if (!jbd_trylock_bh_state(bh)) { > spin_unlock(&journal->j_list_lock); > schedule(); > return 0; > } > return 1; > } > That's very lame code, that. The old "I don't know what the heck to do now so I'll schedule" trick. Sorry. > I guess one way to solve this is to add a wait queue here (before > schedule()), and have the one holding the lock to wake up all on the > waitqueue when they release it. yup. A patch against mainline would be appropriate, please. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 21:15 ` Andrew Morton @ 2005-03-17 9:21 ` Steven Rostedt 2005-03-18 9:23 ` [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-17 9:21 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel On Wed, 16 Mar 2005, Andrew Morton wrote: > > I guess one way to solve this is to add a wait queue here (before > > schedule()), and have the one holding the lock to wake up all on the > > waitqueue when they release it. > > yup. A patch against mainline would be appropriate, please. > Hi Andrew, Here's the patch against 2.6.11. I tested it, by adding (after making the patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead. That way I have same scenerio as with Ingo's kernel, and I turned on NEED_JOURNAL_STATE_WAIT. I'm still running that kernel so it looks like it works. Making those two locks global causes this deadlock on kjournal much quicker, and I don't need to run on an SMP machine (since my SMP machines are currently being used for other tasks). Some comments on my patch. I only implement the wait queue when bit_spin_trylock is an actual lock (thus creating the problem). I didn't want to add this code if it was needed (ie. !(CONFIG_SMP && CONFIG_DEBUG_SPINLOCKS)). So in bit_spin_trylock, I define NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock. When NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the journal code. Now the question is, should we make those two locks global? It would help Ingo's cause (and mine as well). But I don't know the impact on a large SMP configuration. Andrew, since you have a 16xSMP machine, could you (if you have time) try out the effect of that. If you do have time, then I'll send you a patch that goes on top of this one to change the two locks into global spin locks. Ingo, where do you want to go from here? I guess we need to wait on what Andrew decides. -- Steve diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c --- linux-2.6.11.orig/fs/jbd/commit.c 2005-03-02 02:38:25.000000000 -0500 +++ linux-2.6.11/fs/jbd/commit.c 2005-03-17 03:40:06.000000000 -0500 @@ -80,15 +80,33 @@ /* * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is - * held. For ranking reasons we must trylock. If we lose, schedule away and - * return 0. j_list_lock is dropped in this case. + * held. For ranking reasons we must trylock. If we lose put ourselves on a + * state wait queue and we'll be woken up when it is unlocked. Then we return + * 0 to try this again. j_list_lock is dropped in this case. */ static int inverted_lock(journal_t *journal, struct buffer_head *bh) { if (!jbd_trylock_bh_state(bh)) { + /* + * jbd_trylock_bh_state always returns true unless CONFIG_SMP or + * CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there. + * The bit_spin_locks in jbd_lock_bh_state need to be removed anyway. + */ +#ifdef NEED_JOURNAL_STATE_WAIT + DECLARE_WAITQUEUE(wait, current); spin_unlock(&journal->j_list_lock); - schedule(); + add_wait_queue_exclusive(&journal_state_wait,&wait); + set_current_state(TASK_UNINTERRUPTIBLE); + /* Check to see if the lock has been unlocked in this short time */ + if (jbd_is_locked_bh_state(bh)) + schedule(); + set_current_state(TASK_RUNNING); + remove_wait_queue(&journal_state_wait,&wait); return 0; +#else + /* This should never be hit */ + BUG(); +#endif } return 1; } diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c --- linux-2.6.11.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 +++ linux-2.6.11/fs/jbd/journal.c 2005-03-17 03:47:40.000000000 -0500 @@ -80,6 +80,11 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef NEED_JOURNAL_STATE_WAIT +EXPORT_SYMBOL(journal_state_wait); +DECLARE_WAIT_QUEUE_HEAD(journal_state_wait); +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h --- linux-2.6.11.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11/include/linux/jbd.h 2005-03-17 03:48:18.000000000 -0500 @@ -324,6 +324,20 @@ return bh->b_private; } +#ifdef NEED_JOURNAL_STATE_WAIT +/* + * The journal_state_wait is a wait queue that tasks will wait on + * if they fail to get the jbd_lock_bh_state while holding the j_list_lock. + * Instead of spinning on schedule, the task now adds itself to this wait queue + * and will be woken up when the jbd_lock_bh_state is released. + * + * Since the bit_spin_locks are only locks under CONFIG_SMP and + * CONFIG_DEBUG_SPINLOCK, this wait queue is only needed in those + * cases. + */ +extern wait_queue_head_t journal_state_wait; +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { bit_spin_lock(BH_State, &bh->b_state); @@ -342,6 +356,13 @@ static inline void jbd_unlock_bh_state(struct buffer_head *bh) { bit_spin_unlock(BH_State, &bh->b_state); +#ifdef NEED_JOURNAL_STATE_WAIT + /* + * There may be a task sleeping, and waiting to be woken up + * when this is unlocked. + */ + wake_up(&journal_state_wait); +#endif } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) diff -ur linux-2.6.11.orig/include/linux/spinlock.h linux-2.6.11/include/linux/spinlock.h --- linux-2.6.11.orig/include/linux/spinlock.h 2005-03-02 02:38:09.000000000 -0500 +++ linux-2.6.11/include/linux/spinlock.h 2005-03-17 03:39:13.024466071 -0500 @@ -527,6 +527,9 @@ * * Don't use this unless you really need to: spin_lock() and spin_unlock() * are significantly faster. + * + * FIXME: These are evil and need to be removed. They are currently only + * used by the journal code of ext3. */ static inline void bit_spin_lock(int bitnum, unsigned long *addr) { @@ -557,6 +560,13 @@ { preempt_disable(); #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + /* + * This is only used by the journal code of ext3 and if this + * is set then we need to tell the journal code that it needs + * a wait queue to keep kjournald from spinning on a lock. + */ +#define NEED_JOURNAL_STATE_WAIT + if (test_and_set_bit(bitnum, addr)) { preempt_enable(); return 0; ^ permalink raw reply [flat|nested] 125+ messages in thread
* [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) 2005-03-17 9:21 ` Steven Rostedt @ 2005-03-18 9:23 ` Steven Rostedt 2005-03-18 9:32 ` Andrew Morton 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-18 9:23 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, linux-kernel Andrew, Since I haven't gotten a response from you, I'd figure that you may have missed this, since the subject didn't change. So I changed the subject to get your attention, and I've resent this. Here's the patch to get rid of the the lame schedule that was in fs/jbd/commit.c. Let me know if this patch is appropriate. Thanks, -- Steve On Thu, 17 Mar 2005, Steven Rostedt wrote: > > > On Wed, 16 Mar 2005, Andrew Morton wrote: > > > > I guess one way to solve this is to add a wait queue here (before > > > schedule()), and have the one holding the lock to wake up all on the > > > waitqueue when they release it. > > > > yup. A patch against mainline would be appropriate, please. > > > > Hi Andrew, > > Here's the patch against 2.6.11. I tested it, by adding (after making the > patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead. > That way I have same scenerio as with Ingo's kernel, and I turned on > NEED_JOURNAL_STATE_WAIT. I'm still running that kernel so it looks like > it works. Making those two locks global causes this deadlock on kjournal > much quicker, and I don't need to run on an SMP machine (since my SMP > machines are currently being used for other tasks). > > Some comments on my patch. I only implement the wait queue when > bit_spin_trylock is an actual lock (thus creating the problem). I didn't > want to add this code if it was needed (ie. !(CONFIG_SMP && > CONFIG_DEBUG_SPINLOCKS)). So in bit_spin_trylock, I define > NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock. When > NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the > journal code. > > Now the question is, should we make those two locks global? It would help > Ingo's cause (and mine as well). But I don't know the impact on a large > SMP configuration. Andrew, since you have a 16xSMP machine, could you (if > you have time) try out the effect of that. If you do have time, then I'll > send you a patch that goes on top of this one to change the two locks into > global spin locks. > > Ingo, where do you want to go from here? I guess we need to wait on what > Andrew decides. > > -- Steve > > diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c --- linux-2.6.11.orig/fs/jbd/commit.c 2005-03-02 02:38:25.000000000 -0500 +++ linux-2.6.11/fs/jbd/commit.c 2005-03-17 03:40:06.000000000 -0500 @@ -80,15 +80,33 @@ /* * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is - * held. For ranking reasons we must trylock. If we lose, schedule away and - * return 0. j_list_lock is dropped in this case. + * held. For ranking reasons we must trylock. If we lose put ourselves on a + * state wait queue and we'll be woken up when it is unlocked. Then we return + * 0 to try this again. j_list_lock is dropped in this case. */ static int inverted_lock(journal_t *journal, struct buffer_head *bh) { if (!jbd_trylock_bh_state(bh)) { + /* + * jbd_trylock_bh_state always returns true unless CONFIG_SMP or + * CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there. + * The bit_spin_locks in jbd_lock_bh_state need to be removed anyway. + */ +#ifdef NEED_JOURNAL_STATE_WAIT + DECLARE_WAITQUEUE(wait, current); spin_unlock(&journal->j_list_lock); - schedule(); + add_wait_queue_exclusive(&journal_state_wait,&wait); + set_current_state(TASK_UNINTERRUPTIBLE); + /* Check to see if the lock has been unlocked in this short time */ + if (jbd_is_locked_bh_state(bh)) + schedule(); + set_current_state(TASK_RUNNING); + remove_wait_queue(&journal_state_wait,&wait); return 0; +#else + /* This should never be hit */ + BUG(); +#endif } return 1; } diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c --- linux-2.6.11.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500 +++ linux-2.6.11/fs/jbd/journal.c 2005-03-17 03:47:40.000000000 -0500 @@ -80,6 +80,11 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef NEED_JOURNAL_STATE_WAIT +EXPORT_SYMBOL(journal_state_wait); +DECLARE_WAIT_QUEUE_HEAD(journal_state_wait); +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h --- linux-2.6.11.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500 +++ linux-2.6.11/include/linux/jbd.h 2005-03-17 03:48:18.000000000 -0500 @@ -324,6 +324,20 @@ return bh->b_private; } +#ifdef NEED_JOURNAL_STATE_WAIT +/* + * The journal_state_wait is a wait queue that tasks will wait on + * if they fail to get the jbd_lock_bh_state while holding the j_list_lock. + * Instead of spinning on schedule, the task now adds itself to this wait queue + * and will be woken up when the jbd_lock_bh_state is released. + * + * Since the bit_spin_locks are only locks under CONFIG_SMP and + * CONFIG_DEBUG_SPINLOCK, this wait queue is only needed in those + * cases. + */ +extern wait_queue_head_t journal_state_wait; +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { bit_spin_lock(BH_State, &bh->b_state); @@ -342,6 +356,13 @@ static inline void jbd_unlock_bh_state(struct buffer_head *bh) { bit_spin_unlock(BH_State, &bh->b_state); +#ifdef NEED_JOURNAL_STATE_WAIT + /* + * There may be a task sleeping, and waiting to be woken up + * when this is unlocked. + */ + wake_up(&journal_state_wait); +#endif } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) diff -ur linux-2.6.11.orig/include/linux/spinlock.h linux-2.6.11/include/linux/spinlock.h --- linux-2.6.11.orig/include/linux/spinlock.h 2005-03-02 02:38:09.000000000 -0500 +++ linux-2.6.11/include/linux/spinlock.h 2005-03-17 03:39:13.024466071 -0500 @@ -527,6 +527,9 @@ * * Don't use this unless you really need to: spin_lock() and spin_unlock() * are significantly faster. + * + * FIXME: These are evil and need to be removed. They are currently only + * used by the journal code of ext3. */ static inline void bit_spin_lock(int bitnum, unsigned long *addr) { @@ -557,6 +560,13 @@ { preempt_disable(); #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + /* + * This is only used by the journal code of ext3 and if this + * is set then we need to tell the journal code that it needs + * a wait queue to keep kjournald from spinning on a lock. + */ +#define NEED_JOURNAL_STATE_WAIT + if (test_and_set_bit(bitnum, addr)) { preempt_enable(); return 0; ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) 2005-03-18 9:23 ` [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) Steven Rostedt @ 2005-03-18 9:32 ` Andrew Morton 2005-03-18 10:38 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Andrew Morton @ 2005-03-18 9:32 UTC (permalink / raw) To: rostedt; +Cc: mingo, linux-kernel Steven Rostedt <rostedt@goodmis.org> wrote: > > > Andrew, > > Since I haven't gotten a response from you, It sometimes takes me half a day to get onto looking at patches. And if I take them I usually don't reply (sorry). But I don't drop stuff, so if you don't hear, please assume the patch stuck. If others raise objections to the patch I'll usually duck it as well, but it's pretty obvious when that happens. I really should knock up a script to send out an email when I add a patch to -mm. > I'd figure that you may have > missed this, since the subject didn't change. So I changed the subject to > get your attention, and I've resent this. Here's the patch to get rid of > the the lame schedule that was in fs/jbd/commit.c. Let me know if this > patch is appropriate. I'm rather aghast at all the ifdeffery and complexity in this one. But I haven't looked at it closely yet. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) 2005-03-18 9:32 ` Andrew Morton @ 2005-03-18 10:38 ` Steven Rostedt 2005-03-18 11:07 ` Andrew Morton 0 siblings, 1 reply; 125+ messages in thread From: Steven Rostedt @ 2005-03-18 10:38 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, linux-kernel On Fri, 18 Mar 2005, Andrew Morton wrote: > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > Andrew, > > > > Since I haven't gotten a response from you, > > It sometimes takes me half a day to get onto looking at patches. And if I > take them I usually don't reply (sorry). But I don't drop stuff, so if you > don't hear, please assume the patch stuck. If others raise objections > to the patch I'll usually duck it as well, but it's pretty obvious when that > happens. Sorry, I didn't mean to be pushy. I understand that you have a lot on your plate, and I'm sure you don't drop stuff. I just wasn't sure that you noticed that that was a patch and not just a reply on this thread, since I didn't flag it as such in the subject. I just didn't want it to slip under the radar. > > I really should knock up a script to send out an email when I add a patch > to -mm. > I thought you might have had something like that already, which was another reason I thought you might have skipped this. > > I'd figure that you may have > > missed this, since the subject didn't change. So I changed the subject to > > get your attention, and I've resent this. Here's the patch to get rid of > > the the lame schedule that was in fs/jbd/commit.c. Let me know if this > > patch is appropriate. > > I'm rather aghast at all the ifdeffery and complexity in this one. But I > haven't looked at it closely yet. > I wanted to keep the wait logic out when it wasn't a problem. Basically, the problem only occurs when bit_spin_trylock is defined as an actual trylock. So I put in a define there to enable the wait queues. I didn't want to waste cycles checking the wait queue in jbd_unlock_bh_state when there would never be anything on it. Heck, I figured why even have the wait queue wasting memory if it wasn't needed. So that added the ifdeffery complexity. Thanks, -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) 2005-03-18 10:38 ` Steven Rostedt @ 2005-03-18 11:07 ` Andrew Morton 2005-03-18 12:10 ` Steven Rostedt 0 siblings, 1 reply; 125+ messages in thread From: Andrew Morton @ 2005-03-18 11:07 UTC (permalink / raw) To: rostedt; +Cc: mingo, linux-kernel Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > I really should knock up a script to send out an email when I add a patch > > to -mm. > > > > I thought you might have had something like that already, which was > another reason I thought you might have skipped this. > I do now.. > > > > I'd figure that you may have > > > missed this, since the subject didn't change. So I changed the subject to > > > get your attention, and I've resent this. Here's the patch to get rid of > > > the the lame schedule that was in fs/jbd/commit.c. Let me know if this > > > patch is appropriate. > > > > I'm rather aghast at all the ifdeffery and complexity in this one. But I > > haven't looked at it closely yet. > > > > I wanted to keep the wait logic out when it wasn't a problem. Basically, > the problem only occurs when bit_spin_trylock is defined as an actual > trylock. So I put in a define there to enable the wait queues. I didn't > want to waste cycles checking the wait queue in jbd_unlock_bh_state when > there would never be anything on it. Heck, I figured why even have the > wait queue wasting memory if it wasn't needed. So that added the > ifdeffery complexity. No, that code's just a problem. For ranking reasons it's essentially doing this: repeat: cond_resched(); spin_lock(j_list_lock); .... if (!bit_spin_trylock(bh)) { spin_unlock(j_list_lock); schedule(); goto repeat; } Now imagine that some other CPU holds the bit_spin_lock and is spinning, trying to get the spin_lock(). The above code assumes that the schedule() and cond_resched() will take "long enough" for the other CPU to get the spinlock, do its business then release the locks. So all the schedule() is really doing is "blow a few cycles so the other CPU can get in and grab the spinlock". That'll work OK on normal SMP but I suspect that on NUMA setups with really big latencies we could end up starving the other CPU: this CPU would keep on grabbing the lock. It depends on how the interconnect cache and all that goop works. So what to do? One approach would be to spin on the bit_spin_trylock after having dropped j_list_lock. That'll tell us when the other CPU has moved on. Another approach would be to sleep on a waitqueue somewhere. But that means that jbd_unlock_bh_state() needs to do wakeups all the time - costly. Another approach would be to simply whack an msleep(1) in there. That might be OK - it should be very rare. Probably the first approach would be the one to use. That's for mainline. I don't know what the super-duper-RT fix would be. Why did we start discussing this anyway? Oh, SCHED_FIFO. kjournald doesn't run SCHED_FIFO, but someone may decide to make it do so. But even then I don't see a problem for the mainline kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from running. ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) 2005-03-18 11:07 ` Andrew Morton @ 2005-03-18 12:10 ` Steven Rostedt 0 siblings, 0 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-18 12:10 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, linux-kernel On Fri, 18 Mar 2005, Andrew Morton wrote: > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > I wanted to keep the wait logic out when it wasn't a problem. Basically, > > the problem only occurs when bit_spin_trylock is defined as an actual > > trylock. So I put in a define there to enable the wait queues. I didn't > > want to waste cycles checking the wait queue in jbd_unlock_bh_state when > > there would never be anything on it. Heck, I figured why even have the > > wait queue wasting memory if it wasn't needed. So that added the > > ifdeffery complexity. > > No, that code's just a problem. For ranking reasons it's essentially doing > this: > > repeat: > cond_resched(); > spin_lock(j_list_lock); > .... > if (!bit_spin_trylock(bh)) { > spin_unlock(j_list_lock); > schedule(); > goto repeat; > } > Yep, that I understand. > Now imagine that some other CPU holds the bit_spin_lock and is spinning, > trying to get the spin_lock(). The above code assumes that the schedule() > and cond_resched() will take "long enough" for the other CPU to get the > spinlock, do its business then release the locks. > > So all the schedule() is really doing is "blow a few cycles so the other > CPU can get in and grab the spinlock". That'll work OK on normal SMP but I > suspect that on NUMA setups with really big latencies we could end up > starving the other CPU: this CPU would keep on grabbing the lock. It > depends on how the interconnect cache and all that goop works. > > So what to do? > > One approach would be to spin on the bit_spin_trylock after having dropped > j_list_lock. That'll tell us when the other CPU has moved on. > This is probably the best for mainline, since, as you mentioned, the abover code is just bad. > Another approach would be to sleep on a waitqueue somewhere. But that > means that jbd_unlock_bh_state() needs to do wakeups all the time - costly. > That's the approach that my patch made. > Another approach would be to simply whack an msleep(1) in there. That > might be OK - it should be very rare. > This approach is not much better than the current implementation. > Probably the first approach would be the one to use. That's for mainline. > I don't know what the super-duper-RT fix would be. Why did we start > discussing this anyway? > > Oh, SCHED_FIFO. kjournald doesn't run SCHED_FIFO, but someone may decide > to make it do so. But even then I don't see a problem for the mainline > kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from > running. > So this comes down to just a problem with Ingo's PREEPMT_RT. This means that the latency of kjournald, even without SCHED_FIFO will be large. If it preempts a process that has one of these bit spinlocks, (Ingo's RT kernel takes out the preempt_disable in them), then the kjournal thread will spin till its quota is free, causing problems for other processes. Even a process with a higher priority than kjournal if it blocks on one of the other locks that kjournal can have while attempting to get the bit locks. I know Ingo wants to get his patch eventually into the mainline without too much drag. But this problem needs to be solved in the mainline to accomplish this. What do you recommend? -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks 2005-03-16 17:47 ` Steven Rostedt 2005-03-16 19:20 ` Lee Revell 2005-03-16 21:15 ` Andrew Morton @ 2005-03-17 9:58 ` Steven Rostedt 2 siblings, 0 replies; 125+ messages in thread From: Steven Rostedt @ 2005-03-17 9:58 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel On Wed, 16 Mar 2005, Steven Rostedt wrote: > [...] There's a couple of places that > jbd_trylock_bh_state is used in checkpoint.c, but this is the one place > that it definitely deadlocks the system. I believe that the > code in checkpoint.c also has this problem. > I've examined the code in checkpoint.c, and I now believe that it doesn't have this problem. When it fails a lock, it just falls out of the while loops. -- Steve ^ permalink raw reply [flat|nested] 125+ messages in thread
* [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar ` (5 preceding siblings ...) 2005-02-19 5:08 ` Lee Revell @ 2005-03-11 9:28 ` Ingo Molnar 2005-03-11 12:10 ` Andrew Walrond 6 siblings, 1 reply; 125+ messages in thread From: Ingo Molnar @ 2005-03-11 9:28 UTC (permalink / raw) To: linux-kernel i have released the -V0.7.40-00 Real-Time Preemption patch, which can be downloaded from the usual place: http://redhat.com/~mingo/realtime-preempt/ this is a merge to 2.6.11-final. to create a -V0.7.40-00 tree from scratch, the patching order is: http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2 http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.11-final-V0.7.40-00 Ingo ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 2005-03-11 9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar @ 2005-03-11 12:10 ` Andrew Walrond 2005-03-14 20:19 ` Tom Rini 0 siblings, 1 reply; 125+ messages in thread From: Andrew Walrond @ 2005-03-11 12:10 UTC (permalink / raw) To: linux-kernel On Friday 11 March 2005 09:28, Ingo Molnar wrote: > i have released the -V0.7.40-00 Real-Time Preemption patch, which can be > downloaded from the usual place: > I've lost the thread a little; Is this still x86 only? Andrew Walrond ^ permalink raw reply [flat|nested] 125+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 2005-03-11 12:10 ` Andrew Walrond @ 2005-03-14 20:19 ` Tom Rini 0 siblings, 0 replies; 125+ messages in thread From: Tom Rini @ 2005-03-14 20:19 UTC (permalink / raw) To: Andrew Walrond; +Cc: linux-kernel On Fri, Mar 11, 2005 at 12:10:52PM +0000, Andrew Walrond wrote: > On Friday 11 March 2005 09:28, Ingo Molnar wrote: > > i have released the -V0.7.40-00 Real-Time Preemption patch, which can be > > downloaded from the usual place: > > > > I've lost the thread a little; Is this still x86 only? The patch itself contains i386, x86_64 and MIPS support. There's been patches posted for ARM (I _think_ one version which had a stab at generic hardirq support for ARM and another without, and I kinda-sorta think Ingo was waiting for the generic hardirq stuff to settle, which is another issue) as well a PPC32. -- Tom Rini http://gate.crashing.org/~trini/ ^ permalink raw reply [flat|nested] 125+ messages in thread
end of thread, other threads:[~2005-03-29 8:51 UTC | newest] Thread overview: 125+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar 2005-02-04 15:19 ` Kevin Hilman 2005-02-04 17:30 ` Ingo Molnar 2005-02-04 18:19 ` Tom Rini 2005-02-07 9:03 ` Ingo Molnar 2005-02-07 14:35 ` Tom Rini 2005-02-08 8:27 ` Ingo Molnar 2005-02-06 4:19 ` Valdis.Kletnieks 2005-02-07 9:21 ` Ingo Molnar 2005-02-07 15:08 ` Real-Time Preemption and UML? Esben Nielsen 2005-02-07 18:35 ` Jeff Dike 2005-02-07 23:14 ` Esben Nielsen 2005-02-08 8:39 ` Ingo Molnar 2005-02-08 18:55 ` Jeff Dike 2005-02-08 21:20 ` Esben Nielsen 2005-02-08 21:44 ` Ingo Molnar 2005-02-08 23:02 ` Esben Nielsen 2005-02-08 7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks 2005-02-08 8:45 ` Ingo Molnar 2005-02-08 10:26 ` Valdis.Kletnieks 2005-02-08 21:58 ` William Weston 2005-02-09 11:51 ` Ingo Molnar 2005-02-10 2:13 ` William Weston 2005-02-10 7:52 ` Ingo Molnar 2005-02-10 20:21 ` George Anzinger 2005-02-10 20:40 ` Ingo Molnar 2005-02-10 21:05 ` George Anzinger 2005-02-11 8:34 ` Ingo Molnar 2005-02-11 9:38 ` Sven Dietrich 2005-02-11 9:42 ` Ingo Molnar 2005-02-11 0:09 ` Sven Dietrich 2005-02-11 6:01 ` George Anzinger 2005-02-11 8:28 ` Ingo Molnar 2005-02-11 9:53 ` Sven Dietrich 2005-02-11 10:04 ` Ingo Molnar 2005-02-11 21:49 ` Steven Rostedt 2005-02-13 12:59 ` Ingo Molnar 2005-02-13 15:11 ` Steven Rostedt 2005-03-03 19:36 ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints 2005-03-03 22:32 ` Esben Nielsen 2005-03-04 11:56 ` Eugeny S. Mints 2005-03-04 15:45 ` George Anzinger 2005-03-29 8:45 ` Ingo Molnar 2005-02-09 12:48 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley 2005-02-10 2:20 ` William Weston 2005-02-19 5:08 ` Lee Revell 2005-02-19 6:47 ` Lee Revell 2005-02-19 9:00 ` Ingo Molnar 2005-02-19 9:03 ` Ingo Molnar 2005-02-19 20:45 ` Lee Revell 2005-02-20 0:19 ` Lee Revell 2005-03-17 16:33 ` Lee Revell 2005-02-23 2:22 ` Lee Revell 2005-03-10 9:37 ` Steven Rostedt 2005-03-10 9:54 ` Steven Rostedt 2005-03-11 9:57 ` Ingo Molnar 2005-03-11 10:15 ` Steven Rostedt 2005-03-11 10:17 ` Ingo Molnar 2005-03-11 10:24 ` Steven Rostedt 2005-03-11 10:43 ` Andrew Morton 2005-03-11 10:53 ` Steven Rostedt 2005-03-11 14:40 ` Steven Rostedt 2005-03-11 15:08 ` Steven Rostedt 2005-03-11 15:30 ` K.R. Foley 2005-03-11 15:38 ` Ingo Molnar 2005-03-11 16:01 ` Steven Rostedt 2005-03-11 20:39 ` Steven Rostedt 2005-03-11 20:46 ` Lee Revell 2005-03-11 22:06 ` Lee Revell 2005-03-14 7:37 ` Steven Rostedt 2005-03-14 9:33 ` Steven Rostedt 2005-03-14 10:10 ` Steven Rostedt 2005-03-14 15:50 ` Steven Rostedt 2005-03-14 19:02 ` Steven Rostedt 2005-03-15 11:44 ` Steven Rostedt 2005-03-15 12:00 ` Ingo Molnar 2005-03-15 13:07 ` Steven Rostedt 2005-03-15 13:35 ` Ingo Molnar 2005-03-15 13:55 ` Steven Rostedt 2005-03-15 19:12 ` Andrew Morton 2005-03-15 18:05 ` Steven Rostedt 2005-03-15 19:09 ` Lee Revell 2005-03-16 7:50 ` Steven Rostedt 2005-03-16 18:21 ` Lee Revell 2005-03-16 7:31 ` Steven Rostedt 2005-03-16 8:50 ` Ingo Molnar 2005-03-16 9:15 ` Andrew Morton 2005-03-16 9:51 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar 2005-03-16 9:53 ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar 2005-03-16 9:53 ` [patch 2/3] j_list_lock -> j_list_sem Ingo Molnar 2005-03-16 9:57 ` [patch 3/3] remove bitlocks Ingo Molnar 2005-03-16 10:04 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton 2005-03-16 10:12 ` Ingo Molnar 2005-03-16 10:23 ` Steven Rostedt 2005-03-16 10:26 ` Ingo Molnar 2005-03-16 10:26 ` Andrew Morton 2005-03-16 10:29 ` Ingo Molnar 2005-03-16 10:41 ` Andrew Morton 2005-03-16 10:34 ` Arjan van de Ven 2005-03-16 10:19 ` Ingo Molnar 2005-03-16 10:40 ` Andrew Morton 2005-03-16 10:51 ` Ingo Molnar 2005-03-16 11:05 ` Steven Rostedt 2005-03-16 11:19 ` Andrew Morton 2005-03-16 14:04 ` Steven Rostedt 2005-03-16 16:47 ` Steven Rostedt 2005-03-16 17:47 ` Steven Rostedt 2005-03-16 19:20 ` Lee Revell 2005-03-17 7:15 ` Steven Rostedt 2005-03-17 15:41 ` Lee Revell 2005-03-17 16:23 ` Steven Rostedt 2005-03-17 16:36 ` Lee Revell 2005-03-18 6:58 ` Steven Rostedt 2005-03-18 18:19 ` Lee Revell 2005-03-16 21:15 ` Andrew Morton 2005-03-17 9:21 ` Steven Rostedt 2005-03-18 9:23 ` [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) Steven Rostedt 2005-03-18 9:32 ` Andrew Morton 2005-03-18 10:38 ` Steven Rostedt 2005-03-18 11:07 ` Andrew Morton 2005-03-18 12:10 ` Steven Rostedt 2005-03-17 9:58 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Steven Rostedt 2005-03-11 9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar 2005-03-11 12:10 ` Andrew Walrond 2005-03-14 20:19 ` Tom Rini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox