* Re: 2.6.13-rc6-rt9
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
@ 2005-08-18 15:24 ` Thomas Gleixner
2005-08-18 16:08 ` 2.6.13-rc6-rt9 Thomas Gleixner
2005-08-18 21:17 ` 2.6.13-rc6-rt9 Thomas Gleixner
2005-08-18 22:54 ` [2.6.13-rc6-rt9 patch] fix DECNET_ROUTER=y compile Adrian Bunk
` (7 subsequent siblings)
8 siblings, 2 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-18 15:24 UTC (permalink / raw)
To: Ingo Molnar; +Cc: john cooper, linux-kernel
On Thu, 2005-08-18 at 08:01 +0200, Ingo Molnar wrote:
> i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from
> the usual place:
Hi Ingo,
finally found the deadlock. It was caused by IRQ flood, which was
introduced by the end_irq() changes.
They change the semantics in two ways - both wrong.
1. The condition is different
2. ->end() can contain different code than ->enable()
That's the code in end_8259A_irq():
if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) &&
irq_desc[irq].action)
enable_8259A_irq(irq);
The code in end_irq():
if (!(desc->status & IRQ_DISABLED))
desc->handler->enable(irq);
What was the reason for those changes ?
tglx
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude
linux-2.6.13-rc6-rt8/kernel/irq/handle.c
linux-2.6.13-rc6-rt-debug/kernel/irq/handle.c
--- linux-2.6.13-rc6-rt8/kernel/irq/handle.c 2005-08-17
17:53:13.000000000 +0200
+++ linux-2.6.13-rc6-rt-debug/kernel/irq/handle.c 2005-08-18
16:32:54.000000000 +0200
@@ -171,7 +171,7 @@ fastcall notrace unsigned int __do_IRQ(u
*/
desc->handler->ack(irq);
action_ret = handle_IRQ_event(irq, regs, desc->action);
- end_irq(desc, irq);
+ desc->handler->end(irq);
return 1;
}
@@ -241,7 +241,7 @@ out:
* The end-handler has to deal with interrupts which got
* disabled while the handler was running:
*/
- end_irq(desc, irq);
+ desc->handler->end(irq);
out_no_end:
spin_unlock(&desc->lock);
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt8/kernel/irq/internals.h linux-2.6.13-rc6-rt-debug/kernel/irq/internals.h
--- linux-2.6.13-rc6-rt8/kernel/irq/internals.h 2005-08-17 17:53:13.000000000 +0200
+++ linux-2.6.13-rc6-rt-debug/kernel/irq/internals.h 2005-08-18 16:39:56.000000000 +0200
@@ -6,21 +6,6 @@ extern int noirqdebug;
void recalculate_desc_flags(struct irq_desc *desc);
-/*
- * On PREEMPT_HARDIRQS, the ->ack handler masks interrupts, so that
- * they can be redirected to an IRQ thread, if needed. So here we
- * have to unmask the interrupt line, if needed:
- */
-static inline void end_irq(irq_desc_t *desc, unsigned int irq)
-{
-#ifdef CONFIG_PREEMPT_HARDIRQS
- if (!(desc->status & IRQ_DISABLED))
- desc->handler->enable(irq);
-#else
- desc->handler->end(irq);
-#endif
-}
-
#ifdef CONFIG_PROC_FS
extern void register_irq_proc(unsigned int irq);
extern void register_handler_proc(unsigned int irq, struct irqaction *action);
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt8/kernel/irq/manage.c linux-2.6.13-rc6-rt-debug/kernel/irq/manage.c
--- linux-2.6.13-rc6-rt8/kernel/irq/manage.c 2005-08-17 17:53:13.000000000 +0200
+++ linux-2.6.13-rc6-rt-debug/kernel/irq/manage.c 2005-08-18 16:31:46.000000000 +0200
@@ -442,7 +442,7 @@ static void do_hardirq(struct irq_desc *
* The end-handler has to deal with interrupts which got
* disabled while the handler was running:
*/
- end_irq(desc, irq);
+ desc->handler->end(irq);
}
spin_unlock_irq(&desc->lock);
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-18 15:24 ` 2.6.13-rc6-rt9 Thomas Gleixner
@ 2005-08-18 16:08 ` Thomas Gleixner
2005-08-18 21:17 ` 2.6.13-rc6-rt9 Thomas Gleixner
1 sibling, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-18 16:08 UTC (permalink / raw)
To: Ingo Molnar; +Cc: john cooper, linux-kernel
On Thu, 2005-08-18 at 17:24 +0200, Thomas Gleixner wrote:
Oops, mailer madness.
tglx
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt8/kernel/irq/handle.c linux-2.6.13-rc6-rt-debug/kernel/irq/handle.c
--- linux-2.6.13-rc6-rt8/kernel/irq/handle.c 2005-08-17 17:53:13.000000000 +0200
+++ linux-2.6.13-rc6-rt-debug/kernel/irq/handle.c 2005-08-18 16:32:54.000000000 +0200
@@ -171,7 +171,7 @@ fastcall notrace unsigned int __do_IRQ(u
*/
desc->handler->ack(irq);
action_ret = handle_IRQ_event(irq, regs, desc->action);
- end_irq(desc, irq);
+ desc->handler->end(irq);
return 1;
}
@@ -241,7 +241,7 @@ out:
* The end-handler has to deal with interrupts which got
* disabled while the handler was running:
*/
- end_irq(desc, irq);
+ desc->handler->end(irq);
out_no_end:
spin_unlock(&desc->lock);
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt8/kernel/irq/internals.h linux-2.6.13-rc6-rt-debug/kernel/irq/internals.h
--- linux-2.6.13-rc6-rt8/kernel/irq/internals.h 2005-08-17 17:53:13.000000000 +0200
+++ linux-2.6.13-rc6-rt-debug/kernel/irq/internals.h 2005-08-18 16:39:56.000000000 +0200
@@ -6,21 +6,6 @@ extern int noirqdebug;
void recalculate_desc_flags(struct irq_desc *desc);
-/*
- * On PREEMPT_HARDIRQS, the ->ack handler masks interrupts, so that
- * they can be redirected to an IRQ thread, if needed. So here we
- * have to unmask the interrupt line, if needed:
- */
-static inline void end_irq(irq_desc_t *desc, unsigned int irq)
-{
-#ifdef CONFIG_PREEMPT_HARDIRQS
- if (!(desc->status & IRQ_DISABLED))
- desc->handler->enable(irq);
-#else
- desc->handler->end(irq);
-#endif
-}
-
#ifdef CONFIG_PROC_FS
extern void register_irq_proc(unsigned int irq);
extern void register_handler_proc(unsigned int irq, struct irqaction *action);
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt8/kernel/irq/manage.c linux-2.6.13-rc6-rt-debug/kernel/irq/manage.c
--- linux-2.6.13-rc6-rt8/kernel/irq/manage.c 2005-08-17 17:53:13.000000000 +0200
+++ linux-2.6.13-rc6-rt-debug/kernel/irq/manage.c 2005-08-18 16:31:46.000000000 +0200
@@ -442,7 +442,7 @@ static void do_hardirq(struct irq_desc *
* The end-handler has to deal with interrupts which got
* disabled while the handler was running:
*/
- end_irq(desc, irq);
+ desc->handler->end(irq);
}
spin_unlock_irq(&desc->lock);
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-18 15:24 ` 2.6.13-rc6-rt9 Thomas Gleixner
2005-08-18 16:08 ` 2.6.13-rc6-rt9 Thomas Gleixner
@ 2005-08-18 21:17 ` Thomas Gleixner
1 sibling, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-18 21:17 UTC (permalink / raw)
To: Ingo Molnar; +Cc: john cooper, linux-kernel
On Thu, 2005-08-18 at 17:24 +0200, Thomas Gleixner wrote:
> finally found the deadlock. It was caused by IRQ flood, which was
> introduced by the end_irq() changes.
Found another one to back out. It creaped in with the same patch.
It's slow and just conceals bad configured PICs and crappy demux
handlers.
tglx
--- linux-2.6.13-rc6-rt9/arch/ppc/syslib/open_pic.c 2005-08-18 17:37:39.000000000 +0200
+++ linux-2.6.13-rc6-rt9.work/arch/ppc/syslib/open_pic.c 2005-08-18 23:02:12.000000000 +0200
@@ -816,10 +816,6 @@ static void openpic_set_sense(u_int irq,
}
#endif /* notused */
-#ifdef CONFIG_PREEMPT_RT
-#define __SLOW_VERSION__
-#endif
-
/* No spinlocks, should not be necessary with the OpenPIC
* (1 register = 1 interrupt and we have the desc lock).
*/
^ permalink raw reply [flat|nested] 63+ messages in thread
* [2.6.13-rc6-rt9 patch] fix DECNET_ROUTER=y compile
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
2005-08-18 15:24 ` 2.6.13-rc6-rt9 Thomas Gleixner
@ 2005-08-18 22:54 ` Adrian Bunk
2005-08-22 7:59 ` Ingo Molnar
2005-08-18 22:54 ` 2.6.13-rc6-rt9: compile errors Adrian Bunk
` (6 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Adrian Bunk @ 2005-08-18 22:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, Paul E. McKenney
It doesn't compile with CONFIG_DECNET_ROUTER=y:
<-- snip -->
...
CC net/decnet/dn_dev.o
net/decnet/dn_dev.c: In function 'dn_forwarding_proc':
net/decnet/dn_dev.c:335: error: 'struct dn_dev_parms' has no member named 'down'
net/decnet/dn_dev.c:336: error: syntax error before 'do'
net/decnet/dn_dev.c:338: error: 'struct dn_dev_parms' has no member named 'up'
net/decnet/dn_dev.c:339: error: syntax error before 'do'
net/decnet/dn_dev.c:339: warning: control reaches end of non-void function
net/decnet/dn_dev.c: In function 'dn_forwarding_sysctl':
net/decnet/dn_dev.c:374: error: 'struct dn_dev_parms' has no member named 'down'
net/decnet/dn_dev.c:375: error: syntax error before 'do'
net/decnet/dn_dev.c:377: error: 'struct dn_dev_parms' has no member named 'up'
net/decnet/dn_dev.c:378: error: syntax error before 'do'
net/decnet/dn_dev.c:378: warning: control reaches end of non-void function
make[2]: *** [net/decnet/dn_dev.o] Error 1
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@stusta.de>
--- linux-2.6.13-rc6-rt9/net/decnet/dn_dev.c.old 2005-08-19 00:00:41.000000000 +0200
+++ linux-2.6.13-rc6-rt9/net/decnet/dn_dev.c 2005-08-19 00:02:14.000000000 +0200
@@ -332,11 +332,11 @@
*/
tmp = dn_db->parms.forwarding;
dn_db->parms.forwarding = old;
- if (dn_db->parms.down)
- dn_db->parms.down(dev);
+ if (dn_db->parms.dn_down)
+ dn_db->parms.dn_down(dev);
dn_db->parms.forwarding = tmp;
- if (dn_db->parms.up)
- dn_db->parms.up(dev);
+ if (dn_db->parms.dn_up)
+ dn_db->parms.dn_up(dev);
}
return err;
@@ -371,11 +371,11 @@
if (value > 2)
return -EINVAL;
- if (dn_db->parms.down)
- dn_db->parms.down(dev);
+ if (dn_db->parms.dn_down)
+ dn_db->parms.dn_down(dev);
dn_db->parms.forwarding = value;
- if (dn_db->parms.up)
- dn_db->parms.up(dev);
+ if (dn_db->parms.dn_up)
+ dn_db->parms.dn_up(dev);
}
return 0;
^ permalink raw reply [flat|nested] 63+ messages in thread* 2.6.13-rc6-rt9: compile errors
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
2005-08-18 15:24 ` 2.6.13-rc6-rt9 Thomas Gleixner
2005-08-18 22:54 ` [2.6.13-rc6-rt9 patch] fix DECNET_ROUTER=y compile Adrian Bunk
@ 2005-08-18 22:54 ` Adrian Bunk
2005-08-22 8:44 ` Ingo Molnar
2005-08-19 0:05 ` 2.6.13-rc6-rt9 Chuck Harding
` (5 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Adrian Bunk @ 2005-08-18 22:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, Paul E. McKenney
I'm getting the following compile errors:
<-- snip -->
...
LD .tmp_vmlinux1
drivers/built-in.o: In function `pi_init':
: multiple definition of `pi_init'
kernel/built-in.o:(.bss+0x80f0): first defined here
ld: Warning: size of symbol `pi_init' changed from 4 in kernel/built-in.o to 675 in drivers/built-in.o
ld: Warning: type of symbol `pi_init' changed from 1 to 2 in drivers/built-in.o
fs/built-in.o: In function `jffs2_do_crccheck_inode':
: undefined reference to `there_is_no_init_MUTEX_LOCKED_for_RT_semaphores'
fs/built-in.o: In function `jffs2_i_init_once':
super.c:(.text+0x1d1137): undefined reference to `there_is_no_init_MUTEX_LOCKED_for_RT_semaphores'
drivers/built-in.o: In function `si_restart_short_timer':
ipmi_si_intf.c:(.text+0x21f395): undefined reference to `__bad_spinlock_type'
drivers/built-in.o: In function `smi_timeout':
ipmi_si_intf.c:(.text+0x21f5ca): undefined reference to `__bad_spinlock_type'
ipmi_si_intf.c:(.text+0x21f5fa): undefined reference to `__bad_spinlock_type'
drivers/built-in.o: In function `carm_init_one':
sx8.c:(.text+0x28ebcd): undefined reference to `there_is_no_init_MUTEX_LOCKED_for_RT_semaphores'
drivers/built-in.o: In function `plip_close':
plip.c:(.text+0x2bae71): undefined reference to `there_is_no_init_MUTEX_LOCKED_for_RT_semaphores'
drivers/built-in.o: In function `sixpack_open':
6pack.c:(.text+0x47640c): undefined reference to `there_is_no_init_MUTEX_LOCKED_for_RT_semaphores'
drivers/built-in.o: In function `mc32_probe1':
3c527.c:(.init.text+0x333d9): undefined reference to `there_is_no_init_MUTEX_LOCKED_for_RT_semaphores'
make: *** [.tmp_vmlinux1] Error 1
<-- snip -->
Note: pi_init is a global function in drivers/block/paride/paride.c .
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9: compile errors
2005-08-18 22:54 ` 2.6.13-rc6-rt9: compile errors Adrian Bunk
@ 2005-08-22 8:44 ` Ingo Molnar
0 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2005-08-22 8:44 UTC (permalink / raw)
To: Adrian Bunk
Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, Paul E. McKenney
* Adrian Bunk <bunk@stusta.de> wrote:
> I'm getting the following compile errors:
>
> <-- snip -->
>
> ...
> LD .tmp_vmlinux1
> drivers/built-in.o: In function `pi_init':
> : multiple definition of `pi_init'
> kernel/built-in.o:(.bss+0x80f0): first defined here
> Note: pi_init is a global function in drivers/block/paride/paride.c .
thanks - fixed it in my tree.
Ingo
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
` (2 preceding siblings ...)
2005-08-18 22:54 ` 2.6.13-rc6-rt9: compile errors Adrian Bunk
@ 2005-08-19 0:05 ` Chuck Harding
2005-08-19 6:39 ` 2.6.13-rc6-rt9 Steven Rostedt
` (4 subsequent siblings)
8 siblings, 0 replies; 63+ messages in thread
From: Chuck Harding @ 2005-08-19 0:05 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linux Kernel Discussion List, Thomas Gleixner, Steven Rostedt,
Paul E. McKenney
On Thu, 18 Aug 2005, Ingo Molnar wrote:
>
> i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from
> the usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
> it's a fixes-only release. Changes since 2.6.13-rc6-rt3:
>
> - USB irq flags use cleanups (Alan Stern)
>
> - RCU tasklist-lock fixes (Paul McKenney, Thomas Gleixner)
>
> - HR-timers waitqueue splitup, better HRT latencies (Thomas Gleixner)
>
> - latency tracer fixes, irq flags tracing cleanups (Steven Rostedt, me)
>
> - NFSd BKL unlock fix (Steven Rostedt)
>
> - stackfootprint-max-printer fix (Steven Rostedt)
>
> - stop_machine fix (Steven Rostedt)
>
> - lpptest fix (me)
>
> - turned off IOAPIC_POSTFLUSH when CONFIG_X86_IOAPIC_FAST. Now with
> Karsten's VIA fixes my testbox does not show PCI-POST weirnesses
> anymore. In case of IRQ problems please turn off IOAPIC_FAST. (me)
I'm still getting the same oops when rebooting. the same process (reboot)
similar call trace (some addresses are slightly different but the functions
are the same:
disable_IO_APIC+0x5a/0x90 (8)
machine_restart+0x5/0x9 (28)
sys_reboot+0x147/0x156 (4)
netdev_run_todo+0xa4/0x209 (4)
etc.
Another interesting data point is that I did a SysRq+B right after the
machine came up and got a different oops.
--
Charles D. (Chuck) Harding <charding@llnl.gov> Voice: 925-423-8879
Senior Computer Associate ICCD Fax: 925-423-6961
Lawrence Livermore National Laboratory Computation Directorate
Livermore, CA USA http://www.llnl.gov GPG Public Key ID: B9EB6601
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
` (3 preceding siblings ...)
2005-08-19 0:05 ` 2.6.13-rc6-rt9 Chuck Harding
@ 2005-08-19 6:39 ` Steven Rostedt
2005-08-19 13:00 ` 2.6.13-rc6-rt9 Steven Rostedt
2005-08-23 12:36 ` 2.6.13-rc6-rt9 Ingo Molnar
2005-08-19 16:56 ` 2.6.13-rc6-rt9 Peter Zijlstra
` (3 subsequent siblings)
8 siblings, 2 replies; 63+ messages in thread
From: Steven Rostedt @ 2005-08-19 6:39 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney
Deadlock finally found!!!
I've been debugging this all week. And at 2:30 in the morning I finally
found where it is. It really sucks when you need to debug on something
that doesn't have a serial, and netconsole doesn't work that reliably.
This also explains why this only happened on my laptop. It is caused by
the dependent_sleeper, which is used quite a bit when you have SCHED_HT
turned on. Although I wouldn't be surprised if this deadlock exists
elsewhere.
The dependent_sleeper (used by SCHED_HT) grabs all the run queue locks
for the physical CPU. Then it calls trace_stop_sched_switched. So this
is the calling chain.
dependent_sleeper
+==> grabs CPU0 rq and CPU1 rq lock (saying CPU 0 and 1 are on the
same physical CPU)
-> trace_stop_sched_switched
-> check_wakeup_timing
-> down_trylock(max_mutex)
-> rt_down_trylock
-> __down_trylock
+==> grabs trace_lock
Now lets look at something at the same time that is unlocking.
rt_up
-> __up_mutex_nosavestate_inline
-> ___up_mutex_nosavestate
-> ____up_mutex
+==> grabs trace_lock
-> __up_mutex_waiter_nosavestate
-> wake_up_process
-> try_to_wake_up
+==> grabs rq lock
Here we can see that there's a reverse order here and we have a
deadlock. I actually witness this using my logger to show the traces.
All I needed was the last few lines, so the console was fine here.
Ingo, can't you get rt.c to be more confusing. I mean it is too simple.
We need to add a few more underscores here and there :-) Seriously,
that rt.c is mind boggling. It was nice before, now it is just screaming
for a cleanup (come now, do we really need the four underscores?). Same
with latency.c.
Well, there's the deadlock, I'm too tired to figure out all the paths,
and what the heck is going on. So I'll give you the honour of writing
the patch. ;-)
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-19 6:39 ` 2.6.13-rc6-rt9 Steven Rostedt
@ 2005-08-19 13:00 ` Steven Rostedt
2005-08-19 15:36 ` 2.6.13-rc6-rt9 Steven Rostedt
2005-08-22 7:58 ` 2.6.13-rc6-rt9 Ingo Molnar
2005-08-23 12:36 ` 2.6.13-rc6-rt9 Ingo Molnar
1 sibling, 2 replies; 63+ messages in thread
From: Steven Rostedt @ 2005-08-19 13:00 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul E. McKenney, Thomas Gleixner, linux-kernel
On Fri, 2005-08-19 at 02:39 -0400, Steven Rostedt wrote:
> Ingo, can't you get rt.c to be more confusing. I mean it is too simple.
> We need to add a few more underscores here and there :-) Seriously,
> that rt.c is mind boggling. It was nice before, now it is just screaming
> for a cleanup (come now, do we really need the four underscores?). Same
> with latency.c.
Ingo,
Here's one example of cleaning up rt.c. I like an extra parameter
instead of having two functions that are exactly the same except for one
line. I'll probably submit more.
I haven't thought of a good way yet to solve the race condition with
dependent sleeper. (Except by turning off CONFIG_WAKEUP_TIMING :-)
-- Steve
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux_realtime_ernie/kernel/rt.c
===================================================================
--- linux_realtime_ernie/kernel/rt.c (revision 296)
+++ linux_realtime_ernie/kernel/rt.c (working copy)
@@ -1331,8 +1331,7 @@
FREE_WAITER(&waiter);
}
-static void __up_mutex_waiter_savestate(struct rt_mutex *lock __EIP_DECL__);
-static void __up_mutex_waiter_nosavestate(struct rt_mutex *lock __EIP_DECL__);
+static void __up_mutex_waiter(struct rt_mutex *lock, int save_state __EIP_DECL__);
/*
* release the lock:
@@ -1361,12 +1360,9 @@
if (plist_empty(&lock->wait_list))
check_pi_list_empty(lock, lock_owner(lock));
#endif
- if (unlikely(!plist_empty(&lock->wait_list))) {
- if (save_state)
- __up_mutex_waiter_savestate(lock __EIP__);
- else
- __up_mutex_waiter_nosavestate(lock __EIP__);
- } else
+ if (unlikely(!plist_empty(&lock->wait_list)))
+ __up_mutex_waiter(lock, save_state __EIP__);
+ else
lock->owner = NULL;
__raw_spin_unlock(&pi_lock);
__raw_spin_unlock(&lock->wait_lock);
@@ -1759,7 +1755,7 @@
return __down_trylock(&rwsem->lock __CALLER0__);
}
-static void __up_mutex_waiter_nosavestate(struct rt_mutex *lock __EIP_DECL__)
+static void __up_mutex_waiter(struct rt_mutex *lock, int save_state __EIP_DECL__)
{
struct thread_info *old_owner_ti, *new_owner_ti;
struct task_struct *old_owner, *new_owner;
@@ -1790,43 +1786,12 @@
new_owner->pending_owner = lock;
}
#endif
- wake_up_process(new_owner);
+ if (save_state)
+ wake_up_process_mutex(new_owner);
+ else
+ wake_up_process(new_owner);
}
-static void __up_mutex_waiter_savestate(struct rt_mutex *lock __EIP_DECL__)
-{
- struct thread_info *old_owner_ti, *new_owner_ti;
- struct task_struct *old_owner, *new_owner;
- struct rt_mutex_waiter *w;
- int prio;
-
- old_owner_ti = lock_owner(lock);
- old_owner = old_owner_ti->task;
- new_owner_ti = pick_new_owner(lock, old_owner_ti, 1 __EIP__);
- new_owner = new_owner_ti->task;
-
- /*
- * If the owner got priority-boosted then restore it
- * to the previous priority (or to the next highest prio
- * waiter's priority):
- */
- prio = old_owner->normal_prio;
- if (unlikely(!plist_empty(&old_owner->pi_waiters))) {
- w = plist_first_entry(&old_owner->pi_waiters, struct rt_mutex_waiter, pi_list);
- if (w->ti->task->prio < prio)
- prio = w->ti->task->prio;
- }
- if (unlikely(prio != old_owner->prio))
- pi_setprio(lock, old_owner, prio);
-#ifdef CAPTURE_LOCK
- if (lock != &kernel_sem.lock) {
- new_owner->rt_flags |= RT_PENDOWNER;
- new_owner->pending_owner = lock;
- }
-#endif
- wake_up_process_mutex(new_owner);
-}
-
/*
* Do owner check too:
*/
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-19 13:00 ` 2.6.13-rc6-rt9 Steven Rostedt
@ 2005-08-19 15:36 ` Steven Rostedt
2005-08-22 7:57 ` 2.6.13-rc6-rt9 Ingo Molnar
2005-08-22 7:58 ` 2.6.13-rc6-rt9 Ingo Molnar
1 sibling, 1 reply; 63+ messages in thread
From: Steven Rostedt @ 2005-08-19 15:36 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney
On Fri, 2005-08-19 at 09:00 -0400, Steven Rostedt wrote:
> On Fri, 2005-08-19 at 02:39 -0400, Steven Rostedt wrote:
> I haven't thought of a good way yet to solve the race condition with
> dependent sleeper. (Except by turning off CONFIG_WAKEUP_TIMING :-)
>
OK, I found one simple solution. The problem stems from max_mutex being
grabbed. Since this uses the RT locks, and since tracing shouldn't
really care about PI and all that, I switched this to a
compat_semaphore, but only if CONFIG_WAKEUP_TIMING is set. This seems to
get rid of this race condition that I have.
I found more bugs, but for now this message is about this specific race.
-- Steve
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux_realtime_ernie/kernel/latency.c
===================================================================
--- linux_realtime_ernie/kernel/latency.c (revision 297)
+++ linux_realtime_ernie/kernel/latency.c (working copy)
@@ -102,7 +102,19 @@
/*
* Track maximum latencies and save the trace:
*/
+#ifdef CONFIG_WAKEUP_TIMING
+/*
+ * The WAKEUP_TIMING has a race condition, since
+ * trace_stop_sched_switched might be called with run queue locks held
+ * which eventually calls down_trylock.
+ * But the RT version of down_trylock grabs a bunch of locks that
+ * are used by rt_up. rt_up can call wake_up_process which
+ * eventually grabs a run queue lock.
+ */
+static __cacheline_aligned_in_smp COMPAT_DECLARE_MUTEX(max_mutex);
+#else
static __cacheline_aligned_in_smp DECLARE_MUTEX(max_mutex);
+#endif
/*
* Sequence count - we record it when starting a measurement and
* skip the latency if the sequence has changed - some other section
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-19 15:36 ` 2.6.13-rc6-rt9 Steven Rostedt
@ 2005-08-22 7:57 ` Ingo Molnar
0 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2005-08-22 7:57 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney
* Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 2005-08-19 at 09:00 -0400, Steven Rostedt wrote:
> > On Fri, 2005-08-19 at 02:39 -0400, Steven Rostedt wrote:
>
> > I haven't thought of a good way yet to solve the race condition with
> > dependent sleeper. (Except by turning off CONFIG_WAKEUP_TIMING :-)
> >
>
> OK, I found one simple solution. The problem stems from max_mutex
> being grabbed. Since this uses the RT locks, and since tracing
> shouldn't really care about PI and all that, I switched this to a
> compat_semaphore, but only if CONFIG_WAKEUP_TIMING is set. This seems
> to get rid of this race condition that I have.
ok, i have applied your patch and have done a small tweak: i made it a
compat semaphore unconditionally. There's no point in #ifdefing it on
WAKEUP_TIMING.
> I found more bugs, but for now this message is about this specific
> race.
ok.
Ingo
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-19 13:00 ` 2.6.13-rc6-rt9 Steven Rostedt
2005-08-19 15:36 ` 2.6.13-rc6-rt9 Steven Rostedt
@ 2005-08-22 7:58 ` Ingo Molnar
1 sibling, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2005-08-22 7:58 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Paul E. McKenney, Thomas Gleixner, linux-kernel
* Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 2005-08-19 at 02:39 -0400, Steven Rostedt wrote:
>
> > Ingo, can't you get rt.c to be more confusing. I mean it is too simple.
> > We need to add a few more underscores here and there :-) Seriously,
> > that rt.c is mind boggling. It was nice before, now it is just screaming
> > for a cleanup (come now, do we really need the four underscores?). Same
> > with latency.c.
>
> Ingo,
>
> Here's one example of cleaning up rt.c. I like an extra parameter
> instead of having two functions that are exactly the same except for
> one line. I'll probably submit more.
it was done like that deliberately, so that the fastpath doesnt include
conditional code. (when all debugging options are disabled)
Ingo
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-19 6:39 ` 2.6.13-rc6-rt9 Steven Rostedt
2005-08-19 13:00 ` 2.6.13-rc6-rt9 Steven Rostedt
@ 2005-08-23 12:36 ` Ingo Molnar
2005-08-23 12:50 ` 2.6.13-rc6-rt9 Steven Rostedt
1 sibling, 1 reply; 63+ messages in thread
From: Ingo Molnar @ 2005-08-23 12:36 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney
* Steven Rostedt <rostedt@goodmis.org> wrote:
> Ingo, can't you get rt.c to be more confusing. I mean it is too
> simple. We need to add a few more underscores here and there :-)
> Seriously, that rt.c is mind boggling. It was nice before, now it is
> just screaming for a cleanup (come now, do we really need the four
> underscores?). Same with latency.c.
i agree that it's ugly, but some of that ugliness is to achieve the
7-instructions fail-through codepath for the common acquire (and
release) codepath:
c03a5320 <__down_mutex>:
c03a5320: 89 c1 mov %eax,%ecx
c03a5322: 8b 15 08 76 3a c0 mov 0xc03a7608,%edx
c03a5328: 31 c0 xor %eax,%eax
c03a532a: 0f b1 51 14 cmpxchg %edx,0x14(%ecx)
c03a532e: 85 c0 test %eax,%eax
c03a5330: 75 01 jne c03a5333 <__down_mutex+0x13>
c03a5332: c3 ret
that's how much it takes to acquire an RT lock, and i worked hard to get
there. As long as the fastpath is kept this tight, feel free to do
cleanups. But i really want to avoid having to write mutex_down/up in
assembly for 24 architectures ...
Ingo
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-23 12:36 ` 2.6.13-rc6-rt9 Ingo Molnar
@ 2005-08-23 12:50 ` Steven Rostedt
2005-08-23 12:56 ` 2.6.13-rc6-rt9 Ingo Molnar
0 siblings, 1 reply; 63+ messages in thread
From: Steven Rostedt @ 2005-08-23 12:50 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney
On Tue, 2005-08-23 at 14:36 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Ingo, can't you get rt.c to be more confusing. I mean it is too
> > simple. We need to add a few more underscores here and there :-)
> > Seriously, that rt.c is mind boggling. It was nice before, now it is
> > just screaming for a cleanup (come now, do we really need the four
> > underscores?). Same with latency.c.
>
> i agree that it's ugly, but some of that ugliness is to achieve the
> 7-instructions fail-through codepath for the common acquire (and
> release) codepath:
>
> c03a5320 <__down_mutex>:
> c03a5320: 89 c1 mov %eax,%ecx
> c03a5322: 8b 15 08 76 3a c0 mov 0xc03a7608,%edx
> c03a5328: 31 c0 xor %eax,%eax
> c03a532a: 0f b1 51 14 cmpxchg %edx,0x14(%ecx)
> c03a532e: 85 c0 test %eax,%eax
> c03a5330: 75 01 jne c03a5333 <__down_mutex+0x13>
> c03a5332: c3 ret
>
Impressive!
> that's how much it takes to acquire an RT lock, and i worked hard to get
> there. As long as the fastpath is kept this tight, feel free to do
> cleanups. But i really want to avoid having to write mutex_down/up in
> assembly for 24 architectures ...
Warning! I'm hacking hard to get rid of the global pi_lock, and I'm not
worrying now about efficiency. I figure that if I can get it to work,
then we can speed it up afterwards. Since it's complex enough keeping
all the locks straight, I just want it to work without deadlocking.
Once I get it to work, I'll let you figure out how get it back down to
7-instructions :-)
-- Steve
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-23 12:50 ` 2.6.13-rc6-rt9 Steven Rostedt
@ 2005-08-23 12:56 ` Ingo Molnar
0 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2005-08-23 12:56 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney
* Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2005-08-23 at 14:36 +0200, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > Ingo, can't you get rt.c to be more confusing. I mean it is too
> > > simple. We need to add a few more underscores here and there :-)
> > > Seriously, that rt.c is mind boggling. It was nice before, now it is
> > > just screaming for a cleanup (come now, do we really need the four
> > > underscores?). Same with latency.c.
> >
> > i agree that it's ugly, but some of that ugliness is to achieve the
> > 7-instructions fail-through codepath for the common acquire (and
> > release) codepath:
> >
> > c03a5320 <__down_mutex>:
> > c03a5320: 89 c1 mov %eax,%ecx
> > c03a5322: 8b 15 08 76 3a c0 mov 0xc03a7608,%edx
> > c03a5328: 31 c0 xor %eax,%eax
> > c03a532a: 0f b1 51 14 cmpxchg %edx,0x14(%ecx)
> > c03a532e: 85 c0 test %eax,%eax
> > c03a5330: 75 01 jne c03a5333 <__down_mutex+0x13>
> > c03a5332: c3 ret
> >
>
> Impressive!
>
> > that's how much it takes to acquire an RT lock, and i worked hard to get
> > there. As long as the fastpath is kept this tight, feel free to do
> > cleanups. But i really want to avoid having to write mutex_down/up in
> > assembly for 24 architectures ...
>
> Warning! I'm hacking hard to get rid of the global pi_lock, and I'm not
> worrying now about efficiency. I figure that if I can get it to work,
> then we can speed it up afterwards. Since it's complex enough keeping
> all the locks straight, I just want it to work without deadlocking.
>
> Once I get it to work, I'll let you figure out how get it back down to
> 7-instructions :-)
yeah. It can always be done after the fact - the basics wont change.
(Note that the above disassembly is for UP, on SMP the fastpath is
longer and around 10-15 instructions.)
Ingo
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
` (4 preceding siblings ...)
2005-08-19 6:39 ` 2.6.13-rc6-rt9 Steven Rostedt
@ 2005-08-19 16:56 ` Peter Zijlstra
2005-08-19 18:30 ` 2.6.13-rc6-rt9 Peter Zijlstra
2005-08-19 21:50 ` 2.6.13-rc6-rt9 Darren Hart
` (2 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2005-08-19 16:56 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, Paul E. McKenney
Hi Ingo, Paul, others,
I'm trying to run a user-mode-linux guest under the RT kernel however
the uml process never gets out of the calibrate delay loop. It seems as
if the signal never gets through.
A non -rt host kernel does work (with a similar .config).
Could this be related to pauls task list changes?
Kind regards,
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-19 16:56 ` 2.6.13-rc6-rt9 Peter Zijlstra
@ 2005-08-19 18:30 ` Peter Zijlstra
2005-08-19 18:43 ` 2.6.13-rc6-rt9 Paul E. McKenney
0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2005-08-19 18:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, Paul E. McKenney
On Fri, 2005-08-19 at 18:56 +0200, Peter Zijlstra wrote:
> Hi Ingo, Paul, others,
>
> I'm trying to run a user-mode-linux guest under the RT kernel however
> the uml process never gets out of the calibrate delay loop. It seems as
> if the signal never gets through.
>
one clarification: the guest kernel is a non -rt kernel, a modified
2.6.13-rc6 in my case.
> A non -rt host kernel does work (with a similar .config).
>
> Could this be related to pauls task list changes?
>
> Kind regards,
>
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-19 18:30 ` 2.6.13-rc6-rt9 Peter Zijlstra
@ 2005-08-19 18:43 ` Paul E. McKenney
2005-08-20 19:27 ` 2.6.13-rc6-rt9 Peter Zijlstra
0 siblings, 1 reply; 63+ messages in thread
From: Paul E. McKenney @ 2005-08-19 18:43 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt
On Fri, Aug 19, 2005 at 08:30:05PM +0200, Peter Zijlstra wrote:
> On Fri, 2005-08-19 at 18:56 +0200, Peter Zijlstra wrote:
> > Hi Ingo, Paul, others,
> >
> > I'm trying to run a user-mode-linux guest under the RT kernel however
> > the uml process never gets out of the calibrate delay loop. It seems as
> > if the signal never gets through.
> >
> one clarification: the guest kernel is a non -rt kernel, a modified
> 2.6.13-rc6 in my case.
>
> > A non -rt host kernel does work (with a similar .config).
> >
> > Could this be related to pauls task list changes?
Possibly. What signal? This is a signal to a single process, right?
Thanx, Paul
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-19 18:43 ` 2.6.13-rc6-rt9 Paul E. McKenney
@ 2005-08-20 19:27 ` Peter Zijlstra
2005-08-20 21:24 ` 2.6.13-rc6-rt9 Jeff Dike
0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2005-08-20 19:27 UTC (permalink / raw)
To: Jeff Dike, paulmck
Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt
On Fri, 2005-08-19 at 11:43 -0700, Paul E. McKenney wrote:
> On Fri, Aug 19, 2005 at 08:30:05PM +0200, Peter Zijlstra wrote:
> > On Fri, 2005-08-19 at 18:56 +0200, Peter Zijlstra wrote:
> > > Hi Ingo, Paul, others,
> > >
> > > I'm trying to run a user-mode-linux guest under the RT kernel however
> > > the uml process never gets out of the calibrate delay loop. It seems as
> > > if the signal never gets through.
> > >
> > one clarification: the guest kernel is a non -rt kernel, a modified
> > 2.6.13-rc6 in my case.
> >
> > > A non -rt host kernel does work (with a similar .config).
> > >
> > > Could this be related to pauls task list changes?
>
> Possibly. What signal? This is a signal to a single process, right?
>
Jeff, could you help us out here?
What exactly does uml need to get out of the calibrate delay loop?
Kind regards,
Peter Zijlstra
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-20 19:27 ` 2.6.13-rc6-rt9 Peter Zijlstra
@ 2005-08-20 21:24 ` Jeff Dike
2005-09-29 7:54 ` 2.6.13-rc6-rt9 Peter Zijlstra
0 siblings, 1 reply; 63+ messages in thread
From: Jeff Dike @ 2005-08-20 21:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: paulmck, Ingo Molnar, linux-kernel, Thomas Gleixner,
Steven Rostedt
On Sat, Aug 20, 2005 at 09:27:25PM +0200, Peter Zijlstra wrote:
> Jeff, could you help us out here?
> What exactly does uml need to get out of the calibrate delay loop?
Interrupts, it's not too demanding :-)
If it's not seeing VTALRM, then it will never leave the calibration loop.
Try stracing it and see what it's getting.
Jeff
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-20 21:24 ` 2.6.13-rc6-rt9 Jeff Dike
@ 2005-09-29 7:54 ` Peter Zijlstra
2005-09-30 1:00 ` 2.6.13-rc6-rt9 Paul E. McKenney
0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2005-09-29 7:54 UTC (permalink / raw)
To: paulmck, Ingo Molnar, linux-kernel, Thomas Gleixner,
Steven Rostedt
Cc: Jeff Dike
On Sat, 2005-08-20 at 17:24 -0400, Jeff Dike wrote:
> On Sat, Aug 20, 2005 at 09:27:25PM +0200, Peter Zijlstra wrote:
> > Jeff, could you help us out here?
> > What exactly does uml need to get out of the calibrate delay loop?
>
> Interrupts, it's not too demanding :-)
>
> If it's not seeing VTALRM, then it will never leave the calibration loop.
>
> Try stracing it and see what it's getting.
Sorry for the late reply.
Yes, that does seem to be the problem.
Even with a current -rt (2.6.14-rc2-rt5) UML does not run. The issue is
indeed (as jeff pointed out) that VTALRM is never send. The small test
programm below illustrates this.
On a non-rt kernel it completed in 1 second.
On a -rt kernel it waits at infinitum.
Kind regards,
Peter Zijlstra
---------------------------
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>
#include <signal.h>
volatile int quit = 0;
void sig_vtalrm(int signr, siginfo_t * si, void * arg)
{
if (signr == SIGVTALRM) quit = 1;
}
int main()
{
struct itimerval ival = {{0,0}, {1, 0}};
struct sigaction sa;
sa.sa_sigaction = sig_vtalrm;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGVTALRM, &sa, NULL);
setitimer(ITIMER_VIRTUAL, &ival, NULL);
printf("wait\n");
while (!quit) ;
printf("done\n");
}
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-09-29 7:54 ` 2.6.13-rc6-rt9 Peter Zijlstra
@ 2005-09-30 1:00 ` Paul E. McKenney
2005-09-30 1:07 ` 2.6.13-rc6-rt9 Thomas Gleixner
0 siblings, 1 reply; 63+ messages in thread
From: Paul E. McKenney @ 2005-09-30 1:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt,
Jeff Dike
On Thu, Sep 29, 2005 at 09:54:23AM +0200, Peter Zijlstra wrote:
> On Sat, 2005-08-20 at 17:24 -0400, Jeff Dike wrote:
> > On Sat, Aug 20, 2005 at 09:27:25PM +0200, Peter Zijlstra wrote:
> > > Jeff, could you help us out here?
> > > What exactly does uml need to get out of the calibrate delay loop?
> >
> > Interrupts, it's not too demanding :-)
> >
> > If it's not seeing VTALRM, then it will never leave the calibration loop.
> >
> > Try stracing it and see what it's getting.
>
> Sorry for the late reply.
>
> Yes, that does seem to be the problem.
>
> Even with a current -rt (2.6.14-rc2-rt5) UML does not run. The issue is
> indeed (as jeff pointed out) that VTALRM is never send. The small test
> programm below illustrates this.
>
> On a non-rt kernel it completed in 1 second.
> On a -rt kernel it waits at infinitum.
Will play with it and see what I broke...
Thanx, Paul
> Kind regards,
>
> Peter Zijlstra
>
> ---------------------------
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/time.h>
> #include <signal.h>
>
> volatile int quit = 0;
>
> void sig_vtalrm(int signr, siginfo_t * si, void * arg)
> {
> if (signr == SIGVTALRM) quit = 1;
> }
>
> int main()
> {
> struct itimerval ival = {{0,0}, {1, 0}};
>
> struct sigaction sa;
> sa.sa_sigaction = sig_vtalrm;
> sigemptyset(&sa.sa_mask);
> sa.sa_flags = 0;
> sigaction(SIGVTALRM, &sa, NULL);
>
> setitimer(ITIMER_VIRTUAL, &ival, NULL);
>
> printf("wait\n");
> while (!quit) ;
> printf("done\n");
> }
>
>
> --
> Peter Zijlstra <a.p.zijlstra@chello.nl>
>
>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-09-30 1:00 ` 2.6.13-rc6-rt9 Paul E. McKenney
@ 2005-09-30 1:07 ` Thomas Gleixner
2005-09-30 1:46 ` 2.6.13-rc6-rt9 Paul E. McKenney
0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-09-30 1:07 UTC (permalink / raw)
To: paulmck
Cc: George Anzinger, Peter Zijlstra, Ingo Molnar, linux-kernel,
Steven Rostedt, Jeff Dike
On Thu, 2005-09-29 at 18:00 -0700, Paul E. McKenney wrote:
> > Even with a current -rt (2.6.14-rc2-rt5) UML does not run. The issue is
> > indeed (as jeff pointed out) that VTALRM is never send. The small test
> > programm below illustrates this.
> >
> > On a non-rt kernel it completed in 1 second.
> > On a -rt kernel it waits at infinitum.
>
> Will play with it and see what I broke...
Paul,
you are not the culprit :)
The run_posix_cpu_timers(p) call is #ifdef'd out with PREEMPT_RT.
Thats a hard to fix issue.
It can not be run from hardirq context, as it takes a lot of locks
(especially our favorites: tasklist_lock and sighand->siglock). :(
Maybe another playground for rcu, but it might also be solved by some
other mechanism for accounting and delayed execution in the PREEMPT_RT
case.
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-09-30 1:07 ` 2.6.13-rc6-rt9 Thomas Gleixner
@ 2005-09-30 1:46 ` Paul E. McKenney
2005-09-30 6:17 ` 2.6.13-rc6-rt9 Thomas Gleixner
0 siblings, 1 reply; 63+ messages in thread
From: Paul E. McKenney @ 2005-09-30 1:46 UTC (permalink / raw)
To: Thomas Gleixner
Cc: George Anzinger, Peter Zijlstra, Ingo Molnar, linux-kernel,
Steven Rostedt, Jeff Dike
On Fri, Sep 30, 2005 at 03:07:29AM +0200, Thomas Gleixner wrote:
> On Thu, 2005-09-29 at 18:00 -0700, Paul E. McKenney wrote:
> > > Even with a current -rt (2.6.14-rc2-rt5) UML does not run. The issue is
> > > indeed (as jeff pointed out) that VTALRM is never send. The small test
> > > programm below illustrates this.
> > >
> > > On a non-rt kernel it completed in 1 second.
> > > On a -rt kernel it waits at infinitum.
> >
> > Will play with it and see what I broke...
>
> Paul,
>
> you are not the culprit :)
Woo-hoo!!! Exonerated!!! This time, anyway... ;-)
> The run_posix_cpu_timers(p) call is #ifdef'd out with PREEMPT_RT.
>
> Thats a hard to fix issue.
>
> It can not be run from hardirq context, as it takes a lot of locks
> (especially our favorites: tasklist_lock and sighand->siglock). :(
>
> Maybe another playground for rcu, but it might also be solved by some
> other mechanism for accounting and delayed execution in the PREEMPT_RT
> case.
Certainly check_thread_timers() and check_process_timers() are playing
with a number of task_struct fields, so it is not immediately clear
to me how to safely replace tasklist_lock with RCU, at least not with
a simple and small patch.
What did you have in mind for delayed execution?
Thanx, Paul
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-09-30 1:46 ` 2.6.13-rc6-rt9 Paul E. McKenney
@ 2005-09-30 6:17 ` Thomas Gleixner
0 siblings, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-09-30 6:17 UTC (permalink / raw)
To: paulmck
Cc: George Anzinger, Peter Zijlstra, Ingo Molnar, linux-kernel,
Steven Rostedt, Jeff Dike
On Thu, 2005-09-29 at 18:46 -0700, Paul E. McKenney wrote:
> > you are not the culprit :)
>
> Woo-hoo!!! Exonerated!!! This time, anyway... ;-)
My pleasure :)
> > It can not be run from hardirq context, as it takes a lot of locks
> > (especially our favorites: tasklist_lock and sighand->siglock). :(
> >
> > Maybe another playground for rcu, but it might also be solved by some
> > other mechanism for accounting and delayed execution in the PREEMPT_RT
> > case.
>
> Certainly check_thread_timers() and check_process_timers() are playing
> with a number of task_struct fields, so it is not immediately clear
> to me how to safely replace tasklist_lock with RCU, at least not with
> a simple and small patch.
>
> What did you have in mind for delayed execution?
Do only the time check in hard irq context and defer the lock protected
operations to a softirq context. Have to look deeper at the details
though.
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
` (5 preceding siblings ...)
2005-08-19 16:56 ` 2.6.13-rc6-rt9 Peter Zijlstra
@ 2005-08-19 21:50 ` Darren Hart
2005-08-25 6:24 ` 2.6.13-rc6-rt9 Ingo Molnar
2005-08-19 22:13 ` 2.6.13-rc6-rt9 Darren Hart
2005-08-19 23:48 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Thomas Gleixner
8 siblings, 1 reply; 63+ messages in thread
From: Darren Hart @ 2005-08-19 21:50 UTC (permalink / raw)
To: Ingo Molnar, lkml,
Ingo Molnar wrote:
> i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from
> the usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
I'm looking into getting HRT and RT booting on a SUMMIT NUMA machine
(cyclone timer), but after s/error/warning/ in arch/i386/timers/timer.c
for the HRT cyclone ifdef, I still get the following link error:
LD .tmp_vmlinux1
net/built-in.o(.init.text+0xdae): In function `sock_ioctl':
net/socket.c:868: undefined reference to `__you_cannot_kmalloc_that_much'
make: *** [.tmp_vmlinux1] Error 1
I was expecting to be able to build the kernel, and have it crash on
boot (due to unsynched TSCs I'm guessing) and then start debugging/devel
from there. But the the rt9 patch won't build (HRT standalone for
2.6.10 does build). Anyone else seeing this?
> it's a fixes-only release. Changes since 2.6.13-rc6-rt3:
>
> - USB irq flags use cleanups (Alan Stern)
>
> - RCU tasklist-lock fixes (Paul McKenney, Thomas Gleixner)
>
> - HR-timers waitqueue splitup, better HRT latencies (Thomas Gleixner)
>
> - latency tracer fixes, irq flags tracing cleanups (Steven Rostedt, me)
>
> - NFSd BKL unlock fix (Steven Rostedt)
>
> - stackfootprint-max-printer fix (Steven Rostedt)
>
> - stop_machine fix (Steven Rostedt)
>
> - lpptest fix (me)
>
> - turned off IOAPIC_POSTFLUSH when CONFIG_X86_IOAPIC_FAST. Now with
> Karsten's VIA fixes my testbox does not show PCI-POST weirnesses
> anymore. In case of IRQ problems please turn off IOAPIC_FAST. (me)
>
> to build a 2.6.13-rc6-rt9 tree, the following patches should be applied:
>
> http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.12.tar.bz2
> http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.13-rc6.bz2
> http://redhat.com/~mingo/realtime-preempt/patch-2.6.13-rc6-rt9
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
Darren Hart
IBM Linux Technology Center
Linux Kernel Team
Phone: 503 578 3185
T/L: 775 3185
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
` (6 preceding siblings ...)
2005-08-19 21:50 ` 2.6.13-rc6-rt9 Darren Hart
@ 2005-08-19 22:13 ` Darren Hart
2005-08-19 23:00 ` 2.6.13-rc6-rt9 Thomas Gleixner
2005-08-19 23:48 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Thomas Gleixner
8 siblings, 1 reply; 63+ messages in thread
From: Darren Hart @ 2005-08-19 22:13 UTC (permalink / raw)
To: Ingo Molnar, lkml,
Ingo Molnar wrote:
> i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from
> the usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
I was trying to use another HRT clock source and couldn't get menuconfig
to let me select acpi-pm-timer, turns out it has been disabled in
arch/i386/Kconfig, but the description is still in the help...
# config HIGH_RES_TIMER_ACPI_PM
# bool "ACPI-pm-timer"
Is the pm timer incompatible with the RT portion of this patch?
Thanks,
--Darren
> it's a fixes-only release. Changes since 2.6.13-rc6-rt3:
>
> - USB irq flags use cleanups (Alan Stern)
>
> - RCU tasklist-lock fixes (Paul McKenney, Thomas Gleixner)
>
> - HR-timers waitqueue splitup, better HRT latencies (Thomas Gleixner)
>
> - latency tracer fixes, irq flags tracing cleanups (Steven Rostedt, me)
>
> - NFSd BKL unlock fix (Steven Rostedt)
>
> - stackfootprint-max-printer fix (Steven Rostedt)
>
> - stop_machine fix (Steven Rostedt)
>
> - lpptest fix (me)
>
> - turned off IOAPIC_POSTFLUSH when CONFIG_X86_IOAPIC_FAST. Now with
> Karsten's VIA fixes my testbox does not show PCI-POST weirnesses
> anymore. In case of IRQ problems please turn off IOAPIC_FAST. (me)
>
> to build a 2.6.13-rc6-rt9 tree, the following patches should be applied:
>
> http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.12.tar.bz2
> http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.13-rc6.bz2
> http://redhat.com/~mingo/realtime-preempt/patch-2.6.13-rc6-rt9
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
Darren Hart
IBM Linux Technology Center
Linux Kernel Team
Phone: 503 578 3185
T/L: 775 3185
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: 2.6.13-rc6-rt9
2005-08-19 22:13 ` 2.6.13-rc6-rt9 Darren Hart
@ 2005-08-19 23:00 ` Thomas Gleixner
2005-08-20 15:13 ` 2.6.13-rc6-rt9 Darren Hart
0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-19 23:00 UTC (permalink / raw)
To: Darren Hart; +Cc: Ingo Molnar, lkml,
On Fri, 2005-08-19 at 15:13 -0700, Darren Hart wrote:
> I was trying to use another HRT clock source and couldn't get menuconfig
> to let me select acpi-pm-timer, turns out it has been disabled in
> arch/i386/Kconfig, but the description is still in the help...
>
>
> # config HIGH_RES_TIMER_ACPI_PM
> # bool "ACPI-pm-timer"
>
> Is the pm timer incompatible with the RT portion of this patch?
The only timesource I came around to fixup is TSC in combination with
PIT or preferred Local APIC. Add "lapic" to your kernel command line for
UP boxen. Therefor it is disabled for now.
> I'm looking into getting HRT and RT booting on a SUMMIT NUMA machine
> (cyclone timer), but after s/error/warning/ in arch/i386/timers/timer.c
> for the HRT cyclone ifdef, I still get the following link error:
It should be simple to fix this. Just not right now. I have no such box
and restricted time resources. Can you test a patch when I find a slot?
But of course you are heartely invited to fix it yourself :)
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: 2.6.13-rc6-rt9
2005-08-19 23:00 ` 2.6.13-rc6-rt9 Thomas Gleixner
@ 2005-08-20 15:13 ` Darren Hart
0 siblings, 0 replies; 63+ messages in thread
From: Darren Hart @ 2005-08-20 15:13 UTC (permalink / raw)
To: tglx; +Cc: Ingo Molnar, lkml,
Thomas Gleixner wrote:
> On Fri, 2005-08-19 at 15:13 -0700, Darren Hart wrote:
>
>>I was trying to use another HRT clock source and couldn't get menuconfig
>>to let me select acpi-pm-timer, turns out it has been disabled in
>>arch/i386/Kconfig, but the description is still in the help...
>>
>>
>># config HIGH_RES_TIMER_ACPI_PM
>># bool "ACPI-pm-timer"
>>
>>Is the pm timer incompatible with the RT portion of this patch?
>
>
> The only timesource I came around to fixup is TSC in combination with
> PIT or preferred Local APIC. Add "lapic" to your kernel command line for
> UP boxen. Therefor it is disabled for now.
>
>
>>I'm looking into getting HRT and RT booting on a SUMMIT NUMA machine
>>(cyclone timer), but after s/error/warning/ in arch/i386/timers/timer.c
>>for the HRT cyclone ifdef, I still get the following link error:
>
>
> It should be simple to fix this. Just not right now. I have no such box
> and restricted time resources. Can you test a patch when I find a slot?
Absolutely.
> But of course you are heartely invited to fix it yourself :)
As always :-)
>
> tglx
Thanks for the response, I wanted to hear your take on this rather than making
any assumptions as to the state of the patch.
>
>
>
--
Darren Hart
IBM Linux Technology Center
Linux Kernel Team
Phone: 503 578 3185
T/L: 775 3185
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-18 6:01 2.6.13-rc6-rt9 Ingo Molnar
` (7 preceding siblings ...)
2005-08-19 22:13 ` 2.6.13-rc6-rt9 Darren Hart
@ 2005-08-19 23:48 ` Thomas Gleixner
2005-08-20 0:19 ` George Anzinger
` (3 more replies)
8 siblings, 4 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-19 23:48 UTC (permalink / raw)
To: Ingo Molnar
Cc: Roland McGrath, Oleg Nesterov, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney
[-- Attachment #1: Type: text/plain, Size: 13247 bytes --]
Hi all,
On Thu, 2005-08-18 at 08:01 +0200, Ingo Molnar wrote:
> i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from
> the usual place:
I reworked the code for dynamically setting the priority of the hrtimer
softirq to be aware of PI.
The current function "mutex_chprio()" in -rt9 is equivivalent to a
static priority assigment of the hrtimer softirq. The new PI aware
function resolves this nicely and the latency improvements are
impressive (factor 2-3).
I stumbled over "//#define MUTEX_IRQS_OFF" in the first attempt. My
assumption that all code protected by pi_lock (which is a raw lock) is
irq save turned out to be wrong. I missed that commented define :(
I guess it was introduced during the "IRQ latency contest" to squeeze
out the last nsecs :)
Switching it back on is not really influencing system performance in a
measurable way, but it allows to use the pi aware boosting function in
irq context.
Quite contrary it makes the system more snappy and the overall test
latencies go down.
Ingo ????
I assume you run with tracing enabled all the time, which conceals this
regression.
This is not only a demand for HRT, I'm aware of a _real world_
application which requires such functionality depending on the status of
the device. It runs on a low performance machine so avoiding task
switches is really appreciated.
--------------------------------------------
Find attached my current config and a simple test program for cyclic
scheduling. Read the top source comment what you need and where to get
it.
Please use the -n option for now until we have sorted out following
problems:
1. tasklist_lock contention
AFAICT tasklist_lock is the longest held lock in the kernel aside BKL,
but this problem seems to be solved.
send_sigqueue is called from posix_timer_fn() and acquires
tasklist_lock, which makes no sense to me.
send_sigqueue()s (l)onl(e)y user is the posix_timer function
(posix_timer_fn(), calling posix_timer_event()).
Each posix timer blocks the task from vanishing away by
get_task_struct(), which is protected by the held tasklist_lock.
The task can neither go away nor the signal handler can change until
put_task_struct() is called inside release_posix_timer(), which removes
any chance to do an invalid access to either task or sighand because the
relevant timer is deleted before the call to put_task_struct(). Also
this call is protected by tasklist_lock().
If I understand the code correctly then the tasklist_lock acquirement in
send_sigqueue() is superfluous. If not then the "(un)register action" of
put/get_task_struct() is just hot air action. AFAICS from kernel history
this interface was introduced to provide lock free access to certain
parts of the signaling code where the caller holds a reference to the
task structure instead of running through the PITA of find_task_by_pid()
- of course with tasklist_lock held - for every signal to deliver.
I ran already extensive tests on a kernel with the lock acquirement
removed without any problems - as expected by my shortsight :)
Can the experts please shed some light on me ?
Oleg, Paul, Roland ??? (Alphabetic order :)
If I'm not completely wrong, then this mechanism should be generalized
to allow other users than posixtimers a simple un/registering of this
protection. I know a couple of things where this would be useful. I'd be
happy to write up the relevant patch for the general interface.
If I missed something important, feel free to call me an (over-worked)
idiot :)
2. Drift of cyclic timers (armed by set_timer()):
Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline.
I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.
George ???
Cheers,
tglx
-----------------------------------
Some numbers:
Test scenario:
PIII 666MHZ 128MB
Kernel command line: root=/dev/hda1 ro console=ttyS0,115200 lapic nmi_watchdog=2
Using lapic gives definitely better results than using the solely PIT
based version. Make sure to add it to your command line, if you have a
x86 box. PPC is fine without.
Load:
ssh1# while true; do hackbench 20; done
ssh2# while true; do find -type f | xargs grep kernel; done
(Over a large tree so caching is not relevant. The hd led is almost permanent on)
ping -f to the test box
All I(nterval)/Min/Act/Max values in usecs.
test1:/home/tglx/cyclictest# ./cyclictest -p 80 -n -i 1100
293.12 276.22 198.22 88/202 12270 <---- output from /proc/loadavg
T: 0 ( 5407) P:80 I: 1100 C: 2047066 Min: 7 Act: 15 Max: 59
test1:/home/tglx/cyclictest# ./cyclictest -p 80 -n -t 2 -d 10
296.17 275.21 193.82 52/202 22270
T: 0 ( 5407) P:80 I: 1000 C: 1047066 Min: 7 Act: 15 Max: 60
T: 1 ( 5408) P:79 I: 1010 C: 1036659 Min: 7 Act: 11 Max: 79
test1:/home/tglx/cyclictest# ./cyclictest -p 80 -n -t 5
130.31 124.17 148.68 3/525 19070
T: 0 ( 5473) P:80 I: 1000 C: 2631537 Min: 7 Act: 18 Max: 73
T: 1 ( 5474) P:79 I: 1500 C: 1754358 Min: 7 Act: 15 Max: 70
T: 2 ( 5475) P:78 I: 2000 C: 1315769 Min: 6 Act: 13 Max: 70
T: 3 ( 5476) P:77 I: 2500 C: 1052615 Min: 7 Act: 12 Max: 68
T: 4 ( 5477) P:76 I: 3000 C: 877179 Min: 6 Act: 10 Max: 66
test1:/home/tglx/cyclictest# ./cyclictest -p 80 -n -t 5 -i 800 -d 240
177.44 130.70 129.67 15/176 7217
T: 0 (21198) P:80 I: 800 C: 1578221 Min: 7 Act: 13 Max: 82
T: 1 (21199) P:79 I: 1040 C: 1214016 Min: 7 Act: 12 Max: 84
T: 2 (21200) P:78 I: 1280 C: 986388 Min: 7 Act: 13 Max: 60
T: 3 (21201) P:77 I: 1520 C: 830643 Min: 7 Act: 14 Max: 71
T: 4 (21202) P:76 I: 1760 C: 717373 Min: 7 Act: 13 Max: 66
test1:/home/tglx/cyclictest# ./cyclictest -p 80 -n -t 5 -i 666 -d 33
79.75 106.97 106.85 43/869 30355
T: 0 (25687) P:80 I: 666 C: 9329552 Min: 6 Act: 9 Max: 79
T: 1 (25688) P:79 I: 699 C: 8889101 Min: 6 Act: 12 Max: 80
T: 2 (25689) P:78 I: 732 C: 8488363 Min: 6 Act: 10 Max: 87
T: 3 (25690) P:77 I: 765 C: 8122198 Min: 6 Act: 12 Max: 83
T: 4 (25692) P:76 I: 798 C: 7786318 Min: 6 Act: 10 Max: 94
-----------------------------------
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt9/include/linux/sched.h linux-2.6.13-rc6-rt9.work/include/linux/sched.h
--- linux-2.6.13-rc6-rt9/include/linux/sched.h 2005-08-18 17:37:42.000000000 +0200
+++ linux-2.6.13-rc6-rt9.work/include/linux/sched.h 2005-08-19 18:45:31.000000000 +0200
@@ -1134,7 +1134,7 @@ extern int idle_cpu(int cpu);
extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
extern task_t *idle_task(int cpu);
extern void mutex_setprio(task_t *p, int prio);
-extern void mutex_chprio(task_t *p, int prio);
+extern void pi_changeprio(task_t *p, int prio);
extern int normal_prio(task_t *p);
void yield(void);
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt9/kernel/rt.c linux-2.6.13-rc6-rt9.work/kernel/rt.c
--- linux-2.6.13-rc6-rt9/kernel/rt.c 2005-08-18 17:37:42.000000000 +0200
+++ linux-2.6.13-rc6-rt9.work/kernel/rt.c 2005-08-20 00:49:15.000000000 +0200
@@ -201,15 +201,9 @@ do { \
# define trace_local_irq_disable(ti) raw_local_irq_disable()
# define trace_local_irq_enable(ti) raw_local_irq_enable()
# define trace_local_irq_restore(flags, ti) raw_local_irq_restore(flags)
-#else /* !CONFIG_RT_DEADLOCK_DETECT */
-/*
- * Two variants: irqs off and preempt-counter based.
- * preempt-counter based variant seems to be a bit faster.
- */
-//#define MUTEX_IRQS_OFF
+#else /* !CONFIG_RT_DEADLOCK_DETECT */
-#ifdef MUTEX_IRQS_OFF
# define trace_lock_irq(lock, ti) do { raw_local_irq_disable(); (void)(ti); } while (0)
# define trace_lock_irqsave(lock, flags, ti) do { raw_local_irq_save(flags); (void)(ti); } while (0)
# define trace_unlock_irq(lock, ti) raw_local_irq_enable()
@@ -217,15 +211,6 @@ do { \
# define trace_local_irq_disable(ti) raw_local_irq_disable()
# define trace_local_irq_enable(ti) raw_local_irq_enable()
# define trace_local_irq_restore(flags, ti) raw_local_irq_restore(flags)
-#else
-# define trace_lock_irq(lock, ti) preempt_disable_ti(ti)
-# define trace_lock_irqsave(lock, flags, ti) do { (void)flags; preempt_disable_ti(ti); } while (0)
-# define trace_unlock_irq(lock, ti) preempt_enable_ti(ti)
-# define trace_unlock_irqrestore(lock, flags, ti) do { (void)flags; preempt_enable_ti(ti); } while (0)
-# define trace_local_irq_disable(ti) preempt_disable_ti(ti)
-# define trace_local_irq_enable(ti) preempt_enable_ti(ti)
-# define trace_local_irq_restore(flags, ti) do { (void)flags; preempt_enable_ti(ti); } while (0)
-#endif
# define trace_unlock(lock, ti) do { } while (0)
@@ -235,6 +220,7 @@ do { \
# define TRACE_WARN_ON_LOCKED(c) do { } while (0)
# define TRACE_OFF() do { } while (0)
# define TRACE_BUG_ON_LOCKED(c) do { } while (0)
+
#endif /* CONFIG_RT_DEADLOCK_DETECT */
/*
@@ -829,6 +815,50 @@ static void pi_setprio(struct rt_mutex *
}
}
+/*
+ * Change priority of a task pi aware
+ *
+ * There are several aspects to consider:
+ * - task is priority boosted
+ * - task is blocked on a mutex
+ *
+ */
+void pi_changeprio(struct task_struct *p, int prio)
+{
+ unsigned long flags;
+ int oldprio;
+
+ spin_lock_irqsave(&pi_lock, flags);
+
+ oldprio = p->normal_prio;
+ if (oldprio == prio)
+ goto out;
+
+ /* Set normal prio in any case */
+ p->normal_prio = prio;
+
+ /* Check, if we can safely lower the priority */
+ if (prio > p->prio && !plist_empty(&p->pi_waiters)) {
+ struct rt_mutex_waiter *w;
+ w = plist_first_entry(&p->pi_waiters,
+ struct rt_mutex_waiter, pi_list);
+ if (w->ti->task->prio < prio)
+ prio = w->ti->task->prio;
+ }
+
+ if (prio == p->prio)
+ goto out;
+
+ /* Is task blocked on a mutex ? */
+ if (p->blocked_on)
+ pi_setprio(p->blocked_on->lock, p, prio);
+ else
+ mutex_setprio(p, prio);
+ out:
+ spin_unlock_irqrestore(&pi_lock, flags);
+
+}
+
static void
task_blocks_on_lock(struct rt_mutex_waiter *waiter, struct thread_info *ti,
struct rt_mutex *lock __EIP_DECL__)
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt9/kernel/sched.c linux-2.6.13-rc6-rt9.work/kernel/sched.c
--- linux-2.6.13-rc6-rt9/kernel/sched.c 2005-08-18 17:37:42.000000000 +0200
+++ linux-2.6.13-rc6-rt9.work/kernel/sched.c 2005-08-19 18:44:31.000000000 +0200
@@ -3954,58 +3954,6 @@ void mutex_setprio(task_t *p, int prio)
task_rq_unlock(rq, &flags);
}
-/*
- * Used by the PREEMPT_RT code to implement
- * priority inheritance logic for timers:
- *
- * (differs in that the priority change is permanent)
- */
-void mutex_chprio(task_t *p, int prio)
-{
- unsigned long flags;
- prio_array_t *array;
- runqueue_t *rq;
- int oldprio, prev_resched;
-
- BUG_ON(prio < 0 || prio > MAX_PRIO);
- WARN_ON(p->policy == SCHED_NORMAL);
-
- rq = task_rq_lock(p, &flags);
-
- oldprio = p->prio;
- array = p->array;
- if (array)
- dequeue_task(p, array);
- p->normal_prio = prio;
- if (prio < p->prio)
- p->prio = prio;
-
- trace_special_pid(p->pid, oldprio, prio);
- prev_resched = _need_resched();
- if (array) {
- /*
- * If changing to an RT priority then queue it
- * in the active array!
- */
- if (rt_task(p))
- array = rq->active;
- enqueue_task(p, array);
- /*
- * Reschedule if we are currently running on this runqueue and
- * our priority decreased, or if we are not currently running on
- * this runqueue and our priority is higher than the current's
- */
- if (task_running(rq, p)) {
- if (p->prio > oldprio)
- resched_task(rq->curr);
- } else if (TASK_PREEMPTS_CURR(p, rq))
- resched_task(rq->curr);
- }
- trace_special(prev_resched, _need_resched(), 0);
-
- task_rq_unlock(rq, &flags);
-}
-
#ifdef __ARCH_WANT_SYS_NICE
/*
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt9/kernel/softirq.c linux-2.6.13-rc6-rt9.work/kernel/softirq.c
--- linux-2.6.13-rc6-rt9/kernel/softirq.c 2005-08-18 17:37:42.000000000 +0200
+++ linux-2.6.13-rc6-rt9.work/kernel/softirq.c 2005-08-19 18:44:31.000000000 +0200
@@ -314,7 +314,7 @@ void raise_softirq_prio(unsigned long nr
if (prio > (MAX_RT_PRIO - 1))
prio = MAX_RT_PRIO - 1;
- mutex_chprio(tsk, prio);
+ pi_changeprio(tsk, prio);
wake_up_process(tsk);
}
#endif
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6-rt9/kernel/timer.c linux-2.6.13-rc6-rt9.work/kernel/timer.c
--- linux-2.6.13-rc6-rt9/kernel/timer.c 2005-08-18 17:37:43.000000000 +0200
+++ linux-2.6.13-rc6-rt9.work/kernel/timer.c 2005-08-19 18:44:31.000000000 +0200
@@ -1731,7 +1731,7 @@ static inline void recalc_priority(tvec_
if (prio == hrbase->hr_prio)
return;
hrbase->hr_prio = prio;
- mutex_chprio(current, prio);
+ pi_changeprio(current, prio);
spin_unlock_irq(&hrbase->t_base.lock);
cond_resched();
spin_lock_irq(&hrbase->t_base.lock);
[-- Attachment #2: cyclictest-0.1.tar.bz2 --]
[-- Type: application/x-bzip-compressed-tar, Size: 3654 bytes --]
[-- Attachment #3: test.config.bz2 --]
[-- Type: application/x-bzip, Size: 7082 bytes --]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-19 23:48 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Thomas Gleixner
@ 2005-08-20 0:19 ` George Anzinger
2005-08-20 0:36 ` Thomas Gleixner
2005-08-20 14:10 ` Oleg Nesterov
` (2 subsequent siblings)
3 siblings, 1 reply; 63+ messages in thread
From: George Anzinger @ 2005-08-20 0:19 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, Oleg Nesterov, linux-kernel,
Steven Rostedt, Paul E. McKenney
Thomas Gleixner wrote:
~
>
> 2. Drift of cyclic timers (armed by set_timer()):
>
> Due to rounding errors and the drift adjustment code, the fixed
> increment which is precalculated when the timer is set up and added on
> rearm, I see creeping deviation from the timeline.
>
> I have a patch lined up to base the rearm on human (nsac) units, so this
> effect will go away. But this is waste of time until (1.) is not solved.
>
> George ???
Could I (we) see what you have in mind?
>
--
George Anzinger george@mvista.com
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-20 0:19 ` George Anzinger
@ 2005-08-20 0:36 ` Thomas Gleixner
2005-08-20 1:36 ` George Anzinger
0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-20 0:36 UTC (permalink / raw)
To: george
Cc: Ingo Molnar, Roland McGrath, Oleg Nesterov, linux-kernel,
Steven Rostedt, Paul E. McKenney
George,
On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote:
> > 2. Drift of cyclic timers (armed by set_timer()):
> >
> > Due to rounding errors and the drift adjustment code, the fixed
> > increment which is precalculated when the timer is set up and added on
> > rearm, I see creeping deviation from the timeline.
> >
> > I have a patch lined up to base the rearm on human (nsac) units, so this
> > effect will go away. But this is waste of time until (1.) is not solved.
> >
> > George ???
>
> Could I (we) see what you have in mind?
Nothing which applies clean at the moment and I have no access to the
box where the patch floats around.
It's simply explained.
Current code:
set_timer()
calc interval->jiffies / interval->arch_cycles;
based on it.interval
rearm()
timer->expires += interval->jiffies;
timer->arch_cycle_expires += interval->arch_cycles;
normalize(timer);
Patched code:
set_timer()
timer.interval = it.interval;
timer.next_expire = it.value;
both stored as timespec
rearm()
next_expire += interval;
calc timer->expires/arch_cycle_expires;
So on each rearm we eliminate the rounding errors and take the drift
adjustment into account.
It adds some calculation overhead to each rearm, but ....
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-20 0:36 ` Thomas Gleixner
@ 2005-08-20 1:36 ` George Anzinger
2005-09-26 21:03 ` Roland McGrath
0 siblings, 1 reply; 63+ messages in thread
From: George Anzinger @ 2005-08-20 1:36 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, Oleg Nesterov, linux-kernel,
Steven Rostedt, Paul E. McKenney,
'high-res-timers-discourse@lists.sourceforge.net'
Thomas Gleixner wrote:
> George,
>
> On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote:
>
>>>2. Drift of cyclic timers (armed by set_timer()):
>>>
>>>Due to rounding errors and the drift adjustment code, the fixed
>>>increment which is precalculated when the timer is set up and added on
>>>rearm, I see creeping deviation from the timeline.
>>>
>>>I have a patch lined up to base the rearm on human (nsac) units, so this
>>>effect will go away. But this is waste of time until (1.) is not solved.
>>>
>>>George ???
>>
>>Could I (we) see what you have in mind?
>
>
> Nothing which applies clean at the moment and I have no access to the
> box where the patch floats around.
>
> It's simply explained.
>
> Current code:
>
> set_timer()
> calc interval->jiffies / interval->arch_cycles;
> based on it.interval
>
> rearm()
> timer->expires += interval->jiffies;
> timer->arch_cycle_expires += interval->arch_cycles;
> normalize(timer);
>
> Patched code:
>
> set_timer()
> timer.interval = it.interval;
> timer.next_expire = it.value;
> both stored as timespec
>
> rearm()
> next_expire += interval;
> calc timer->expires/arch_cycle_expires;
>
> So on each rearm we eliminate the rounding errors and take the drift
> adjustment into account.
>
> It adds some calculation overhead to each rearm, but ....
>
I think the standard was written to eliminate the need for this. The
notion is that we have a resolution which we use in the calculations so
while there may be drift WRT his request, there should be no drift WRT
the requested value rounded up to the next resolution.
Still, if we can't keep that resolution in arch_cycles...
On another issue along this line, I have been thinking of changing the
x86 TSC arch cycle size to 1ns. (NOT the resolution, the units for the
arch cycle.) The reason to do this is to correctly track changes in cpu
frequency as it is today, we would need to track down and update all
pending HR timers when ever the frequency changed. By using a common
unit all we need to do is change the conversion constants (well I guess
they would not be constants any more :). I REALLY don't want to do this
as it does add conversion overhead, but I can not think of another clean
way to track TSC frequency changes.
--
George Anzinger george@mvista.com
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-20 1:36 ` George Anzinger
@ 2005-09-26 21:03 ` Roland McGrath
0 siblings, 0 replies; 63+ messages in thread
From: Roland McGrath @ 2005-09-26 21:03 UTC (permalink / raw)
To: george
Cc: tglx, Ingo Molnar, Oleg Nesterov, linux-kernel, Steven Rostedt,
Paul E. McKenney,
'high-res-timers-discourse@lists.sourceforge.net'
> On another issue along this line, I have been thinking of changing the
> x86 TSC arch cycle size to 1ns. (NOT the resolution, the units for the
> arch cycle.) The reason to do this is to correctly track changes in cpu
> frequency as it is today, we would need to track down and update all
> pending HR timers when ever the frequency changed. By using a common
> unit all we need to do is change the conversion constants (well I guess
> they would not be constants any more :). I REALLY don't want to do this
> as it does add conversion overhead, but I can not think of another clean
> way to track TSC frequency changes.
sched_clock does this conversion at sampling time, too.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-19 23:48 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Thomas Gleixner
2005-08-20 0:19 ` George Anzinger
@ 2005-08-20 14:10 ` Oleg Nesterov
2005-08-20 16:04 ` Thomas Gleixner
2005-08-20 16:58 ` [PATCH] fix send_sigqueue() vs thread exit race Oleg Nesterov
2005-08-22 7:38 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Ingo Molnar
3 siblings, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-20 14:10 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney
Thomas Gleixner wrote:
>
> send_sigqueue is called from posix_timer_fn() and acquires
> tasklist_lock, which makes no sense to me.
>
> send_sigqueue()s (l)onl(e)y user is the posix_timer function
> (posix_timer_fn(), calling posix_timer_event()).
>
> Each posix timer blocks the task from vanishing away by
> get_task_struct(), which is protected by the held tasklist_lock.
>
> The task can neither go away nor the signal handler can change until
> put_task_struct() is called inside release_posix_timer(), which removes
> any chance to do an invalid access to either task or sighand because the
> relevant timer is deleted before the call to put_task_struct(). Also
> this call is protected by tasklist_lock().
Yes, the task_struct can't go away, but if process exited this
task_struct is just chunk of garbage. I think the intent was to
protect against this case.
However, I agree with you, locking the tasklist_lock can't help,
and the code is wrong.
posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
case) does not have PF_EXITING flag, then it calls send_sigqueue()
which locks task list. But if the thread exits in between the kernel
will oops.
posix_timer_event() runs under k_itimer.it_lock, but this does not
help if that thread was not the only one in thread group, in this
case we don't call exit_itimers().
The comment is wrong too. ->sighand can't change, we are clearing
posix timer on exec, and tasklist can't prevent ->sighand from
going away..
Ingo, Roland, George, am I wrong?
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-20 14:10 ` Oleg Nesterov
@ 2005-08-20 16:04 ` Thomas Gleixner
2005-08-20 17:50 ` Oleg Nesterov
2005-08-22 21:37 ` George Anzinger
0 siblings, 2 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-20 16:04 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney
On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote:
> posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
> case) does not have PF_EXITING flag, then it calls send_sigqueue()
> which locks task list. But if the thread exits in between the kernel
> will oops.
> posix_timer_event() runs under k_itimer.it_lock, but this does not
> help if that thread was not the only one in thread group, in this
> case we don't call exit_itimers().
I added exit_notify_itimers() for that case. It is synchronized vs.
posix_timer_event() via the timer lock and therefor protects against the
race described above.
It solves the problem for the only user of send_sigqueue for now, but I
think we should have a more general interface to such functionality to
allow simple usage.
tglx
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/include/linux/sched.h linux-2.6.13-rc6.work/include/linux/sched.h
--- linux-2.6.13-rc6/include/linux/sched.h 2005-08-13 12:25:56.000000000 +0200
+++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.000000000 +0200
@@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st
extern void exit_sighand(struct task_struct *);
extern void __exit_sighand(struct task_struct *);
extern void exit_itimers(struct signal_struct *);
+extern void exit_notify_itimers(struct signal_struct *);
extern NORET_TYPE void do_group_exit(int);
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/posix-timers.c linux-2.6.13-rc6.work/kernel/posix-timers.c
--- linux-2.6.13-rc6/kernel/posix-timers.c 2005-08-13 12:25:58.000000000 +0200
+++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.000000000 +0200
@@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct *
}
/*
+ * This is called by __exit_signal, when there are still references to
+ * the shared signal_struct
+ */
+void exit_notify_itimers(struct signal_struct *sig)
+{
+ struct k_itimer *timer;
+ struct list_head *tmp;
+ unsigned long flags;
+
+ list_for_each(tmp, &sig->posix_timers) {
+
+ timer = list_entry(tmp, struct k_itimer, list);
+
+ /* Protect against posix_timer_fn */
+ spin_lock_irqsave(&timer->it_lock, flags);
+ if (timer->it_process == current) {
+
+ if (timer->it_sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
+ timer->it_sigev_notify = SIGEV_SIGNAL;
+
+ timer->it_process = timer->it_process->group_leader;
+ }
+ spin_lock_irqrestore(&timer->it_lock, flags);
+ }
+}
+
+/*
* And now for the "clock" calls
*
* These functions are called both from timer functions (with the timer
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6.work/kernel/signal.c
--- linux-2.6.13-rc6/kernel/signal.c 2005-08-13 12:25:58.000000000 +0200
+++ linux-2.6.13-rc6.work/kernel/signal.c 2005-08-20 17:57:46.000000000 +0200
@@ -390,6 +390,7 @@ void __exit_signal(struct task_struct *t
sig->nvcsw += tsk->nvcsw;
sig->nivcsw += tsk->nivcsw;
sig->sched_time += tsk->sched_time;
+ exit_notify_itimers(sig);
spin_unlock(&sighand->siglock);
sig = NULL; /* Marker for below. */
}
@@ -1381,13 +1382,12 @@ send_sigqueue(int sig, struct sigqueue *
int ret = 0;
/*
- * We need the tasklist lock even for the specific
- * thread case (when we don't need to follow the group
- * lists) in order to avoid races with "p->sighand"
- * going away or changing from under us.
+ * No need to hold tasklist lock here.
+ * posix_timer_event() is synchronized via
+ * exit_itimers() and exit_notify_itimers() to
+ * be protected against p->sighand == NULL
*/
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
spin_lock_irqsave(&p->sighand->siglock, flags);
if (unlikely(!list_empty(&q->list))) {
@@ -1414,7 +1414,6 @@ send_sigqueue(int sig, struct sigqueue *
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
- read_unlock(&tasklist_lock);
return(ret);
}
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-20 16:04 ` Thomas Gleixner
@ 2005-08-20 17:50 ` Oleg Nesterov
2005-08-22 21:37 ` George Anzinger
1 sibling, 0 replies; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-20 17:50 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney
Thomas Gleixner wrote:
>
> On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote:
>
> > posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
> > case) does not have PF_EXITING flag, then it calls send_sigqueue()
> > which locks task list. But if the thread exits in between the kernel
> > will oops.
>
> > posix_timer_event() runs under k_itimer.it_lock, but this does not
> > help if that thread was not the only one in thread group, in this
> > case we don't call exit_itimers().
>
> I added exit_notify_itimers() for that case. It is synchronized vs.
> posix_timer_event() via the timer lock and therefor protects against the
> race described above.
>
> It solves the problem for the only user of send_sigqueue for now, but I
> think we should have a more general interface to such functionality to
> allow simple usage.
>
> tglx
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/include/linux/sched.h linux-2.6.13-rc6.work/include/linux/sched.h
> --- linux-2.6.13-rc6/include/linux/sched.h 2005-08-13 12:25:56.000000000 +0200
> +++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.000000000 +0200
> @@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st
> extern void exit_sighand(struct task_struct *);
> extern void __exit_sighand(struct task_struct *);
> extern void exit_itimers(struct signal_struct *);
> +extern void exit_notify_itimers(struct signal_struct *);
>
> extern NORET_TYPE void do_group_exit(int);
>
> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/posix-timers.c linux-2.6.13-rc6.work/kernel/posix-timers.c
> --- linux-2.6.13-rc6/kernel/posix-timers.c 2005-08-13 12:25:58.000000000 +0200
> +++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.000000000 +0200
> @@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct *
> }
>
> /*
> + * This is called by __exit_signal, when there are still references to
> + * the shared signal_struct
> + */
> +void exit_notify_itimers(struct signal_struct *sig)
> +{
> + struct k_itimer *timer;
> + struct list_head *tmp;
> + unsigned long flags;
> +
> + list_for_each(tmp, &sig->posix_timers) {
> +
> + timer = list_entry(tmp, struct k_itimer, list);
> +
> + /* Protect against posix_timer_fn */
> + spin_lock_irqsave(&timer->it_lock, flags);
I think this is deadlockable. We already (release_task) hold
tasklist_lock here. What if this timer has no SIGEV_THREAD_ID ?
posix_timer_fn locks timer->it_lock first, then locks tasklist_lock
in send_group_sigqueue().
Also, sys_timer_delete() locks ->it_lock, then ->sighand.lock.
I think the better fix would be re-check ->sighand in send_sigqueue,
like Paul's patches do.
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-20 16:04 ` Thomas Gleixner
2005-08-20 17:50 ` Oleg Nesterov
@ 2005-08-22 21:37 ` George Anzinger
1 sibling, 0 replies; 63+ messages in thread
From: George Anzinger @ 2005-08-22 21:37 UTC (permalink / raw)
To: tglx
Cc: Oleg Nesterov, Ingo Molnar, Roland McGrath, linux-kernel,
Steven Rostedt, Paul E. McKenney
Thomas Gleixner wrote:
> On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote:
>
>
>>posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
>>case) does not have PF_EXITING flag, then it calls send_sigqueue()
>>which locks task list. But if the thread exits in between the kernel
>>will oops.
>
>
>>posix_timer_event() runs under k_itimer.it_lock, but this does not
>>help if that thread was not the only one in thread group, in this
>>case we don't call exit_itimers().
>
>
> I added exit_notify_itimers() for that case. It is synchronized vs.
> posix_timer_event() via the timer lock and therefor protects against the
> race described above.
It seems to me that exit (or exec) calling the timer release code
covered this. I haven't gone through the exact sequence being
discussed, however. In any case, I don't think we should send the
signal to the group leader instead, but rather should release the timer
and cancel any pending signal. AFAIKT there is no reason the group
leader can not exit prior to other threads, but I may have missed this...
George
>
> It solves the problem for the only user of send_sigqueue for now, but I
> think we should have a more general interface to such functionality to
> allow simple usage.
>
> tglx
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
>
> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/include/linux/sched.h linux-2.6.13-rc6.work/include/linux/sched.h
> --- linux-2.6.13-rc6/include/linux/sched.h 2005-08-13 12:25:56.000000000 +0200
> +++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.000000000 +0200
> @@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st
> extern void exit_sighand(struct task_struct *);
> extern void __exit_sighand(struct task_struct *);
> extern void exit_itimers(struct signal_struct *);
> +extern void exit_notify_itimers(struct signal_struct *);
>
> extern NORET_TYPE void do_group_exit(int);
>
> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/posix-timers.c linux-2.6.13-rc6.work/kernel/posix-timers.c
> --- linux-2.6.13-rc6/kernel/posix-timers.c 2005-08-13 12:25:58.000000000 +0200
> +++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.000000000 +0200
> @@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct *
> }
>
> /*
> + * This is called by __exit_signal, when there are still references to
> + * the shared signal_struct
> + */
> +void exit_notify_itimers(struct signal_struct *sig)
> +{
> + struct k_itimer *timer;
> + struct list_head *tmp;
> + unsigned long flags;
> +
> + list_for_each(tmp, &sig->posix_timers) {
> +
> + timer = list_entry(tmp, struct k_itimer, list);
> +
> + /* Protect against posix_timer_fn */
> + spin_lock_irqsave(&timer->it_lock, flags);
> + if (timer->it_process == current) {
> +
> + if (timer->it_sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
> + timer->it_sigev_notify = SIGEV_SIGNAL;
> +
> + timer->it_process = timer->it_process->group_leader;
> + }
> + spin_lock_irqrestore(&timer->it_lock, flags);
> + }
> +}
> +
> +/*
> * And now for the "clock" calls
> *
> * These functions are called both from timer functions (with the timer
> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6.work/kernel/signal.c
> --- linux-2.6.13-rc6/kernel/signal.c 2005-08-13 12:25:58.000000000 +0200
> +++ linux-2.6.13-rc6.work/kernel/signal.c 2005-08-20 17:57:46.000000000 +0200
> @@ -390,6 +390,7 @@ void __exit_signal(struct task_struct *t
> sig->nvcsw += tsk->nvcsw;
> sig->nivcsw += tsk->nivcsw;
> sig->sched_time += tsk->sched_time;
> + exit_notify_itimers(sig);
> spin_unlock(&sighand->siglock);
> sig = NULL; /* Marker for below. */
> }
> @@ -1381,13 +1382,12 @@ send_sigqueue(int sig, struct sigqueue *
> int ret = 0;
>
> /*
> - * We need the tasklist lock even for the specific
> - * thread case (when we don't need to follow the group
> - * lists) in order to avoid races with "p->sighand"
> - * going away or changing from under us.
> + * No need to hold tasklist lock here.
> + * posix_timer_event() is synchronized via
> + * exit_itimers() and exit_notify_itimers() to
> + * be protected against p->sighand == NULL
> */
> BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> - read_lock(&tasklist_lock);
> spin_lock_irqsave(&p->sighand->siglock, flags);
>
> if (unlikely(!list_empty(&q->list))) {
> @@ -1414,7 +1414,6 @@ send_sigqueue(int sig, struct sigqueue *
>
> out:
> spin_unlock_irqrestore(&p->sighand->siglock, flags);
> - read_unlock(&tasklist_lock);
> return(ret);
> }
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
George Anzinger george@mvista.com
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH] fix send_sigqueue() vs thread exit race
2005-08-19 23:48 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Thomas Gleixner
2005-08-20 0:19 ` George Anzinger
2005-08-20 14:10 ` Oleg Nesterov
@ 2005-08-20 16:58 ` Oleg Nesterov
2005-08-21 9:44 ` Thomas Gleixner
2005-08-22 7:38 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Ingo Molnar
3 siblings, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-20 16:58 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
[PATCH] fix send_sigqueue() vs thread exit race
posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
case) does not have PF_EXITING flag, then it calls send_sigqueue()
which locks task list. But if the thread exits in between the kernel
will oops (->sighand == NULL after __exit_sighand).
This patch moves the PF_EXITING check into the send_sigqueue(), it
must be done atomically under tasklist_lock. When send_sigqueue()
detects exiting thread it returns -1. In that case posix_timer_event
will send the signal to thread group.
Also, this patch fixes task_struct use-after-free in posix_timer_event.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- 2.6.13-rc6/kernel/signal.c~ 2005-08-18 23:10:28.000000000 +0400
+++ 2.6.13-rc6/kernel/signal.c 2005-08-20 23:05:21.000000000 +0400
@@ -1366,16 +1366,16 @@ send_sigqueue(int sig, struct sigqueue *
unsigned long flags;
int ret = 0;
- /*
- * We need the tasklist lock even for the specific
- * thread case (when we don't need to follow the group
- * lists) in order to avoid races with "p->sighand"
- * going away or changing from under us.
- */
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
+ read_lock(&tasklist_lock);
+
+ if (unlikely(p->flags & PF_EXITING)) {
+ ret = -1;
+ goto out_err;
+ }
+
spin_lock_irqsave(&p->sighand->siglock, flags);
-
+
if (unlikely(!list_empty(&q->list))) {
/*
* If an SI_TIMER entry is already queue just increment
@@ -1385,7 +1385,7 @@ send_sigqueue(int sig, struct sigqueue *
BUG();
q->info.si_overrun++;
goto out;
- }
+ }
/* Short-circuit ignored signals. */
if (sig_ignored(p, sig)) {
ret = 1;
@@ -1400,8 +1400,10 @@ send_sigqueue(int sig, struct sigqueue *
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+out_err:
read_unlock(&tasklist_lock);
- return(ret);
+
+ return ret;
}
int
--- 2.6.13-rc6/kernel/posix-timers.c~ 2005-08-18 21:37:08.000000000 +0400
+++ 2.6.13-rc6/kernel/posix-timers.c 2005-08-20 23:21:23.000000000 +0400
@@ -427,21 +427,23 @@ int posix_timer_event(struct k_itimer *t
timr->sigq->info.si_code = SI_TIMER;
timr->sigq->info.si_tid = timr->it_id;
timr->sigq->info.si_value = timr->it_sigev_value;
+
if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
- if (unlikely(timr->it_process->flags & PF_EXITING)) {
- timr->it_sigev_notify = SIGEV_SIGNAL;
- put_task_struct(timr->it_process);
- timr->it_process = timr->it_process->group_leader;
- goto group;
- }
- return send_sigqueue(timr->it_sigev_signo, timr->sigq,
- timr->it_process);
- }
- else {
- group:
- return send_group_sigqueue(timr->it_sigev_signo, timr->sigq,
- timr->it_process);
+ struct task_struct *leader;
+ int ret = send_sigqueue(timr->it_sigev_signo, timr->sigq,
+ timr->it_process);
+
+ if (likely(ret >= 0))
+ return ret;
+
+ timr->it_sigev_notify = SIGEV_SIGNAL;
+ leader = timr->it_process->group_leader;
+ put_task_struct(timr->it_process);
+ timr->it_process = leader;
}
+
+ return send_group_sigqueue(timr->it_sigev_signo, timr->sigq,
+ timr->it_process);
}
EXPORT_SYMBOL_GPL(posix_timer_event);
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-20 16:58 ` [PATCH] fix send_sigqueue() vs thread exit race Oleg Nesterov
@ 2005-08-21 9:44 ` Thomas Gleixner
2005-08-21 10:41 ` Oleg Nesterov
2005-08-21 10:59 ` Oleg Nesterov
0 siblings, 2 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-21 9:44 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
On Sat, 2005-08-20 at 20:58 +0400, Oleg Nesterov wrote:
> [PATCH] fix send_sigqueue() vs thread exit race
>
> posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
> case) does not have PF_EXITING flag, then it calls send_sigqueue()
> which locks task list. But if the thread exits in between the kernel
> will oops (->sighand == NULL after __exit_sighand).
>
> This patch moves the PF_EXITING check into the send_sigqueue(), it
> must be done atomically under tasklist_lock. When send_sigqueue()
> detects exiting thread it returns -1. In that case posix_timer_event
> will send the signal to thread group.
>
> Also, this patch fixes task_struct use-after-free in posix_timer_event.
>
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
>
> --- 2.6.13-rc6/kernel/signal.c~ 2005-08-18 23:10:28.000000000 +0400
> +++ 2.6.13-rc6/kernel/signal.c 2005-08-20 23:05:21.000000000 +0400
> @@ -1366,16 +1366,16 @@ send_sigqueue(int sig, struct sigqueue *
> unsigned long flags;
> int ret = 0;
>
> - /*
> - * We need the tasklist lock even for the specific
> - * thread case (when we don't need to follow the group
> - * lists) in order to avoid races with "p->sighand"
> - * going away or changing from under us.
> - */
> BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> - read_lock(&tasklist_lock);
> + read_lock(&tasklist_lock);
> +
> + if (unlikely(p->flags & PF_EXITING)) {
> + ret = -1;
> + goto out_err;
> + }
> +
It's still racy. tasklist_lock does not protect anything here.
arm timer
exit
timer event
timr->it_process references a freed structure
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-21 9:44 ` Thomas Gleixner
@ 2005-08-21 10:41 ` Oleg Nesterov
2005-08-21 12:38 ` Thomas Gleixner
2005-08-21 10:59 ` Oleg Nesterov
1 sibling, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-21 10:41 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Thomas Gleixner wrote:
>
> On Sat, 2005-08-20 at 20:58 +0400, Oleg Nesterov wrote:
> > [PATCH] fix send_sigqueue() vs thread exit race
> >
> > .....
> > - read_lock(&tasklist_lock);
> > + read_lock(&tasklist_lock);
> > +
> > + if (unlikely(p->flags & PF_EXITING)) {
> > + ret = -1;
> > + goto out_err;
> > + }
> > +
>
> It's still racy. tasklist_lock does not protect anything here.
I hope no, but please clarify if I am wrong.
tasklist_lock protects against release_task()->__exit_sigxxx()
here.
> arm timer
> exit
If this is the last thread in thread group, exit_itimers()
will stop and clear ->posix_timers.
If not:
> timer event
> timr->it_process references a freed structure
>
No, create_timer() does get_task_struct(timr->it_process), this
task may be EXIT_DEAD now, but the task_struct itself is valid,
and it's ->flags has PF_EXITING flag.
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-21 10:41 ` Oleg Nesterov
@ 2005-08-21 12:38 ` Thomas Gleixner
0 siblings, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-21 12:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
On Sun, 2005-08-21 at 14:41 +0400, Oleg Nesterov wrote:
> If not:
>
> > timer event
> > timr->it_process references a freed structure
> >
>
> No, create_timer() does get_task_struct(timr->it_process), this
> task may be EXIT_DEAD now, but the task_struct itself is valid,
> and it's ->flags has PF_EXITING flag.
Right, did not think about that.
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-21 9:44 ` Thomas Gleixner
2005-08-21 10:41 ` Oleg Nesterov
@ 2005-08-21 10:59 ` Oleg Nesterov
2005-08-21 21:24 ` Thomas Gleixner
1 sibling, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-21 10:59 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Thomas Gleixner wrote:
>
> arm timer
> exit
> timer event
I think there is another bug here.
CPU_0 CPU_1
release_task: posix_timer_fn:
write_lock(tasklist); lock(timer->it_lock);
exit_timers: send_sigxxx();
lock(timer->it_lock) read_lock(tasklist);
Deadlock.
Dear cc-list, what do you think?
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-21 10:59 ` Oleg Nesterov
@ 2005-08-21 21:24 ` Thomas Gleixner
2005-08-21 21:50 ` Thomas Gleixner
0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-21 21:24 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Oleg,
On Sun, 2005-08-21 at 14:59 +0400, Oleg Nesterov wrote:
> CPU_0 CPU_1
>
> release_task: posix_timer_fn:
> write_lock(tasklist); lock(timer->it_lock);
>
> exit_timers: send_sigxxx();
> lock(timer->it_lock) read_lock(tasklist);
>
> Deadlock.
>
> Dear cc-list, what do you think?
The patch below on top of your patch should solve this. We don't need
tasklist_lock to check p->flags. As you pointed out p cannot be invalid
in send_sigqueue as it's protected by get_task_struct() in
create_timer()
For send_group_sigqueue it's protected by exit_itimers() waiting for
k_itimer.it_lock.
It still does not solve the ugly dependency on tasklist_lock but at
least the race and the deadlock are fixed.
tglx
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
--- linux-2.6.13-rc6.signal/kernel/signal.oleg.c 2005-08-21 23:07:03.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/signal.c 2005-08-21 23:09:07.000000000 +0200
@@ -1381,11 +1381,14 @@ send_sigqueue(int sig, struct sigqueue *
int ret = 0;
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
- if (unlikely(p->flags & PF_EXITING)) {
- ret = -1;
- goto out_err;
+retry:
+ if (unlikely(p->flags & PF_EXITING))
+ return -1;
+
+ if ((unlikely(!read_trylock(&tasklist_lock))) {
+ cpu_relax();
+ goto retry;
}
spin_lock_irqsave(&p->sighand->siglock, flags);
@@ -1414,7 +1417,6 @@ send_sigqueue(int sig, struct sigqueue *
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
-out_err:
read_unlock(&tasklist_lock);
return ret;
@@ -1427,7 +1429,14 @@ send_group_sigqueue(int sig, struct sigq
int ret = 0;
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
+retry:
+ if (unlikely(p->flags & PF_EXITING))
+ return -1;
+
+ if (unlikely(!read_trylock(&tasklist_lock))) {
+ cpu_relax();
+ goto retry;
+ }
spin_lock_irqsave(&p->sighand->siglock, flags);
handle_stop_signal(sig, p);
--- linux-2.6.13-rc6.signal/kernel/posix-timers.oleg.c 2005-08-21 23:09:58.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/posix-timers.c 2005-08-21 23:19:42.000000000 +0200
@@ -501,7 +501,8 @@ static void posix_timer_fn(unsigned long
remove_from_abslist(timr);
}
- if (posix_timer_event(timr, si_private))
+ /* Do not rearm the timer, when we are exiting */
+ if (posix_timer_event(timr, si_private) > 0)
/*
* signal was not sent because of sig_ignor
* we will not get a call back to restart it AND
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-21 21:24 ` Thomas Gleixner
@ 2005-08-21 21:50 ` Thomas Gleixner
2005-08-22 6:39 ` Oleg Nesterov
0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-21 21:50 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
On Sun, 2005-08-21 at 23:24 +0200, Thomas Gleixner wrote:
> Oleg,
Sorry for making noise. I introduced the race again.
tglx
--- linux-2.6.13-rc6.signal/kernel/signal.oleg.c 2005-08-21 23:07:03.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/signal.c 2005-08-21 23:48:52.000000000 +0200
@@ -1381,8 +1381,15 @@ send_sigqueue(int sig, struct sigqueue *
int ret = 0;
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
+retry:
+ if (unlikely(p->flags & PF_EXITING))
+ return -1;
+
+ if (unlikely(!read_trylock(&tasklist_lock))) {
+ cpu_relax();
+ goto retry;
+ }
if (unlikely(p->flags & PF_EXITING)) {
ret = -1;
goto out_err;
@@ -1427,7 +1434,18 @@ send_group_sigqueue(int sig, struct sigq
int ret = 0;
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
+retry:
+ if (unlikely(p->flags & PF_EXITING))
+ return -1;
+
+ if (unlikely(!read_trylock(&tasklist_lock))) {
+ cpu_relax();
+ goto retry;
+ }
+ if (unlikely(p->flags & PF_EXITING)) {
+ ret = -1;
+ goto out_err;
+ }
spin_lock_irqsave(&p->sighand->siglock, flags);
handle_stop_signal(sig, p);
@@ -1461,8 +1479,9 @@ send_group_sigqueue(int sig, struct sigq
__group_complete_signal(sig, p);
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+out_err:
read_unlock(&tasklist_lock);
- return(ret);
+ return ret;
}
/*
--- linux-2.6.13-rc6.signal/kernel/posix-timers.oleg.c 2005-08-21 23:09:58.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/posix-timers.c 2005-08-21 23:19:42.000000000 +0200
@@ -501,7 +501,8 @@ static void posix_timer_fn(unsigned long
remove_from_abslist(timr);
}
- if (posix_timer_event(timr, si_private))
+ /* Do not rearm the timer, when we are exiting */
+ if (posix_timer_event(timr, si_private) > 0)
/*
* signal was not sent because of sig_ignor
* we will not get a call back to restart it AND
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-21 21:50 ` Thomas Gleixner
@ 2005-08-22 6:39 ` Oleg Nesterov
2005-08-22 8:08 ` Thomas Gleixner
0 siblings, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-22 6:39 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Thomas Gleixner wrote:
>
> @@ -1427,7 +1434,18 @@ send_group_sigqueue(int sig, struct sigq
> int ret = 0;
>
> BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> - read_lock(&tasklist_lock);
> +retry:
> + if (unlikely(p->flags & PF_EXITING))
> + return -1;
> +
I don't think this is correct. p == ->group_leader, it may
have been exited and in EXIT_ZOMBIE state. But the thread
group (process) is live, we should not stop posix timers.
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-22 6:39 ` Oleg Nesterov
@ 2005-08-22 8:08 ` Thomas Gleixner
2005-08-22 8:52 ` Oleg Nesterov
0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-22 8:08 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
On Mon, 2005-08-22 at 10:39 +0400, Oleg Nesterov wrote:
> Thomas Gleixner wrote:
> >
> > @@ -1427,7 +1434,18 @@ send_group_sigqueue(int sig, struct sigq
> > int ret = 0;
> >
> > BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> > - read_lock(&tasklist_lock);
> > +retry:
> > + if (unlikely(p->flags & PF_EXITING))
> > + return -1;
> > +
>
> I don't think this is correct. p == ->group_leader, it may
> have been exited and in EXIT_ZOMBIE state. But the thread
> group (process) is live, we should not stop posix timers.
Hmm, true. release_task() is not called in this case, so p->sighand is
still there.
But we can not check for p->sighand == NULL, as sighand is released
after exit_itimers() so we are still deadlock prone. So I think
__exit_sighand() should be called before exit_itimers(). Then we can do
retry:
if (unlikely(!p->sighand))
return -1;
instead of checking for PF_EXITING.
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-22 8:08 ` Thomas Gleixner
@ 2005-08-22 8:52 ` Oleg Nesterov
2005-08-22 10:06 ` Thomas Gleixner
0 siblings, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-22 8:52 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Thomas Gleixner wrote:
>
> But we can not check for p->sighand == NULL, as sighand is released
> after exit_itimers() so we are still deadlock prone. So I think
> __exit_sighand() should be called before exit_itimers(). Then we can do
>
> retry:
> if (unlikely(!p->sighand))
> return -1;
>
> instead of checking for PF_EXITING.
I think yes, and this is closely related to your and Paul
"Use RCU to protect tasklist for unicast signals" patches.
We don't need RCU here, but we do need this hypothetical
lock_task_sighand() (and __exit_sighand from __exit_signal
change).
We still need tasklist_lock in send_group_queue for
__group_complete_signal though. I hope Paul will solve
this, it is needed for group_send_info() anyway.
But for the near future I don't see simple and non-intrusive
fix for this deadlock. I am waiting for George to confirm
that the bug exists and we are not crazy :)
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-22 8:52 ` Oleg Nesterov
@ 2005-08-22 10:06 ` Thomas Gleixner
2005-08-22 16:45 ` Oleg Nesterov
0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-22 10:06 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Oleg,
On Mon, 2005-08-22 at 12:52 +0400, Oleg Nesterov wrote:
> Thomas Gleixner wrote:
> I think yes, and this is closely related to your and Paul
> "Use RCU to protect tasklist for unicast signals" patches.
>
> We don't need RCU here, but we do need this hypothetical
> lock_task_sighand() (and __exit_sighand from __exit_signal
> change).
>
> We still need tasklist_lock in send_group_queue for
> __group_complete_signal though. I hope Paul will solve
> this, it is needed for group_send_info() anyway.
>
> But for the near future I don't see simple and non-intrusive
> fix for this deadlock. I am waiting for George to confirm
> that the bug exists and we are not crazy :)
It exists. It triggers on preempt-RT and I can trigger it on vanilla SMP
by waiting for the timer expiry in release_task() before the
__exit_signal() call. That's reasonable, as it can happen that way by
chance too. It requires that the timer expires on a different CPU, but I
don't see a reason why this should not happen.
The patch below solves it for me.
tglx
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/exit.c linux-2.6.13-rc6.signal/kernel/exit.c
--- linux-2.6.13-rc6/kernel/exit.c 2005-08-13 12:25:58.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/exit.c 2005-08-22 11:19:02.000000000 +0200
@@ -71,7 +71,6 @@ repeat:
__ptrace_unlink(p);
BUG_ON(!list_empty(&p->ptrace_list) || !list_empty(&p->ptrace_children));
__exit_signal(p);
- __exit_sighand(p);
/*
* Note that the fastpath in sys_times depends on __exit_signal having
* updated the counters before a task is removed from the tasklist of
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/fork.c linux-2.6.13-rc6.signal/kernel/fork.c
--- linux-2.6.13-rc6/kernel/fork.c 2005-08-13 12:25:58.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/fork.c 2005-08-22 11:35:05.000000000 +0200
@@ -1132,7 +1132,8 @@ bad_fork_cleanup_mm:
bad_fork_cleanup_signal:
exit_signal(p);
bad_fork_cleanup_sighand:
- exit_sighand(p);
+ if (p->sighand)
+ exit_sighand(p);
bad_fork_cleanup_fs:
exit_fs(p); /* blocking */
bad_fork_cleanup_files:
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/posix-timers.c linux-2.6.13-rc6.signal/kernel/posix-timers.c
--- linux-2.6.13-rc6/kernel/posix-timers.c 2005-08-13 12:25:58.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/posix-timers.c 2005-08-21 23:19:42.000000000 +0200
@@ -427,21 +427,23 @@ int posix_timer_event(struct k_itimer *t
timr->sigq->info.si_code = SI_TIMER;
timr->sigq->info.si_tid = timr->it_id;
timr->sigq->info.si_value = timr->it_sigev_value;
+
if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
- if (unlikely(timr->it_process->flags & PF_EXITING)) {
- timr->it_sigev_notify = SIGEV_SIGNAL;
- put_task_struct(timr->it_process);
- timr->it_process = timr->it_process->group_leader;
- goto group;
- }
- return send_sigqueue(timr->it_sigev_signo, timr->sigq,
- timr->it_process);
- }
- else {
- group:
- return send_group_sigqueue(timr->it_sigev_signo, timr->sigq,
- timr->it_process);
+ struct task_struct *leader;
+ int ret = send_sigqueue(timr->it_sigev_signo, timr->sigq,
+ timr->it_process);
+
+ if (likely(ret >= 0))
+ return ret;
+
+ timr->it_sigev_notify = SIGEV_SIGNAL;
+ leader = timr->it_process->group_leader;
+ put_task_struct(timr->it_process);
+ timr->it_process = leader;
}
+
+ return send_group_sigqueue(timr->it_sigev_signo, timr->sigq,
+ timr->it_process);
}
EXPORT_SYMBOL_GPL(posix_timer_event);
@@ -499,7 +501,8 @@ static void posix_timer_fn(unsigned long
remove_from_abslist(timr);
}
- if (posix_timer_event(timr, si_private))
+ /* Do not rearm the timer, when we are exiting */
+ if (posix_timer_event(timr, si_private) > 0)
/*
* signal was not sent because of sig_ignor
* we will not get a call back to restart it AND
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6.signal/kernel/signal.c
--- linux-2.6.13-rc6/kernel/signal.c 2005-08-13 12:25:58.000000000 +0200
+++ linux-2.6.13-rc6.signal/kernel/signal.c 2005-08-22 11:51:08.000000000 +0200
@@ -395,6 +395,7 @@ void __exit_signal(struct task_struct *t
}
clear_tsk_thread_flag(tsk,TIF_SIGPENDING);
flush_sigqueue(&tsk->pending);
+ __exit_sighand(tsk);
if (sig) {
/*
* We are cleaning up the signal_struct here. We delayed
@@ -1380,16 +1381,20 @@ send_sigqueue(int sig, struct sigqueue *
unsigned long flags;
int ret = 0;
- /*
- * We need the tasklist lock even for the specific
- * thread case (when we don't need to follow the group
- * lists) in order to avoid races with "p->sighand"
- * going away or changing from under us.
- */
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
+
+ while(!read_trylock(&tasklist_lock)) {
+ if (p->flags & PF_EXITING)
+ return -1;
+ cpu_relax();
+ }
+ if (unlikely(p->flags & PF_EXITING)) {
+ ret = -1;
+ goto out_err;
+ }
+
spin_lock_irqsave(&p->sighand->siglock, flags);
-
+
if (unlikely(!list_empty(&q->list))) {
/*
* If an SI_TIMER entry is already queue just increment
@@ -1399,7 +1404,7 @@ send_sigqueue(int sig, struct sigqueue *
BUG();
q->info.si_overrun++;
goto out;
- }
+ }
/* Short-circuit ignored signals. */
if (sig_ignored(p, sig)) {
ret = 1;
@@ -1414,8 +1419,10 @@ send_sigqueue(int sig, struct sigqueue *
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+out_err:
read_unlock(&tasklist_lock);
- return(ret);
+
+ return ret;
}
int
@@ -1425,7 +1432,16 @@ send_group_sigqueue(int sig, struct sigq
int ret = 0;
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
+
+ while(!read_trylock(&tasklist_lock)) {
+ if (!p->sighand)
+ return -1;
+ cpu_relax();
+ }
+ if (unlikely(!p->sighand)) {
+ ret = -1;
+ goto out_err;
+ }
spin_lock_irqsave(&p->sighand->siglock, flags);
handle_stop_signal(sig, p);
@@ -1459,8 +1475,9 @@ send_group_sigqueue(int sig, struct sigq
__group_complete_signal(sig, p);
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+out_err:
read_unlock(&tasklist_lock);
- return(ret);
+ return ret;
}
/*
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-22 10:06 ` Thomas Gleixner
@ 2005-08-22 16:45 ` Oleg Nesterov
2005-08-23 10:13 ` Thomas Gleixner
2005-08-23 10:42 ` [PATCH] fix send_sigqueue() vs thread exit race Thomas Gleixner
0 siblings, 2 replies; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-22 16:45 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Thomas Gleixner wrote:
>
> It exists. It triggers on preempt-RT and I can trigger it on vanilla SMP
> by waiting for the timer expiry in release_task() before the
> __exit_signal() call. That's reasonable, as it can happen that way by
> chance too. It requires that the timer expires on a different CPU, but I
> don't see a reason why this should not happen.
Ok, exit_itimers()->itimer_delete() called when the last thread exits
or does exec.
kernel/posix-timers.c:common_timer_del() calls del_timer_sync(), after
that nobody can access this timer, so we don't need to lock timer->it_lock
at all in this case. No lock - no deadlock.
But I know nothing about kernel/posix-cpu-timers.c, I doubt it will work
for posix_cpu_timer_del(). I don't have time to study posix-cpu-timers now.
However, I see that __exit_signal() calls posix_cpu_timers_exit_xxx(), so
may be it can work?
380 int posix_cpu_timer_del(struct k_itimer *timer)
381 {
382 struct task_struct *p = timer->it.cpu.task;
383
384 if (timer->it.cpu.firing)
385 return TIMER_RETRY;
386
387 if (unlikely(p == NULL))
388 return 0;
389
390 if (!list_empty(&timer->it.cpu.entry)) {
391 read_lock(&tasklist_lock);
Surely, it should be impossible to happen when process exists, otherwise
it would deadlock immediately, we did write_lock(tasklist).
Thomas, do you know something about posix-cpu-timers.c?
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-22 16:45 ` Oleg Nesterov
@ 2005-08-23 10:13 ` Thomas Gleixner
2005-08-23 16:17 ` Oleg Nesterov
2005-08-23 10:42 ` [PATCH] fix send_sigqueue() vs thread exit race Thomas Gleixner
1 sibling, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-23 10:13 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
On Mon, 2005-08-22 at 20:45 +0400, Oleg Nesterov wrote:
> Thomas Gleixner wrote:
> Ok, exit_itimers()->itimer_delete() called when the last thread exits
> or does exec.
>
> kernel/posix-timers.c:common_timer_del() calls del_timer_sync(), after
> that nobody can access this timer, so we don't need to lock timer->it_lock
> at all in this case. No lock - no deadlock.
It still deadlocks:
CPU 0 CPU 1
write_lock(&tasklist_lock);
__exit_signal()
timer expires
base->running_timer = timer
send_group_sigqueue()
read_lock(&tasklist_lock();
exit_itimers()
del_timer_sync(timer)
waits for ever because waits for ever on tasklist_lock
base->running_timer == timer
I still think the last patch I sent is still necessary.
> But I know nothing about kernel/posix-cpu-timers.c, I doubt it will work
> for posix_cpu_timer_del(). I don't have time to study posix-cpu-timers now.
> However, I see that __exit_signal() calls posix_cpu_timers_exit_xxx(), so
> may be it can work?
>
> 380 int posix_cpu_timer_del(struct k_itimer *timer)
> 381 {
> 382 struct task_struct *p = timer->it.cpu.task;
> 383
> 384 if (timer->it.cpu.firing)
> 385 return TIMER_RETRY;
> 386
> 387 if (unlikely(p == NULL))
> 388 return 0;
> 389
> 390 if (!list_empty(&timer->it.cpu.entry)) {
> 391 read_lock(&tasklist_lock);
>
> Surely, it should be impossible to happen when process exists, otherwise
> it would deadlock immediately, we did write_lock(tasklist).
>
> Thomas, do you know something about posix-cpu-timers.c?
Not much. I look into this
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-23 10:13 ` Thomas Gleixner
@ 2005-08-23 16:17 ` Oleg Nesterov
2005-08-23 18:29 ` Thomas Gleixner
2005-09-24 13:42 ` [PATCH] fix exit_itimers() vs posix_timer_event() AB-BA deadlock Oleg Nesterov
0 siblings, 2 replies; 63+ messages in thread
From: Oleg Nesterov @ 2005-08-23 16:17 UTC (permalink / raw)
To: tglx
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
Thomas Gleixner wrote:
>
> On Mon, 2005-08-22 at 20:45 +0400, Oleg Nesterov wrote:
> >
> > kernel/posix-timers.c:common_timer_del() calls del_timer_sync(), after
> > that nobody can access this timer, so we don't need to lock timer->it_lock
> > at all in this case. No lock - no deadlock.
>
> It still deadlocks:
>
> CPU 0 CPU 1
> write_lock(&tasklist_lock);
> __exit_signal()
> timer expires
> base->running_timer = timer
> send_group_sigqueue()
> read_lock(&tasklist_lock();
> exit_itimers()
> del_timer_sync(timer)
> waits for ever because waits for ever on tasklist_lock
> base->running_timer == timer
Silly me.
> I still think the last patch I sent is still necessary.
Thomas, you know that I like this change in __exit_{signal,sighand},
but i think this change is dangerous, should go in a separate patch,
and needs a lot of testing. But the decision is up to Ingo and Roland.
I am looking at your previous patch:
> - read_lock(&tasklist_lock);
> +retry:
> + if (unlikely(p->flags & PF_EXITING))
> + return -1;
> +
> + if (unlikely(!read_trylock(&tasklist_lock))) {
> + cpu_relax();
> + goto retry;
> + }
> + if (unlikely(p->flags & PF_EXITING)) {
> + ret = -1;
> + goto out_err;
What do you think about this:
int try_to_lock_this_beep_tasklist_lock(struct task_struct *group_leader)
{
while (unlikely(!read_trylock(&tasklist_lock))) {
if (group_leader->flags & PF_EXITING) {
smp_rmb();
if (thread_group_empty(group_leader))
return 0;
}
cpu_relax();
}
return 1;
}
No need to re-check after we got tasklist, the signal will be flushed.
I think it's better to move the locking into the posix_timer_event, btw.
In that case we can drop my patch.
What is your opinion, can it work?
P.S.
Thomas, thanks for explanation about posix-cpu-timers.
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-23 16:17 ` Oleg Nesterov
@ 2005-08-23 18:29 ` Thomas Gleixner
2005-09-24 13:42 ` [PATCH] fix exit_itimers() vs posix_timer_event() AB-BA deadlock Oleg Nesterov
1 sibling, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-23 18:29 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
On Tue, 2005-08-23 at 20:17 +0400, Oleg Nesterov wrote:
> Thomas Gleixner wrote:
> > I still think the last patch I sent is still necessary.
>
> Thomas, you know that I like this change in __exit_{signal,sighand},
> but i think this change is dangerous, should go in a separate patch,
> and needs a lot of testing. But the decision is up to Ingo and Roland.
Maybe it makes more sense in context of Pauls RCU stuff.
> What do you think about this:
>
> int try_to_lock_this_beep_tasklist_lock(struct task_struct *group_leader)
> {
> while (unlikely(!read_trylock(&tasklist_lock))) {
> if (group_leader->flags & PF_EXITING) {
> smp_rmb();
> if (thread_group_empty(group_leader))
> return 0;
> }
> cpu_relax();
> }
>
> return 1;
> }
>
> No need to re-check after we got tasklist, the signal will be flushed.
Makes sense, though I'm not sure if its safe to call
thread_group_empty() without tasklist lock held. I might be in case of
PF_EXITING, but it needs a comment at least.
> I think it's better to move the locking into the posix_timer_event, btw.
Hmm, not sure. I've seen similar stuff in the AIO patches. I guess this
should be solved safe at one place rather than all around the kernel.
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread* [PATCH] fix exit_itimers() vs posix_timer_event() AB-BA deadlock
2005-08-23 16:17 ` Oleg Nesterov
2005-08-23 18:29 ` Thomas Gleixner
@ 2005-09-24 13:42 ` Oleg Nesterov
2005-09-25 5:44 ` Andrew Morton
1 sibling, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-09-24 13:42 UTC (permalink / raw)
To: tglx, Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
CPU_0 CPU_1
release_task: posix_timer_fn:
write_lock(tasklist); spin_lock(timer->it_lock);
exit_timers: send_sigqueue:
spin_lock(timer->it_lock) read_lock(tasklist);
Actually, it is a bit worse. If posix timer starts between tasklist
locking and del_timer_sync() call this deadlock will happen because
of TIMER_RETRY logic.
With this patch posix_timer_event() detects the exiting thread group
and aborts the sending of signal (taking tasklist_lock). To simplify
the code tasklist locking moved from send_{,group_}sigqueue to the
posix_timer_event().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- 2.6.14-rc2/kernel/posix-timers.c~1_DLOCK 2005-09-17 18:57:30.000000000 +0400
+++ 2.6.14-rc2/kernel/posix-timers.c 2005-09-24 18:03:05.000000000 +0400
@@ -411,6 +411,10 @@ exit:
int posix_timer_event(struct k_itimer *timr,int si_private)
{
+ struct task_struct *leader, *zombie;
+ int (*send_actor)(int, struct sigqueue *, struct task_struct *);
+ int ret;
+
memset(&timr->sigq->info, 0, sizeof(siginfo_t));
timr->sigq->info.si_sys_private = si_private;
/*
@@ -428,22 +432,40 @@ int posix_timer_event(struct k_itimer *t
timr->sigq->info.si_tid = timr->it_id;
timr->sigq->info.si_value = timr->it_sigev_value;
- if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
- struct task_struct *leader;
- int ret = send_sigqueue(timr->it_sigev_signo, timr->sigq,
- timr->it_process);
-
- if (likely(ret >= 0))
- return ret;
-
- timr->it_sigev_notify = SIGEV_SIGNAL;
- leader = timr->it_process->group_leader;
- put_task_struct(timr->it_process);
- timr->it_process = leader;
- }
-
- return send_group_sigqueue(timr->it_sigev_signo, timr->sigq,
- timr->it_process);
+ /*
+ * We are locking ->it_lock + tasklist_lock backwards
+ * from release_task()->exit_itimers(), beware deadlock.
+ */
+ leader = timr->it_process->group_leader;
+ while (unlikely(!read_trylock(&tasklist_lock))) {
+ if (leader->flags & PF_EXITING) {
+ smp_rmb();
+ if (thread_group_empty(leader))
+ return 0;
+ }
+ cpu_relax();
+ }
+
+ zombie = NULL;
+ send_actor = send_group_sigqueue;
+ if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
+ if (unlikely(timr->it_process->flags & PF_EXITING)) {
+ zombie = timr->it_process;
+ timr->it_process = leader;
+ timr->it_sigev_notify = SIGEV_SIGNAL;
+ } else
+ send_actor = send_sigqueue;
+ }
+
+ ret = send_actor(timr->it_sigev_signo, timr->sigq,
+ timr->it_process);
+
+ read_unlock(&tasklist_lock);
+
+ if (unlikely(zombie != NULL))
+ put_task_struct(zombie);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(posix_timer_event);
--- 2.6.14-rc2/kernel/signal.c~1_DLOCK 2005-09-17 18:57:30.000000000 +0400
+++ 2.6.14-rc2/kernel/signal.c 2005-09-24 18:01:11.000000000 +0400
@@ -1367,22 +1367,14 @@ send_sigqueue(int sig, struct sigqueue *
int ret = 0;
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
-
- if (unlikely(p->flags & PF_EXITING)) {
- ret = -1;
- goto out_err;
- }
-
spin_lock_irqsave(&p->sighand->siglock, flags);
if (unlikely(!list_empty(&q->list))) {
+ BUG_ON(q->info.si_code != SI_TIMER);
/*
* If an SI_TIMER entry is already queue just increment
* the overrun count.
*/
- if (q->info.si_code != SI_TIMER)
- BUG();
q->info.si_overrun++;
goto out;
}
@@ -1400,9 +1392,6 @@ send_sigqueue(int sig, struct sigqueue *
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
-out_err:
- read_unlock(&tasklist_lock);
-
return ret;
}
@@ -1413,7 +1402,6 @@ send_group_sigqueue(int sig, struct sigq
int ret = 0;
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
- read_lock(&tasklist_lock);
spin_lock_irqsave(&p->sighand->siglock, flags);
handle_stop_signal(sig, p);
@@ -1424,13 +1412,12 @@ send_group_sigqueue(int sig, struct sigq
}
if (unlikely(!list_empty(&q->list))) {
+ BUG_ON(q->info.si_code != SI_TIMER);
/*
* If an SI_TIMER entry is already queue just increment
* the overrun count. Other uses should not try to
* send the signal multiple times.
*/
- if (q->info.si_code != SI_TIMER)
- BUG();
q->info.si_overrun++;
goto out;
}
@@ -1447,8 +1434,7 @@ send_group_sigqueue(int sig, struct sigq
__group_complete_signal(sig, p);
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
- read_unlock(&tasklist_lock);
- return(ret);
+ return ret;
}
/*
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix exit_itimers() vs posix_timer_event() AB-BA deadlock
2005-09-24 13:42 ` [PATCH] fix exit_itimers() vs posix_timer_event() AB-BA deadlock Oleg Nesterov
@ 2005-09-25 5:44 ` Andrew Morton
2005-09-25 14:07 ` [PATCH] fix exit_itimers() vs posix_timer_event() AB-BAdeadlock Oleg Nesterov
0 siblings, 1 reply; 63+ messages in thread
From: Andrew Morton @ 2005-09-25 5:44 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: tglx, mingo, roland, george, linux-kernel, rostedt, paulmck
Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
> + /*
> + * We are locking ->it_lock + tasklist_lock backwards
> + * from release_task()->exit_itimers(), beware deadlock.
> + */
> + leader = timr->it_process->group_leader;
> + while (unlikely(!read_trylock(&tasklist_lock))) {
> + if (leader->flags & PF_EXITING) {
> + smp_rmb();
> + if (thread_group_empty(leader))
> + return 0;
> + }
> + cpu_relax();
> + }
Oh dear. Is there no way to fix this up by taking the locks in the correct
order? (Whatever that is).
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix exit_itimers() vs posix_timer_event() AB-BAdeadlock
2005-09-25 5:44 ` Andrew Morton
@ 2005-09-25 14:07 ` Oleg Nesterov
2005-10-23 16:50 ` Oleg Nesterov
0 siblings, 1 reply; 63+ messages in thread
From: Oleg Nesterov @ 2005-09-25 14:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: tglx, mingo, roland, george, linux-kernel, rostedt, paulmck
Andrew Morton wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> >
> > + /*
> > + * We are locking ->it_lock + tasklist_lock backwards
> > + * from release_task()->exit_itimers(), beware deadlock.
> > + */
> > + leader = timr->it_process->group_leader;
> > + while (unlikely(!read_trylock(&tasklist_lock))) {
> > + if (leader->flags & PF_EXITING) {
> > + smp_rmb();
> > + if (thread_group_empty(leader))
> > + return 0;
> > + }
> > + cpu_relax();
> > + }
>
> Oh dear. Is there no way to fix this up by taking the locks in the correct
> order? (Whatever that is).
Yes, this is ugly and that is why I hesitated so long before posting this patch.
But I don't see the simple and correct fix. The lock ordering is not the only
problem. In fact we don't need to take the ->it_lock in the itimer_delete() at
all. The problem is that itimer_delete() can't delete k_itimer.it.real.timer
while holding tasklist_lock, because this lock will prevent (without this patch)
the completition of posix_timer_fn(). It is unsafe to unlock tasklist in __exit_signal()
before calling exit_itimers(), and it is too late to call exit_itimers() after
unlock_irq(&tasklist_lock) in release_task(), at this moment we don't have
->signal/->sighand already.
I beleive the only sane fix would be to eliminate tasklist_lock from signal
sending path (see the patches from Paul and Thomas), but this is hard and
(I think) needs more working/testing.
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH] fix exit_itimers() vs posix_timer_event() AB-BAdeadlock
2005-09-25 14:07 ` [PATCH] fix exit_itimers() vs posix_timer_event() AB-BAdeadlock Oleg Nesterov
@ 2005-10-23 16:50 ` Oleg Nesterov
0 siblings, 0 replies; 63+ messages in thread
From: Oleg Nesterov @ 2005-10-23 16:50 UTC (permalink / raw)
To: Andrew Morton, tglx, mingo, roland, george, linux-kernel, rostedt,
paulmck
Oleg Nesterov wrote:
>
> Andrew Morton wrote:
> >
> > Oleg Nesterov <oleg@tv-sign.ru> wrote:
> > >
> > > + /*
> > > + * We are locking ->it_lock + tasklist_lock backwards
> > > + * from release_task()->exit_itimers(), beware deadlock.
> > > + */
> > > + leader = timr->it_process->group_leader;
> > > + while (unlikely(!read_trylock(&tasklist_lock))) {
> > > + if (leader->flags & PF_EXITING) {
> > > + smp_rmb();
> > > + if (thread_group_empty(leader))
> > > + return 0;
> > > + }
> > > + cpu_relax();
> > > + }
> >
> > Oh dear. Is there no way to fix this up by taking the locks in the correct
> > order? (Whatever that is).
Andrew, please drop this patch. It is obsoleted by Roland's
"[PATCH] Call exit_itimers from do_exit, not __exit_signal".
Oleg.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] fix send_sigqueue() vs thread exit race
2005-08-22 16:45 ` Oleg Nesterov
2005-08-23 10:13 ` Thomas Gleixner
@ 2005-08-23 10:42 ` Thomas Gleixner
1 sibling, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2005-08-23 10:42 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Roland McGrath, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney, Andrew Morton
On Mon, 2005-08-22 at 20:45 +0400, Oleg Nesterov wrote:
> But I know nothing about kernel/posix-cpu-timers.c, I doubt it will work
> for posix_cpu_timer_del(). I don't have time to study posix-cpu-timers now.
> However, I see that __exit_signal() calls posix_cpu_timers_exit_xxx(), so
> may be it can work?
timer->it.cpu.task is set to NULL by posix_cpu_timers_exit(), so the
code in posix_cpu_timer_del returns before accessing tasklist_lock.
The exit functions do not take any locks, but it is not necessary
there.
posix_run_cpu_timers(p) is called with p=current() and we have
interrupts disabled, so the timer interrupt can not run on this CPU. The
current exiting process can not run at the same time on a different CPU,
so no race and lockup possible here.
tglx
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-19 23:48 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Thomas Gleixner
` (2 preceding siblings ...)
2005-08-20 16:58 ` [PATCH] fix send_sigqueue() vs thread exit race Oleg Nesterov
@ 2005-08-22 7:38 ` Ingo Molnar
2005-08-22 7:41 ` Ingo Molnar
3 siblings, 1 reply; 63+ messages in thread
From: Ingo Molnar @ 2005-08-22 7:38 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Roland McGrath, Oleg Nesterov, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney
* Thomas Gleixner <tglx@linutronix.de> wrote:
> Hi all,
>
> On Thu, 2005-08-18 at 08:01 +0200, Ingo Molnar wrote:
> > i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from
> > the usual place:
>
> I reworked the code for dynamically setting the priority of the
> hrtimer softirq to be aware of PI.
looks mostly good to me. I'm uneasy about the fact that we touch
p->normal_prio without taking the runqueue lock. Best would be to clean
these things up by introducing a wake_up_process_prio() and
wake_up_process_chprio() method that does these things embedded in
try_to_wake_up().
> I stumbled over "//#define MUTEX_IRQS_OFF" in the first attempt. My
> assumption that all code protected by pi_lock (which is a raw lock) is
> irq save turned out to be wrong. I missed that commented define :( I
> guess it was introduced during the "IRQ latency contest" to squeeze
> out the last nsecs :)
>
> Switching it back on is not really influencing system performance in a
> measurable way, but it allows to use the pi aware boosting function in
> irq context.
>
> Quite contrary it makes the system more snappy and the overall test
> latencies go down.
we can undo that flag - it's indeed only a couple of cycles worth of
optimization, which wont count for most workloads. I've applied your
patch, but we need to do those cleanups and the fact needs to be
documented that !MUTEX_IRQS_OFF is not safe anymore. (most likely this
means that the MUTEX_IRQS_OFF flag and all related changes needs to be
gotten rid of)
Ingo
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
2005-08-22 7:38 ` [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment Ingo Molnar
@ 2005-08-22 7:41 ` Ingo Molnar
0 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2005-08-22 7:41 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Roland McGrath, Oleg Nesterov, George Anzinger, linux-kernel,
Steven Rostedt, Paul E. McKenney
* Ingo Molnar <mingo@elte.hu> wrote:
> > Quite contrary it makes the system more snappy and the overall test
> > latencies go down.
>
> we can undo that flag - it's indeed only a couple of cycles worth of
> optimization, which wont count for most workloads. I've applied your
> patch, but we need to do those cleanups and the fact needs to be
> documented that !MUTEX_IRQS_OFF is not safe anymore. (most likely this
> means that the MUTEX_IRQS_OFF flag and all related changes needs to be
> gotten rid of)
your patch gets rid of the flag, but not of all side-effects: e.g.
trace_local_irq_enable(ti) only takes a 'ti' parameter for
!MUTEX_IRQS_OFF - which does not exist anymore.
Ingo
^ permalink raw reply [flat|nested] 63+ messages in thread