public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.19-rc6-rt4, changed yum repository
@ 2006-11-18 16:30 Ingo Molnar
  2006-11-19 14:39 ` Karsten Wiese
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-11-18 16:30 UTC (permalink / raw)
  To: linux-kernel

i've released the 2.6.18-rc6-rt4 tree, which can be downloaded from the 
usual place:

  http://redhat.com/~mingo/realtime-preempt/

NOTE: the YUM repository has changed the -rt kernel's package name, it's 
now kernel-rt, so it does not override the kernel package. If you have 
rt.repo already then just do a "yum install kernel-rt".

If there's no rt repository yet, the -rt YUM repository for Fedora Core 
6 can be activated via:

   cd /etc/yum.repos.d
   wget http://people.redhat.com/~mingo/realtime-preempt/rt.repo
   yum install kernel-rt

a number of fixes were done since -rt3, and merges of Linus' latest -git 
tree.

to build a 2.6.19-rc6-rt4 tree, the following patches should be applied:

  http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.18.tar.bz2
  http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.19-rc6.bz2
  http://redhat.com/~mingo/realtime-preempt/patch-2.6.19-rc6-rt4

as usual, bugreports, fixes and suggestions are welcome,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.19-rc6-rt4, changed yum repository
  2006-11-19 14:39 ` Karsten Wiese
@ 2006-11-19 13:43   ` Ingo Molnar
  2006-11-19 20:56     ` Karsten Wiese
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-11-19 13:43 UTC (permalink / raw)
  To: Karsten Wiese; +Cc: linux-kernel


* Karsten Wiese <fzu@wemgehoertderstaat.de> wrote:

> work_resched:
> 	DISABLE_INTERRUPTS
> 	call __schedule
> 					# make sure we don't miss an interrupt
> 					# setting need_resched or sigpending
> 					# between sampling and the iret
> 	movl TI_flags(%ebp), %ecx
> 	andl $_TIF_WORK_MASK, %ecx	# is there any work to be done other
> 					# than syscall tracing?
> 	jz restore_all
> 	testl $(_TIF_NEED_RESCHED|_TIF_NEED_RESCHED_DELAYED), %ecx
> 	jnz work_resched
> 
> The hwclock page_fault happens at the
>  	movl TI_flags(%ebp), %ecx
> line.

hm, weird - maybe something corrupts %ebp here? Could you try to add 
this to before the faulting instruction:

	GET_THREAD_INFO(%ebp)

this will make sure %ebp has the right contents.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.19-rc6-rt4, changed yum repository
  2006-11-18 16:30 2.6.19-rc6-rt4, changed yum repository Ingo Molnar
@ 2006-11-19 14:39 ` Karsten Wiese
  2006-11-19 13:43   ` Ingo Molnar
  2006-11-19 14:42 ` Robert Schwebel
  2006-11-20 14:14 ` Sergio Monteiro Basto
  2 siblings, 1 reply; 8+ messages in thread
From: Karsten Wiese @ 2006-11-19 14:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hi,

Boot bugs happening here on fc6 running locally compiled 2.6.19-rc-rt
UP i386 kernels on a K8.
They happen on some boots, some are ok.
If bug happens, boot seams to stop after the
"for interactive setup , press 'I'" sort of message
and continues after I enter control+c.
Next message after that is about the time having been set.
If bug happened, the time is incorrectly set mostly to 3600s
in the future.

This is the last part of SysRq+T right after such a bug happened:

 =======================
init          S [f7e6d000] F7EB1F60     0   296      1   297           286 (NOTLB)
       f7eb1f28 00000086 c014928e f7eb1f60 f7fb07b8 c191d6c4 f7eeb1a0 c191d600
       00000007 f7e6d12c f7e6d000 d4b9b4a7 00000007 0000262b c014a6ab f6867568
       00018a21 00000000 f7fc1000 00000001 f7eb1f88 f7eb1f40 c02ca94b f7feee00
Call Trace:
 [<c014928e>] do_wp_page+0x34a/0x395
 [<c014a6ab>] __handle_mm_fault+0x7b9/0x7db
 [<c02ca94b>] schedule+0xcd/0xe9
 [<c0118903>] eligible_child+0x9f/0xb0
 [<c0119d53>] do_wait+0x9f1/0xacd
 [<c0114409>] default_wake_function+0x0/0x16
 [<c0119e60>] sys_wait4+0x31/0x34
 [<c0119e8a>] sys_waitpid+0x27/0x2b
 [<c0102e49>] sysenter_past_esp+0x56/0x79
 =======================
rc.sysinit    S [f7fc1000] FF889000     0   297    296   322               (NOTLB)
       f7e76f28 00000086 c01492cf ff889000 08113fcc f6873860 f7fee600 c17f5760
       00000006 f7fc112c f7fc1000 dd1e3416 00000008 0000df7d c014a642 f687744c
       011543fb 00000000 f7e7baa0 00000001 f7e76f88 f7e76f40 c02ca94b f7fee600
Call Trace:
 [<c01492cf>] do_wp_page+0x38b/0x395
 [<c014a642>] __handle_mm_fault+0x750/0x7db
 [<c02ca94b>] schedule+0xcd/0xe9
 [<c0118903>] eligible_child+0x9f/0xb0
 [<c0119d53>] do_wait+0x9f1/0xacd
 [<c012fd91>] rt_down+0xe/0x25
 [<c0114409>] default_wake_function+0x0/0x16
 [<c0119e60>] sys_wait4+0x31/0x34
 [<c0119e8a>] sys_waitpid+0x27/0x2b
 [<c0102e49>] sysenter_past_esp+0x56/0x79
 =======================
hwclock       R [f7e7baa0] F6873224 [on rq #0]     0   322    297                     (NOTLB)
       c1902fb4 00003086 f7fee434 f6873224 f7fee434 c02cd498 00000000 00000000
       00000001 f7e7bbcc f7e7baa0 795f1ab5 0000000d 003cda22 c02e8fc2 00000004
       4e8951e9 00000002 00240384 00000000 00000000 c1902000 c0102f22 00240384
Call Trace:
 [<c02cd498>] do_page_fault+0x2b9/0x552
 [<c0102f22>] work_resched+0x6/0x19
 =======================

work_resched sits in entry.S:

work_resched:
	DISABLE_INTERRUPTS
	call __schedule
					# make sure we don't miss an interrupt
					# setting need_resched or sigpending
					# between sampling and the iret
	movl TI_flags(%ebp), %ecx
	andl $_TIF_WORK_MASK, %ecx	# is there any work to be done other
					# than syscall tracing?
	jz restore_all
	testl $(_TIF_NEED_RESCHED|_TIF_NEED_RESCHED_DELAYED), %ecx
	jnz work_resched

The hwclock page_fault happens at the
 	movl TI_flags(%ebp), %ecx
line.

Will try the yum repo kernel next.

      Karsten


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.19-rc6-rt4, changed yum repository
  2006-11-18 16:30 2.6.19-rc6-rt4, changed yum repository Ingo Molnar
  2006-11-19 14:39 ` Karsten Wiese
@ 2006-11-19 14:42 ` Robert Schwebel
  2006-11-20 14:14 ` Sergio Monteiro Basto
  2 siblings, 0 replies; 8+ messages in thread
From: Robert Schwebel @ 2006-11-19 14:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo,

On Sat, Nov 18, 2006 at 05:30:32PM +0100, Ingo Molnar wrote:
> i've released the 2.6.18-rc6-rt4 tree, which can be downloaded from
> the usual place:

I have problems runinng cyclictest on later -rt kernels, up to -rt4.

This is a Dell Latitude D610, with Pentium M & ICH6 chipset.

Symptoms: 

- "cyclictest -n -p 90 -t 4" is just "running upwards"

  With 2.6.18-rt5 it looked like this, much better:

  isonoe:/home/rsc/tmp/rtpreempt/cyclictest-0.11# ./cyclictest -n -t 4 -p 90 -l 1000
  1.51 0.99 0.41 1/117 9885          
  
  T: 0 ( 9802) P:90 I:    1000 C:    1000 Min:       7 Act:      11 Avg:      16 Max:     561
  T: 1 ( 9803) P:89 I:    1500 C:     671 Min:       7 Act:      11 Avg:      15 Max:     498
  T: 2 ( 9804) P:88 I:    2000 C:     504 Min:       7 Act:      11 Avg:      16 Max:     533
  T: 3 ( 9805) P:87 I:    2500 C:     403 Min:       6 Act:      11 Avg:      13 Max:     394

- Using a relative timer (-r), I get huge latencies:

  isonoe:/home/rsc/tmp/rtpreempt/cyclictest-0.11# ./cyclictest -n -p 90 -t 4 -r -l 1000
  0.47 0.76 0.61 3/128 16112           
  
  T: 0 (15901) P:90 I:    1000 C:    1000 Min:    2963 Act:    2990 Avg:    4334 Max:    7039
  T: 1 (15902) P:89 I:    1500 C:     667 Min:    4003 Act:    6495 Avg:    6492 Max:    6541
  T: 2 (15903) P:88 I:    2000 C:     667 Min:    3498 Act:    5995 Avg:    5993 Max:    6041
  T: 3 (15904) P:87 I:    2500 C:     667 Min:    5433 Act:    5488 Avg:    5498 Max:    6898

This is with "lapic lapictimer" on the kernel commandline. I've manually
switched the clocksource from pit to tsc:

isonoe:/sys/devices/system/clocksource/clocksource0# cat available_clocksource 
jiffies tsc pit 
isonoe:/sys/devices/system/clocksource/clocksource0# cat current_clocksource 
tsc 

Some related dmesg output:

Kernel command line: root=/dev/sda7 ro lapic lapictimer
Found and enabled local APIC!
mapped APIC to ffffd000 (fee00000)
...
WARNING: experimental RCU implementation.
Clock event device pit configured with caps set: 07
...
Clock event device pit new caps set: 03
Clock event device lapic configured with caps set: 04
...
Time: pit clocksource has been installed.
...
*****************************************************************************
*                                                                           *
*  REMINDER, the following debugging options are turned on in your .config: *
*                                                                           *
*        CONFIG_CRITICAL_PREEMPT_TIMING                                     *
*        CONFIG_CRITICAL_IRQSOFF_TIMING                                     *
*        CONFIG_FUNCTION_TRACE                                              *
*                                                                           *
*  they may increase runtime overhead and latencies.                        *
*                                                                           *
*****************************************************************************

Last time this machine worked was with 2.6.18-rt5, 2.6.18-rt7 didn't work any
more. If it is helpful, I can do some further boot tests to find out which
patch broke it. 

Robert
-- 
 Dipl.-Ing. Robert Schwebel | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry
   Handelsregister:  Amtsgericht Hildesheim, HRA 2686
     Hannoversche Str. 2, 31134 Hildesheim, Germany
   Phone: +49-5121-206917-0 |  Fax: +49-5121-206917-9


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.19-rc6-rt4, changed yum repository
  2006-11-19 13:43   ` Ingo Molnar
@ 2006-11-19 20:56     ` Karsten Wiese
  2006-11-19 21:14       ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Karsten Wiese @ 2006-11-19 20:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Am Sonntag, 19. November 2006 14:43 schrieb Ingo Molnar:
> 
> * Karsten Wiese <fzu@wemgehoertderstaat.de> wrote:
> 
> > work_resched:
> > 	DISABLE_INTERRUPTS
> > 	call __schedule
> > 					# make sure we don't miss an interrupt
> > 					# setting need_resched or sigpending
> > 					# between sampling and the iret
> > 	movl TI_flags(%ebp), %ecx
> > 	andl $_TIF_WORK_MASK, %ecx	# is there any work to be done other
> > 					# than syscall tracing?
> > 	jz restore_all
> > 	testl $(_TIF_NEED_RESCHED|_TIF_NEED_RESCHED_DELAYED), %ecx
> > 	jnz work_resched
> > 
> > The hwclock page_fault happens at the
> >  	movl TI_flags(%ebp), %ecx
> > line.
> 
> hm, weird - maybe something corrupts %ebp here? Could you try to add 
> this to before the faulting instruction:
> 
> 	GET_THREAD_INFO(%ebp)
> 
> this will make sure %ebp has the right contents.

Doesn't make a difference:
clock is still set an hour early during boot occasionally.
An hour offset I also get when I comment out the hwclock call in
rc.sysinit.

The Sysrq+T output with GET_THREAD_INFO(%ebp) has:
  =======================
 hwclock       R [f7f76550] C1B07224 [on rq #0]     0   329    304                     (NOTLB)
        f7f6efb4 00003086 c1907434 c1b07224 c1907434 c02d320f 00000000 00000000
        00000001 f7f7667c f7f76550 ad91991e 00000008 001349cd c02ef1fe 00000004
        d1292e17 00000000 000cc113 00000000 00000000 f7f6e000 c0102f22 000cc113
 Call Trace:
  [<c02d320f>] do_page_fault+0x2b9/0x552
  [<c0102f22>] work_resched+0x6/0x20
  =======================

The  [<c0102f22>] work_resched+0x6/0x20 corresponds to
	mov    $0xfffff000,%ebp
in:
(gdb) disassemble work_resched
Dump of assembler code for function work_resched:
0x000001c0 <work_resched+0>:    cli
0x000001c1 <work_resched+1>:    call   0x1c2 <work_resched+2>
0x000001c6 <work_resched+6>:    mov    $0xfffff000,%ebp
0x000001cb <work_resched+11>:   and    %esp,%ebp
0x000001cd <work_resched+13>:   mov    0x8(%ebp),%ecx
0x000001d0 <work_resched+16>:   and    $0xfe3e,%ecx
0x000001d6 <work_resched+22>:   je     0x16a <restore_all>
0x000001d8 <work_resched+24>:   test   $0x80008,%ecx
0x000001de <work_resched+30>:   jne    0x1c0 <work_resched>
End of assembler dump.

But "mov    $0xfffff000,%ebp" can't cause a pagefault.
So either the Sysrq+T output is wrong or the actual page_fault happens
inside the "call __schedule" with __schedule missing from the
Call Trace.

Your yum-repo kernel seams to stay clear of above problem.
Obvious differences to my .config are SMP <> UP
and M686+X86_GENERIC <> MK8.

      Karsten



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.19-rc6-rt4, changed yum repository
  2006-11-19 20:56     ` Karsten Wiese
@ 2006-11-19 21:14       ` Ingo Molnar
  2006-11-21  0:04         ` Karsten Wiese
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-11-19 21:14 UTC (permalink / raw)
  To: Karsten Wiese; +Cc: linux-kernel


* Karsten Wiese <fzu@wemgehoertderstaat.de> wrote:

>  Call Trace:
>   [<c02d320f>] do_page_fault+0x2b9/0x552
>   [<c0102f22>] work_resched+0x6/0x20

> The  [<c0102f22>] work_resched+0x6/0x20 corresponds to
> 	mov    $0xfffff000,%ebp

> 0x000001c1 <work_resched+1>:    call   0x1c2 <work_resched+2>
> 0x000001c6 <work_resched+6>:    mov    $0xfffff000,%ebp

no, it's the call's return address that is work_resched+6.

to get a more usable snapshot of what this task is doing you'd need 
something like SysRq-P output. (that works on PREEMPT_RT only if you 
enable /proc/sys/kernel/debug_direct_keyboard - but careful, it might 
break if you generate too many interrupts - i usually only to do the 
SysRq-P and hope that it doesnt break then.)

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.19-rc6-rt4, changed yum repository
  2006-11-18 16:30 2.6.19-rc6-rt4, changed yum repository Ingo Molnar
  2006-11-19 14:39 ` Karsten Wiese
  2006-11-19 14:42 ` Robert Schwebel
@ 2006-11-20 14:14 ` Sergio Monteiro Basto
  2 siblings, 0 replies; 8+ messages in thread
From: Sergio Monteiro Basto @ 2006-11-20 14:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Sat, 2006-11-18 at 17:30 +0100, Ingo Molnar wrote:
> i've released the 2.6.18-rc6-rt4 tree, which can be downloaded from the 
> usual place:
> 
>   http://redhat.com/~mingo/realtime-preempt/
> 
> NOTE: the YUM repository has changed the -rt kernel's package name, it's 
> now kernel-rt, so it does not override the kernel package. If you have 
> rt.repo already then just do a "yum install kernel-rt".
> 
> If there's no rt repository yet, the -rt YUM repository for Fedora Core 
> 6 can be activated via:

Can you provide kernel-rt-devel for compile nvidia DRI kernel module ? 

I had tried 2.6.19-rc6-rt4, and no regressions from 2.6.18+dyntick.

but I still can freeze computer when make heavy copy to SATA HD. 
Thanks,

Sérgio M. B. 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.19-rc6-rt4, changed yum repository
  2006-11-19 21:14       ` Ingo Molnar
@ 2006-11-21  0:04         ` Karsten Wiese
  0 siblings, 0 replies; 8+ messages in thread
From: Karsten Wiese @ 2006-11-21  0:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Am Sonntag, 19. November 2006 22:14 schrieb Ingo Molnar:
> 
> * Karsten Wiese <fzu@wemgehoertderstaat.de> wrote:
> 
> >  Call Trace:
> >   [<c02d320f>] do_page_fault+0x2b9/0x552
> >   [<c0102f22>] work_resched+0x6/0x20
> 
> > The  [<c0102f22>] work_resched+0x6/0x20 corresponds to
> > 	mov    $0xfffff000,%ebp
> 
> > 0x000001c1 <work_resched+1>:    call   0x1c2 <work_resched+2>
> > 0x000001c6 <work_resched+6>:    mov    $0xfffff000,%ebp
> 
> no, it's the call's return address that is work_resched+6.
> 
> to get a more usable snapshot of what this task is doing you'd need 
> something like SysRq-P output. (that works on PREEMPT_RT only if you 
> enable /proc/sys/kernel/debug_direct_keyboard - but careful, it might 
> break if you generate too many interrupts - i usually only to do the 
> SysRq-P and hope that it doesnt break then.)

Thanks. It turned out, it was me having missed to .config enable
	"Enhanced Real Time Clock Support"
. Sorry for having bothered you about this.
hwclock then couldn't open /dev/rtc, fell back to iopl(3) hw access.
And that failed sometimes ;-)

I'd still like to know, why are there the do_page_fault() SysRq+T
traces under hwclock context, while hwclock userspace is in a loop
doing iopl(3)ed io-access? 

hwclock's loop active then is:
	for (i = 0; !cmos_clock_busy(); i++)
		if (i >= 10000000)
			return 1;
with:
	int cmos_clock_busy()
	{
		return   /* poll bit 7 (UIP) of Control Register A */
	    (hclock_read(10) & 0x80);
	}
with hand interpreted:
	unsigned long hclock_read(unsigned long reg)
	{
		unsigned long v;
		__asm__ volatile ("cli");
		outb (reg, clock_ctl_addr);
		 v = inb (clock_data_addr);
		__asm__ volatile ("sti");
		return v;
	}

I guess:
SysRq+T doesn't care about userspace eip and just displays hwclock's
last used kernel stack?

      Karsten


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-11-21  0:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-18 16:30 2.6.19-rc6-rt4, changed yum repository Ingo Molnar
2006-11-19 14:39 ` Karsten Wiese
2006-11-19 13:43   ` Ingo Molnar
2006-11-19 20:56     ` Karsten Wiese
2006-11-19 21:14       ` Ingo Molnar
2006-11-21  0:04         ` Karsten Wiese
2006-11-19 14:42 ` Robert Schwebel
2006-11-20 14:14 ` Sergio Monteiro Basto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox