All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.23-rc9-rt1
@ 2007-10-02 16:11 Steven Rostedt
  2007-10-02 16:47 ` 2.6.23-rc9-rt1 Steven Rostedt
  2007-10-03  1:02 ` 2.6.23-rc9-rt1 Clark Williams
  0 siblings, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2007-10-02 16:11 UTC (permalink / raw)
  To: LKML, RT
  Cc: Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Daniel Walker,
	Peter Zijlstra

We are pleased to announce the 2.6.23-rc9-rt1 tree, which can be
downloaded from the new location:

 http://www.kernel.org/pub/linux/kernel/projects/rt/

Changes since 2.6.23-rc9-rt1

  - update to 2.6.23-rc9

  - Various cleanups (Daniel Walker)
     - convert PICK_OP to PICK_FUNCTION
     - use now() in latency_tracer
     - have preempt_max_latency in all modes
     - latency_hist resetting
     - Stop critical timing in idle.

  - Deadlock fix in locked list primitives (Peter Zijlstrta)

  - RT task wakeup fix (Ingo Molnar)

  - New Preempt RCU implementation (Paul E. McKenney and Steven Rostedt)

  That last change (new Preempt RCU) is highly experimental!!!
  We are currently testing it now, although it has been through some
  minor tests already, we haven't declared it stable yet.

  This new implementation might shave your cat, eat your dog and
  make your children miss the bus and be late for school.
  You have been warned! As is said many times on this list
  "If it breaks, you get to keep the pieces". So don't come crying
  to us if something terrible happens, but please let us know
  so that we can try to fix what broke.

  That said, please test it as much as possible. We are happy with
  the new implementation, but it's still young, and we want to
  shake out the problems so it can be pushed up into mainline.


to build a 2.6.23-rc9-rt1 tree, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.22.tar.bz2
  http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.23-rc9.bz2
  http://www.kernel.org/pub/linux/kernel/projects/rt/patch-2.6.23-rc9-rt1.bz2

The broken out patches are also available.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc9-rt1
  2007-10-02 16:11 2.6.23-rc9-rt1 Steven Rostedt
@ 2007-10-02 16:47 ` Steven Rostedt
  2007-10-03  1:02 ` 2.6.23-rc9-rt1 Clark Williams
  1 sibling, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2007-10-02 16:47 UTC (permalink / raw)
  To: LKML, RT
  Cc: Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Daniel Walker,
	Peter Zijlstra


--

On Tue, 2 Oct 2007, Steven Rostedt wrote:

> We are pleased to announce the 2.6.23-rc9-rt1 tree, which can be
> downloaded from the new location:
>
>  http://www.kernel.org/pub/linux/kernel/projects/rt/
>
> Changes since 2.6.23-rc9-rt1

Note, that should have been "Changes since 2.6.23-rc8-rt1".

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc9-rt1
  2007-10-02 16:11 2.6.23-rc9-rt1 Steven Rostedt
  2007-10-02 16:47 ` 2.6.23-rc9-rt1 Steven Rostedt
@ 2007-10-03  1:02 ` Clark Williams
  2007-10-03  1:58   ` 2.6.23-rc9-rt1 Steven Rostedt
  1 sibling, 1 reply; 6+ messages in thread
From: Clark Williams @ 2007-10-03  1:02 UTC (permalink / raw)
  To: Steven Rostedt, Paul E. McKenney; +Cc: RT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steven Rostedt wrote:
>   That last change (new Preempt RCU) is highly experimental!!!
>   We are currently testing it now, although it has been through some
>   minor tests already, we haven't declared it stable yet.
> 
>   This new implementation might shave your cat, eat your dog and
>   make your children miss the bus and be late for school.
>   You have been warned! As is said many times on this list
>   "If it breaks, you get to keep the pieces". So don't come crying
>   to us if something terrible happens, but please let us know
>   so that we can try to fix what broke.
> 
>   That said, please test it as much as possible. We are happy with
>   the new implementation, but it's still young, and we want to
>   shake out the problems so it can be pushed up into mainline.
> 

Luckily, I'm currently catless, dogless and childless, so no harm done :)

I'm running this kernel on a Thinkpad T60 (Core2 Duo, x86_64). When I suspended to
RAM and then resumed, my syslog window started scrolling the following:

Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac90>] mwait_idle+0x0/0x5f
Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac05>] cpu_idle+0xc7/0xee
Oct  2 19:45:46 localhost kernel:  [<ffffffff8021c6f3>] start_secondary+0x2e4/0x2f5
Oct  2 19:45:46 localhost kernel:
Oct  2 19:45:46 localhost kernel: WARNING: at include/linux/rcupreempt.h:91
rcu_enter_nohz()
Oct  2 19:45:46 localhost kernel:
Oct  2 19:45:46 localhost kernel: Call Trace:
Oct  2 19:45:46 localhost kernel:  [<ffffffff80254a0b>]
tick_nohz_stop_sched_tick+0x1c6/0x2aa
Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac90>] mwait_idle+0x0/0x5f
Oct  2 19:45:46 localhost gpm[2095]: *** err [gpm.c(529)]:
Oct  2 19:45:46 localhost gpm[2095]: select(): Interrupted system call
Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ab80>] cpu_idle+0x42/0xee
Oct  2 19:45:46 localhost kernel:  [<ffffffff8021c6f3>] start_secondary+0x2e4/0x2f5
Oct  2 19:45:46 localhost kernel:
Oct  2 19:45:46 localhost kernel: WARNING: at include/linux/rcupreempt.h:99
rcu_exit_nohz()
Oct  2 19:45:46 localhost kernel:
Oct  2 19:45:46 localhost kernel: Call Trace:
Oct  2 19:45:46 localhost kernel:  [<ffffffff80254b71>]
tick_nohz_restart_sched_tick+0x82/0x17d
Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac90>] mwait_idle+0x0/0x5f
Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac05>] cpu_idle+0xc7/0xee
Oct  2 19:45:46 localhost kernel:  [<ffffffff8021c6f3>] start_secondary+0x2e4/0x2f5
Oct  2 19:45:46 localhost kernel:
Oct  2 19:45:46 localhost kernel: WARNING: at include/linux/rcupreempt.h:91
rcu_enter_nohz()
Oct  2 19:45:46 localhost kernel:
Oct  2 19:45:46 localhost kernel: Call Trace:
Oct  2 19:45:46 localhost kernel:  [<ffffffff80254a0b>]
tick_nohz_stop_sched_tick+0x1c6/0x2aa
Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac90>] mwait_idle+0x0/0x5f
Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ab80>] cpu_idle+0x42/0xee
Oct  2 19:45:46 localhost kernel:  [<ffffffff8021c6f3>] start_secondary+0x2e4/0x2f5
Oct  2 19:45:46 localhost kernel:
Oct  2 19:45:46 localhost kernel: WARNING: at include/linux/rcupreempt.h:99
rcu_exit_nohz()


Ad infinitum. Not sure what you're looking for to be cleared in the enter and exit
functions, but it doesn't look like it's happening after a resume. Didn't seem to
affect the behavior of the kernel, since the network came up and I was able to
function normally (or as normally as I can function).

Clark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHAuoxHyuj/+TTEp0RAtwQAJ41+q49fJXmKB9+WKdphDFc/iUcyACeOgGF
5JV0gQb+fFuCf2MDjMuTyVA=
=pSNr
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc9-rt1
  2007-10-03  1:02 ` 2.6.23-rc9-rt1 Clark Williams
@ 2007-10-03  1:58   ` Steven Rostedt
  2007-10-03 13:46     ` 2.6.23-rc9-rt1 Clark Williams
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2007-10-03  1:58 UTC (permalink / raw)
  To: Clark Williams; +Cc: Paul E. McKenney, RT


--
On Tue, 2 Oct 2007, Clark Williams wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Steven Rostedt wrote:
> >   That last change (new Preempt RCU) is highly experimental!!!
> >   We are currently testing it now, although it has been through some
> >   minor tests already, we haven't declared it stable yet.
> >
> >   This new implementation might shave your cat, eat your dog and
> >   make your children miss the bus and be late for school.
> >   You have been warned! As is said many times on this list
> >   "If it breaks, you get to keep the pieces". So don't come crying
> >   to us if something terrible happens, but please let us know
> >   so that we can try to fix what broke.
> >
> >   That said, please test it as much as possible. We are happy with
> >   the new implementation, but it's still young, and we want to
> >   shake out the problems so it can be pushed up into mainline.
> >
>
> Luckily, I'm currently catless, dogless and childless, so no harm done :)

Great! So you are the perfect tester for us!

>
> I'm running this kernel on a Thinkpad T60 (Core2 Duo, x86_64). When I suspended to
> RAM and then resumed, my syslog window started scrolling the following:
>
> Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac90>] mwait_idle+0x0/0x5f
> Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac05>] cpu_idle+0xc7/0xee
> Oct  2 19:45:46 localhost kernel:  [<ffffffff8021c6f3>] start_secondary+0x2e4/0x2f5
> Oct  2 19:45:46 localhost kernel:
> Oct  2 19:45:46 localhost kernel: WARNING: at include/linux/rcupreempt.h:91
> rcu_enter_nohz()
> Oct  2 19:45:46 localhost kernel:
> Oct  2 19:45:46 localhost kernel: Call Trace:
> Oct  2 19:45:46 localhost kernel:  [<ffffffff80254a0b>]
> tick_nohz_stop_sched_tick+0x1c6/0x2aa
> Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ac90>] mwait_idle+0x0/0x5f
> Oct  2 19:45:46 localhost gpm[2095]: *** err [gpm.c(529)]:
> Oct  2 19:45:46 localhost gpm[2095]: select(): Interrupted system call
> Oct  2 19:45:46 localhost kernel:  [<ffffffff8020ab80>] cpu_idle+0x42/0xee
> Oct  2 19:45:46 localhost kernel:  [<ffffffff8021c6f3>] start_secondary+0x2e4/0x2f5
> Oct  2 19:45:46 localhost kernel:
> Oct  2 19:45:46 localhost kernel: WARNING: at include/linux/rcupreempt.h:99
> rcu_exit_nohz()

grmbl grmbl!!!

We are missing a match somewhere. Most likely in the suspend or resume
code.  It's expected that if a CPU is idle with no ticks then the
dynticks_progress_counter is even, otherwise it is odd. This check tells
us that, in your case, this isn't the case. Which _is_ bad, and I wouldn't
run it too long that way. It means that you can be getting false RCU grace
period ends, which is not a good thing.

I could put a hack in that fixes the issue when detected, and still prints
out a warning. I'll do that for now, until we find the problem area. I
think the first warning probably had the want that corrupted us, and then
we got flooded with warnings because we never fixed the situation.

Patch coming soon.

-- Steve
>
> Ad infinitum. Not sure what you're looking for to be cleared in the enter and exit
> functions, but it doesn't look like it's happening after a resume. Didn't seem to
> affect the behavior of the kernel, since the network came up and I was able to
> function normally (or as normally as I can function).

I'd reboot if I were you ;-)

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc9-rt1
  2007-10-03  1:58   ` 2.6.23-rc9-rt1 Steven Rostedt
@ 2007-10-03 13:46     ` Clark Williams
  2007-10-03 17:28       ` 2.6.23-rc9-rt1 Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Clark Williams @ 2007-10-03 13:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Paul E. McKenney, RT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steven Rostedt wrote:
> 
> grmbl grmbl!!!
> 
> We are missing a match somewhere. Most likely in the suspend or resume
> code.  It's expected that if a CPU is idle with no ticks then the
> dynticks_progress_counter is even, otherwise it is odd. This check tells
> us that, in your case, this isn't the case. Which _is_ bad, and I wouldn't
> run it too long that way. It means that you can be getting false RCU grace
> period ends, which is not a good thing.
> 
> I could put a hack in that fixes the issue when detected, and still prints
> out a warning. I'll do that for now, until we find the problem area. I
> think the first warning probably had the want that corrupted us, and then
> we got flooded with warnings because we never fixed the situation.
> 
> Patch coming soon.
> 
> -- Steve
>> Ad infinitum. Not sure what you're looking for to be cleared in the enter and exit
>> functions, but it doesn't look like it's happening after a resume. Didn't seem to
>> affect the behavior of the kernel, since the network came up and I was able to
>> function normally (or as normally as I can function).
> 
> I'd reboot if I were you ;-)

Oh, I did :)

I've since suspended and resumed a couple of more times and have not seen your RCU
warnings, so it's not completely reproduceable.

Got any debugging code you want me to add, in case it pops up again?

Clark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHA51OHyuj/+TTEp0RAn6hAJwO3JHHk+2EVjpH7XcAYu+g5CC3DQCgj1lR
4Umv2vf2iDjxHNprCXfxhvI=
=SL/V
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc9-rt1
  2007-10-03 13:46     ` 2.6.23-rc9-rt1 Clark Williams
@ 2007-10-03 17:28       ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2007-10-03 17:28 UTC (permalink / raw)
  To: Clark Williams; +Cc: Paul E. McKenney, RT


--

On Wed, 3 Oct 2007, Clark Williams wrote:

> >
> > I could put a hack in that fixes the issue when detected, and still prints
> > out a warning. I'll do that for now, until we find the problem area. I
> > think the first warning probably had the want that corrupted us, and then
> > we got flooded with warnings because we never fixed the situation.
> >
> > Patch coming soon.
> >
> > -- Steve
> >> Ad infinitum. Not sure what you're looking for to be cleared in the enter and exit
> >> functions, but it doesn't look like it's happening after a resume. Didn't seem to
> >> affect the behavior of the kernel, since the network came up and I was able to
> >> function normally (or as normally as I can function).
> >
> > I'd reboot if I were you ;-)
>
> Oh, I did :)
>
> I've since suspended and resumed a couple of more times and have not seen your RCU
> warnings, so it's not completely reproduceable.
>
> Got any debugging code you want me to add, in case it pops up again?
>

OK here's the patch to give a warning and temporarily fix the issue.

-- Steve

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux-2.6.23-rc9-rt1/include/linux/rcupreempt.h
===================================================================
--- linux-2.6.23-rc9-rt1.orig/include/linux/rcupreempt.h
+++ linux-2.6.23-rc9-rt1/include/linux/rcupreempt.h
@@ -108,7 +108,13 @@ DECLARE_PER_CPU(long, dynticks_progress_
 static inline void rcu_enter_nohz(void)
 {
 	__get_cpu_var(dynticks_progress_counter)++;
-	WARN_ON(__get_cpu_var(dynticks_progress_counter) & 0x1);
+	if (unlikely(__get_cpu_var(dynticks_progress_counter) & 0x1)) {
+		printk("BUG: bad accounting of dynamic ticks\n");
+		printk("   will try to fix, but it is best to reboot\n");
+		WARN_ON(1);
+		/* try to fix it */
+		__get_cpu_var(dynticks_progress_counter)++;
+	}
 	mb();
 }

@@ -116,7 +122,13 @@ static inline void rcu_exit_nohz(void)
 {
 	mb();
 	__get_cpu_var(dynticks_progress_counter)++;
-	WARN_ON(!(__get_cpu_var(dynticks_progress_counter) & 0x1));
+	if (unlikely(!(__get_cpu_var(dynticks_progress_counter) & 0x1))) {
+		printk("BUG: bad accounting of dynamic ticks\n");
+		printk("   will try to fix, but it is best to reboot\n");
+		WARN_ON(1);
+		/* try to fix it */
+		__get_cpu_var(dynticks_progress_counter)++;
+	}
 }

 #else /* CONFIG_NO_HZ */

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-10-03 17:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-02 16:11 2.6.23-rc9-rt1 Steven Rostedt
2007-10-02 16:47 ` 2.6.23-rc9-rt1 Steven Rostedt
2007-10-03  1:02 ` 2.6.23-rc9-rt1 Clark Williams
2007-10-03  1:58   ` 2.6.23-rc9-rt1 Steven Rostedt
2007-10-03 13:46     ` 2.6.23-rc9-rt1 Clark Williams
2007-10-03 17:28       ` 2.6.23-rc9-rt1 Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.