[ANNOUNCE] 3.8.10-rt6

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [ANNOUNCE] 3.8.10-rt6
@ 2013-04-29 20:12 Sebastian Andrzej Siewior
  2013-04-29 21:19 ` Clark Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-29 20:12 UTC (permalink / raw)
  To: linux-rt-users; +Cc: LKML, Thomas Gleixner, rostedt

Dear RT Folks,

I'm pleased to announce the 3.8.10-rt6 release.

changes since v3.8.10-rt5:
- the i915 compiles again after I broke it in the last release. A patch
  was sent by Carsten Emde.

Known issues:

    - SLxB is broken on PowerPC.
    - suspend / resume seems to program program the timer wrong and wait
      ages until it continues.

The delta patch against v3.8.10-rt5 is appended below and can be found here:

  https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/incr/patch-3.8.10-rt5-rt6.patch.xz

The RT patch against 3.8.9 can be found here:

  https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/patch-3.8.10-rt6.patch.xz

The split quilt queue is available at:

  https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/patches-3.8.10-rt6.tar.xz

Sebastian

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 81125de..eabd3dd 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -814,6 +814,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct intel_ring_buffer *ring;
 	u32 ctx_id = i915_execbuffer2_get_context_id(*args);
 	u32 exec_start, exec_len;
+	u32 seqno;
 	u32 mask;
 	u32 flags;
 	int ret, mode, i;
@@ -1068,7 +1069,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			goto err;
 	}
 
-	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
+	seqno = intel_ring_get_seqno(ring);
+	trace_i915_gem_ring_dispatch(ring, seqno, flags);
 	i915_trace_irq_get(ring, seqno);
 
 	i915_gem_execbuffer_move_to_active(&objects, ring);
diff --git a/localversion-rt b/localversion-rt
index 0efe7ba..8fc605d 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt5
+-rt6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [ANNOUNCE] 3.8.10-rt6
  2013-04-29 20:12 [ANNOUNCE] 3.8.10-rt6 Sebastian Andrzej Siewior
@ 2013-04-29 21:19 ` Clark Williams
  2013-04-30  8:47   ` John Kacur
                     ` (2 more replies)
       [not found] ` <23187402.mkEEi1N7Lp@bs8>
  2013-05-03  4:40 ` Jain Priyanka-B32167
  2 siblings, 3 replies; 19+ messages in thread
From: Clark Williams @ 2013-04-29 21:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

On Mon, 29 Apr 2013 22:12:02 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>     - suspend / resume seems to program program the timer wrong and wait
>       ages until it continues.
> 

It has to be something we're doing when we apply RT to v3.8.x, since
v3.8.x suspends/resumes with no issues and I was able to suspend and
resume fine with the 3.6-rt series. 

I'm looking at a git diff between 3.6.11-rt30 and 3.8.9-rt4,
specifically in kernel/time* and arch/x86/kernel but so far I'm not
seeing much that's RT specific.

Clark

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [ANNOUNCE] 3.8.10-rt6
  2013-04-29 21:19 ` Clark Williams
@ 2013-04-30  8:47   ` John Kacur
  2013-04-30 10:35   ` Sebastian Andrzej Siewior
  2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
  2 siblings, 0 replies; 19+ messages in thread
From: John Kacur @ 2013-04-30  8:47 UTC (permalink / raw)
  To: Clark Williams
  Cc: Sebastian Andrzej Siewior, linux-rt-users, LKML, Thomas Gleixner,
	rostedt



On Mon, 29 Apr 2013, Clark Williams wrote:

> On Mon, 29 Apr 2013 22:12:02 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >     - suspend / resume seems to program program the timer wrong and wait
> >       ages until it continues.
> > 
> 
> It has to be something we're doing when we apply RT to v3.8.x, since
> v3.8.x suspends/resumes with no issues and I was able to suspend and
> resume fine with the 3.6-rt series.

Our v3.8x series is currently no different than "vanilla" rt.
quilt-import on top of v3.8.10, with no RH patches. So, I know you said 
that just as a polite way to say, "maybe we messed up, but...", however 
I'm confident we didn't.

Also, that must be a typo, you meant to say that v3.8.x does have issues 
with suspend / resume right?
 
> 
> I'm looking at a git diff between 3.6.11-rt30 and 3.8.9-rt4,
> specifically in kernel/time* and arch/x86/kernel but so far I'm not
> seeing much that's RT specific.
> 
> Clark
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [ANNOUNCE] 3.8.10-rt6
  2013-04-29 21:19 ` Clark Williams
  2013-04-30  8:47   ` John Kacur
@ 2013-04-30 10:35   ` Sebastian Andrzej Siewior
  2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
  2 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-30 10:35 UTC (permalink / raw)
  To: Clark Williams; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt

* Clark Williams | 2013-04-29 16:19:25 [-0500]:

>On Mon, 29 Apr 2013 22:12:02 +0200
>Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>>     - suspend / resume seems to program program the timer wrong and wait
>>       ages until it continues.
>> 
>
>It has to be something we're doing when we apply RT to v3.8.x, since
>v3.8.x suspends/resumes with no issues and I was able to suspend and
>resume fine with the 3.6-rt series. 

Are your problems gone with:

diff --git a/kernel/printk.c b/kernel/printk.c
index 6d52c34..8783ea5 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1583,6 +1583,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 */
 	if (unlikely(forced_early_printk(fmt, args)))
 		return 1;
+	if (in_nmi())
+		return 1;
 
 	boot_delay_msec(level);
 	printk_delay();

>Clark

Sebastian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-29 21:19 ` Clark Williams
  2013-04-30  8:47   ` John Kacur
  2013-04-30 10:35   ` Sebastian Andrzej Siewior
@ 2013-04-30 17:09   ` Sebastian Andrzej Siewior
  2013-04-30 18:08     ` Steven Rostedt
                       ` (4 more replies)
  2 siblings, 5 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-30 17:09 UTC (permalink / raw)
  To: Clark Williams; +Cc: linux-rt-users, Thomas Gleixner, LKML, rostedt

* Clark Williams | 2013-04-29 16:19:25 [-0500]:

>On Mon, 29 Apr 2013 22:12:02 +0200
>Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>>     - suspend / resume seems to program program the timer wrong and wait
>>       ages until it continues.
>
>It has to be something we're doing when we apply RT to v3.8.x, since
>v3.8.x suspends/resumes with no issues and I was able to suspend and
>resume fine with the 3.6-rt series. 

I think I figured out what is going on or atleast I think I did.

This log snippet is from the resume path (from suspend to mem):

[   15.052115] Enabling non-boot CPUs ...
[   15.052115] smpboot: Booting Node 0 Processor 1 APIC 0x1
[   14.841378] Initializing CPU#1
[   42.840017] [sched_delayed] sched: RT throttling activated
[   42.842144] CPU1 is up
[   42.842536] smpboot: Booting Node 0 Processor 2 APIC 0x2

Two things happen here:
- the time goes backwards from 15.X to 14.X. This is okay because the
  14.X is the timestamp from the secondary CPU not - yet synchronized
  with the bootcpu
- the printk with "CPU1 is up" is comming from the boot CPU and
  according to the timestamp about 28secs passed by. But this did not
  really happen as the whole procedure took less time.

The next thing that happens is that RCU assumes nobody is doing any
progress (for almost 28secs) and triggers NMIs & printks to get some
attention. I have a trace where
- CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
        has "lock" and is spinning for logbuf_lock

- CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
  arch_trigger_all_cpu_backtrace_handler()
        it may have logbuf_lock and is spinning for "lock"

I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
it made no progress until I ended it.
This NMI releated deadlock is a problem which should also trigger
mainline, right?

Now, the time jump on the other hand is the real issue here and is
RT-only. It looks like we get a big number of timer updates via
tick_do_update_jiffies64() because according to ktime_get() that much
time really passed by.

The sollution seems as simple as

>From c27eb2e0ab0b5acd96a4b62288976f1b72789b3e Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Tue, 30 Apr 2013 18:53:55 +0200
Subject: [PATCH] time/timekeeping: shadow tk->cycle_last together with
 clock->cycle_last

Commit ("timekeeping: Store cycle_last value in timekeeper struct as
well") introduced a tk-> based cycle_last values which needs to be reset
on resume path as well or else ktime_get() will think that time
increased a lot.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/time/timekeeping.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 99f943b..688817f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -777,6 +777,7 @@ static void timekeeping_resume(void)
 	}
 	/* re-base the last cycle value */
 	tk->clock->cycle_last = tk->clock->read(tk->clock);
+	tk->cycle_last = tk->clock->cycle_last;
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
 	timekeeping_update(tk, false, true);
-- 
1.7.10.4

So Clark, does this patch fix your problem?

>Clark

Sebastian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
@ 2013-04-30 18:08     ` Steven Rostedt
  2013-05-03  9:59       ` Sebastian Andrzej Siewior
  2013-04-30 19:18     ` Clark Williams
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2013-04-30 18:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Clark Williams, linux-rt-users, Thomas Gleixner, LKML

On Tue, 2013-04-30 at 19:09 +0200, Sebastian Andrzej Siewior wrote:

> The next thing that happens is that RCU assumes nobody is doing any
> progress (for almost 28secs) and triggers NMIs & printks to get some
> attention. I have a trace where
> - CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
>         has "lock" and is spinning for logbuf_lock
> 
> - CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
>   arch_trigger_all_cpu_backtrace_handler()
>         it may have logbuf_lock and is spinning for "lock"
> 
> I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
> it made no progress until I ended it.
> This NMI releated deadlock is a problem which should also trigger
> mainline, right?

Well, yeah, as sending out a NMI stack dump is sorta the last resort,
and is dangerous to do printks from NMI context.

> 
> Now, the time jump on the other hand is the real issue here and is
> RT-only. It looks like we get a big number of timer updates via
> tick_do_update_jiffies64() because according to ktime_get() that much
> time really passed by.

As the NMI dump only happens because of the time jump, which as you
said, is -rt only, I wouldn't say that the NMI deadlock is a mainline
bug.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 18:08     ` Steven Rostedt
@ 2013-05-03  9:59       ` Sebastian Andrzej Siewior
  2013-05-03 15:31         ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-03  9:59 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Clark Williams, linux-rt-users, Thomas Gleixner, LKML

On 04/30/2013 08:08 PM, Steven Rostedt wrote:
>> This NMI releated deadlock is a problem which should also trigger
>> mainline, right?
> 
> Well, yeah, as sending out a NMI stack dump is sorta the last resort,
> and is dangerous to do printks from NMI context.

So we did bad and we upgrade to bad and dangerous.

>>
>> Now, the time jump on the other hand is the real issue here and is
>> RT-only. It looks like we get a big number of timer updates via
>> tick_do_update_jiffies64() because according to ktime_get() that much
>> time really passed by.
> 
> As the NMI dump only happens because of the time jump, which as you
> said, is -rt only, I wouldn't say that the NMI deadlock is a mainline
> bug.

The reason for the NMI was a bug in the -RT tree but if something else
triggers that NMI we have a good chance to deadlock.

What about a try_lock() and leave after 50 usecs of trying and not
getting it in the in_nmi() case?

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-05-03  9:59       ` Sebastian Andrzej Siewior
@ 2013-05-03 15:31         ` Steven Rostedt
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2013-05-03 15:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Clark Williams, linux-rt-users, Thomas Gleixner, LKML

On Fri, 2013-05-03 at 11:59 +0200, Sebastian Andrzej Siewior wrote:
>  
> > As the NMI dump only happens because of the time jump, which as you
> > said, is -rt only, I wouldn't say that the NMI deadlock is a mainline
> > bug.
> 
> The reason for the NMI was a bug in the -RT tree but if something else
> triggers that NMI we have a good chance to deadlock.

But only if the NMI does a printk(). The only reason NMIs do printks is
when a bug is detected. But usually oops_in_progress() is called and
also zap_locks() is suppose to help prevent these problems. But that
doesn't always work.

> 
> What about a try_lock() and leave after 50 usecs of trying and not
> getting it in the in_nmi() case?

I wouldn't try too hard to fix printks for NMIs. There's many things
that can go wrong with NMIs doing a printk while another printk is
active.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
  2013-04-30 18:08     ` Steven Rostedt
@ 2013-04-30 19:18     ` Clark Williams
  2013-04-30 21:54       ` Clark Williams
  2013-04-30 22:31     ` Borislav Petkov
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Clark Williams @ 2013-04-30 19:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, Thomas Gleixner, LKML, rostedt

[-- Attachment #1: Type: text/plain, Size: 3881 bytes --]

On Tue, 30 Apr 2013 19:09:48 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> * Clark Williams | 2013-04-29 16:19:25 [-0500]:
> 
> >On Mon, 29 Apr 2013 22:12:02 +0200
> >Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >>     - suspend / resume seems to program program the timer wrong and wait
> >>       ages until it continues.
> >
> >It has to be something we're doing when we apply RT to v3.8.x, since
> >v3.8.x suspends/resumes with no issues and I was able to suspend and
> >resume fine with the 3.6-rt series. 
> 
> I think I figured out what is going on or atleast I think I did.
> 
> This log snippet is from the resume path (from suspend to mem):
> 
> [   15.052115] Enabling non-boot CPUs ...
> [   15.052115] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [   14.841378] Initializing CPU#1
> [   42.840017] [sched_delayed] sched: RT throttling activated
> [   42.842144] CPU1 is up
> [   42.842536] smpboot: Booting Node 0 Processor 2 APIC 0x2
> 
> Two things happen here:
> - the time goes backwards from 15.X to 14.X. This is okay because the
>   14.X is the timestamp from the secondary CPU not - yet synchronized
>   with the bootcpu
> - the printk with "CPU1 is up" is comming from the boot CPU and
>   according to the timestamp about 28secs passed by. But this did not
>   really happen as the whole procedure took less time.
> 
> The next thing that happens is that RCU assumes nobody is doing any
> progress (for almost 28secs) and triggers NMIs & printks to get some
> attention. I have a trace where
> - CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
>         has "lock" and is spinning for logbuf_lock
> 
> - CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
>   arch_trigger_all_cpu_backtrace_handler()
>         it may have logbuf_lock and is spinning for "lock"
> 
> I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
> it made no progress until I ended it.
> This NMI releated deadlock is a problem which should also trigger
> mainline, right?
> 
> Now, the time jump on the other hand is the real issue here and is
> RT-only. It looks like we get a big number of timer updates via
> tick_do_update_jiffies64() because according to ktime_get() that much
> time really passed by.
> 
> The sollution seems as simple as
> 
> From c27eb2e0ab0b5acd96a4b62288976f1b72789b3e Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Tue, 30 Apr 2013 18:53:55 +0200
> Subject: [PATCH] time/timekeeping: shadow tk->cycle_last together with
>  clock->cycle_last
> 
> Commit ("timekeeping: Store cycle_last value in timekeeper struct as
> well") introduced a tk-> based cycle_last values which needs to be reset
> on resume path as well or else ktime_get() will think that time
> increased a lot.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/time/timekeeping.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 99f943b..688817f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -777,6 +777,7 @@ static void timekeeping_resume(void)
>  	}
>  	/* re-base the last cycle value */
>  	tk->clock->cycle_last = tk->clock->read(tk->clock);
> +	tk->cycle_last = tk->clock->cycle_last;
>  	tk->ntp_error = 0;
>  	timekeeping_suspended = 0;
>  	timekeeping_update(tk, false, true);
> -- 
> 1.7.10.4
> 
> So Clark, does this patch fix your problem?
>

It does seem to! I've got both patches applied right now (your patch to
vprintk_emit() and the above patch) and it fixes the long delay on my
lab box. When I get done today (or have a break in the action) I'll try
it on my laptop to verify. 

Thanks Sebastian,
Clark

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 19:18     ` Clark Williams
@ 2013-04-30 21:54       ` Clark Williams
  0 siblings, 0 replies; 19+ messages in thread
From: Clark Williams @ 2013-04-30 21:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, Thomas Gleixner, LKML, rostedt

[-- Attachment #1: Type: text/plain, Size: 4224 bytes --]

On Tue, 30 Apr 2013 14:18:24 -0500
Clark Williams <williams@redhat.com> wrote:

> On Tue, 30 Apr 2013 19:09:48 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > * Clark Williams | 2013-04-29 16:19:25 [-0500]:
> > 
> > >On Mon, 29 Apr 2013 22:12:02 +0200
> > >Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > >>     - suspend / resume seems to program program the timer wrong and wait
> > >>       ages until it continues.
> > >
> > >It has to be something we're doing when we apply RT to v3.8.x, since
> > >v3.8.x suspends/resumes with no issues and I was able to suspend and
> > >resume fine with the 3.6-rt series. 
> > 
> > I think I figured out what is going on or atleast I think I did.
> > 
> > This log snippet is from the resume path (from suspend to mem):
> > 
> > [   15.052115] Enabling non-boot CPUs ...
> > [   15.052115] smpboot: Booting Node 0 Processor 1 APIC 0x1
> > [   14.841378] Initializing CPU#1
> > [   42.840017] [sched_delayed] sched: RT throttling activated
> > [   42.842144] CPU1 is up
> > [   42.842536] smpboot: Booting Node 0 Processor 2 APIC 0x2
> > 
> > Two things happen here:
> > - the time goes backwards from 15.X to 14.X. This is okay because the
> >   14.X is the timestamp from the secondary CPU not - yet synchronized
> >   with the bootcpu
> > - the printk with "CPU1 is up" is comming from the boot CPU and
> >   according to the timestamp about 28secs passed by. But this did not
> >   really happen as the whole procedure took less time.
> > 
> > The next thing that happens is that RCU assumes nobody is doing any
> > progress (for almost 28secs) and triggers NMIs & printks to get some
> > attention. I have a trace where
> > - CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
> >         has "lock" and is spinning for logbuf_lock
> > 
> > - CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
> >   arch_trigger_all_cpu_backtrace_handler()
> >         it may have logbuf_lock and is spinning for "lock"
> > 
> > I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
> > it made no progress until I ended it.
> > This NMI releated deadlock is a problem which should also trigger
> > mainline, right?
> > 
> > Now, the time jump on the other hand is the real issue here and is
> > RT-only. It looks like we get a big number of timer updates via
> > tick_do_update_jiffies64() because according to ktime_get() that much
> > time really passed by.
> > 
> > The sollution seems as simple as
> > 
> > From c27eb2e0ab0b5acd96a4b62288976f1b72789b3e Mon Sep 17 00:00:00 2001
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Date: Tue, 30 Apr 2013 18:53:55 +0200
> > Subject: [PATCH] time/timekeeping: shadow tk->cycle_last together with
> >  clock->cycle_last
> > 
> > Commit ("timekeeping: Store cycle_last value in timekeeper struct as
> > well") introduced a tk-> based cycle_last values which needs to be reset
> > on resume path as well or else ktime_get() will think that time
> > increased a lot.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >  kernel/time/timekeeping.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 99f943b..688817f 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -777,6 +777,7 @@ static void timekeeping_resume(void)
> >  	}
> >  	/* re-base the last cycle value */
> >  	tk->clock->cycle_last = tk->clock->read(tk->clock);
> > +	tk->cycle_last = tk->clock->cycle_last;
> >  	tk->ntp_error = 0;
> >  	timekeeping_suspended = 0;
> >  	timekeeping_update(tk, false, true);
> > -- 
> > 1.7.10.4
> > 
> > So Clark, does this patch fix your problem?
> >
> 
> It does seem to! I've got both patches applied right now (your patch to
> vprintk_emit() and the above patch) and it fixes the long delay on my
> lab box. When I get done today (or have a break in the action) I'll try
> it on my laptop to verify. 
> 
> Thanks Sebastian,
> Clark

Tested on my laptop which now resumes. 

Many thanks.

Clark

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
  2013-04-30 18:08     ` Steven Rostedt
  2013-04-30 19:18     ` Clark Williams
@ 2013-04-30 22:31     ` Borislav Petkov
  2013-05-02  7:59       ` Sebastian Andrzej Siewior
  2013-05-01  8:30     ` Bernhard Schiffner
  2013-05-01  8:32     ` Bernhard Schiffner
  4 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2013-04-30 22:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Clark Williams, linux-rt-users, Thomas Gleixner, LKML, rostedt

On Tue, Apr 30, 2013 at 07:09:48PM +0200, Sebastian Andrzej Siewior wrote:
> Now, the time jump on the other hand is the real issue here and is
> RT-only. It looks like we get a big number of timer updates via
> tick_do_update_jiffies64() because according to ktime_get() that much
> time really passed by.
> 
> The sollution seems as simple as
> 
> From c27eb2e0ab0b5acd96a4b62288976f1b72789b3e Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Tue, 30 Apr 2013 18:53:55 +0200
> Subject: [PATCH] time/timekeeping: shadow tk->cycle_last together with
>  clock->cycle_last
> 
> Commit ("timekeeping: Store cycle_last value in timekeeper struct as
> well") introduced a tk-> based cycle_last values which needs to be reset
> on resume path as well or else ktime_get() will think that time
> increased a lot.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/time/timekeeping.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 99f943b..688817f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -777,6 +777,7 @@ static void timekeeping_resume(void)
>  	}
>  	/* re-base the last cycle value */
>  	tk->clock->cycle_last = tk->clock->read(tk->clock);
> +	tk->cycle_last = tk->clock->cycle_last;
>  	tk->ntp_error = 0;
>  	timekeeping_suspended = 0;
>  	timekeeping_update(tk, false, true);

Didn't tlgx fix a similar issue upstream already?

77c675ba18836.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 22:31     ` Borislav Petkov
@ 2013-05-02  7:59       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-02  7:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Clark Williams, linux-rt-users, Thomas Gleixner, LKML, rostedt

On 05/01/2013 12:31 AM, Borislav Petkov wrote:
> Didn't tlgx fix a similar issue upstream already?
> 
> 77c675ba18836.

He did as it seems.

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
                       ` (2 preceding siblings ...)
  2013-04-30 22:31     ` Borislav Petkov
@ 2013-05-01  8:30     ` Bernhard Schiffner
  2013-05-01  8:32     ` Bernhard Schiffner
  4 siblings, 0 replies; 19+ messages in thread
From: Bernhard Schiffner @ 2013-05-01  8:30 UTC (permalink / raw)
  To: linux-rt-users

Am Dienstag, 30. April 2013, 19:09:48 schrieb Sebastian Andrzej Siewior:
> * Clark Williams | 2013-04-29 16:19:25 [-0500]:
> >On Mon, 29 Apr 2013 22:12:02 +0200
> >
> >Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >>     - suspend / resume seems to program program the timer wrong and wait
> >>     
> >>       ages until it continues.
> >
> >It has to be something we're doing when we apply RT to v3.8.x, since
> >v3.8.x suspends/resumes with no issues and I was able to suspend and
> >resume fine with the 3.6-rt series.
> 
> I think I figured out what is going on or atleast I think I did.
> 
> This log snippet is from the resume path (from suspend to mem):
> 
> [   15.052115] Enabling non-boot CPUs ...
> [   15.052115] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [   14.841378] Initializing CPU#1
> [   42.840017] [sched_delayed] sched: RT throttling activated
> [   42.842144] CPU1 is up
> [   42.842536] smpboot: Booting Node 0 Processor 2 APIC 0x2
> 
> Two things happen here:
> - the time goes backwards from 15.X to 14.X. This is okay because the
>   14.X is the timestamp from the secondary CPU not - yet synchronized
>   with the bootcpu
> - the printk with "CPU1 is up" is comming from the boot CPU and
>   according to the timestamp about 28secs passed by. But this did not
>   really happen as the whole procedure took less time.
> 
> The next thing that happens is that RCU assumes nobody is doing any
> progress (for almost 28secs) and triggers NMIs & printks to get some
> attention. I have a trace where
> - CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
>         has "lock" and is spinning for logbuf_lock
> 
> - CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
>   arch_trigger_all_cpu_backtrace_handler()
>         it may have logbuf_lock and is spinning for "lock"
> 
> I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
> it made no progress until I ended it.
> This NMI releated deadlock is a problem which should also trigger
> mainline, right?
> 
> Now, the time jump on the other hand is the real issue here and is
> RT-only. It looks like we get a big number of timer updates via
> tick_do_update_jiffies64() because according to ktime_get() that much
> time really passed by.
> 
> The sollution seems as simple as
> 
> From c27eb2e0ab0b5acd96a4b62288976f1b72789b3e Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Tue, 30 Apr 2013 18:53:55 +0200
> Subject: [PATCH] time/timekeeping: shadow tk->cycle_last together with
>  clock->cycle_last
> 
> Commit ("timekeeping: Store cycle_last value in timekeeper struct as
> well") introduced a tk-> based cycle_last values which needs to be reset
> on resume path as well or else ktime_get() will think that time
> increased a lot.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/time/timekeeping.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 99f943b..688817f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -777,6 +777,7 @@ static void timekeeping_resume(void)
>  	}
>  	/* re-base the last cycle value */
>  	tk->clock->cycle_last = tk->clock->read(tk->clock);
> +	tk->cycle_last = tk->clock->cycle_last;
>  	tk->ntp_error = 0;
>  	timekeeping_suspended = 0;
>  	timekeeping_update(tk, false, true);
> 
> >Clark
> 
> Sebastian
> --
This patch together with the in_nmi() patch solves the resume problem for me.
Architecture X64, patched against 3.8.10-rt6.

THANKS!

Bernhard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
                       ` (3 preceding siblings ...)
  2013-05-01  8:30     ` Bernhard Schiffner
@ 2013-05-01  8:32     ` Bernhard Schiffner
  2013-05-03 10:27       ` Sebastian Andrzej Siewior
  4 siblings, 1 reply; 19+ messages in thread
From: Bernhard Schiffner @ 2013-05-01  8:32 UTC (permalink / raw)
  To: linux-rt-users

Am Dienstag, 30. April 2013, 19:09:48 schrieb Sebastian Andrzej Siewior:
> * Clark Williams | 2013-04-29 16:19:25 [-0500]:
> >On Mon, 29 Apr 2013 22:12:02 +0200
> >
> >Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >>     - suspend / resume seems to program program the timer wrong and wait
> >>     
> >>       ages until it continues.
> >
> >It has to be something we're doing when we apply RT to v3.8.x, since
> >v3.8.x suspends/resumes with no issues and I was able to suspend and
> >resume fine with the 3.6-rt series.
> 
> I think I figured out what is going on or atleast I think I did.
> 
> This log snippet is from the resume path (from suspend to mem):
> 
> [   15.052115] Enabling non-boot CPUs ...
> [   15.052115] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [   14.841378] Initializing CPU#1
> [   42.840017] [sched_delayed] sched: RT throttling activated
> [   42.842144] CPU1 is up
> [   42.842536] smpboot: Booting Node 0 Processor 2 APIC 0x2
> 
> Two things happen here:
> - the time goes backwards from 15.X to 14.X. This is okay because the
>   14.X is the timestamp from the secondary CPU not - yet synchronized
>   with the bootcpu
> - the printk with "CPU1 is up" is comming from the boot CPU and
>   according to the timestamp about 28secs passed by. But this did not
>   really happen as the whole procedure took less time.
> 
> The next thing that happens is that RCU assumes nobody is doing any
> progress (for almost 28secs) and triggers NMIs & printks to get some
> attention. I have a trace where
> - CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
>         has "lock" and is spinning for logbuf_lock
> 
> - CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
>   arch_trigger_all_cpu_backtrace_handler()
>         it may have logbuf_lock and is spinning for "lock"
> 
> I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
> it made no progress until I ended it.
> This NMI releated deadlock is a problem which should also trigger
> mainline, right?
> 
> Now, the time jump on the other hand is the real issue here and is
> RT-only. It looks like we get a big number of timer updates via
> tick_do_update_jiffies64() because according to ktime_get() that much
> time really passed by.
> 
> The sollution seems as simple as
> 
> From c27eb2e0ab0b5acd96a4b62288976f1b72789b3e Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Tue, 30 Apr 2013 18:53:55 +0200
> Subject: [PATCH] time/timekeeping: shadow tk->cycle_last together with
>  clock->cycle_last
> 
> Commit ("timekeeping: Store cycle_last value in timekeeper struct as
> well") introduced a tk-> based cycle_last values which needs to be reset
> on resume path as well or else ktime_get() will think that time
> increased a lot.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/time/timekeeping.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 99f943b..688817f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -777,6 +777,7 @@ static void timekeeping_resume(void)
>  	}
>  	/* re-base the last cycle value */
>  	tk->clock->cycle_last = tk->clock->read(tk->clock);
> +	tk->cycle_last = tk->clock->cycle_last;
>  	tk->ntp_error = 0;
>  	timekeeping_suspended = 0;
>  	timekeeping_update(tk, false, true);
> 
> >Clark
> 
> Sebastian
> --
This patch together with the in_nmi() patch solves the resume problem for me.
Architecture X64, patched against 3.8.10-rt6.

THANKS!

Bernhard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-05-01  8:32     ` Bernhard Schiffner
@ 2013-05-03 10:27       ` Sebastian Andrzej Siewior
  2013-05-03 17:46         ` Bernhard Schiffner
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-03 10:27 UTC (permalink / raw)
  To: Bernhard Schiffner; +Cc: linux-rt-users

* Bernhard Schiffner | 2013-05-01 10:32:43 [+0200]:

>This patch together with the in_nmi() patch solves the resume problem for me.
>Architecture X64, patched against 3.8.10-rt6.

You should be able to drop the in_nmi() patch have it still working.

>THANKS!
>
>Bernhard

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
  2013-05-03 10:27       ` Sebastian Andrzej Siewior
@ 2013-05-03 17:46         ` Bernhard Schiffner
  0 siblings, 0 replies; 19+ messages in thread
From: Bernhard Schiffner @ 2013-05-03 17:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Am Freitag, 3. Mai 2013, 12:27:03 schrieb Sebastian Andrzej Siewior:
> * Bernhard Schiffner | 2013-05-01 10:32:43 [+0200]:
> >This patch together with the in_nmi() patch solves the resume problem for
> >me. Architecture X64, patched against 3.8.10-rt6.
> 
> You should be able to drop the in_nmi() patch have it still working.
Confirmed.
(oops kernel is 3.8.11-rt6 now, but working)
> 
> >THANKS!
> >
> >Bernhard
> 
> Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <23187402.mkEEi1N7Lp@bs8>]

* Re: [ANNOUNCE] 3.8.10-rt6
       [not found] ` <23187402.mkEEi1N7Lp@bs8>
@ 2013-04-30  7:26   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-30  7:26 UTC (permalink / raw)
  To: Bernhard Schiffner; +Cc: linux-rt-users, lkml

On 04/29/2013 10:46 PM, Bernhard Schiffner wrote:
>> Known issues:
>>
>>     - SLxB is broken on PowerPC.
>>     - suspend / resume seems to program program the timer wrong and wait
>>       ages until it continues.
> 
> Yes, it's a annoying problem here too.
> How can I help to solve it?

Are you referring to the PowerPC issue or suspend / resume?

> Bernhard

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [ANNOUNCE] 3.8.10-rt6
  2013-04-29 20:12 [ANNOUNCE] 3.8.10-rt6 Sebastian Andrzej Siewior
  2013-04-29 21:19 ` Clark Williams
       [not found] ` <23187402.mkEEi1N7Lp@bs8>
@ 2013-05-03  4:40 ` Jain Priyanka-B32167
  2013-05-03  8:40   ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 19+ messages in thread
From: Jain Priyanka-B32167 @ 2013-05-03  4:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, Thomas Gleixner, rostedt@goodmis.org, linux-rt-users

Hello Sebastian,

It is mentioned below that SLxB is broken.
I assume it means bit SLUB and SLAB is broken?
Can you please share the error-details/logs/scenario/steps-to-reproduce.

Regards
Priyanka

> -----Original Message-----
> From: linux-rt-users-owner@vger.kernel.org [mailto:linux-rt-users-
> owner@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Tuesday, April 30, 2013 1:42 AM
> To: linux-rt-users
> Cc: LKML; Thomas Gleixner; rostedt@goodmis.org
> Subject: [ANNOUNCE] 3.8.10-rt6
> 
> Dear RT Folks,
> 
> I'm pleased to announce the 3.8.10-rt6 release.
> 
> changes since v3.8.10-rt5:
> - the i915 compiles again after I broke it in the last release. A patch
>   was sent by Carsten Emde.
> 
> Known issues:
> 
>     - SLxB is broken on PowerPC.
>     - suspend / resume seems to program program the timer wrong and wait
>       ages until it continues.
> 
> The delta patch against v3.8.10-rt5 is appended below and can be found
> here:
> 
>   https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/incr/patch-
> 3.8.10-rt5-rt6.patch.xz
> 
> The RT patch against 3.8.9 can be found here:
> 
>   https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/patch-3.8.10-
> rt6.patch.xz
> 
> The split quilt queue is available at:
> 
>   https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/patches-3.8.10-
> rt6.tar.xz
> 
> Sebastian
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 81125de..eabd3dd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -814,6 +814,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void
> *data,
>  	struct intel_ring_buffer *ring;
>  	u32 ctx_id = i915_execbuffer2_get_context_id(*args);
>  	u32 exec_start, exec_len;
> +	u32 seqno;
>  	u32 mask;
>  	u32 flags;
>  	int ret, mode, i;
> @@ -1068,7 +1069,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void
> *data,
>  			goto err;
>  	}
> 
> -	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring),
> flags);
> +	seqno = intel_ring_get_seqno(ring);
> +	trace_i915_gem_ring_dispatch(ring, seqno, flags);
>  	i915_trace_irq_get(ring, seqno);
> 
>  	i915_gem_execbuffer_move_to_active(&objects, ring); diff --git
> a/localversion-rt b/localversion-rt index 0efe7ba..8fc605d 100644
> --- a/localversion-rt
> +++ b/localversion-rt
> @@ -1 +1 @@
> --rt5
> +-rt6
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users"
> in the body of a message to majordomo@vger.kernel.org More majordomo info
> at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [ANNOUNCE] 3.8.10-rt6
  2013-05-03  4:40 ` Jain Priyanka-B32167
@ 2013-05-03  8:40   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-03  8:40 UTC (permalink / raw)
  To: Jain Priyanka-B32167
  Cc: LKML, Thomas Gleixner, rostedt@goodmis.org, linux-rt-users

* Jain Priyanka-B32167 | 2013-05-03 04:40:33 [+0000]:

>Hello Sebastian,
Hello Jain,

>It is mentioned below that SLxB is broken.
>I assume it means bit SLUB and SLAB is broken?

Yes. It looks like that this is limited to Book-E / e500. I have here a
MPC8572DS which shows this:
|[27173.423355] ------------[ cut here ]------------
|[27173.423360] kernel BUG at mm/slab.c:3227!
|[27173.423364] Oops: Exception in kernel mode, sig: 5 [#1]
|[27173.423367] PREEMPT SMP NR_CPUS=2 MPC8572 DS
|[27173.423370] NIP: 800b236c LR: 800b2290 CTR: 802bd168
|[27173.423373] REGS: ba557b90 TRAP: 0700   Not tainted  (3.8.9-rt4-dirty)
|[27173.423378] MSR: 00029000 <CE,EE,ME>  CR: 24002444  XER: 00000000
|[27173.423402] TASK = ba101290[31018] 'hackbench' THREAD: ba556000 CPU: 0
|[27173.423402] GPR00: 800b2bb8 ba557c40 ba101290 b7374200 000106d0 00000000 00000000 00000200
|[27173.423402] GPR08: 00000001 00000008 00000008 b7a0fc60 24002462 1001a810 00000000 803c0000
|[27173.423402] GPR16: 00000001 bf002490 bf002488 803c0000 00100100 00200200 bf002480 803c32f0
|[27173.423402] GPR24: 00000000 bf0024a4 000106d0 ba556000 bf000540 bf00f200 00000003 81eb5ae0
|[27173.423412] NIP [800b236c] cache_alloc_refill+0x16c/0x7e8
|[27173.423414] LR [800b2290] cache_alloc_refill+0x90/0x7e8
|[27173.423415] Call Trace:
|[27173.423422] [ba557c40] [802d0c54] rt_spin_lock_slowlock+0x58/0x288 (unreliable)
|[27173.423426] [ba557c90] [800b2bb8] __kmalloc+0x1d0/0x204
|[27173.423432] [ba557cc0] [80236ee4] __kmalloc_reserve+0x28/0x84
|[27173.423435] [ba557ce0] [80236fc4] __alloc_skb+0x84/0x18c
|[27173.423439] [ba557d20] [802338ec] sock_alloc_send_pskb+0x1d8/0x36c
|[27173.423444] [ba557d80] [802bd414] unix_stream_sendmsg+0x2ac/0x3ec
|[27173.423453] [ba557de0] [8022e4c4] sock_aio_write+0x110/0x148
|[27173.423458] [ba557e40] [800b7030] do_sync_write+0x94/0x108
|[27173.423462] [ba557ef0] [800b7204] vfs_write+0x160/0x170
|[27173.423465] [ba557f10] [800b7308] sys_write+0x4c/0xa8
|[27173.423471] [ba557f40] [8000d3c0] ret_from_syscall+0x0/0x3c
|[27173.423473] --- Exception: c01 at 0xffad0ec
|[27173.423473]     LR = 0x100011c8
|[27173.423474] Instruction dump:
|[27173.423480] 3de0803c 62940100 62b50200 3a560008 83f60000 7f16f800 419a019c 813f0010
|[27173.423486] 815c0018 7d0a4810 39000000 7d084114 <0f080000> 7f0a4840 40990084 3bdeffff
|[27173.604492] ---[ end trace 0000000000000002 ]---

after (according to the timestamp) 7:32 hours runtime. It run was
running in one shell
| cyclictest -m -n -S -p 80 -d 0 -i 500
and the other
|while ((1)); do hackbench; done

This was done with SLAB, the backtrace is different with SLUB. I
tried with one CPU but it is same thing.

I tried MPC5200b based board and it did not do anything stupid for over
two days while doing exact the same thing.
The obvious difference is the different MMU implementation of those two.
The other difference is ~400Mhz CPU vs 1.5Ghz.

>Can you please share the error-details/logs/scenario/steps-to-reproduce.

As I wrote above, cyclictest + hackbench. My MPC8572 boots from hard
disk into a e500 based root file system (that means it uses its FPU for
floating point instead SW-emulation).

>Regards
>Priyanka

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-05-03 17:46 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-29 20:12 [ANNOUNCE] 3.8.10-rt6 Sebastian Andrzej Siewior
2013-04-29 21:19 ` Clark Williams
2013-04-30  8:47   ` John Kacur
2013-04-30 10:35   ` Sebastian Andrzej Siewior
2013-04-30 17:09   ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
2013-04-30 18:08     ` Steven Rostedt
2013-05-03  9:59       ` Sebastian Andrzej Siewior
2013-05-03 15:31         ` Steven Rostedt
2013-04-30 19:18     ` Clark Williams
2013-04-30 21:54       ` Clark Williams
2013-04-30 22:31     ` Borislav Petkov
2013-05-02  7:59       ` Sebastian Andrzej Siewior
2013-05-01  8:30     ` Bernhard Schiffner
2013-05-01  8:32     ` Bernhard Schiffner
2013-05-03 10:27       ` Sebastian Andrzej Siewior
2013-05-03 17:46         ` Bernhard Schiffner
     [not found] ` <23187402.mkEEi1N7Lp@bs8>
2013-04-30  7:26   ` [ANNOUNCE] 3.8.10-rt6 Sebastian Andrzej Siewior
2013-05-03  4:40 ` Jain Priyanka-B32167
2013-05-03  8:40   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).