All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gilles Carry <Gilles.Carry@bull.net>
To: Gregory Haskins <ghaskins@novell.com>
Cc: Chirag Jog <chirag@linux.vnet.ibm.com>,
	linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	rostedt@goodmis.org, dvhltc@us.ibm.com, dino@in.ibm.com
Subject: Re: [PATCH 2/2] RT: remove "paranoid" limit in push_rt_task
Date: Fri, 03 Oct 2008 15:46:59 +0200	[thread overview]
Message-ID: <48E62253.1090000@bull.net> (raw)
In-Reply-To: <20081003124305.17387.90233.stgit@dev.haskins.net>

Sorry Greg,

Neither PPC64 nor Intel64 make it with this patch.
At boot time, it stops at the BUG_ON you added:
0xc00000000004eca4 is in push_rt_task (kernel/sched_rt.c:1102)

I let you do more investigations.
Have a good week-end in you garage ;)

Gilles.


PPC64:
cpu 0x2: Vector: 700 (Program Check) at [c0000000ee2877b0]
     pc: c00000000004eca4: .push_rt_task+0x1f4/0x2d0
     lr: c00000000004ec24: .push_rt_task+0x174/0x2d0
     sp: c0000000ee287a30
    msr: 8000000000021032
   current = 0xc0000000ee276fe0
   paca    = 0xc0000000005c3780
     pid   = 36, comm = sirq-block/2
kernel BUG at kernel/sched_rt.c:1102!
enter ? for help
[c0000000ee287a30] c00000000004ec78 .push_rt_task+0x1c8/0x2d0 (unreliable)
[c0000000ee287ae0] c00000000004eda4 .push_rt_tasks+0x24/0x44
[c0000000ee287b70] c00000000004edf0 .post_schedule_rt+0x2c/0x50
[c0000000ee287c00] c000000000052864 .finish_task_switch+0x100/0x1a8
[c0000000ee287cb0] c0000000002cdbd0 .__schedule+0x6a0/0x75c
[c0000000ee287d90] c0000000002cdedc .schedule+0xf4/0x128
[c0000000ee287e20] c000000000061700 .ksoftirqd+0x124/0x37c
[c0000000ee287f00] c000000000076dc0 .kthread+0x84/0xd4
[c0000000ee287f90] c000000000029368 .kernel_thread+0x4c/0x68
2:mon>

Intel64:
kernel BUG at kernel/sched_rt.c:1102!
invalid opcode: 0000 [1] PREEMPT SMP
CPU 4
Modules linked in: mptsas scsi_transport_sas
Pid: 61, comm: sirq-block/4 Not tainted 2.6.26.5-rt9-00002-g3b27927-dirty #26
RIP: 0010:[<ffffffff8022b307>]  [<ffffffff8022b307>] push_rt_task+0x15f/0x20b
RSP: 0018:ffff81007f4d5d70  EFLAGS: 00010097
RAX: 0000000000000000 RBX: ffff81007edf09d0 RCX: 000000000822b765
RDX: 000000000822b765 RSI: 0000000000000000 RDI: ffff81000103f280
RBP: ffff81007f4d5da0 R08: ffff81007f4d4000 R09: ffff81007edcbe20
R10: 00000000ffffffff R11: ffffffff8021fa2c R12: 0000000000000000
R13: ffff810001034280 R14: ffff81007edf09e0 R15: ffff81000103f280
FS:  00007f2f26e776f0(0000) GS:ffff81007fc0ccc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006b9fb0 CR3: 00000001bf4c9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sirq-block/4 (pid: 61, threadinfo ffff81007f4d4000, task
ffff81007f4d0e10)
Stack:  000000007f4d5e00 ffff81000103f280 ffff81007edf09d0 ffff8101bf457540
  0000000000000001 0000000000000002 ffff81007f4d5dc0 ffffffff8022b3c7
  ffff81007f4d5de0 ffff81000103f280 ffff81007f4d5de0 ffffffff8022b3e8
Call Trace:
  [<ffffffff8022b3c7>] push_rt_tasks+0x14/0x1c
  [<ffffffff8022b3e8>] post_schedule_rt+0x19/0x25
  [<ffffffff8022d7ee>] finish_task_switch+0x73/0x121
  [<ffffffff805bbe3d>] thread_return+0x4f/0xdc
  [<ffffffff805bc066>] schedule+0xd4/0xf0
  [<ffffffff80237eeb>] ksoftirqd+0xb3/0x260
  [<ffffffff80237e38>] ? ksoftirqd+0x0/0x260
  [<ffffffff80245209>] ? kthread+0x47/0x76
  [<ffffffff8022e9f9>] ? schedule_tail+0x43/0x97
  [<ffffffff8020c3d8>] ? child_rip+0xa/0x12
  [<ffffffff802451c2>] ? kthread+0x0/0x76
  [<ffffffff8020c3ce>] ? child_rip+0x0/0x12


Code: 48 c7 c6 c0 1d 23 80 e8 83 b3 03 00 e9 ee fe ff ff 4c 89 e7 e8 b1 31 39
00 eb ba 48 8b 43 08 8b 40 18 41 3b 87 90 0e 00 00 74 04 <0f> 0b eb fe 48 89
de 4c 89 ff e8 5b fe ff ff f0 41 ff 0e 0f 94
RIP  [<ffffffff8022b307>] push_rt_task+0x15f/0x20b
  RSP <ffff81007f4d5d70>


Gregory Haskins wrote:
> A panic was discovered by Chirag Jog and investigated by Gilles Carry
> to be originating in the fact that a task being pushed away
> may get migrated away during a double_lock_balance.  The result was
> that the pushable_tasks list may become corrupted.
> 
> The root cause is that the "paranoid" retry limit could cause us to
> bail out of a retry, but still try to remove the item from the (now
> potentially incorrect) list.  There are numerous ways to correct the
> condition, but the paranoid feature is no longer relevant with the new
> pushable logic (since pushable naturally limits the loop anyway), so
> lets just remove it.
> 
> Reported By: Chirag Jog <chirag@linux.vnet.ibm.com>
> Found-by: Gilles Carry <gilles.carry@bull.net>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> ---
> 
>  kernel/sched_rt.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 59ead84..5a754fe 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -1056,7 +1056,6 @@ static int push_rt_task(struct rq *rq)
>  {
>  	struct task_struct *next_task;
>  	struct rq *lowest_rq;
> -	int paranoid = RT_MAX_TRIES;
>  
>  	if (!rq->rt.overloaded)
>  		return 0;
> @@ -1094,12 +1093,14 @@ static int push_rt_task(struct rq *rq)
>  		 * If it has, then try again.
>  		 */
>  		task = pick_next_pushable_task(rq);
> -		if (unlikely(task != next_task) && task && paranoid--) {
> +		if (unlikely(task != next_task) && task) {
>  			put_task_struct(next_task);
>  			next_task = task;
>  			goto retry;
>  		}
>  
> +		BUG_ON(task_cpu(next_task) != rq->cpu);
> +
>  		/*
>  		 * Once we have failed to push this task, we will not
>  		 * try again, since the other cpus will pull from us
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

  reply	other threads:[~2008-10-03 13:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-25 12:32 [BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang Chirag Jog
2008-09-29 18:13 ` Gregory Haskins
2008-09-29 21:18 ` Gregory Haskins
2008-09-29 21:34   ` Gregory Haskins
2008-09-29 22:00     ` Gregory Haskins
2008-09-30  4:43       ` Chirag Jog
2008-09-30  6:47         ` Gilles Carry
2008-10-01 14:22         ` [PATCH] sched: add a stacktrace on enqueue_pushable error Gregory Haskins
2008-10-02  9:42           ` Gilles Carry
2008-10-02 11:18   ` [BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang Gilles Carry
2008-10-03 12:42 ` [RT PATCH 0/2] fix for BUG_ON crash in 26.5-rt9 Gregory Haskins
2008-10-03 12:43   ` [PATCH 1/2] RT: Remove comment that is no longer true Gregory Haskins
2008-10-03 12:43   ` [PATCH 2/2] RT: remove "paranoid" limit in push_rt_task Gregory Haskins
2008-10-03 13:46     ` Gilles Carry [this message]
2008-10-03 15:45       ` Chirag Jog
2008-10-03 17:27         ` Gregory Haskins
2008-10-03 17:27           ` Gregory Haskins
2008-10-03 17:26       ` [RT PATCH v2 0/2] Series short description Gregory Haskins
2008-10-03 17:26         ` [RT PATCH v2 1/2] RT: Remove comment that is no longer true Gregory Haskins
2008-10-03 17:26         ` [RT PATCH v2 2/2] RT: remove "paranoid" limit in push_rt_task Gregory Haskins
2008-10-03 12:54   ` [RT PATCH 0/2] fix for BUG_ON crash in 26.5-rt9 Gregory Haskins
2008-10-06 15:14 ` [RT PATCH v3 0/2] Fix for "[BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang" Gregory Haskins
2008-10-06 15:14   ` [RT PATCH v3 1/2] RT: Remove comment that is no longer true Gregory Haskins
2008-10-06 15:14   ` [RT PATCH v3 2/2] RT: fix push_rt_task() to handle dequeue_pushable properly Gregory Haskins
2008-10-07  6:04   ` [RT PATCH v3 0/2] Fix for "[BUG][PPC64] BUG in 2.6.26.5-rt9 causing Hang" Gilles Carry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48E62253.1090000@bull.net \
    --to=gilles.carry@bull.net \
    --cc=chirag@linux.vnet.ibm.com \
    --cc=dino@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.