All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	linuxppc-dev@ozlabs.org, linux-next@vger.kernel.org,
	nacc@us.ibm.com, Andrew Morton <akpm@linux-foundation.org>,
	Balbir Singh <balbir@linux.vnet.ibm.com>
Subject: Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops  while running kernbench and tbench on powerpc
Date: Thu, 24 Apr 2008 11:35:15 +0530	[thread overview]
Message-ID: <4810231B.6020105@linux.vnet.ibm.com> (raw)
In-Reply-To: <18446.61538.620549.715043@cargo.ozlabs.ibm.com>

Paul Mackerras wrote:
> Kamalesh Babulal writes:
> 
>> After applying the patch above and the patch posted on
>> http://lkml.org/lkml/2008/4/8/42
>> the bug had the following information,
> 
> Thanks.  The patch below, against Linus' current git tree, fixes one
> bug that might be the cause of the problem, and also attempts to
> detect the erroneous situation earlier and fix it up, and also print
> some debug information.  Please try to reproduce the problem with this
> patch applied, and if there are any console log messages starting with
> SLB: or FWNMI:, please send me the console log.
> 
> Paul.
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index c0db5b7..f7f0962 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -439,6 +439,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_1T_SEGMENT)
>  	mr	r1,r8		/* start using new stack pointer */
>  	std	r7,PACAKSAVE(r13)
> 
> +	/* check that SLB entry 2 contains the right thing */
> +	clrrdi	r6,r1,28
> +	clrldi.	r0,r6,2
> +	beq	3f
> +	li	r0,2
> +	slbmfee	r7,r0
> +	oris	r6,r6,SLB_ESID_V@h
> +	cmpd	r6,r7
> +	beq	3f
> +	bl	bad_slb_switch
> +	ld	r3,PACACURRENT(r13)
> +	addi	r3,r3,THREAD
> +3:
>  	ld	r6,_CCR(r1)
>  	mtcrf	0xFF,r6
> 
> @@ -540,6 +553,19 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
>  	ld	r4,_XER(r1)
>  	mtspr	SPRN_XER,r4
> 
> +	/* check that SLB entry 2 contains the right thing */
> +	clrrdi	r6,r1,28	/* stack ESID */
> +	clrldi.	r0,r6,2
> +	beq	57f
> +	li	r0,2
> +	slbmfee	r7,r0
> +	oris	r6,r6,SLB_ESID_V@h
> +	cmpd	r6,r7
> +	beq	57f
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	bl	bad_slb_exc
> +	ld	r3,_MSR(r1)
> +57:
>  	REST_8GPRS(5, r1)
> 
>  	andi.	r0,r3,MSR_RI
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index be35ffa..c938134 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -45,6 +45,7 @@
>  #include <asm/system.h>
>  #include <asm/mpic.h>
>  #include <asm/vdso_datapage.h>
> +#include <asm/mmu.h>
>  #ifdef CONFIG_PPC64
>  #include <asm/paca.h>
>  #endif
> @@ -580,6 +581,10 @@ int __devinit start_secondary(void *unused)
>  	atomic_inc(&init_mm.mm_count);
>  	current->active_mm = &init_mm;
> 
> +	/* Bolt in the entry for the kernel stack now */
> +	if (cpu_has_feature(CPU_FTR_SLB))
> +		slb_flush_and_rebolt();
> +
>  	smp_store_cpu_info(cpu);
>  	set_dec(tb_ticks_per_jiffy);
>  	preempt_disable();
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 906daed..bb7765b 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -309,3 +309,34 @@ void slb_initialize(void)
>  	 * one. */
>  	asm volatile("isync":::"memory");
>  }
> +
> +static void dump_slb(void)
> +{
> +	long entry;
> +	unsigned long esid, vsid;
> +
> +	printk(KERN_EMERG "SLB contents now:\n");
> +	for (entry = 0; entry < 64; ++entry) {
> +		asm volatile("slbmfee  %0,%1" : "=r" (esid) : "r" (entry));
> +		if (esid == 0)
> +			/* valid bit is clear along with everything else */
> +			continue;
> +		asm volatile("slbmfev  %0,%1" : "=r" (vsid) : "r" (entry));
> +		printk(KERN_EMERG "%d: %.16lx %.16lx\n", entry, esid, vsid);
> +	}
> +}
> +
> +void bad_slb_exc(struct pt_regs *regs)
> +{
> +	printk(KERN_EMERG "SLB: stack not bolted on exception return\n");
> +	dump_slb();
> +	slb_flush_and_rebolt();
> +	show_regs(regs);
> +}
> +
> +void bad_slb_switch(void)
> +{
> +	printk(KERN_EMERG "SLB: stack not bolted on context switch\n");
> +	dump_slb();
> +	slb_flush_and_rebolt();
> +}
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index a1ab25c..ed68083 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -325,6 +325,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log * err)
> 
>  	if (err->disposition == RTAS_DISP_FULLY_RECOVERED) {
>  		/* Platform corrected itself */
> +		printk(KERN_ALERT "FWNMI: platform corrected error %.16lx\n",
> +		       *(unsigned long *)err);
>  		nonfatal = 1;
>  	} else if ((regs->msr & MSR_RI) &&
>  		   user_mode(regs) &&

Hi Paul,

Thanks, after applying the patch the oops is not reproducible on the machine. The console
log had no message starting with SLB: or FWNMI:. I have updated the bugzilla also.

Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

WARNING: multiple messages have this Message-ID (diff)
From: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	linux-next@vger.kernel.org, linuxppc-dev@ozlabs.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Whitcroft <apw@shadowen.org>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	nacc@us.ibm.com
Subject: Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops  while running kernbench and tbench on powerpc
Date: Thu, 24 Apr 2008 11:35:15 +0530	[thread overview]
Message-ID: <4810231B.6020105@linux.vnet.ibm.com> (raw)
In-Reply-To: <18446.61538.620549.715043@cargo.ozlabs.ibm.com>

Paul Mackerras wrote:
> Kamalesh Babulal writes:
> 
>> After applying the patch above and the patch posted on
>> http://lkml.org/lkml/2008/4/8/42
>> the bug had the following information,
> 
> Thanks.  The patch below, against Linus' current git tree, fixes one
> bug that might be the cause of the problem, and also attempts to
> detect the erroneous situation earlier and fix it up, and also print
> some debug information.  Please try to reproduce the problem with this
> patch applied, and if there are any console log messages starting with
> SLB: or FWNMI:, please send me the console log.
> 
> Paul.
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index c0db5b7..f7f0962 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -439,6 +439,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_1T_SEGMENT)
>  	mr	r1,r8		/* start using new stack pointer */
>  	std	r7,PACAKSAVE(r13)
> 
> +	/* check that SLB entry 2 contains the right thing */
> +	clrrdi	r6,r1,28
> +	clrldi.	r0,r6,2
> +	beq	3f
> +	li	r0,2
> +	slbmfee	r7,r0
> +	oris	r6,r6,SLB_ESID_V@h
> +	cmpd	r6,r7
> +	beq	3f
> +	bl	bad_slb_switch
> +	ld	r3,PACACURRENT(r13)
> +	addi	r3,r3,THREAD
> +3:
>  	ld	r6,_CCR(r1)
>  	mtcrf	0xFF,r6
> 
> @@ -540,6 +553,19 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
>  	ld	r4,_XER(r1)
>  	mtspr	SPRN_XER,r4
> 
> +	/* check that SLB entry 2 contains the right thing */
> +	clrrdi	r6,r1,28	/* stack ESID */
> +	clrldi.	r0,r6,2
> +	beq	57f
> +	li	r0,2
> +	slbmfee	r7,r0
> +	oris	r6,r6,SLB_ESID_V@h
> +	cmpd	r6,r7
> +	beq	57f
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	bl	bad_slb_exc
> +	ld	r3,_MSR(r1)
> +57:
>  	REST_8GPRS(5, r1)
> 
>  	andi.	r0,r3,MSR_RI
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index be35ffa..c938134 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -45,6 +45,7 @@
>  #include <asm/system.h>
>  #include <asm/mpic.h>
>  #include <asm/vdso_datapage.h>
> +#include <asm/mmu.h>
>  #ifdef CONFIG_PPC64
>  #include <asm/paca.h>
>  #endif
> @@ -580,6 +581,10 @@ int __devinit start_secondary(void *unused)
>  	atomic_inc(&init_mm.mm_count);
>  	current->active_mm = &init_mm;
> 
> +	/* Bolt in the entry for the kernel stack now */
> +	if (cpu_has_feature(CPU_FTR_SLB))
> +		slb_flush_and_rebolt();
> +
>  	smp_store_cpu_info(cpu);
>  	set_dec(tb_ticks_per_jiffy);
>  	preempt_disable();
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 906daed..bb7765b 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -309,3 +309,34 @@ void slb_initialize(void)
>  	 * one. */
>  	asm volatile("isync":::"memory");
>  }
> +
> +static void dump_slb(void)
> +{
> +	long entry;
> +	unsigned long esid, vsid;
> +
> +	printk(KERN_EMERG "SLB contents now:\n");
> +	for (entry = 0; entry < 64; ++entry) {
> +		asm volatile("slbmfee  %0,%1" : "=r" (esid) : "r" (entry));
> +		if (esid == 0)
> +			/* valid bit is clear along with everything else */
> +			continue;
> +		asm volatile("slbmfev  %0,%1" : "=r" (vsid) : "r" (entry));
> +		printk(KERN_EMERG "%d: %.16lx %.16lx\n", entry, esid, vsid);
> +	}
> +}
> +
> +void bad_slb_exc(struct pt_regs *regs)
> +{
> +	printk(KERN_EMERG "SLB: stack not bolted on exception return\n");
> +	dump_slb();
> +	slb_flush_and_rebolt();
> +	show_regs(regs);
> +}
> +
> +void bad_slb_switch(void)
> +{
> +	printk(KERN_EMERG "SLB: stack not bolted on context switch\n");
> +	dump_slb();
> +	slb_flush_and_rebolt();
> +}
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index a1ab25c..ed68083 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -325,6 +325,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log * err)
> 
>  	if (err->disposition == RTAS_DISP_FULLY_RECOVERED) {
>  		/* Platform corrected itself */
> +		printk(KERN_ALERT "FWNMI: platform corrected error %.16lx\n",
> +		       *(unsigned long *)err);
>  		nonfatal = 1;
>  	} else if ((regs->msr & MSR_RI) &&
>  		   user_mode(regs) &&

Hi Paul,

Thanks, after applying the patch the oops is not reproducible on the machine. The console
log had no message starting with SLB: or FWNMI:. I have updated the bugzilla also.

Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

  reply	other threads:[~2008-04-24  6:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-20 15:01 [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc Kamalesh Babulal
2008-02-20 15:01 ` Kamalesh Babulal
2008-04-08  8:21 ` Paul Mackerras
2008-04-08  8:21   ` Paul Mackerras
2008-04-08 11:51   ` Kamalesh Babulal
2008-04-08 11:51     ` Kamalesh Babulal
2008-04-08 12:53     ` Paul Mackerras
2008-04-08 12:53       ` Paul Mackerras
2008-04-08 17:45       ` Kamalesh Babulal
2008-04-08 17:45         ` Kamalesh Babulal
2008-04-08 23:26     ` Paul Mackerras
2008-04-08 23:26       ` Paul Mackerras
2008-04-09  5:20       ` Kamalesh Babulal
2008-04-09  5:20         ` Kamalesh Babulal
2008-04-14 10:04         ` Paul Mackerras
2008-04-14 10:04           ` Paul Mackerras
2008-04-14 13:28           ` Kamalesh Babulal
2008-04-14 13:28             ` Kamalesh Babulal
2008-04-23  8:16             ` Paul Mackerras
2008-04-23  8:16               ` Paul Mackerras
2008-04-24  6:05               ` Kamalesh Babulal [this message]
2008-04-24  6:05                 ` Kamalesh Babulal
2008-05-09  3:15                 ` Paul Mackerras
2008-05-09  3:15                   ` Paul Mackerras
2008-05-10 16:43                   ` Kamalesh Babulal
2008-05-10 16:43                     ` Kamalesh Babulal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4810231B.6020105@linux.vnet.ibm.com \
    --to=kamalesh@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=nacc@us.ibm.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.