simulate a bad NAND block cause kernel hang

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* simulate a bad NAND block cause kernel hang
@ 2005-08-25 13:27 ahgu
  2005-08-25 13:44 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: ahgu @ 2005-08-25 13:27 UTC (permalink / raw)
  To: ahgu, linux-mtd

I forced the flash_erase function to fail. I expect the jffs2 will pick up 
the return error and mark the block bad and put the bad block in a bad_block 
list. But what I get is kernel failure:
I get similar error when I simulate a write error.
Am I doing the bad block simulation correctly? Is this a correct response? 
What is supposed to happen when the NAND flash grow a bad block?


-ahgu

erasing 615
Erase at 0x0001c000 failed immediately: errno 7
jffs2_erase_failedScheduling in interrupt
kernel BUG at sched.c:676!
Unable to handle kernel paging request at virtual address 00000000, epc == 
80112220, ra == 80112220
Oops in fault.c:do_page_fault, line 225:
$0 : 00000000 1000f800 0000001b 00000001 816ee000 00000000 00000001 00001893
$8 : 00001893 00000000 00000000 00000000 802a9459 fffffff9 0000000a 802edd0a
$16: 8028e260 802ec000 00000000 802ad928 818cbe2c 818cbe28 80106000 802ad924
$24: ffffffff 00000002                   802ec000 802ede38 802ede38 80112220
Hi : 000247ff
Lo : befc0000
epc  : 80112220    Not tainted
Status: 1000f803
Cause : 1080000c
Process kupdated (pid: 6, stackpage=802ec000)
Stack: 80245460 80245538 000002a4 00001875 802fe504 009a0000 818cbd10 
802ad928
       818cbe2c 818cbe28 802fe4b0 8029ae8c 00000000 801db1f8 00000010 
802edeb8
       802edea8 1000f801 00000000 802ec000 802fe508 802fe508 0000000a 
00000266
       8107c720 818cbd10 818cbd10 816af120 818cbe2c 818cbe28 00000001 
8029ae8c
       00000000 801d29b0 802fe400 8107c720 818cbd10 816af120 801852a0 
801851fc
       80253518 ...
Call Trace: [<80245460>] [<80245538>] [<801db1f8>] [<801d29b0>] [<801852a0>] 
[<801851fc>]
 [<80253518>] [<801854ac>] [<801853f8>] [<80186738>] [<80186730>] 
[<8014004c>]
 [<8013f11c>] [<8013f610>] [<8013f3d8>] [<8013f3d8>] [<801089e8>] 
[<80140eb4>]
 [<801089d8>]

Code: 24a55538  0c0458b5  240602a4 <ac000000> 0012a940  3c0a8029  254a2040 
01555021  40016000
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
 * 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: simulate a bad NAND block cause kernel hang
  2005-08-25 13:27 simulate a bad NAND block cause kernel hang ahgu
@ 2005-08-25 13:44 ` Thomas Gleixner
  2005-08-25 15:41   ` ahgu
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2005-08-25 13:44 UTC (permalink / raw)
  To: ahgu; +Cc: linux-mtd

On Thu, 2005-08-25 at 09:27 -0400, ahgu wrote:
> I forced the flash_erase function to fail. I expect the jffs2 will pick up 
> the return error and mark the block bad and put the bad block in a bad_block 
> list. But what I get is kernel failure:
> I get similar error when I simulate a write error.
> Am I doing the bad block simulation correctly? Is this a correct response? 
> What is supposed to happen when the NAND flash grow a bad block?

JFFS2 should handle this.

The oops trace is worthless, as it does not show the stack trace in
human readable form (function names decoded) 

Make sure that CONFIG_KALLSYMS is set in your kernel .config file. 

Also information about kernel version and possibly applied MTD/JFFS2
patches is missing.


tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: simulate a bad NAND block cause kernel hang
  2005-08-25 13:44 ` Thomas Gleixner
@ 2005-08-25 15:41   ` ahgu
  2005-08-25 15:47     ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: ahgu @ 2005-08-25 15:41 UTC (permalink / raw)
  To: tglx; +Cc: linux-mtd

I am using 2.4.18 kernel.

jffs2_erase_failed(c, jeb);is the last function before the fault condition 
get triggered.
Where can I find the diff between 2.4.18 and 2.4.20?

void jffs2_erase_block(struct jffs2_sb_info *c, struct jffs2_eraseblock 
*jeb)
{
 int ret;
#ifdef __ECOS
       ret = jffs2_flash_erase(c, jeb);
       if (!ret) {
               jffs2_erase_succeeded(c, jeb);
               return;
       }
#else /* Linux */
 struct erase_info *instr;

 instr = kmalloc(sizeof(struct erase_info) + sizeof(struct 
erase_priv_struct), GFP_KERNEL);
 if (!instr) {
  printk(KERN_WARNING "kmalloc for struct erase_info in jffs2_erase_block 
failed. Refiling block for later\n");
  spin_lock(&c->erase_completion_lock);
  list_del(&jeb->list);
  list_add(&jeb->list, &c->erase_pending_list);
  c->erasing_size -= c->sector_size;
  c->dirty_size += c->sector_size;
  jeb->dirty_size = c->sector_size;
  spin_unlock(&c->erase_completion_lock);
  return;
 }

 memset(instr, 0, sizeof(*instr));

 instr->mtd = c->mtd;
 instr->addr = jeb->offset;
 instr->len = c->sector_size;
 instr->callback = jffs2_erase_callback;
 instr->priv = (unsigned long)(&instr[1]);

 ((struct erase_priv_struct *)instr->priv)->jeb = jeb;
 ((struct erase_priv_struct *)instr->priv)->c = c;

 /* NAND , read out the fail counter, if possible */
 if (!jffs2_can_mark_obsolete(c))
  jffs2_nand_read_failcnt(c,jeb);

 ret = c->mtd->erase(c->mtd, instr);
 if (!ret)
  return;

 kfree(instr);
#endif /* __ECOS */

 if (ret == -ENOMEM || ret == -EAGAIN) {
  /* Erase failed immediately. Refile it on the list */
  D1(printk(KERN_DEBUG "Erase at 0x%08x failed: %d. Refiling on 
erase_pending_list\n", jeb->offset, ret));
  spin_lock(&c->erase_completion_lock);
  list_del(&jeb->list);
  list_add(&jeb->list, &c->erase_pending_list);
  c->erasing_size -= c->sector_size;
  c->dirty_size += c->sector_size;
  jeb->dirty_size = c->sector_size;
  spin_unlock(&c->erase_completion_lock);
  return;
 }
 if (ret == -EROFS)
  printk(KERN_WARNING "Erase at 0x%08x failed immediately: -EROFS. Is the 
sector locked?\n", jeb->offset);
 else
  printk(KERN_WARNING "Erase at 0x%08x failed immediately: errno %d\n", 
jeb->offset, ret);

 jffs2_erase_failed(c, jeb);
}

----- Original Message ----- 
From: "Thomas Gleixner" <tglx@linutronix.de>
To: "ahgu" <ahgu@ahgu.homeunix.com>
Cc: <linux-mtd@lists.infradead.org>
Sent: Thursday, August 25, 2005 9:44 AM
Subject: Re: simulate a bad NAND block cause kernel hang

> On Thu, 2005-08-25 at 09:27 -0400, ahgu wrote:
>> I forced the flash_erase function to fail. I expect the jffs2 will pick 
>> up
>> the return error and mark the block bad and put the bad block in a 
>> bad_block
>> list. But what I get is kernel failure:
>> I get similar error when I simulate a write error.
>> Am I doing the bad block simulation correctly? Is this a correct 
>> response?
>> What is supposed to happen when the NAND flash grow a bad block?
>
> JFFS2 should handle this.
>
> The oops trace is worthless, as it does not show the stack trace in
> human readable form (function names decoded)
>
> Make sure that CONFIG_KALLSYMS is set in your kernel .config file.
>
> Also information about kernel version and possibly applied MTD/JFFS2
> patches is missing.
>
>
> tglx
>
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: simulate a bad NAND block cause kernel hang
  2005-08-25 15:41   ` ahgu
@ 2005-08-25 15:47     ` Thomas Gleixner
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2005-08-25 15:47 UTC (permalink / raw)
  To: ahgu; +Cc: linux-mtd

On Thu, 2005-08-25 at 11:41 -0400, ahgu wrote:
> I am using 2.4.18 kernel.

Please read
http://www.linux-mtd.infradead.org/source.html#kernelversions


> jffs2_erase_failed(c, jeb);is the last function before the fault condition 
> get triggered.
> Where can I find the diff between 2.4.18 and 2.4.20?

diff -urN linux-2.4.18 linux-2.4.20 >veryvery_broken_vs_very_broken.diff


2.4. kernels have no working NAND support and will never get it.

tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-08-25 15:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-25 13:27 simulate a bad NAND block cause kernel hang ahgu
2005-08-25 13:44 ` Thomas Gleixner
2005-08-25 15:41   ` ahgu
2005-08-25 15:47     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox