[BUG] Nand support broken with v2.6.36-rc1

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [BUG] Nand support broken with v2.6.36-rc1
@ 2010-08-17  8:52 Michael Guntsche
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Guntsche @ 2010-08-17  8:52 UTC (permalink / raw)
  To: Brian Norris, linux-mtd, linux-kernel

Hello,

First of all, please CC me on any replies since I am not subscribed to one of the MLs.

I just tried compiling 2.6.36-rc1 for one of my embedded boards here and
noticed that nand support was apparently broken with -rc1. 
In the syslog I see:

[  231.039693] rbppc_nand_probe: MikroTik RouterBOARD 600 series NAND
driver, version 0.0.2
[  231.048103] NAND device: Manufacturer ID: 0xad, Chip ID: 0x76 (Hynix
NAND 64MiB 3,3V 8-bit)

This is the board I am using and the nand driver worked from 2.6.27 up
to 2.6.35 with no modifications

[  231.056590] Scanning device for bad blocks
[  231.063908] Bad eraseblock 56 at 0x0000000e0000
[  231.068589] Bad eraseblock 57 at 0x0000000e4000
[  231.073194] Bad eraseblock 58 at 0x0000000e8000
[  231.077870] Bad eraseblock 59 at 0x0000000ec000
[  231.082482] Bad eraseblock 60 at 0x0000000f0000
[  231.087146] Bad eraseblock 61 at 0x0000000f4000
......
This continues for a long time

I know that this device has two badblocks. But with the new code almost
all blocks are marked as bad. 
I tracked this down to commit:

c7b28e25cb9beb943aead770ff14551b55fa8c79
 mtd: nand: refactor BB marker detection

Reverting the code under drivers/mtd/nand to an earlier commit makes it
work again.
The only thing that might be special with the nand driver that is being
used is that a different oob layout is being used.

static struct nand_ecclayout rbppc_nand_oob_16 = {
  .eccbytes = 6,
  .eccpos = { 8, 9, 10, 13, 14, 15 },
  .oobavail = 9,
  .oobfree = { { 0, 4 }, { 6, 2 }, { 11, 2 }, { 4, 1 } }
};

I am not sure if a driver change is needed, but seeing that the commit did not touch any specific nand driver code I do not think that this is the problem here.

Maybe someone more knowledgeable than me can take a look at it.

Kind regards,
Michael Guntsche

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [BUG] Nand support broken with v2.6.36-rc1
@ 2010-08-17 11:36 Michael Guntsche
  2010-08-17 17:00 ` Brian Norris
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Guntsche @ 2010-08-17 11:36 UTC (permalink / raw)
  To: Brian Norris, linux-mtd, linux-kernel

Hello again,

Answering my own question here. Yes indeed with the new code a driver
change seems to be needed. The badblock pattern used with this nand is no longer
supported with the stock kernel code. I added this to the nand driver
itself.

static uint8_t scan_ff_pattern[] = { 0xff, 0xff };
static struct nand_bbt_descr rbppc_nand_smallpage = {
  .options = NAND_BBT_SCAN2NDPAGE,
  .offs = NAND_SMALL_BADBLOCK_POS,
  .len = 1,
  .pattern = scan_ff_pattern
};

and the driver is working again. But shouldn't this be supported by the stock level code as well?

Kind regards,
Michael Guntsche

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 11:36 [BUG] Nand support broken with v2.6.36-rc1 Michael Guntsche
@ 2010-08-17 17:00 ` Brian Norris
  2010-08-17 17:47   ` Michael Guntsche
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Brian Norris @ 2010-08-17 17:00 UTC (permalink / raw)
  To: Michael Guntsche
  Cc: linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org,
	Brian Norris

Hello,

On 08/17/2010 01:52 AM, Michael Guntsche wrote:
 > The only thing that might be special with the nand driver that is being
 > used is that a different oob layout is being used.
 >
 > static struct nand_ecclayout rbppc_nand_oob_16 = {
 >    .eccbytes = 6,
 >    .eccpos = { 8, 9, 10, 13, 14, 15 },
 >    .oobavail = 9,
 >    .oobfree = { { 0, 4 }, { 6, 2 }, { 11, 2 }, { 4, 1 } }
 > };

On 08/17/2010 04:36 AM, Michael Guntsche wrote:
> I added this to the nand driver itself.
>
> static uint8_t scan_ff_pattern[] = { 0xff, 0xff };
> static struct nand_bbt_descr rbppc_nand_smallpage = {
>    .options = NAND_BBT_SCAN2NDPAGE,
>    .offs = NAND_SMALL_BADBLOCK_POS,
>    .len = 1,
>    .pattern = scan_ff_pattern
> };
>
> and the driver is working again. But shouldn't this be supported by the stock level code as well?

Why yes, it should! Somebody (probably me) goofed. Your nand_ecclayout 
is conflicting with the kernel's choice of bad block position. Recent 
changes must have affected which position is chosen automatically by the 
kernel.

One of the following two cases is likely the problem:
(1) Your chip is supposed to use offset 0, not 5, for the BBM (i.e., 
NAND_LARGE_BADBLOCK_POS, not NAND_SMALL_BADBLOCK_POS), and so your 
ecclayout should not be leaving byte 0 in the "oobfree" array (a design 
flaw since you first began using this chip)
(2) I made the commit that you mentioned 
(c7b28e25cb9beb943aead770ff14551b55fa8c79) too restrictive in allowing 
chips to use NAND_SMALL_BADBLOCK_POS.

Option 2 is likely the case, and in fact, I realized a stupid mistake I 
made in refactoring the detection here.

I have been studying data from hundreds of flash chips to find where the 
factory-determined markers should be stored. Unfortunately, I can't 
cover all of them, and so your Hynix chip is likely one that was 
overlooked. Could you send the full NAND ID string (8 bytes, not just 
the manufacturer and chip ID), an exact part number for the flash, and a 
datasheet? Any one of those could help (the datasheet being the most 
important), but whatever you can provide is helpful. More data on your 
chip would allow me to determine the problem for sure; I will send a 
patch ASAP once I get your information.

Sorry for the trouble!

On another note, it may be intelligent to have the kernel-specific 
systems check for such a conflict between bad-block markers and ECC 
layout. If a position needed by the bad-block marker is listed in 
"oobfree" or "eccpos" then we have a problem. Sound like a good idea 
anybody? If so, what would be the best approach:
* print an error and quit detection
* try to modify the ecclayout, bbm info or both
* try to modify, and fall-back to error message and quit if necessary

Thanks,
Brian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 17:00 ` Brian Norris
@ 2010-08-17 17:47   ` Michael Guntsche
  2010-08-17 18:49     ` Brian Norris
  2010-08-17 20:59   ` Abdoulaye Walsimou GAYE
  2010-08-18 18:25   ` [PATCH] mtd: nand: Fix regression in BBM detection Brian Norris
  2 siblings, 1 reply; 13+ messages in thread
From: Michael Guntsche @ 2010-08-17 17:47 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org

On 17 Aug 10 10:00, Brian Norris wrote:
> One of the following two cases is likely the problem:
> (1) Your chip is supposed to use offset 0, not 5, for the BBM (i.e.,
> NAND_LARGE_BADBLOCK_POS, not NAND_SMALL_BADBLOCK_POS), and so your
> ecclayout should not be leaving byte 0 in the "oobfree" array (a
> design flaw since you first began using this chip)

First, I am just an end user so I have no access to the datasheets etc. I
just got the code from the board manufactrurer (2.6.27) and forward
port it to recent kernels.

The reason I am using a specific layout is because the bootloader on
this board expects it this way. It formats it this way in the beginning
and I cannot change that. 


> Could you send the full NAND ID string (8 bytes, not
If you can tell me where I can find that I'll be more than happy to send
it to you. But as I said I think the reason for this is this special
bootloader.

Please tell me, if you need more informations.


Kind regards,
Michael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 17:47   ` Michael Guntsche
@ 2010-08-17 18:49     ` Brian Norris
  2010-08-17 20:06       ` Michael Guntsche
  0 siblings, 1 reply; 13+ messages in thread
From: Brian Norris @ 2010-08-17 18:49 UTC (permalink / raw)
  To: Michael Guntsche
  Cc: linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org

Hi,

On 08/17/2010 10:47 AM, Michael Guntsche wrote:
> First, I am just an end user so I have no access to the datasheets etc. I
> just got the code from the board manufactrurer (2.6.27) and forward
> port it to recent kernels.

I see. No problem. We'll work with what you can do:

If you can simply find the NAND chip part number (it would be printed on 
the chip itself), that will be helpful.

Also, there are a few things you can do under a working kernel (e.g., 
2.6.35?).

First, have you ever used any of the mtdutils? In particular, running 
the command "mtdinfo -a" and sending the output is helpful if you have 
the utility installed on your board.

Second, since you are doing the forward-porting, I assume you can do a 
little bit of coding/patching. To print the whole ID string, you can add 
a simple "printk" line to the code in "drivers/mtd/nand/nand_base.c". 
For example, on the 2.6.35 kernel, you can just apply the patch below. 
Then, on boot, the ID string will print (or at least show up in "dmesg" 
or "syslog"). That info can help a little.

> The reason I am using a specific layout is because the bootloader on
> this board expects it this way. It formats it this way in the beginning
> and I cannot change that.

Well, if the new commit that broke your board is getting the block 
marker *correct* according to the factory specifications, then this 
particular problem is your setup's problem; perhaps there could be a 
workaround, like I mentioned about checking for these kind of conflicts. 
However, I'm still hypothesizing that I simply got the detection wrong, 
and so my fix will solve your problem.

Thanks,
Brian

---
  drivers/mtd/nand/nand_base.c |    4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 4a7b864..d2d1fab 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -2809,8 +2809,10 @@ static struct nand_flash_dev 
*nand_get_flash_type(struct mtd_info *mtd,

  	/* Read entire ID string */

-	for (i = 0; i < 8; i++)
+	for (i = 0; i < 8; i++) {
  		id_data[i] = chip->read_byte(mtd);
+		printk(KERN_INFO "ID byte %i: %#x\n", i, id_data[i]);
+	}

  	if (id_data[0] != *maf_id || id_data[1] != dev_id) {
  		printk(KERN_INFO "%s: second ID read did not match "
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 18:49     ` Brian Norris
@ 2010-08-17 20:06       ` Michael Guntsche
  2010-08-17 21:42         ` Brian Norris
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Guntsche @ 2010-08-17 20:06 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org

On 17 Aug 10 11:49, Brian Norris wrote:
> First, have you ever used any of the mtdutils? In particular,
> running the command "mtdinfo -a" and sending the output is helpful
> if you have the utility installed on your board.
hmm mtdinfo tries to open /sys/class/mtd/mtd0/dev which das not exist
the device is working ok as block device on the other hand so let's try
the next thing.

Output booting with a patched .36-rc1

[    0.279217] rbppc_nand_probe: MikroTik RouterBOARD 600 series NAND driver, version 0.0.2
[    0.287535] ID byte 0: 0xad
[    0.290373] ID byte 1: 0x76
[    0.293185] ID byte 2: 0xad
[    0.295985] ID byte 3: 0x76
[    0.298798] ID byte 4: 0xad
[    0.301610] ID byte 5: 0x76
[    0.304423] ID byte 6: 0xad
[    0.307223] ID byte 7: 0x76
[    0.310046] NAND device: Manufacturer ID: 0xad, Chip ID: 0x76 (Hynix NAND 64MiB 3,3V 8-bit)

Hope this helps...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 17:00 ` Brian Norris
  2010-08-17 17:47   ` Michael Guntsche
@ 2010-08-17 20:59   ` Abdoulaye Walsimou GAYE
  2010-08-17 22:07     ` Brian Norris
  2010-08-18 18:25   ` [PATCH] mtd: nand: Fix regression in BBM detection Brian Norris
  2 siblings, 1 reply; 13+ messages in thread
From: Abdoulaye Walsimou GAYE @ 2010-08-17 20:59 UTC (permalink / raw)
  To: Brian Norris, linux-mtd

On 08/17/2010 07:00 PM, Brian Norris wrote:
> Hello,
>
> On 08/17/2010 01:52 AM, Michael Guntsche wrote:
> > The only thing that might be special with the nand driver that is being
> > used is that a different oob layout is being used.
> >
> > static struct nand_ecclayout rbppc_nand_oob_16 = {
> >    .eccbytes = 6,
> >    .eccpos = { 8, 9, 10, 13, 14, 15 },
> >    .oobavail = 9,
> >    .oobfree = { { 0, 4 }, { 6, 2 }, { 11, 2 }, { 4, 1 } }
> > };
>
> On 08/17/2010 04:36 AM, Michael Guntsche wrote:
>> I added this to the nand driver itself.
>>
>> static uint8_t scan_ff_pattern[] = { 0xff, 0xff };
>> static struct nand_bbt_descr rbppc_nand_smallpage = {
>>    .options = NAND_BBT_SCAN2NDPAGE,
>>    .offs = NAND_SMALL_BADBLOCK_POS,
>>    .len = 1,
>>    .pattern = scan_ff_pattern
>> };
>>
>> and the driver is working again. But shouldn't this be supported by 
>> the stock level code as well?
>
> Why yes, it should! Somebody (probably me) goofed. Your nand_ecclayout 
> is conflicting with the kernel's choice of bad block position. Recent 
> changes must have affected which position is chosen automatically by 
> the kernel.
>
> One of the following two cases is likely the problem:
> (1) Your chip is supposed to use offset 0, not 5, for the BBM (i.e., 
> NAND_LARGE_BADBLOCK_POS, not NAND_SMALL_BADBLOCK_POS), and so your 
> ecclayout should not be leaving byte 0 in the "oobfree" array (a 
> design flaw since you first began using this chip)
> (2) I made the commit that you mentioned 
> (c7b28e25cb9beb943aead770ff14551b55fa8c79) too restrictive in allowing 
> chips to use NAND_SMALL_BADBLOCK_POS.
>
> Option 2 is likely the case, and in fact, I realized a stupid mistake 
> I made in refactoring the detection here.
>
> I have been studying data from hundreds of flash chips to find where 
> the factory-determined markers should be stored. Unfortunately, I 
> can't cover all of them, and so your Hynix chip is likely one that was 
> overlooked. Could you send the full NAND ID string (8 bytes, not just 
> the manufacturer and chip ID), an exact part number for the flash, and 
> a datasheet? Any one of those could help (the datasheet being the most 
> important), but whatever you can provide is helpful. More data on your 
> chip would allow me to determine the problem for sure; I will send a 
> patch ASAP once I get your information.
>
> Sorry for the trouble!
>
> On another note, it may be intelligent to have the kernel-specific 
> systems check for such a conflict between bad-block markers and ECC 
> layout. If a position needed by the bad-block marker is listed in 
> "oobfree" or "eccpos" then we have a problem. Sound like a good idea 
> anybody? If so, what would be the best approach:
> * print an error and quit detection
> * try to modify the ecclayout, bbm info or both
> * try to modify, and fall-back to error message and quit if necessary
>
> Thanks,
> Brian


Hello,
I don't know if it's the same issue reported here (sorry if not), but 
when I use flash_eraseall
to erase a partition of a NAND flash[1] with linux-2.6.33.5 running on 
the target here is the output:

# flash_eraseall  /dev/mtd3
Erasing 16 Kibyte @ 1270000 -- 31 % complete.
Skipping bad block at 0x01274000
Erasing 16 Kibyte @ 3aa0000 -- 100 % complete.

And if it is latest linus tree (v2.6.36-rc1):

# flash_eraseall  /dev/mtd3
Erasing 16 Kibyte @ 1274000 -- 31 % complete.
flash_eraseall: /dev/mtd3: MTD Erase failure: Input/output error
Erasing 16 Kibyte @ 3aa0000 -- 100 % complete.

Now 0x01274000 is not recognized as bad block.
I use flash_eraseall from latest mtd-utils git tree 
(07a87aa599a8fc32e938d9987bd2b59eebcfcb76)

Do you think it's the same issue?

Thanks,
AWG

[1]
S3C24XX NAND Driver, (c) 2004 Simtec Electronics
s3c24xx-nand s3c2440-nand: Tacls=1, 9ns Twrph0=3 29ns, Twrph1=2 19ns
s3c24xx-nand s3c2440-nand: NAND hardware ECC
NAND device: Manufacturer ID: 0xec, Chip ID: 0x76 (Samsung NAND 64MiB 
3,3V 8-bit)
Creating 4 MTD partitions on "NAND 64MiB 3,3V 8-bit":
0x000000000000-0x000000040000 : "u-boot"
0x000000040000-0x000000060000 : "u-boot-env"
0x000000060000-0x000000560000 : "kernel"
0x000000560000-0x000004000000 : "root"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 20:06       ` Michael Guntsche
@ 2010-08-17 21:42         ` Brian Norris
  2010-08-18  5:53           ` Michael Guntsche
  0 siblings, 1 reply; 13+ messages in thread
From: Brian Norris @ 2010-08-17 21:42 UTC (permalink / raw)
  To: Michael Guntsche
  Cc: linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org

On 08/17/2010 01:05 PM, Michael Guntsche wrote:
> On 17 Aug 10 11:49, Brian Norris wrote:
>> First, have you ever used any of the mtdutils? In particular,
>> running the command "mtdinfo -a" and sending the output is helpful
>> if you have the utility installed on your board.
> hmm mtdinfo tries to open /sys/class/mtd/mtd0/dev which das not exist
> the device is working ok as block device on the other hand so let's try
> the next thing.

I'm not an expert on the workings of mtdutils, so I don't know
why your device does not have the necessary sysfs entries. Perhaps
weird hardware features or a strange, incorrect driver (I can't work
on your board specific driver for you). I suppose it's OK to ignore
this problem for the moment.

> Output booting with a patched .36-rc1
> 
> [    0.279217] rbppc_nand_probe: MikroTik RouterBOARD 600 series NAND driver, version 0.0.2
> [    0.287535] ID byte 0: 0xad
> [    0.290373] ID byte 1: 0x76
> [    0.293185] ID byte 2: 0xad
> [    0.295985] ID byte 3: 0x76
> [    0.298798] ID byte 4: 0xad
> [    0.301610] ID byte 5: 0x76
> [    0.304423] ID byte 6: 0xad
> [    0.307223] ID byte 7: 0x76
> [    0.310046] NAND device: Manufacturer ID: 0xad, Chip ID: 0x76 (Hynix NAND 64MiB 3,3V 8-bit)
> 
> Hope this helps...
> 

Honestly, that doesn't really help :) I guess the device is old
enough it does not have an extended ID. In that case, I will need
the part number to be able to diagnose for sure. Can you find the
physical chip on the board and give me whatever labeling is on it?

In place of that, though, you can just try this patch on 2.6.36-rc1.
I believe it should satisfy the intention of my previous (faulty)
commit while reverting the regression behavior. If this works OK, I
will submit it to be included in the mainline kernel.

Thanks for taking the time to debug this.

Brian

---
 drivers/mtd/nand/nand_base.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index a3c7473..a22ed7b 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -2934,14 +2934,10 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd,
 		chip->chip_shift = ffs((unsigned)(chip->chipsize >> 32)) + 32 - 1;
 
 	/* Set the bad block position */
-	if (!(busw & NAND_BUSWIDTH_16) && (*maf_id == NAND_MFR_STMICRO ||
-				(*maf_id == NAND_MFR_SAMSUNG &&
-				 mtd->writesize == 512) ||
-				*maf_id == NAND_MFR_AMD))
-		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
-	else
+	if (mtd->writesize > 512 || (busw & NAND_BUSWIDTH_16))
 		chip->badblockpos = NAND_LARGE_BADBLOCK_POS;
-
+	else
+		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
 
 	/* Get chip options, preserve non chip based options */
 	chip->options &= ~NAND_CHIPOPTIONS_MSK;
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 20:59   ` Abdoulaye Walsimou GAYE
@ 2010-08-17 22:07     ` Brian Norris
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Norris @ 2010-08-17 22:07 UTC (permalink / raw)
  To: Abdoulaye Walsimou GAYE; +Cc: linux-mtd@lists.infradead.org, Brian Norris

On 08/17/2010 01:59 PM, Abdoulaye Walsimou GAYE wrote:
> Hello,
> I don't know if it's the same issue reported here (sorry if not), but
> when I use flash_eraseall
> to erase a partition of a NAND flash[1] with linux-2.6.33.5 running on
> the target here is the output:
> 
> # flash_eraseall  /dev/mtd3
> Erasing 16 Kibyte @ 1270000 -- 31 % complete.
> Skipping bad block at 0x01274000
> Erasing 16 Kibyte @ 3aa0000 -- 100 % complete.
> 
> And if it is latest linus tree (v2.6.36-rc1):
> 
> # flash_eraseall  /dev/mtd3
> Erasing 16 Kibyte @ 1274000 -- 31 % complete.
> flash_eraseall: /dev/mtd3: MTD Erase failure: Input/output error
> Erasing 16 Kibyte @ 3aa0000 -- 100 % complete.
> 
> Now 0x01274000 is not recognized as bad block.
> I use flash_eraseall from latest mtd-utils git tree
> (07a87aa599a8fc32e938d9987bd2b59eebcfcb76)
> 
> Do you think it's the same issue?
> 
> Thanks,
> AWG
> 
> [1]
> S3C24XX NAND Driver, (c) 2004 Simtec Electronics
> s3c24xx-nand s3c2440-nand: Tacls=1, 9ns Twrph0=3 29ns, Twrph1=2 19ns
> s3c24xx-nand s3c2440-nand: NAND hardware ECC
> NAND device: Manufacturer ID: 0xec, Chip ID: 0x76 (Samsung NAND 64MiB
> 3,3V 8-bit)
> Creating 4 MTD partitions on "NAND 64MiB 3,3V 8-bit":
> 0x000000000000-0x000000040000 : "u-boot"
> 0x000000040000-0x000000060000 : "u-boot-env"
> 0x000000060000-0x000000560000 : "kernel"
> 0x000000560000-0x000004000000 : "root"

This is not the *same* problem (you have a Samsung, not Hynix, part);
however, it may be related, esp. considering they have the same
device ID (0x76). Can you isolate the problem down to that same
commit?
c7b28e25cb9beb943aead770ff14551b55fa8c79
mtd: nand: refactor BB marker detection

That commit may not the problem, your problem is also likely related to:
58373ff0afff4cc8ac40608872995f4d87eb72ec

If commit c7b28e... is the problem, then perhaps you can try the
following patch (applied to 2.6.36-rc1) that I recommended to Michael.
I can't see quite *why* this patch would fix this issue for your Samsung
part, whereas, it makes perfect sense for the Hynix part.

Let me know how that goes.

Thanks,
Brian

---
 drivers/mtd/nand/nand_base.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index a3c7473..a22ed7b 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -2934,14 +2934,10 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd,
 		chip->chip_shift = ffs((unsigned)(chip->chipsize >> 32)) + 32 - 1;
 
 	/* Set the bad block position */
-	if (!(busw & NAND_BUSWIDTH_16) && (*maf_id == NAND_MFR_STMICRO ||
-				(*maf_id == NAND_MFR_SAMSUNG &&
-				 mtd->writesize == 512) ||
-				*maf_id == NAND_MFR_AMD))
-		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
-	else
+	if (mtd->writesize > 512 || (busw & NAND_BUSWIDTH_16))
 		chip->badblockpos = NAND_LARGE_BADBLOCK_POS;
-
+	else
+		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
 
 	/* Get chip options, preserve non chip based options */
 	chip->options &= ~NAND_CHIPOPTIONS_MSK;
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG] Nand support broken with v2.6.36-rc1
  2010-08-17 21:42         ` Brian Norris
@ 2010-08-18  5:53           ` Michael Guntsche
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Guntsche @ 2010-08-18  5:53 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org

On 17 Aug 10 14:42, Brian Norris wrote:
> In place of that, though, you can just try this patch on 2.6.36-rc1.
> I believe it should satisfy the intention of my previous (faulty)
> commit while reverting the regression behavior. If this works OK, I
> will submit it to be included in the mainline kernel.

Hi Brian,

Applying this patch fixes the problem for me. I removed my workaround
and just tried your patch and was able to mount both mtdblock devices.

Thank you for fixing this,
Michael Guntsche

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] mtd: nand: Fix regression in BBM detection
  2010-08-17 17:00 ` Brian Norris
  2010-08-17 17:47   ` Michael Guntsche
  2010-08-17 20:59   ` Abdoulaye Walsimou GAYE
@ 2010-08-18 18:25   ` Brian Norris
  2010-08-18 19:30     ` Abdoulaye Walsimou GAYE
  2 siblings, 1 reply; 13+ messages in thread
From: Brian Norris @ 2010-08-18 18:25 UTC (permalink / raw)
  To: linux-mtd
  Cc: Artem Bityutskiy, Linux Kernel, awg, mike, David Woodhouse,
	Brian Norris

Commit c7b28e25cb9beb943aead770ff14551b55fa8c79 caused a regression
in detection of factory-set bad block markers, especially for certain
small-page NAND. This fix removes some unneeded constraints on using
NAND_SMALL_BADBLOCK_POS, making the detection code more correct.

This regression can be seen, for example, in Hynix HY27US081G1M and
similar.

Signed-off-by: Brian Norris <norris@broadcom.com>
---
 drivers/mtd/nand/nand_base.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index a3c7473..a22ed7b 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -2934,14 +2934,10 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd,
 		chip->chip_shift = ffs((unsigned)(chip->chipsize >> 32)) + 32 - 1;
 
 	/* Set the bad block position */
-	if (!(busw & NAND_BUSWIDTH_16) && (*maf_id == NAND_MFR_STMICRO ||
-				(*maf_id == NAND_MFR_SAMSUNG &&
-				 mtd->writesize == 512) ||
-				*maf_id == NAND_MFR_AMD))
-		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
-	else
+	if (mtd->writesize > 512 || (busw & NAND_BUSWIDTH_16))
 		chip->badblockpos = NAND_LARGE_BADBLOCK_POS;
-
+	else
+		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
 
 	/* Get chip options, preserve non chip based options */
 	chip->options &= ~NAND_CHIPOPTIONS_MSK;
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mtd: nand: Fix regression in BBM detection
  2010-08-18 18:25   ` [PATCH] mtd: nand: Fix regression in BBM detection Brian Norris
@ 2010-08-18 19:30     ` Abdoulaye Walsimou GAYE
  2010-08-19  0:04       ` Brian Norris
  0 siblings, 1 reply; 13+ messages in thread
From: Abdoulaye Walsimou GAYE @ 2010-08-18 19:30 UTC (permalink / raw)
  To: Brian Norris
  Cc: David Woodhouse, mike, linux-mtd, Linux Kernel, Artem Bityutskiy

On 08/18/2010 08:25 PM, Brian Norris wrote:
> Commit c7b28e25cb9beb943aead770ff14551b55fa8c79 caused a regression
> in detection of factory-set bad block markers, especially for certain
> small-page NAND. This fix removes some unneeded constraints on using
> NAND_SMALL_BADBLOCK_POS, making the detection code more correct.
>
> This regression can be seen, for example, in Hynix HY27US081G1M and
> similar.
>
> Signed-off-by: Brian Norris<norris@broadcom.com>
> ---
>   drivers/mtd/nand/nand_base.c |   10 +++-------
>   1 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index a3c7473..a22ed7b 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -2934,14 +2934,10 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd,
>   		chip->chip_shift = ffs((unsigned)(chip->chipsize>>  32)) + 32 - 1;
>
>   	/* Set the bad block position */
> -	if (!(busw&  NAND_BUSWIDTH_16)&&  (*maf_id == NAND_MFR_STMICRO ||
> -				(*maf_id == NAND_MFR_SAMSUNG&&
> -				 mtd->writesize == 512) ||
> -				*maf_id == NAND_MFR_AMD))
> -		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
> -	else
> +	if (mtd->writesize>  512 || (busw&  NAND_BUSWIDTH_16))
>   		chip->badblockpos = NAND_LARGE_BADBLOCK_POS;
> -
> +	else
> +		chip->badblockpos = NAND_SMALL_BADBLOCK_POS;
>
>   	/* Get chip options, preserve non chip based options */
>   	chip->options&= ~NAND_CHIPOPTIONS_MSK;
>    

Brian,
Sorry for the long delay!
I tested the above patch unfortunately it does not help in my case!
And when I go further and put a JFFS2 in that partition and boot the
board I have

(S3c2410 nand hardware ECC enable):
mtd->read(0x400 bytes from 0x1274000) returned ECC error
mtd->read(0x3c08 bytes from 0x12743f8) returned ECC error

(without S3c2410 nand hardware ECC enable):
uncorrectable error :
uncorrectable error :
uncorrectable error :
uncorrectable error :
mtd->read(0x400 bytes from 0x1274000) returned ECC error
uncorrectable error :
uncorrectable error :
uncorrectable error :
[...]
uncorrectable error :
uncorrectable error :
mtd->read(0x3c08 bytes from 0x12743f8) returned ECC error

Despite these errors I can actually use the board (no kernel panic)!
The part is Samsung K9F1208U0C - PCB0

Hope that helps!

Thanks,
AWG

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mtd: nand: Fix regression in BBM detection
  2010-08-18 19:30     ` Abdoulaye Walsimou GAYE
@ 2010-08-19  0:04       ` Brian Norris
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Norris @ 2010-08-19  0:04 UTC (permalink / raw)
  To: Abdoulaye Walsimou GAYE
  Cc: David Woodhouse, mike@it-loops.com, linux-mtd@lists.infradead.org,
	Linux Kernel, Artem Bityutskiy

On 08/18/2010 12:30 PM, Abdoulaye Walsimou GAYE wrote:
> Brian,
> Sorry for the long delay!
> I tested the above patch unfortunately it does not help in my case!

Understood. That makes sense. In fact, your problem is most likely *not* 
related to this commit. As I mentioned before, please try narrowing down 
what specifically caused this; if I read correctly, you jumped from 
2.6.33 to 2.6.36-rc1. There have been several important changes between 
those releases. Notably, this commit may be giving Samsung chips problems:
426c457a3216fac74e

This thread is covering a few problems with Samsung:
http://lists.infradead.org/pipermail/linux-mtd/2010-August/031590.html

> And when I go further and put a JFFS2 in that partition and boot the
> board I have
<snip>
> Despite these errors I can actually use the board (no kernel panic)!
> The part is Samsung K9F1208U0C - PCB0

Unless you really know what you're doing, I wouldn't be writing/erasing 
the flash if it's not detecting bad blocks properly.

Let me know if you have trouble with narrowing down to the problem commit.

Brian

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-08-19  0:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-17 11:36 [BUG] Nand support broken with v2.6.36-rc1 Michael Guntsche
2010-08-17 17:00 ` Brian Norris
2010-08-17 17:47   ` Michael Guntsche
2010-08-17 18:49     ` Brian Norris
2010-08-17 20:06       ` Michael Guntsche
2010-08-17 21:42         ` Brian Norris
2010-08-18  5:53           ` Michael Guntsche
2010-08-17 20:59   ` Abdoulaye Walsimou GAYE
2010-08-17 22:07     ` Brian Norris
2010-08-18 18:25   ` [PATCH] mtd: nand: Fix regression in BBM detection Brian Norris
2010-08-18 19:30     ` Abdoulaye Walsimou GAYE
2010-08-19  0:04       ` Brian Norris
  -- strict thread matches above, loose matches on Subject: below --
2010-08-17  8:52 [BUG] Nand support broken with v2.6.36-rc1 Michael Guntsche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).