public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* Help needed with corruption detection/ubifs_wbuf_sync_nolock
@ 2012-06-25 13:58 Reginald Perrin
  2012-06-25 14:48 ` Reginald Perrin
  2012-06-27 14:22 ` Artem Bityutskiy
  0 siblings, 2 replies; 5+ messages in thread
From: Reginald Perrin @ 2012-06-25 13:58 UTC (permalink / raw)
  To: MTD Mailing List

Hi folks,

I'm tracking down a corruption issue, and trying to trace back where LEB's are getting randomly corrupted in our system (a very rare event, but it can happen).  I'm focusing on ubifs/io.c, and trying to validate data before we send to ubi_leb_write().

Can somebody please clarify something for me on ubifs_wbuf_sync_nolock()?  I'm trying to validate that the data we're writing hasn't been corrupted.  I thought I could just check that the node-type was valid, such as:

    if ( ((struct ubifs_ch *)wbuf->buf)->node_type > UBIFS_ORPH_NODE ) {

        // ABORT WRITE
    }

    err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, wbuf->offs,


This *seems* to work, but during our application start, it's actually triggering before we are reaching the troubled code, in code that really shouldn't have any issues.  I think this means I don't understand how wbuf relates to actual LEB nodes.

Can anybody help me understand how to check to see if the LEB is corrupted before we write?  I'm trying to get close enough to the corruption to get a backtrace.

TIA
RP

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help needed with corruption detection/ubifs_wbuf_sync_nolock
  2012-06-25 13:58 Help needed with corruption detection/ubifs_wbuf_sync_nolock Reginald Perrin
@ 2012-06-25 14:48 ` Reginald Perrin
  2012-06-27 14:23   ` Artem Bityutskiy
  2012-06-27 14:22 ` Artem Bityutskiy
  1 sibling, 1 reply; 5+ messages in thread
From: Reginald Perrin @ 2012-06-25 14:48 UTC (permalink / raw)
  To: MTD Mailing List

Update:  

I'm now doing this, and it seems to work.  Is this correct?

if ( wbuf->dtype != UBI_UNKNOWN && ((struct ubifs_ch *)wbuf->buf)->node_type > UBIFS_ORPH_NODE ) 

...

RP


----- Original Message -----
> From: Reginald Perrin <reggyperrin@yahoo.com>
> To: MTD Mailing List <linux-mtd@lists.infradead.org>
> Cc: 
> Sent: Monday, June 25, 2012 9:58 AM
> Subject: Help needed with corruption detection/ubifs_wbuf_sync_nolock
> 
> Hi folks,
> 
> I'm tracking down a corruption issue, and trying to trace back where 
> LEB's are getting randomly corrupted in our system (a very rare event, but 
> it can happen).  I'm focusing on ubifs/io.c, and trying to validate data 
> before we send to ubi_leb_write().
> 
> Can somebody please clarify something for me on ubifs_wbuf_sync_nolock()? 
>  I'm trying to validate that the data we're writing hasn't been 
> corrupted.  I thought I could just check that the node-type was valid, such as:
> 
>     if ( ((struct ubifs_ch *)wbuf->buf)->node_type > UBIFS_ORPH_NODE ) 
> {
> 
>         // ABORT WRITE
>     }
> 
>     err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, wbuf->offs,
> 
> 
> This *seems* to work, but during our application start, it's actually 
> triggering before we are reaching the troubled code, in code that really 
> shouldn't have any issues.  I think this means I don't understand how 
> wbuf relates to actual LEB nodes.
> 
> Can anybody help me understand how to check to see if the LEB is corrupted 
> before we write?  I'm trying to get close enough to the corruption to get a 
> backtrace.
> 
> TIA
> RP
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help needed with corruption detection/ubifs_wbuf_sync_nolock
  2012-06-25 13:58 Help needed with corruption detection/ubifs_wbuf_sync_nolock Reginald Perrin
  2012-06-25 14:48 ` Reginald Perrin
@ 2012-06-27 14:22 ` Artem Bityutskiy
  2012-06-29 13:40   ` Reginald Perrin
  1 sibling, 1 reply; 5+ messages in thread
From: Artem Bityutskiy @ 2012-06-27 14:22 UTC (permalink / raw)
  To: Reginald Perrin; +Cc: MTD Mailing List

[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]

Hi,

On Mon, 2012-06-25 at 06:58 -0700, Reginald Perrin wrote:
> I'm tracking down a corruption issue, and trying to trace back where
> LEB's are getting randomly corrupted in our system (a very rare event,
> but it can happen).  I'm focusing on ubifs/io.c, and trying to
> validate data before we send to ubi_leb_write().

You are not using MLC NAND, right? Did you validate your flash using MTD
tests?

> Can somebody please clarify something for me
> on ubifs_wbuf_sync_nolock()?  I'm trying to validate that the data
> we're writing hasn't been corrupted.  I thought I could just check
> that the node-type was valid, such as:
> 
>     if ( ((struct ubifs_ch *)wbuf->buf)->node_type > UBIFS_ORPH_NODE )
> {
> 
>         // ABORT WRITE
>     }
> 
>     err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, wbuf->offs,
> 
The above code assumes the contents of the write-buffer always starts
with an UBIFS node, which is not true. 'wbuf->buf[0]' may be the middle
or the end of a node. If you want to add a check, you need to write a
helper function which _scans_ the write-buffer and searches for
UBIFS_NODE_MAGIC, and _then_ may be the start of a node. Then you go
check the common header CRC. And the write-buffer may contain more than
one node, so you need to iterate. And you need to take into account the
case when this is the end of the write-buffer and the common header does
not fit.
> 
> Can anybody help me understand how to check to see if the LEB is
> corrupted before we write?  I'm trying to get close enough to the
> corruption to get a backtrace.

Corrupted how - the CRC is corrupted? You can try to scan the LEB in the
previoius LEB using 'ubifs_scan()' in before switching to the new one in
the 'ubifs_wbuf_seek_nolock()' function, I guess.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help needed with corruption detection/ubifs_wbuf_sync_nolock
  2012-06-25 14:48 ` Reginald Perrin
@ 2012-06-27 14:23   ` Artem Bityutskiy
  0 siblings, 0 replies; 5+ messages in thread
From: Artem Bityutskiy @ 2012-06-27 14:23 UTC (permalink / raw)
  To: Reginald Perrin; +Cc: MTD Mailing List

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

On Mon, 2012-06-25 at 07:48 -0700, Reginald Perrin wrote:
> Update:  
> 
> I'm now doing this, and it seems to work.  Is this correct?
> 
> if ( wbuf->dtype != UBI_UNKNOWN && ((struct ubifs_ch *)wbuf->buf)->node_type > UBIFS_ORPH_NODE ) 

I do not think so, wbuf->buf points to a random byte, not to the
beginning of a node.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help needed with corruption detection/ubifs_wbuf_sync_nolock
  2012-06-27 14:22 ` Artem Bityutskiy
@ 2012-06-29 13:40   ` Reginald Perrin
  0 siblings, 0 replies; 5+ messages in thread
From: Reginald Perrin @ 2012-06-29 13:40 UTC (permalink / raw)
  To: dedekind1@gmail.com; +Cc: MTD Mailing List

Artem,

Our analysis shows that LEB's are corrupted due to a software bug we are trying to find.  It seems like data is shifted in the buffer (like memcpy(buf,buf+2,node_size)), which we think is related to our power management.

My goal was to try and trap on the bad write, and debug whatever is corrupting things before it happens.

You've convinced me that using wbuf probably isn't the right tactic.  I put code into all the other ubl_leb_write() calls, and it doesn't seem to be in those.  

Still trying to find the best way to trap on it before it happens.

Thanks



----- Original Message -----
> From: Artem Bityutskiy <dedekind1@gmail.com>
> To: Reginald Perrin <reggyperrin@yahoo.com>
> Cc: MTD Mailing List <linux-mtd@lists.infradead.org>
> Sent: Wednesday, June 27, 2012 10:22 AM
> Subject: Re: Help needed with corruption detection/ubifs_wbuf_sync_nolock
> 
> Hi,
> 
> On Mon, 2012-06-25 at 06:58 -0700, Reginald Perrin wrote:
>>  I'm tracking down a corruption issue, and trying to trace back where
>>  LEB's are getting randomly corrupted in our system (a very rare event,
>>  but it can happen).  I'm focusing on ubifs/io.c, and trying to
>>  validate data before we send to ubi_leb_write().
> 
> You are not using MLC NAND, right? Did you validate your flash using MTD
> tests?
> 
>>  Can somebody please clarify something for me
>>  on ubifs_wbuf_sync_nolock()?  I'm trying to validate that the data
>>  we're writing hasn't been corrupted.  I thought I could just check
>>  that the node-type was valid, such as:
>> 
>>      if ( ((struct ubifs_ch *)wbuf->buf)->node_type > 
> UBIFS_ORPH_NODE )
>>  {
>> 
>>          // ABORT WRITE
>>      }
>> 
>>      err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, 
> wbuf->offs,
>> 
> The above code assumes the contents of the write-buffer always starts
> with an UBIFS node, which is not true. 'wbuf->buf[0]' may be the 
> middle
> or the end of a node. If you want to add a check, you need to write a
> helper function which _scans_ the write-buffer and searches for
> UBIFS_NODE_MAGIC, and _then_ may be the start of a node. Then you go
> check the common header CRC. And the write-buffer may contain more than
> one node, so you need to iterate. And you need to take into account the
> case when this is the end of the write-buffer and the common header does
> not fit.
>> 
>>  Can anybody help me understand how to check to see if the LEB is
>>  corrupted before we write?  I'm trying to get close enough to the
>>  corruption to get a backtrace.
> 
> Corrupted how - the CRC is corrupted? You can try to scan the LEB in the
> previoius LEB using 'ubifs_scan()' in before switching to the new one in
> the 'ubifs_wbuf_seek_nolock()' function, I guess.
> 
> -- 
> Best Regards,
> Artem Bityutskiy
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-29 13:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-25 13:58 Help needed with corruption detection/ubifs_wbuf_sync_nolock Reginald Perrin
2012-06-25 14:48 ` Reginald Perrin
2012-06-27 14:23   ` Artem Bityutskiy
2012-06-27 14:22 ` Artem Bityutskiy
2012-06-29 13:40   ` Reginald Perrin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox