Re: ubifs_decompress: cannot decompress ...

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

From: Artem Bityutskiy <dedekind1@gmail.com>
To: "Matthew L. Creech" <mlcreech@gmail.com>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>,
	MTD list <linux-mtd@lists.infradead.org>
Subject: Re: ubifs_decompress: cannot decompress ...
Date: Mon, 06 Jun 2011 12:58:36 +0300	[thread overview]
Message-ID: <1307354316.3112.19.camel@localhost> (raw)
In-Reply-To: <BANLkTimsanbjqD8ZqJ26wcp42iQWt+vFdw@mail.gmail.com>

On Thu, 2011-06-02 at 14:59 -0400, Matthew L. Creech wrote: 
> On Thu, Jun 2, 2011 at 12:30 AM, Matthew L. Creech <mlcreech@gmail.com> wrote:
> >
> > As for this specific error (ubifs_decompress): tomorrow I'll try to
> > gather & post additional log data for this device.  Thanks!
> >
> 
> Here is a console dump with more information enabled:
> 
> http://mcreech.com/work/ubifs-decompress-err.txt
> 
> I turned on mount & recovery debug messages, although it seems to
> mount & recover correctly, so presumably any info from past recovery
> actions is long gone.  The error actually occurs later on, when our
> main application accesses SQLite databases.
> 
> I dumped out the corrupted node data in 3 places:
> 
> 1. In ubifs_decompress(), I dumped the data buffer, prefixed with
> "compressed node" (redundant with #3, really)
> 
> 2. In read_block(), I page-align the starting offset & size, re-fetch
> the pertinent pages from flash, and dump those with corresponding OOB
> info, prefixed with "data" and "oob" respectively
> 
> 3. There's already a dbg_dump_node() call in read_block(), so I
> enabled that as well

So the corruption starts exactly at the NAND page boundary. This makes
me believe that the reason is most probably power cut recovery. But you
say your client ensures there were none...

Yes, you are right that info from past is gone... What I'd like to see
is a dump of the whole LEB. Could you please add 'dbg_dump_leb()' -
basically I want to look if this LEB was passed through GC.

Because my theory is:

1. You have LEB A which contains this data node, but it is not corrupted
   yet. Let's call this data node X.
2. GC moves valid data from LEB A to LEB B (lnum 3479).
3. We get a power cut while moving the data. We end up with node X
   corrupted in B.
4. UBIFS recovery has a bug and it decides that the copy of node X in
   LEB B is OK, commits, and LEB A is erased at some point.
5. And we are in your situation...

But this is just a theory.

I actually worked on power cut emulation testing improvements lately,
and the current state is that "integck -p" fails sometimes. I need to
investigate it - might turn out to be a bug which cases the effect you
see.

Basically, I've improved UBIFS power cut testing and corrupt the buffer
with random data, not only with 0xFFs, and now integck -p starts
failing. See this commit:

http://git.infradead.org/ubifs-2.6.git/commit/96c32bb596c5a74362a6a825f66fde68b6c3487c

It contains several unrelated changes. Ignore the simple random part,
only the changes in 'cut_data()' are interesting.

But I'll split that change on several changes.

And I'll try to investigate the issue - it might turn out to be
integck's issue, will see.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

next prev parent reply	other threads:[~2011-06-06 10:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-27 21:12 ubifs_decompress: cannot decompress Matthew L. Creech
2011-05-30 12:29 ` Ben Gardiner
2011-05-31 15:47   ` Matthew L. Creech
2011-05-31 16:10     ` Ben Gardiner
2011-05-31 21:47       ` Matthew L. Creech
2011-06-01  7:51         ` Artem Bityutskiy
2011-06-02  4:30           ` Matthew L. Creech
2011-06-02 18:59             ` Matthew L. Creech
2011-06-06  9:58               ` Artem Bityutskiy [this message]
2011-06-06 16:04                 ` Matthew L. Creech
2011-06-06 16:18                   ` Artem Bityutskiy
2011-06-06 19:52                     ` Matthew L. Creech
2011-06-07  4:34                       ` Artem Bityutskiy
2011-06-07 20:41                         ` Matthew L. Creech
2011-06-08 14:11                           ` Artem Bityutskiy
2011-06-08 17:50                             ` Matthew L. Creech
2011-06-09 12:10                               ` Artem Bityutskiy
2011-06-20 15:35                                 ` Matthew L. Creech
2011-06-07 10:24                       ` Artem Bityutskiy
2011-06-03  4:32             ` Artem Bityutskiy
2011-06-01  8:02     ` Artem Bityutskiy
2011-06-01  8:07       ` Artem Bityutskiy
2011-06-01  8:39       ` Artem Bityutskiy
2011-06-02  4:34       ` Matthew L. Creech
2011-06-01  7:48 ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1307354316.3112.19.camel@localhost \
    --to=dedekind1@gmail.com \
    --cc=bengardiner@nanometrics.ca \
    --cc=linux-mtd@lists.infradead.org \
    --cc=mlcreech@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox