public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
From: Artem Bityutskiy <dedekind1@gmail.com>
To: "Matthew L. Creech" <mlcreech@gmail.com>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>,
	MTD list <linux-mtd@lists.infradead.org>
Subject: Re: ubifs_decompress: cannot decompress ...
Date: Fri, 03 Jun 2011 07:32:20 +0300	[thread overview]
Message-ID: <1307075540.4405.128.camel@localhost> (raw)
In-Reply-To: <BANLkTi=ATYnYKrRvZLyFPxfbW0_RVwTROA@mail.gmail.com>

On Thu, 2011-06-02 at 00:30 -0400, Matthew L. Creech wrote:
> On Wed, Jun 1, 2011 at 3:51 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> >
> > How this happens? What do you do? Does this happen after mount when you
> > first read your data? Or this happens at some point while you stress
> > testing your system? Or this happens after a power cut?
> >
> 
> So far there's no discernable pattern.  Most of the failed units are
> returns from the field, so we don't know what kind of conditions
> they've been placed in.  Some are from our test department, but we
> haven't found anything that might "trigger" the problem in any way.
> 
> The device works fine for some period of time (usually weeks /
> months), then we get complaints about various problems.  The reported
> symptoms eventually come down to one of these UBIFS errors.  Depending
> on the region which happens to go bad, it can result in breakage of a
> minor feature (because a file we try to read/write after mount
> triggers the error), all the way up to a completely non-functional
> device.  I'm not sure if we've ever seen it fail to mount altogether
> (I'll check into that), but we've had several cases in which U-Boot
> couldn't read the kernel image from UBIFS, so the device wouldn't boot
> Linux at all.
> 
> Power cuts are probably not common, though.  We have to expect them in
> the product of course, but practically speaking, our service guy
> assures me that a couple of the bad units he shipped me had stable
> power and were rarely/never rebooted.  But I can't rule it out with
> certainty.
> 
> Aside from that, it's just normal operation.  If the usage pattern
> matters, the only files ever written to in the persistent (UBIFS)
> filesystem are SQLite databases.  It's generally light usage, logging
> a variety of measurements once every 5 minutes.  I've tried
> stress-testing by running non-stop SQLite operations, recreating the
> normal usage pattern but with a _much_ higher frequency of writes than
> normal.  It didn't seem to help reproduce the error - we've yet to
> succeed in making this problem happen under controlled conditions.
> 
> As for this specific error (ubifs_decompress): tomorrow I'll try to
> gather & post additional log data for this device.  Thanks!

OK, then this is not about power cuts and unstable bits. First thing
coming to my mind is that your kernel may have some non-UBIFS bugs which
end up in memory corruptions, so UBIFS writes corrupted data to the
flash.

But the hexdump you sent shows that you have some non-0xFFs and then
many 0xFFs. Are those trailing 0xFFs part of the node data or not? If
yes, then it does not look like memory corruption, but more like some
driver/flash issues.

BTW, have you run mtd tests? Would you mind to set up torture test on
one of your boards and let it run fore several weeks. I remember we
found a rare DMA bug in our board by running the torture test for long
time. Also, it might be interesting how your HW and SW behave when you
continuously wear out few eraseblocks.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

  parent reply	other threads:[~2011-06-03  4:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-27 21:12 ubifs_decompress: cannot decompress Matthew L. Creech
2011-05-30 12:29 ` Ben Gardiner
2011-05-31 15:47   ` Matthew L. Creech
2011-05-31 16:10     ` Ben Gardiner
2011-05-31 21:47       ` Matthew L. Creech
2011-06-01  7:51         ` Artem Bityutskiy
2011-06-02  4:30           ` Matthew L. Creech
2011-06-02 18:59             ` Matthew L. Creech
2011-06-06  9:58               ` Artem Bityutskiy
2011-06-06 16:04                 ` Matthew L. Creech
2011-06-06 16:18                   ` Artem Bityutskiy
2011-06-06 19:52                     ` Matthew L. Creech
2011-06-07  4:34                       ` Artem Bityutskiy
2011-06-07 20:41                         ` Matthew L. Creech
2011-06-08 14:11                           ` Artem Bityutskiy
2011-06-08 17:50                             ` Matthew L. Creech
2011-06-09 12:10                               ` Artem Bityutskiy
2011-06-20 15:35                                 ` Matthew L. Creech
2011-06-07 10:24                       ` Artem Bityutskiy
2011-06-03  4:32             ` Artem Bityutskiy [this message]
2011-06-01  8:02     ` Artem Bityutskiy
2011-06-01  8:07       ` Artem Bityutskiy
2011-06-01  8:39       ` Artem Bityutskiy
2011-06-02  4:34       ` Matthew L. Creech
2011-06-01  7:48 ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1307075540.4405.128.camel@localhost \
    --to=dedekind1@gmail.com \
    --cc=bengardiner@nanometrics.ca \
    --cc=linux-mtd@lists.infradead.org \
    --cc=mlcreech@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox