linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Djelic <ivan.djelic@parrot.com>
To: Cliff Brake <cliff.brake@gmail.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	"dedekind1@gmail.com" <dedekind1@gmail.com>
Subject: Re: JFFS2 loss of power expectations
Date: Wed, 4 May 2011 00:03:42 +0200	[thread overview]
Message-ID: <20110503220342.GA3862@parrot.com> (raw)
In-Reply-To: <BANLkTinscG0a3mOwvCwQOLKRNoabStkOYg@mail.gmail.com>

On Tue, May 03, 2011 at 09:08:26PM +0100, Cliff Brake wrote:
> >> 2) any suggestions for debugging this?
> >
> > Some kind of device which may cut power is needed. Then you may write a
> > test program or script, cut power at random point, boot up, make sure
> > the FS look ok.
> 
> Yes, we have a programmable PS set up to cut power during boot, and we
> can reproduce JFFS2 file system corruption with a day or so of
> testing.  We are using a fairly old CPU board with a small SLC flash
> (128MB).
> 
> Now, the question is how do we prevent it?
> 
> We are looking into mounting the root file system in RO and sync
> modes, etc, but don't have test results yet.
> 
> So, just looking for general ideas how to improve this situation.

Hi Cliff,
Just a few debugging ideas that helped me a lot in the past:

1. Try to focus your random power cuts so that they happen precisely during a
nand write/erase operation; this will help reproduce bugs much faster.
Ideally you could try to use a hw timer or watchdog to trigger a software
reset with µs precision. 

2. Using instrumentation and targeted power cuts as described above, you
should be able to isolate the last interrupted nand operation that caused a
corruption: is it an interrupted page programming, or a partially erased block?

3. During reboot after a power cut, look for nand read failures. Are they
located as expected in the last page/block that was programmed/erased ? Or do
they appear in unrelated locations ? Or not appearing at all ?

4. If the above steps do not lead to an obvious explanation, they may still
provide you with a way to dump nand contents (before and after corruption) and
systematically reproduce the bug on a linux pc running nandsim. This makes
debugging much easier.

On the improvement side, I was going to suggest mounting as much as possible
as RO, but you mentioned that already.

Hope that helps,
Regards,

Ivan

      reply	other threads:[~2011-05-03 22:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-22  5:05 JFFS2 loss of power expectations Cliff Brake
2011-04-22  7:36 ` Artem Bityutskiy
2011-05-03 20:08   ` Cliff Brake
2011-05-03 22:03     ` Ivan Djelic [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110503220342.GA3862@parrot.com \
    --to=ivan.djelic@parrot.com \
    --cc=cliff.brake@gmail.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-mtd@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).