* JFFS2 loss of power expectations
@ 2011-04-22 5:05 Cliff Brake
2011-04-22 7:36 ` Artem Bityutskiy
0 siblings, 1 reply; 4+ messages in thread
From: Cliff Brake @ 2011-04-22 5:05 UTC (permalink / raw)
To: linux-mtd
Hello,
I'm helping debug a system that is running the 2.6.27 kernel with
JFFS2 (with summary) on SLC NAND. The CPU is a PXA270 (cm-x270), and
appears to be using NAND_ECC_SOFT. We are experiencing some file
system corruption if we lose power when the system is booting up.
There is some amount of file system activity when udev and other
system components start, but overall there is not much that is being
written to the file system.
A few questions:
1) should this combination be fairly robust to power failure (are
failures expected, or possible)?
2) any suggestions for debugging this?
We have not ruled out hardware problems (power supply, etc).
Thanks,
Cliff
--
=================
http://bec-systems.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: JFFS2 loss of power expectations
2011-04-22 5:05 JFFS2 loss of power expectations Cliff Brake
@ 2011-04-22 7:36 ` Artem Bityutskiy
2011-05-03 20:08 ` Cliff Brake
0 siblings, 1 reply; 4+ messages in thread
From: Artem Bityutskiy @ 2011-04-22 7:36 UTC (permalink / raw)
To: Cliff Brake; +Cc: linux-mtd
Hi,
On Fri, 2011-04-22 at 01:05 -0400, Cliff Brake wrote:
> I'm helping debug a system that is running the 2.6.27 kernel with
> JFFS2 (with summary) on SLC NAND. The CPU is a PXA270 (cm-x270), and
> appears to be using NAND_ECC_SOFT. We are experiencing some file
> system corruption if we lose power when the system is booting up.
> There is some amount of file system activity when udev and other
> system components start, but overall there is not much that is being
> written to the file system.
>
> A few questions:
>
> 1) should this combination be fairly robust to power failure (are
> failures expected, or possible)?
Probably, but you have to test this anyway.
1. Despite JFFS2 is considered old and robust - it is not maintained
very well for the last couple of years.
2. SLC NANDs in the past were more robust than modern SLCs and new
challenges like unstable bits may have changed the situation with JFFS2
robustness.
So - test it.
> 2) any suggestions for debugging this?
Some kind of device which may cut power is needed. Then you may write a
test program or script, cut power at random point, boot up, make sure
the FS look ok.
Or simply cut the power at random point between [0-N] seconds, then boot
up all the way to the end, check everything is ok, reboot, cut power,
etc.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: JFFS2 loss of power expectations
2011-04-22 7:36 ` Artem Bityutskiy
@ 2011-05-03 20:08 ` Cliff Brake
2011-05-03 22:03 ` Ivan Djelic
0 siblings, 1 reply; 4+ messages in thread
From: Cliff Brake @ 2011-05-03 20:08 UTC (permalink / raw)
To: dedekind1; +Cc: linux-mtd
On Fri, Apr 22, 2011 at 3:36 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> Hi,
>
> On Fri, 2011-04-22 at 01:05 -0400, Cliff Brake wrote:
>> I'm helping debug a system that is running the 2.6.27 kernel with
>> JFFS2 (with summary) on SLC NAND. The CPU is a PXA270 (cm-x270), and
>> appears to be using NAND_ECC_SOFT. We are experiencing some file
>> system corruption if we lose power when the system is booting up.
>> There is some amount of file system activity when udev and other
>> system components start, but overall there is not much that is being
>> written to the file system.
>>
>> A few questions:
>>
>> 1) should this combination be fairly robust to power failure (are
>> failures expected, or possible)?
>
> Probably, but you have to test this anyway.
>
> 1. Despite JFFS2 is considered old and robust - it is not maintained
> very well for the last couple of years.
> 2. SLC NANDs in the past were more robust than modern SLCs and new
> challenges like unstable bits may have changed the situation with JFFS2
> robustness.
>
> So - test it.
>
>> 2) any suggestions for debugging this?
>
> Some kind of device which may cut power is needed. Then you may write a
> test program or script, cut power at random point, boot up, make sure
> the FS look ok.
Yes, we have a programmable PS set up to cut power during boot, and we
can reproduce JFFS2 file system corruption with a day or so of
testing. We are using a fairly old CPU board with a small SLC flash
(128MB).
Now, the question is how do we prevent it?
We are looking into mounting the root file system in RO and sync
modes, etc, but don't have test results yet.
So, just looking for general ideas how to improve this situation.
Thanks,
Cliff
--
=================
http://bec-systems.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: JFFS2 loss of power expectations
2011-05-03 20:08 ` Cliff Brake
@ 2011-05-03 22:03 ` Ivan Djelic
0 siblings, 0 replies; 4+ messages in thread
From: Ivan Djelic @ 2011-05-03 22:03 UTC (permalink / raw)
To: Cliff Brake; +Cc: linux-mtd@lists.infradead.org, dedekind1@gmail.com
On Tue, May 03, 2011 at 09:08:26PM +0100, Cliff Brake wrote:
> >> 2) any suggestions for debugging this?
> >
> > Some kind of device which may cut power is needed. Then you may write a
> > test program or script, cut power at random point, boot up, make sure
> > the FS look ok.
>
> Yes, we have a programmable PS set up to cut power during boot, and we
> can reproduce JFFS2 file system corruption with a day or so of
> testing. We are using a fairly old CPU board with a small SLC flash
> (128MB).
>
> Now, the question is how do we prevent it?
>
> We are looking into mounting the root file system in RO and sync
> modes, etc, but don't have test results yet.
>
> So, just looking for general ideas how to improve this situation.
Hi Cliff,
Just a few debugging ideas that helped me a lot in the past:
1. Try to focus your random power cuts so that they happen precisely during a
nand write/erase operation; this will help reproduce bugs much faster.
Ideally you could try to use a hw timer or watchdog to trigger a software
reset with µs precision.
2. Using instrumentation and targeted power cuts as described above, you
should be able to isolate the last interrupted nand operation that caused a
corruption: is it an interrupted page programming, or a partially erased block?
3. During reboot after a power cut, look for nand read failures. Are they
located as expected in the last page/block that was programmed/erased ? Or do
they appear in unrelated locations ? Or not appearing at all ?
4. If the above steps do not lead to an obvious explanation, they may still
provide you with a way to dump nand contents (before and after corruption) and
systematically reproduce the bug on a linux pc running nandsim. This makes
debugging much easier.
On the improvement side, I was going to suggest mounting as much as possible
as RO, but you mentioned that already.
Hope that helps,
Regards,
Ivan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-05-03 22:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-22 5:05 JFFS2 loss of power expectations Cliff Brake
2011-04-22 7:36 ` Artem Bityutskiy
2011-05-03 20:08 ` Cliff Brake
2011-05-03 22:03 ` Ivan Djelic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).