JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
@ 2007-03-08  9:49 R&D4
  2007-03-08 10:51 ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: R&D4 @ 2007-03-08  9:49 UTC (permalink / raw)
  To: mtd_mailinglist

Hi all MTD developers,

we are currently using an MTD partition on a NAND device, of course with
JFFS2 on it ;-) , for transaction logging purpose.
This transacion is mission critical and we cannot afford to lose data
(or, even worse, have corrupted data!)

For this reason we also use a battery-backed SRAM as temporary storage
for the transaction state machine. After the transacion has been
completed we flush the content of the SRAM to a file and (after the
written is completed) we can overwrite the temporary storage with new data.
Of course the machine can be interrupted in any moment without notice
(e.g. watchdog, power failure). Only the content of the SRAM is
guaranteed to be valid at any time.

The "main" problem, of course, is to know "when" we can say "ok the data
has been _completely_ written to the final storage".

By reading back on this mailing list, "goooogling" on internet and
reading JFFS2 FAQ
(http://www.linux-mtd.infradead.org/faq/jffs2.html#L_writewell) I think
I have found some kind of solution (I'm currently running some test on
it) depending on the storage medium (NOR vs NAND):

- on *NOR*: in our understanding, we can just use a simple fwrite()
followed by fsync() or sync(). After the sync() return the control to
the user's program, we can be sure that the data has been written on the
device. So

file = fopen(file_on_jffs2_nor)
while(isneeded) {
	while (space_available(SRAM)) {
		fill(SRAM);
	}

	fread(buffer, SRAM);
	fwrite(buffer, file);
	fsync(file);
	invalidate_SDRAM();
}
fclose(file)

(Of course I have intetionally omitted the code for resuming from a warm
reset.)

QUESTION: Is this pseudo code correct? Is fsync() needed? (O_SYNC is not
supported by JFFS2, AFAIK) or data has been _completely_ written right
before the fwrite() return (so no sync() required)?

- on *NAND*: things are a bit tricky ;-). Even if you call fsync() data
may not have been written to storage, due the fact that "it's better to
fill a NAND page before commit"
For this reason only after "a while" the (dirty) page is written to
storage even if it's not full. In the FAQ you say that this "a while" is
controlled by the standard kernel vm functions by setting
/proc/sys/vm/dirty_writeback_centisecs.

By reading this I think about use this code:

at system startup:
`echo smallvalue > /proc/sys/vm/dirty_writeback_centisecs`

file = fopen(file_on_jffs2_nand)
while(isneeded) {
	while (space_available(SRAM)) {
		fill(SRAM);
	}

	fread(buffer, SRAM);
	fwrite(buffer, file);
	fsync(file);
	sleep(smallvalue+anothervalue)
	invalidate_SDRAM();
}
fclose(file)

'smallvalue' should be something less that the standard 5 secs but
something that will not waste to much CPU or NAND storage (by using
not-completely-filled pages, correct me if I'm wrong about this point).
I was thinking about 500 millis.

'anothervalue' should be something '>>smallvalue' and it should be used
(IMHO) because Linux is not an RTOS, so timing are not tightly guaranteed.

Is this approach correct or the something better that can be done??

Of course you can still flush buffers and dirty pages but umounting the
partition but.. this is too long for our needs

BTW I have seen, in my current test, that, without the sleep(), sometime
my  "last" data is not written correctly.

Hope this (long ;-) ) email can lead to a useful discussion about this
problem! :-)

Best Regards,

Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08  9:49 JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace) R&D4
@ 2007-03-08 10:51 ` David Woodhouse
  2007-03-08 12:54   ` Jörn Engel
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2007-03-08 10:51 UTC (permalink / raw)
  To: R&D4; +Cc: mtd_mailinglist

On Thu, 2007-03-08 at 10:49 +0100, R&D4 wrote:
> Hi all MTD developers,
> 
> we are currently using an MTD partition on a NAND device, of course with
> JFFS2 on it ;-) , for transaction logging purpose.
> This transacion is mission critical and we cannot afford to lose data
> (or, even worse, have corrupted data!)
> 
> For this reason we also use a battery-backed SRAM as temporary storage
> for the transaction state machine. After the transacion has been
> completed we flush the content of the SRAM to a file and (after the
> written is completed) we can overwrite the temporary storage with new data.
> Of course the machine can be interrupted in any moment without notice
> (e.g. watchdog, power failure). Only the content of the SRAM is
> guaranteed to be valid at any time.
> 
> The "main" problem, of course, is to know "when" we can say "ok the data
> has been _completely_ written to the final storage".
> 
> By reading back on this mailing list, "goooogling" on internet and
> reading JFFS2 FAQ
> (http://www.linux-mtd.infradead.org/faq/jffs2.html#L_writewell) I think
> I have found some kind of solution (I'm currently running some test on
> it) depending on the storage medium (NOR vs NAND):
> 
> - on *NOR*: in our understanding, we can just use a simple fwrite()
> followed by fsync() or sync(). After the sync() return the control to
> the user's program, we can be sure that the data has been written on the
> device. So

...

> QUESTION: Is this pseudo code correct? Is fsync() needed? (O_SYNC is not
> supported by JFFS2, AFAIK) or data has been _completely_ written right
> before the fwrite() return (so no sync() required)?

On NOR you don't need the sync(). At least, if you're using write() you
don't need the sync. I make no claims about what glibc does with
fwrite(), but I believe fsync() ought to be perfectly sufficient.

JFFS2 doesn't support O_SYNC because it's _already_ synchronous.

> 
> - on *NAND*: things are a bit tricky ;-). Even if you call fsync() data
> may not have been written to storage, due the fact that "it's better to
> fill a NAND page before commit"

If you call fsync() an the data for the given file isn't actually
written to the NAND before the system call returns, that's a very
serious bug. We went to great lengths to ensure that fsync() works as it
should. If you think this is misbehaving, please show JFFS2 debugging
output demonstrating the error.

Your proposal of using 'sleep()' really ought to fill you with dread.
Adding an extra sleep is almost _never_ the way to achieve reliable
operation. I hope you did that only to draw attention to the problem and
weren't _honestly_ considering it in production :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 10:51 ` David Woodhouse
@ 2007-03-08 12:54   ` Jörn Engel
  2007-03-08 13:04     ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Jörn Engel @ 2007-03-08 12:54 UTC (permalink / raw)
  To: David Woodhouse; +Cc: R&D4, mtd_mailinglist

On Thu, 8 March 2007 10:51:38 +0000, David Woodhouse wrote:
> 
> On NOR you don't need the sync(). At least, if you're using write() you
> don't need the sync. I make no claims about what glibc does with
> fwrite(), but I believe fsync() ought to be perfectly sufficient.
> 
> JFFS2 doesn't support O_SYNC because it's _already_ synchronous.

It used to be.  And then came wbuf.c.  With write buffer enabled, JFFS2
is still serialized, but no longer synchronous (in the meaning of having
an implicit sync() after each operations).

Last time I checked, JFFS2 got one case wrong.  Don't remember which one,
but there are not that many to check:
 - mount -o sync
 - open(..., O_SYNC);
 - fsync()
 - sync()

[ Side note - NOR flash can still have a write buffer and require
explicit sync. ]

Jörn

-- 
Prosperity makes friends, adversity tries them.
-- Publilius Syrus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 12:54   ` Jörn Engel
@ 2007-03-08 13:04     ` David Woodhouse
  2007-03-08 13:12       ` Jörn Engel
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2007-03-08 13:04 UTC (permalink / raw)
  To: Jörn Engel; +Cc: R&D4, mtd_mailinglist

On Thu, 2007-03-08 at 13:54 +0100, Jörn Engel wrote:
> 
> Last time I checked, JFFS2 got one case wrong.  Don't remember which
> one,
> but there are not that many to check:
>  - mount -o sync
>  - open(..., O_SYNC);

I don't think we've implemented those two at all.

>  - fsync()
>  - sync()

These two should work.

> [ Side note - NOR flash can still have a write buffer and require
> explicit sync. ]

True. For 'NOR' read 'flash without wbuf' and for 'NAND' read 'flash
with wbuf'.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 13:04     ` David Woodhouse
@ 2007-03-08 13:12       ` Jörn Engel
  2007-03-08 13:22         ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Jörn Engel @ 2007-03-08 13:12 UTC (permalink / raw)
  To: David Woodhouse; +Cc: R&D4, mtd_mailinglist

On Thu, 8 March 2007 13:04:04 +0000, David Woodhouse wrote:
> On Thu, 2007-03-08 at 13:54 +0100, Jörn Engel wrote:
> > 
> > Last time I checked, JFFS2 got one case wrong.  Don't remember which
> > one,
> > but there are not that many to check:
> >  - mount -o sync
> >  - open(..., O_SYNC);
> 
> I don't think we've implemented those two at all.

Then you have your bug report.  Ignoring those was ok in 2001, but with
wbuf I would at least expect JFFS2 to return an error.

One option for implementing this would be to do the write, call schedule
to let other processes write something, then check whether the buffer
has been flushed out by other processes.  Or do some GC to fill the
buffer with useful data.  Anything but padding it.

Jörn

-- 
The strong give up and move away, while the weak give up and stay.
-- unknown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 13:12       ` Jörn Engel
@ 2007-03-08 13:22         ` David Woodhouse
  2007-03-08 13:44           ` Josh Boyer
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2007-03-08 13:22 UTC (permalink / raw)
  To: Jörn Engel; +Cc: R&D4, mtd_mailinglist

On Thu, 2007-03-08 at 14:12 +0100, Jörn Engel wrote:
> Then you have your bug report.  Ignoring those was ok in 2001, but with
> wbuf I would at least expect JFFS2 to return an error.

Not quite, because that's not what he was trying. He said he was using
fsync().

I would hope that JFFS2 doesn't _have_ to return an error -- the VFS
should do so if the filesystem doesn't support a given option. That
doesn't seem to be the case though, so I agree that we should make JFFS2
check for those two flags and refuse the operation (mount/open) for
write-buffered flash.

> One option for implementing this would be to do the write, call schedule
> to let other processes write something, then check whether the buffer
> has been flushed out by other processes.  Or do some GC to fill the
> buffer with useful data.  Anything but padding it. 

Better just to say 'no', I think :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 13:22         ` David Woodhouse
@ 2007-03-08 13:44           ` Josh Boyer
  2007-03-08 13:58             ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Boyer @ 2007-03-08 13:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: R&D4, Jörn Engel, mtd_mailinglist

On Thu, 2007-03-08 at 13:22 +0000, David Woodhouse wrote:
> On Thu, 2007-03-08 at 14:12 +0100, Jörn Engel wrote:
> > Then you have your bug report.  Ignoring those was ok in 2001, but with
> > wbuf I would at least expect JFFS2 to return an error.
> 
> Not quite, because that's not what he was trying. He said he was using
> fsync().
> 
> I would hope that JFFS2 doesn't _have_ to return an error -- the VFS
> should do so if the filesystem doesn't support a given option. That
> doesn't seem to be the case though, so I agree that we should make JFFS2
> check for those two flags and refuse the operation (mount/open) for
> write-buffered flash.

Wait... why?  Rejecting mount -o sync seems sane, but why can't O_SYNC
support be handled?  If users want to open their files like that and it
causes JFFS2 to flush a bunch of padding out, well they asked for it.

josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 13:44           ` Josh Boyer
@ 2007-03-08 13:58             ` David Woodhouse
  2007-03-08 14:35               ` Josh Boyer
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2007-03-08 13:58 UTC (permalink / raw)
  To: Josh Boyer; +Cc: R&D4, Jörn Engel, mtd_mailinglist

On Thu, 2007-03-08 at 07:44 -0600, Josh Boyer wrote:
> Wait... why?  Rejecting mount -o sync seems sane, but why can't O_SYNC
> support be handled?  If users want to open their files like that and it
> causes JFFS2 to flush a bunch of padding out, well they asked for it. 

Users ask for shared writable mmap too. That isn't a good idea either.

Contemplate what Linus always says about using 'volatile' vs. proper
barriers. Now consider it for O_SYNC vs. fsync(). And factor in the fact
that O_SYNC is going to be massively suboptimal if it causes syncs when
they otherwise didn't need to happen.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 13:58             ` David Woodhouse
@ 2007-03-08 14:35               ` Josh Boyer
  2007-03-08 14:43                 ` Jörn Engel
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Boyer @ 2007-03-08 14:35 UTC (permalink / raw)
  To: David Woodhouse; +Cc: R&D4, Jörn Engel, mtd_mailinglist

On Thu, 2007-03-08 at 13:58 +0000, David Woodhouse wrote:
> On Thu, 2007-03-08 at 07:44 -0600, Josh Boyer wrote:
> > Wait... why?  Rejecting mount -o sync seems sane, but why can't O_SYNC
> > support be handled?  If users want to open their files like that and it
> > causes JFFS2 to flush a bunch of padding out, well they asked for it. 
> 
> Users ask for shared writable mmap too. That isn't a good idea either.
> 
> Contemplate what Linus always says about using 'volatile' vs. proper
> barriers. Now consider it for O_SYNC vs. fsync(). And factor in the fact
> that O_SYNC is going to be massively suboptimal if it causes syncs when
> they otherwise didn't need to happen.

Ok, good point.  You've convinced me.

josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace)
  2007-03-08 14:35               ` Josh Boyer
@ 2007-03-08 14:43                 ` Jörn Engel
  0 siblings, 0 replies; 10+ messages in thread
From: Jörn Engel @ 2007-03-08 14:43 UTC (permalink / raw)
  To: Josh Boyer; +Cc: R&D4, David Woodhouse, mtd_mailinglist

On Thu, 8 March 2007 08:35:17 -0600, Josh Boyer wrote:
> On Thu, 2007-03-08 at 13:58 +0000, David Woodhouse wrote:
> > On Thu, 2007-03-08 at 07:44 -0600, Josh Boyer wrote:
> > > Wait... why?  Rejecting mount -o sync seems sane, but why can't O_SYNC
> > > support be handled?  If users want to open their files like that and it
> > > causes JFFS2 to flush a bunch of padding out, well they asked for it. 
> > 
> > Users ask for shared writable mmap too. That isn't a good idea either.
> > 
> > Contemplate what Linus always says about using 'volatile' vs. proper
> > barriers. Now consider it for O_SYNC vs. fsync(). And factor in the fact
> > that O_SYNC is going to be massively suboptimal if it causes syncs when
> > they otherwise didn't need to happen.
> 
> Ok, good point.  You've convinced me.

I can imagine cases when O_SYNC makes sense.  Logfiles, basically.
When debugging it is useful to have as much information in them as
possible, especially right before the crash.  O_SYNC is a useful
relaxation here on system where fsync() is identical to sync() - O_SYNC
will not flush every other file in the system.

Whether any applications actually do this is anyone's guess, though.

Jörn

-- 
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface.
-- Doug MacIlroy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-03-08 14:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-08  9:49 JFFS2 as transactional FS (in other words: how to be sure that data have been writtent correctly from userspace) R&D4
2007-03-08 10:51 ` David Woodhouse
2007-03-08 12:54   ` Jörn Engel
2007-03-08 13:04     ` David Woodhouse
2007-03-08 13:12       ` Jörn Engel
2007-03-08 13:22         ` David Woodhouse
2007-03-08 13:44           ` Josh Boyer
2007-03-08 13:58             ` David Woodhouse
2007-03-08 14:35               ` Josh Boyer
2007-03-08 14:43                 ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox