JFFS2 an nodes checking

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* JFFS2 an nodes checking
@ 2004-09-28 12:29 Artem B. Bityuckiy
  2004-09-28 12:37 ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 12:29 UTC (permalink / raw)
  To: linux-mtd

Hello. I have a question that a wanted to ask just after I've known 
JFFS2. For some reasons I didn't ask it yet.

When JFFS2 is being mounted, all the node header CRC checksums (hdr_crc 
and node_crc) are checked. The data CRC checksums (data_crc) aren't checked.

The data CRCs are checked by either Garbage Collector or on iget() 
request when the inode cache is being built.

To check the regular file inode's CRC the whole it's content must be 
read. This may be time-expensive in case of big file. Especially if the 
NAND flash device is used.

Why when user opens a big file he need to wait while all it's contents 
will be read? Moreover, the CRCs are checked anyway on any read request 
(of course, for nodes which are involved to this request). Is this delay 
really needed? Why not to just leave the inode in the 
INO_STATE_UNCHECKED state - the Garbage Collector will check it some 
time? ....

It seems OK for me when the Garbage Collector checks CRCs in background, 
but the delay on the iget() request doesn't seem to be necessary.

Did I miss something?

Thanks in advance.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 12:29 JFFS2 an nodes checking Artem B. Bityuckiy
@ 2004-09-28 12:37 ` David Woodhouse
  2004-09-28 13:17   ` Artem B. Bityuckiy
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 12:37 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 16:29 +0400, Artem B. Bityuckiy wrote:
> Why when user opens a big file he need to wait while all it's contents 
> will be read? Moreover, the CRCs are checked anyway on any read request 
> (of course, for nodes which are involved to this request). Is this delay 
> really needed? Why not to just leave the inode in the 
> INO_STATE_UNCHECKED state - the Garbage Collector will check it some 
> time? ....

Bear in mind that we discard nodes with invalid data_crc. So they do not
obsolete _older_ nodes which cover the same area. If we don't check the
CRC we can't build up the red-black tree for the inode in the same way
we do at the moment, because we don't actually know which nodes are
valid. We'd have to invent another kind of data structure, in which we
could keep information about overlapping nodes, some of which may turn
out not to be valid. 

At the moment we just have the unsorted raw->next_in_ino list, and the
rbtree in the jffs2_inode_info. 

But yes, it might make some sense to perform this optimisation if it's
not _too_ painful to do it.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 12:37 ` David Woodhouse
@ 2004-09-28 13:17   ` Artem B. Bityuckiy
  2004-09-28 13:22     ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 13:17 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> Bear in mind that we discard nodes with invalid data_crc. So they do not
> obsolete _older_ nodes which cover the same area. If we don't check the
> CRC we can't build up the red-black tree for the inode in the same way
> we do at the moment, because we don't actually know which nodes are
> valid. We'd have to invent another kind of data structure, in which we
> could keep information about overlapping nodes, some of which may turn
> out not to be valid. 

Oh, I missed this aspect! Thanks.

As I understand if we disable the CRC checking at the iget() request, we 
will lost the JFFS2 some "logging" capabilities.

For example, if the power is off during the new node write operation, we 
don't lost older data since the older node's data will be used. But if 
we won't check the data CRC, we will have all zeros in the correspondent 
piece of the file (the hole node).

Ok. But what if we check only last node (with the highest version) and 
if it is OK, don't check anything more? If it is bad, check previous, 
and so on. It seems for me, these checks are enough. We cover the 
situation with unexpected power losses.

There are some scenarios, where we will lost some data. For example, 
some sector on the flash device becomes bad, and we lost a node which is 
situated on this sector. If there is older node for this data range, the 
current implementation will use it. But in this case I am not sure that 
the data from the older node is better than just zeros. Are such flash 
faults JFFS2's business?

What do you think?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 13:17   ` Artem B. Bityuckiy
@ 2004-09-28 13:22     ` David Woodhouse
  2004-09-28 13:37       ` Artem B. Bityuckiy
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 13:22 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 17:17 +0400, Artem B. Bityuckiy wrote:
> Ok. But what if we check only last node (with the highest version) and 
> if it is OK, don't check anything more? If it is bad, check previous, 
> and so on. It seems for me, these checks are enough. We cover the 
> situation with unexpected power losses.

There's no reason to presume that the only node with broken CRC will be
the _latest_ node, especially on NAND flash.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 13:22     ` David Woodhouse
@ 2004-09-28 13:37       ` Artem B. Bityuckiy
  2004-09-28 13:45         ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 13:37 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> On Tue, 2004-09-28 at 17:17 +0400, Artem B. Bityuckiy wrote:
> 
>>Ok. But what if we check only last node (with the highest version) and 
>>if it is OK, don't check anything more? If it is bad, check previous, 
>>and so on. It seems for me, these checks are enough. We cover the 
>>situation with unexpected power losses.
> 
> 
> There's no reason to presume that the only node with broken CRC will be
> the _latest_ node, especially on NAND flash.
> 
If this isn't last node, the reason is media errors. And it isn't JFFS2 
deal to restore data. Moreover, this isn't guarantied in the current 
implementation - the previous node(s) may have been already Garbage 
Collected.

Another possibility is to teach the GC not to delete obsolete nodes for 
not yet checked inodes. And the fall-back procedure must be done if the 
bad node is read after the file was opened but its data wasn't checked 
(iput() and iget() again).

Comments?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 13:37       ` Artem B. Bityuckiy
@ 2004-09-28 13:45         ` David Woodhouse
  2004-09-28 13:57           ` Artem B. Bityuckiy
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 13:45 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 17:37 +0400, Artem B. Bityuckiy wrote:
> If this isn't last node, the reason is media errors. 

No. What about an unclean reboot followed by more valid writes? You end
up with the broken node in the middle.

> Another possibility is to teach the GC not to delete obsolete nodes for 
> not yet checked inodes. 

The GC already doesn't delete _anything_ until all inodes have been
checked. In fact I suppose it _could_ proceed, checking each inode only
as and when it encounters a node belonging to that inode... but that
would generally screw up the accounting totals and make my head hurt so
it wasn't done that way.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 13:45         ` David Woodhouse
@ 2004-09-28 13:57           ` Artem B. Bityuckiy
  2004-09-28 14:04             ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 13:57 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> On Tue, 2004-09-28 at 17:37 +0400, Artem B. Bityuckiy wrote:
> 
>>If this isn't last node, the reason is media errors. 
> 
> 
> No. What about an unclean reboot followed by more valid writes? You end
> up with the broken node in the middle.
Sorry, I don't understand. Suppose, after unclean reboot the bad last 
node appears. Before any write, this last node will be detected *before 
write* since the iget() will be called before it. Is it?

> The GC already doesn't delete _anything_ until all inodes have been
> checked. In fact I suppose it _could_ proceed, checking each inode only
> as and when it encounters a node belonging to that inode... but that
> would generally screw up the accounting totals and make my head hurt so
> it wasn't done that way.
>
Hm, yes.

Ok, anyway, there are possibilities to improve the iget().

What do you think the best way to do such change (no check on iget()) ?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 13:57           ` Artem B. Bityuckiy
@ 2004-09-28 14:04             ` David Woodhouse
  2004-09-28 14:26               ` Artem B. Bityuckiy
  2004-09-28 14:31               ` Josh Boyer
  0 siblings, 2 replies; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 14:04 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 17:57 +0400, Artem B. Bityuckiy wrote:
> Sorry, I don't understand. Suppose, after unclean reboot the bad last 
> node appears. Before any write, this last node will be detected *before 
> write* since the iget() will be called before it. Is it?

True, but it won't necessarily be _deleted_ so it could still be there
on the _next_ boot.

> Ok, anyway, there are possibilities to improve the iget().
> 
> What do you think the best way to do such change (no check on iget())
> ?

You have to make the read/write code capable of dealing with the fact
that the full rbtree hasn't been built. Possibly you sort through the
raw nodes and put them into _pools_, each covering something like a
256KiB of the file. Then when you get a read or write of any such range,
you check the CRCs for the nodes in that _range_ and build the full map
only for that range of the file. It sounds painful, to be honest -- I'd
rather just tell people not to use such large files on JFFS2 -- split it
up into multiple files instead.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:04             ` David Woodhouse
@ 2004-09-28 14:26               ` Artem B. Bityuckiy
  2004-09-28 14:37                 ` David Woodhouse
  2004-09-28 14:31               ` Josh Boyer
  1 sibling, 1 reply; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 14:26 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> On Tue, 2004-09-28 at 17:57 +0400, Artem B. Bityuckiy wrote:
> 
>>Sorry, I don't understand. Suppose, after unclean reboot the bad last 
>>node appears. Before any write, this last node will be detected *before 
>>write* since the iget() will be called before it. Is it?
> 
> 
> True, but it won't necessarily be _deleted_ so it could still be there
> on the _next_ boot.
Yes, you are right. But we may handle this situation. Suppose we've 
found that the last node is bad. In this can we write new node 
containing the same data area (the area range is known since the header 
CRC is good). Thus, that last bad node will be obsoleted and the new 
last node will be good. If there are will be new unclean reboot, we will 
have two bad nodes at the end.
Only after obsoleting last bad node we allow new writes.

What do you think now? :-)

> 
> You have to make the read/write code capable of dealing with the fact
> that the full rbtree hasn't been built. Possibly you sort through the
> raw nodes and put them into _pools_, each covering something like a
> 256KiB of the file. Then when you get a read or write of any such range,
> you check the CRCs for the nodes in that _range_ and build the full map
> only for that range of the file. It sounds painful, to be honest -- I'd
> rather just tell people not to use such large files on JFFS2 -- split it
> up into multiple files instead.
> 

I'm really going to do such the optimization and your idea is 
complicated enough. I'd like to find easier solution and don't introduce 
additional complicated things to the already complicated file system :-) 
Ok, I'll think on this problem.

For now, the Idea with checking only the last node seems to me better. 
But I didn't think enough about it yet.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:04             ` David Woodhouse
  2004-09-28 14:26               ` Artem B. Bityuckiy
@ 2004-09-28 14:31               ` Josh Boyer
  2004-09-28 14:47                 ` Artem B. Bityuckiy
  1 sibling, 1 reply; 20+ messages in thread
From: Josh Boyer @ 2004-09-28 14:31 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Tue, 2004-09-28 at 09:04, David Woodhouse wrote:
> You have to make the read/write code capable of dealing with the fact
> that the full rbtree hasn't been built. Possibly you sort through the
> raw nodes and put them into _pools_, each covering something like a
> 256KiB of the file. Then when you get a read or write of any such range,
> you check the CRCs for the nodes in that _range_ and build the full map
> only for that range of the file. It sounds painful, to be honest -- I'd
> rather just tell people not to use such large files on JFFS2 -- split it
> up into multiple files instead.

Yes, large files are bad, but aren't the only source of such a time
delay.  IIRC, the problem is really just a matter of the number of nodes
per file.  So couldn't you have a small file with a large number of
writes within that file that has the same effect?  (Until the obsoleted
nodes are actually deleted that is.)

I'm thinking of fifos here too...

josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:26               ` Artem B. Bityuckiy
@ 2004-09-28 14:37                 ` David Woodhouse
  2004-09-28 14:58                   ` Artem B. Bityuckiy
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 14:37 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 18:26 +0400, Artem B. Bityuckiy wrote:
> Yes, you are right. But we may handle this situation. Suppose we've 
> found that the last node is bad. In this can we write new node 
> containing the same data area (the area range is known since the header 
> CRC is good). Thus, that last bad node will be obsoleted and the new 
> last node will be good. If there are will be new unclean reboot, we will 
> have two bad nodes at the end.
> Only after obsoleting last bad node we allow new writes.
> 
> What do you think now? :-)

It doesn't work unless you _know_ that the last time the fs was mounted
it was with this new code instead of with the old code which _doesn't_
do that. Unless you call it JFFS3 it doesn't seem like the answer :)

> I'm really going to do such the optimization and your idea is 
> complicated enough. I'd like to find easier solution and don't introduce 
> additional complicated things to the already complicated file system :-) 
> Ok, I'll think on this problem.

The file system is simple -- in fact it's _trivial_. Only the
optimisations are complex. It's a bit like chess. :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:31               ` Josh Boyer
@ 2004-09-28 14:47                 ` Artem B. Bityuckiy
  2004-09-28 14:58                   ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 14:47 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-mtd, David Woodhouse

> Yes, large files are bad, but aren't the only source of such a time
> delay.  IIRC, the problem is really just a matter of the number of nodes
> per file.  So couldn't you have a small file with a large number of
> writes within that file that has the same effect?  (Until the obsoleted
> nodes are actually deleted that is.)
> 
> I'm thinking of fifos here too...
> 
> josh
> 

I suppose you don't mean the Unix FIFO file type (like block device, 
socket, etc).

Yes, if we have a file with a lot of small nodes, we will read node 
headers anyway.
This case won't be optimized very much if we don't check the data CRC.

But after the GC process this file will be optimized and split on 4K 
pieces :-)


-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:47                 ` Artem B. Bityuckiy
@ 2004-09-28 14:58                   ` David Woodhouse
  2004-09-28 16:48                     ` Artem B. Bityuckiy
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 14:58 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 18:47 +0400, Artem B. Bityuckiy wrote:
> > I'm thinking of fifos here too...
> 
> I suppose you don't mean the Unix FIFO file type (like block device, 
> socket, etc).

Yes, he does. The problem is that each time you access a FIFO you update
its ctime and mtime on the flash... which leads to a _lot_ of nodes,
only one of which is non-obsolete. That's a problem on NAND flash.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:37                 ` David Woodhouse
@ 2004-09-28 14:58                   ` Artem B. Bityuckiy
  2004-09-28 15:04                     ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 14:58 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

> It doesn't work unless you _know_ that the last time the fs was mounted
> it was with this new code instead of with the old code which _doesn't_
> do that. Unless you call it JFFS3 it doesn't seem like the answer :)
>
Hmm, you mean that I'll lost the backward compatibility in case of 
unclean reboot. Yes, you are right. Thanks.

But... from the another hand, this is only in case of unclean reboots... 
Moreover, we may do this only for _big files_ ...

What would you say if somebody offer you a patch with such optimization 
? Would you reject it? Or you would accept it and rename JFFS2 to JFFS3 
:-) ?

> The file system is simple -- in fact it's _trivial_. Only the
> optimisations are complex. It's a bit like chess. :)
:-)
Hmm, yes. I meant JFFS2 as whole, with all its optimizations.

Anyway, the complex solution with partial fragtrees is the potential 
source of new bugs, etc.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:58                   ` Artem B. Bityuckiy
@ 2004-09-28 15:04                     ` David Woodhouse
  0 siblings, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 15:04 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 18:58 +0400, Artem B. Bityuckiy wrote:
> What would you say if somebody offer you a patch with such optimization 
> ? Would you reject it? Or you would accept it and rename JFFS2 to JFFS3 
> :-) ?

I'd probably reject it unless I was completely convinced it was
necessary, and it was the only way to do it. We're not ready for JFFS3
yet.

How about introducing a new node type which can be ignored by older
JFFS2 implementations? Consider an 'inode checkpoint' node, which tells
you instantly the state of all the nodes with versions smaller than it,
so you don't have to check all those. You only have to read and check
the CRC on the nodes _newer_ than the checkpoint.

Then you can write these whenever you feel like it, to large files. 

> Anyway, the complex solution with partial fragtrees is the potential 
> source of new bugs, etc.

True. It took long enough to get the first fragtree stuff right, and I
still have the mental scars :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 14:58                   ` David Woodhouse
@ 2004-09-28 16:48                     ` Artem B. Bityuckiy
  2004-09-28 16:57                       ` Josh Boyer
  2004-09-28 16:58                       ` David Woodhouse
  0 siblings, 2 replies; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 16:48 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

> Yes, he does. The problem is that each time you access a FIFO you update
> its ctime and mtime on the flash... which leads to a _lot_ of nodes,
> only one of which is non-obsolete. That's a problem on NAND flash.

Things are same for regular files. This is from SUSv3 for write() call:

"Upon successful completion, where nbyte is greater than 0, write() 
shall mark for update the st_ctime and st_mtime fields of the file"

This is from SUSv3 fore read() call:

"Upon successful completion, where nbyte is greater than 0, read() shall 
mark for update the st_atime field of the file, and shall return the 
number of bytes read."

What's special with FIFOs ?

Why that's a problem on NAND flash especially ?


-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 16:48                     ` Artem B. Bityuckiy
@ 2004-09-28 16:57                       ` Josh Boyer
  2004-09-28 16:58                       ` David Woodhouse
  1 sibling, 0 replies; 20+ messages in thread
From: Josh Boyer @ 2004-09-28 16:57 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd, David Woodhouse

On Tue, 2004-09-28 at 11:48, Artem B. Bityuckiy wrote:
> > Yes, he does. The problem is that each time you access a FIFO you update
> > its ctime and mtime on the flash... which leads to a _lot_ of nodes,
> > only one of which is non-obsolete. That's a problem on NAND flash.
> 
> Things are same for regular files. This is from SUSv3 for write() call:
> 
> "Upon successful completion, where nbyte is greater than 0, write() 
> shall mark for update the st_ctime and st_mtime fields of the file"

Yes, but in that case you are writing data anyway.  So updates of
st_ctime and st_mtime are free.

> 
> This is from SUSv3 fore read() call:
> 
> "Upon successful completion, where nbyte is greater than 0, read() shall 
> mark for update the st_atime field of the file, and shall return the 
> number of bytes read."

I think we have the "noatime" mount option to avoid this for JFFS2.  Or
some similar mount flag.  I don't remember offhand.

> 
> What's special with FIFOs ?

Fifos don't really hold data, they are just named pipes.  When you write
to it, it's mostly handled by the VFS.  The actual data isn't written
out by JFFS2.  Except that we have to update st_ctime and st_mtime,
which causes more nodes.

> 
> Why that's a problem on NAND flash especially ?

Because you can't directly obsolete a node on NAND flash (and some weird
versions of NOR flash as well).  So obsolete nodes are actually written
out to flash instead of just flipping a bit in the existing node.

josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 16:48                     ` Artem B. Bityuckiy
  2004-09-28 16:57                       ` Josh Boyer
@ 2004-09-28 16:58                       ` David Woodhouse
  2004-09-28 17:15                         ` Artem B. Bityuckiy
  2004-09-28 18:24                         ` Josh Boyer
  1 sibling, 2 replies; 20+ messages in thread
From: David Woodhouse @ 2004-09-28 16:58 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-09-28 at 20:48 +0400, Artem B. Bityuckiy wrote:
> > Yes, he does. The problem is that each time you access a FIFO you update
> > its ctime and mtime on the flash... which leads to a _lot_ of nodes,
> > only one of which is non-obsolete. That's a problem on NAND flash.
> 
> Things are same for regular files. This is from SUSv3 for write() call:
> 
> "Upon successful completion, where nbyte is greater than 0, write() 
> shall mark for update the st_ctime and st_mtime fields of the file"

Yeah, but we _know_ we're going to write to the flash when we write to
regular files. That's not necessarily intuitively true for FIFOs. You
expect your data to get to the other end of the FIFO... you don't
necessarily expect anything to be written to the flash.

> This is from SUSv3 fore read() call:
> 
> "Upon successful completion, where nbyte is greater than 0, read() shall 
> mark for update the st_atime field of the file, and shall return the 
> number of bytes read."

We don't do atime on JFFS2.

> Why that's a problem on NAND flash especially ?

On NOR we can scribble over the old nodes with the old mtime/ctime. On
NAND we can't so we end up with lots of nodes which are _potentially_
valid and which all have to be compared.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 16:58                       ` David Woodhouse
@ 2004-09-28 17:15                         ` Artem B. Bityuckiy
  2004-09-28 18:24                         ` Josh Boyer
  1 sibling, 0 replies; 20+ messages in thread
From: Artem B. Bityuckiy @ 2004-09-28 17:15 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd



David Woodhouse wrote:
> Yeah, but we _know_ we're going to write to the flash when we write to
> regular files. That's not necessarily intuitively true for FIFOs. You
> expect your data to get to the other end of the FIFO... you don't
> necessarily expect anything to be written to the flash.
 >
Josh Boyer wrote:
 > Fifos don't really hold data, they are just named pipes.  When you write
 > to it, it's mostly handled by the VFS.  The actual data isn't written
 > out by JFFS2.  Except that we have to update st_ctime and st_mtime,
 > which causes more nodes.
Yes.. I thought in contents of the optimization I spoke about and tried 
to understand this problem in that context. I spoke about iget() delay. 
But the FIFO issue is another. Ok, thanks for reply!

David Woodhouse wrote:
 > On NOR we can scribble over the old nodes with the old mtime/ctime. On
 > NAND we can't so we end up with lots of nodes which are _potentially_
 > valid and which all have to be compared.
Josh Boyer wrote:
 > Because you can't directly obsolete a node on NAND flash (and some weird
 > versions of NOR flash as well).  So obsolete nodes are actually written
 > out to flash instead of just flipping a bit in the existing node.

Yes, I must have guess this myself. :-)

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: JFFS2 an nodes checking
  2004-09-28 16:58                       ` David Woodhouse
  2004-09-28 17:15                         ` Artem B. Bityuckiy
@ 2004-09-28 18:24                         ` Josh Boyer
  1 sibling, 0 replies; 20+ messages in thread
From: Josh Boyer @ 2004-09-28 18:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Tue, 2004-09-28 at 11:58, David Woodhouse wrote:
> > This is from SUSv3 fore read() call:
> > 
> > "Upon successful completion, where nbyte is greater than 0, read() shall 
> > mark for update the st_atime field of the file, and shall return the 
> > number of bytes read."
> 
> We don't do atime on JFFS2.

Thinking more about this...  why can't we do something similar for
[cm]time and special files, like fifos?  I remember a conversation on
IRC, but I don't remember the result.

josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2004-09-28 18:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-28 12:29 JFFS2 an nodes checking Artem B. Bityuckiy
2004-09-28 12:37 ` David Woodhouse
2004-09-28 13:17   ` Artem B. Bityuckiy
2004-09-28 13:22     ` David Woodhouse
2004-09-28 13:37       ` Artem B. Bityuckiy
2004-09-28 13:45         ` David Woodhouse
2004-09-28 13:57           ` Artem B. Bityuckiy
2004-09-28 14:04             ` David Woodhouse
2004-09-28 14:26               ` Artem B. Bityuckiy
2004-09-28 14:37                 ` David Woodhouse
2004-09-28 14:58                   ` Artem B. Bityuckiy
2004-09-28 15:04                     ` David Woodhouse
2004-09-28 14:31               ` Josh Boyer
2004-09-28 14:47                 ` Artem B. Bityuckiy
2004-09-28 14:58                   ` David Woodhouse
2004-09-28 16:48                     ` Artem B. Bityuckiy
2004-09-28 16:57                       ` Josh Boyer
2004-09-28 16:58                       ` David Woodhouse
2004-09-28 17:15                         ` Artem B. Bityuckiy
2004-09-28 18:24                         ` Josh Boyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox