inode checkpoints

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* inode checkpoints
@ 2004-10-04 10:14 Artem Bityuckiy
  2004-10-04 10:22 ` David Woodhouse
  0 siblings, 1 reply; 29+ messages in thread
From: Artem Bityuckiy @ 2004-10-04 10:14 UTC (permalink / raw)
  To: linux-mtd; +Cc: dwmw2

Hello. We are working on the possibility to increase JFFS2 on NAND 
flashes and want to decrease the iget() call delay. We are going to 
introduce "inode checkpoints" which are special type of nodes. 
Checkpoints are created for the regular file inodes and directory inodes.

Here is the idea description.

-----------------------------------------------------------------------

Currently when JFFS2 builds the inode cache for a regular file (iget() 
call), it performs the following tasks (among others):

1 reads headers of all the file nodes;
2 reads all the file nodes data and checks CRC (if the data CRC wasn~Rt 
yet checked because this activity is only performed once for each file 
until unmount).

If the JFFS2 is used with the NAND flash type, the access to all the 
node headers which are distributed over different NAND flash pages might 
be very expensive (especially if the file is relatively big). It is more 
effective to have all these headers together in one inode checkpoint 
node and to read them all from few pages.

The second task is mainly needed to handle bad nodes and to perform roll 
backs. The roll back means, that if there are bad nodes, JFFS2 discards 
them and uses older ones. Thus, in case of unclean reboots, file data 
which have already been successfully written can not be corrupted by the 
subsequent broken writes. This mechanism doesn't guaranty the roll back 
in case of flash media corruption, just unclean reboots are handled.

Thus, it is important to check file data only once, and if it is OK, it 
isn't necessary to check it anymore. But currently, after every reboot 
all the files are rechecked. But with checkpoints, JFSS2 doesn't need to 
re-check nodes which are referred by the checkpoint, since they have 
already been checked when the checkpoint node was created. Thus, with 
checkpoints, fewer data have to be read and checked. This also decreases 
the iget() call delay.

When JFFS2 builds the inode cache for the directory inodes, it currently 
reads all the direntries, which are located in this directory. This also 
means that in case of directories with lots of direntries, JFFS2 reads 
lots of small direntry nodes from different NAND pages. Having the 
checkpoint for the directory, JFFS2 might read that information from 
much fewer pages.

The drawback of checkpoints is that the additional flash space is used. 
To save more space checkpoints are compressed.

-----------------------------------------------------------------------

Wha do JFFS2 folks think about the idea?

-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 10:14 inode checkpoints Artem Bityuckiy
@ 2004-10-04 10:22 ` David Woodhouse
  2004-10-04 10:36   ` [OBORONA-SPAM] " Artem B. Bityuckiy
  2004-10-04 11:44   ` Artem Bityuckiy
  0 siblings, 2 replies; 29+ messages in thread
From: David Woodhouse @ 2004-10-04 10:22 UTC (permalink / raw)
  To: dedekind; +Cc: linux-mtd

On Mon, 2004-10-04 at 14:14 +0400, Artem Bityuckiy wrote:
> Wha do JFFS2 folks think about the idea?

It seems to make a lot of sense. We could write such checkpoints only on
files which are large enough to warrant them, and we could tune the
frequency of such checkpoints. 

Of course as with many cute ideas it may turn out to be utterly
impractical when we implement it :) It does look sane though.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [OBORONA-SPAM]  Re: inode checkpoints
  2004-10-04 10:22 ` David Woodhouse
@ 2004-10-04 10:36   ` Artem B. Bityuckiy
  2004-10-04 12:34     ` Josh Boyer
  2004-10-04 11:44   ` Artem Bityuckiy
  1 sibling, 1 reply; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 10:36 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

> It seems to make a lot of sense. We could write such checkpoints only on
> files which are large enough to warrant them, and we could tune the
> frequency of such checkpoints.

The following are required features (as I think) of the checkpoint 
creation algorithm and strategy.
1 New checkpoint nodes must not be created if there is little free space 
left on the file system.
2 New checkpoint nodes ought to be created only for large inodes.
3 The checkpoint nodes creation process must not decrease the JFFS2 
performance very much.

Checkpoints should be created in background by the GC thread. GC begins 
creating checkpoints if the following is true:
1 there is no other work to do for GC (i.e., there is no need to check 
inodes and to garbage collect the dirt);
2 there is sufficient amount of free space on the file system

The checkpoint creation process consists of two major steps:
1 locating the most appropriate inode;
2 providing that the inode is found, write the checkpoint(s) for it.

Also, when there are few space left on the file system, GC may begin 
obsoleting checkpoints, starting from checkpoints for smaller files, 
etc. Thus, the spase may appear.

> 
> Of course as with many cute ideas it may turn out to be utterly
> impractical when we implement it :) It does look sane though.

I hope this will be practical if we implement this.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [OBORONA-SPAM]  Re: inode checkpoints
  2004-10-04 10:36   ` [OBORONA-SPAM] " Artem B. Bityuckiy
@ 2004-10-04 12:34     ` Josh Boyer
  2004-10-04 13:07       ` Artem B. Bityuckiy
  0 siblings, 1 reply; 29+ messages in thread
From: Josh Boyer @ 2004-10-04 12:34 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd, David Woodhouse

On Mon, 2004-10-04 at 05:36, Artem B. Bityuckiy wrote:
> > It seems to make a lot of sense. We could write such checkpoints only on
> > files which are large enough to warrant them, and we could tune the
> > frequency of such checkpoints.
> 
> The following are required features (as I think) of the checkpoint 
> creation algorithm and strategy.
> 1 New checkpoint nodes must not be created if there is little free space 
> left on the file system.
> 2 New checkpoint nodes ought to be created only for large inodes.

Can we extend this to be "files with a large amount of nodes"?  I'm
thinking of the fifo problem here too...

> 3 The checkpoint nodes creation process must not decrease the JFFS2 
> performance very much.
> 
> Checkpoints should be created in background by the GC thread. GC begins 
> creating checkpoints if the following is true:

eCos doesn't have a GC thread IIRC.  Not a big deal in my opinion, but
others might care :).

josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 12:34     ` Josh Boyer
@ 2004-10-04 13:07       ` Artem B. Bityuckiy
  2004-10-04 13:18         ` David Woodhouse
  0 siblings, 1 reply; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 13:07 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-mtd, David Woodhouse

> Can we extend this to be "files with a large amount of nodes"?  I'm
> thinking of the fifo problem here too...
>
:-)
FIFOs, blk/chr dvice files *always have only one valid node at a time* 
(right?). Thus, when building the inode cache, we may only read the node 
with the highest version (which is found by scanning the correspondent 
node_ref list) and read *only* this node. If it is OK (CRC is right), no 
need to read more. If CRC is bad, read the older one...  Checkpoints 
aren't needed, this is *another* improvement...

How about this scheme?

> 
> eCos doesn't have a GC thread IIRC.  Not a big deal in my opinion, but
> others might care :).
> 
> josh
.....
David Woodhouse:
 > I agree that it shouldn't be left to the GC thread alone -- there isn't
 > one on eCos and it's optional on Linux too.

Hmm, true.
Does eCos support threads (I didn't work with it)? If not, the situation 
is more complicated... I want to use thread (low-priority) in order to 
*not affect the system performance much*.

 >We can write out snapshots
 > in the normal course of garbage collection though, when we have
 > obsoleted a node belonging to the inode in question.
> 
I think the checkpoints shouldn't be created by GC (when it is actually 
garbage collecting) because of it will decrease the JFFS2 performance. 
GC is called on some write request when there is no space (excluding the 
situation when it is called in background by the GC thread). This, it is 
bad to load GC by additional work of creating checkpoints since it is 
long enough process....

What do you think?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 13:07       ` Artem B. Bityuckiy
@ 2004-10-04 13:18         ` David Woodhouse
  2004-10-04 13:32           ` Artem B. Bityuckiy
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: David Woodhouse @ 2004-10-04 13:18 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Mon, 2004-10-04 at 17:07 +0400, Artem B. Bityuckiy wrote:
> FIFOs, blk/chr dvice files *always have only one valid node at a time* 
> (right?). Thus, when building the inode cache, we may only read the node 
> with the highest version (which is found by scanning the correspondent 
> node_ref list) and read *only* this node. If it is OK (CRC is right), no 
> need to read more. If CRC is bad, read the older one...  Checkpoints 
> aren't needed, this is *another* improvement...
>
> How about this scheme?

Seems sane.
 
> David Woodhouse:
>  > I agree that it shouldn't be left to the GC thread alone -- there isn't
>  > one on eCos and it's optional on Linux too.
> 
> Hmm, true.
> Does eCos support threads (I didn't work with it)?

It can, but it's optional. Like the GC thread in JFFS2 is.

>  If not, the situation  is more complicated... I want to use thread
>  (low-priority) in order to *not affect the system performance much*.

It won't affect the system performance much _anyway_. Write it only when
you're _already_ writing out a node belonging to the inode in question,
and only when any existing snapshot for that inode is old enough that
it's worth writing a new one.
 
> I think the checkpoints shouldn't be created by GC (when it is actually 
> garbage collecting) because of it will decrease the JFFS2 performance. 
> GC is called on some write request when there is no space (excluding the 
> situation when it is called in background by the GC thread). This, it is 
> bad to load GC by additional work of creating checkpoints since it is 
> long enough process....
> 
> What do you think?

Hmmm. Maybe. But we can write a checkpoint node when we GC a previous
checkpoint node perhaps? And we can still write checkpoint nodes when
appropriate from the actual JFFS2 write routines.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 13:18         ` David Woodhouse
@ 2004-10-04 13:32           ` Artem B. Bityuckiy
  2004-10-04 13:46           ` Artem B. Bityuckiy
  2004-10-04 14:18           ` Artem B. Bityuckiy
  2 siblings, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 13:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> On Mon, 2004-10-04 at 17:07 +0400, Artem B. Bityuckiy wrote:
> 
>>FIFOs, blk/chr dvice files *always have only one valid node at a time* 
>>(right?). Thus, when building the inode cache, we may only read the node 
>>with the highest version (which is found by scanning the correspondent 
>>node_ref list) and read *only* this node. If it is OK (CRC is right), no 
>>need to read more. If CRC is bad, read the older one...  Checkpoints 
>>aren't needed, this is *another* improvement...
>>
>>How about this scheme?
> 
> 
> Seems sane.
>
So, this isn't the subject of my improvement. But it is nice we 
discussed this too - somebody could implement this. :-)

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 13:18         ` David Woodhouse
  2004-10-04 13:32           ` Artem B. Bityuckiy
@ 2004-10-04 13:46           ` Artem B. Bityuckiy
  2004-10-04 14:18           ` Artem B. Bityuckiy
  2 siblings, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 13:46 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

>>I think the checkpoints shouldn't be created by GC (when it is actually 
>>garbage collecting) because of it will decrease the JFFS2 performance. 
>>GC is called on some write request when there is no space (excluding the 
>>situation when it is called in background by the GC thread). This, it is 
>>bad to load GC by additional work of creating checkpoints since it is 
>>long enough process....
>>
>>What do you think?
> 
> 
> Hmmm. Maybe. But we can write a checkpoint node when we GC a previous
> checkpoint node perhaps? And we can still write checkpoint nodes when
> appropriate from the actual JFFS2 write routines.
> 
I have the following reasons why it is bad to include checkpoint 
writings when garbage collecting:

1. If there is no (or few) space and the GC is called, it is better to 
*obsolete* checkpoints, not to create new ones. :-) So, if there are no 
threads, I don't know how to create checkpoints from GC... In this case 
it is better to create them while writing... But this is additional 
delay for write operations...

So, I propose either:
1. Introduce the option which enables the checkpoint support and 
automatically enables the GC thread.
2. Just use distinct thread for checkpoints.

Do you have other ideas?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 13:18         ` David Woodhouse
  2004-10-04 13:32           ` Artem B. Bityuckiy
  2004-10-04 13:46           ` Artem B. Bityuckiy
@ 2004-10-04 14:18           ` Artem B. Bityuckiy
  2004-10-04 14:23             ` David Woodhouse
  2 siblings, 1 reply; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 14:18 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

Of course, ICPs shouldn't contain full node headers, only those 
information which is needed for the inode cache creation should be includes.

I'm imaging the following data structures for ICPs...

------------------------------------------------------------
      Regular file inode checkpoints (FICP)
------------------------------------------------------------

The FICP are represented by the following structure:

struct jffs2_raw_ficp {
     /* standard JFFS2 node header fields */
     jint32_t magic;
     jint32_t nodetype;
     jint32_t totlen;
     jint32_t hdr_crc;
     jint32_t version;  /* the checkpoint version */
     jint32_t highest_version;  /* the highest version of node, 
described by this checkpoint */
     jint32_t lowest_version;  /* the lowest version of node, described 
by this checkpoint */
     jint32_t dsize;  /* checkpoint's data size in uncompressed form */
     jint32_t ino;  /* the inode described by this checkpoint */
     jint32_t num;  /* the number of nodes, described by this checkpoint */
     uint8_t compr;  /* the checkpoint node data compression type; 
standard values are used */
     uint8_t unused[3];  /* just padding */
     jint32_t data_crc;  /* the CRC checksum of the checkpoint data */
     jint32_t node_crc;  /* the CRC checksum of the checkpoint object 
without data */
     struct jffs2_raw_ficp_entry data[0];  /* */
};

Each node, described by the FICP is represented by the following 
structure (fields are the same as in the jffs2_raw_inode structure).

struct jffs2_raw_ficp_entry {
     jint32_t offset; /* the offset in the regular file, to which the 
changes which are introduced by this node belong */
     jint32_t version; /* the node version  */
     jint32_t size; /* size of the data belonging to the node (in 
uncompressed form) */
};

------------------------------------------------------------
Directory inode checkpoints (DICP)
------------------------------------------------------------

The directory checkpoint is represented by the following structure:
struct jffs2_raw_dicp {
     /* standard JFFS2 node header fields */
     jint32_t magic;
     jint32_t nodetype;
     jint32_t totlen;
     jint32_t hdr_crc;
     jint32_t version; /* the checkpoint version */
     jint32_t highest_version; /* the lowest version of the direntry 
node, described by this checkpoint */
     jint32_t lowest_version; /* the lowest version of the direntry 
node, described by this checkpoint */
     jint32_t dsize; /* checkpoint's data size in uncompressed form */
     jint32_t ino /* the inode number of directory, which is described 
by this checkpoint */;
     jint32_t num; /* the number of direntry nodes, described by this 
checkpoint */
     uint8_t compr; /* the checkpoint node data compression type; 
standard values are used */
     uint8_t unused[3]; /* ust padding */
     jint32_t data_crc; /* the CRC checksum of the checkpoint data */
     jint32_t node_crc; /* the CRC checksum of the checkpoint object 
without data */
     struct jffs2_raw_dicp_entry data[0];
};

Each direntry node, described by the directory checkpoint is represented 
by the following structure (fields are the same as in the 
jffs2_raw_dirent structure).

struct jffs2_raw_dicp_entry {
     jint32_t ino; /* the inode number of the target file (i.e., of the 
file to which this direntry */
     jint32_t version; /* the direntry node version */
     uint8_t size; /* the direntry name length */
     uint8_t type; /* the type of the target file (the same as the type 
field in the struct */
     uint8_t name[0] /* the direntry name */
} __attribute__((packed));



----------------------------------------------------

For FICP and DICP new node type identifiers ought to be introduced. 
These identifiers are defined as:

#define JFFS2_NODETYPE_FICP \
     (JFFS2_FEATURE_RWCOMPAT_DELETE | JFFS2_NODE_ACCURATE | 5)

#define JFFS2_NODETYPE_DICP \
     (JFFS2_FEATURE_RWCOMPAT_DELETE | JFFS2_NODE_ACCURATE | 6)

----------------------------------------------------

Comments ?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 14:18           ` Artem B. Bityuckiy
@ 2004-10-04 14:23             ` David Woodhouse
  2004-10-04 15:07               ` Artem B. Bityuckiy
  2004-10-05 14:07               ` Artem B. Bityuckiy
  0 siblings, 2 replies; 29+ messages in thread
From: David Woodhouse @ 2004-10-04 14:23 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Mon, 2004-10-04 at 18:18 +0400, Artem B. Bityuckiy wrote:
>      jint32_t data_crc;  /* the CRC checksum of the checkpoint data */
>      jint32_t node_crc;  /* the CRC checksum of the checkpoint object without data */

You probably don't need to separate these. You'll only ever want to read
the _whole_ thing anyway, surely?

Other than that it looks sane enough at first glance. It looks like it
_will_ be useful enough for NOR flash too. I'd rather not have two
implementations if this is good enough for both.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 14:23             ` David Woodhouse
@ 2004-10-04 15:07               ` Artem B. Bityuckiy
  2004-10-05 14:07               ` Artem B. Bityuckiy
  1 sibling, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 15:07 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> On Mon, 2004-10-04 at 18:18 +0400, Artem B. Bityuckiy wrote:
> 
>>     jint32_t data_crc;  /* the CRC checksum of the checkpoint data */
>>     jint32_t node_crc;  /* the CRC checksum of the checkpoint object without data */
> 
> 
> You probably don't need to separate these. You'll only ever want to read
> the _whole_ thing anyway, surely?

No...

Let's discuss when ICP is obsolete...

Introduce definition of ICP entry: checkpoints describe nodes and 
contain a number of checkpoint entries; each such entry describes one 
node. FICP entries are described by the struct jffs2_raw_ficp_entry 
structure (see the prev. letter :-) ). DICP entries are described by the 
struct jffs2_raw_dicp_entry structure. ICP entry is said to be valid or 
obsolete if the correspondent node is valid or obsolete.

Also, a pre-note: each ICP has the correspondent jffs2_node_ref 
structure which is always in-core. Since the list of node_refs belonging 
to one inode isn't sorted, the ICP node_ref object may be anywhere in 
the list. And to find the ICP's node_ref, the list may be scanned.

One more global idea is that I don't want to introduce more data 
structures which are in-core. This is the JFFS2 drawback that it eats 
more memory if more files/nodes are added to the file system... So, I 
don't want to enlarge this drawback... So, I want to have in-core only 
one more node_ref object for each ICP node... Don't sure this is 
possible, but it would be nice (agree?)

OK, A checkpoint node may be obsolete because of two reasons:

1 Newer checkpoint node exists which obsoletes the checkpoint; such 
checkpoints are detected by the version number;
2 All the nodes which are described by the checkpoint were updated or 
deleted and hence, the checkpoint isn't valid anymore; to detect such 
obsolete checkpoints, the node_ref array is scanned; if there is no more 
then ICP_MINNODES (some constant) valid ICP entries, the checkpoint is 
obsolete; to check how many valid nodes are described by the checkpoint, 
the lowest_version and highest_version fields are used;

ICP_MINNODES means the minumum number of ICP entries.

So, let us suppose we are going to garbage collect ICP. First, we want 
to detect, if the ICP valid or obsolete. For this purpose, we need to 
know the highest_version and lowest_version (thus, read the ICP header). 
Then run through the node_ref list and find out how many *valid* node 
entries the old ICP has (I suppose there is no inode cache for the 
inode, referred by the ICP). If this number is too small, don't remove 
the ICP, just move it to the other block...

We would keep the highest_version and lowest_version (may be other too) 
fields in-core, but again, it seems bad to me... So, we need to read ICP 
headers sometimes.

Thus, I think it will be needed to have the separate data_crc and node_crc.

> 
> Other than that it looks sane enough at first glance. It looks like it
> _will_ be useful enough for NOR flash too. I'd rather not have two
> implementations if this is good enough for both.

Too many differences... For example, the direntry ICPs *aren't needed* 
at all. Data structure for regfiles is much more simpler (no data[]) 
array at all if to see to my structures).

There is a big issue how to split big ICPs... for NOR ICPs are always 
small. May be it is better to write them directly on iget() request 
since they are small...

And so on so on.
:-)

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 14:23             ` David Woodhouse
  2004-10-04 15:07               ` Artem B. Bityuckiy
@ 2004-10-05 14:07               ` Artem B. Bityuckiy
  2004-10-05 16:45                 ` David Woodhouse
  1 sibling, 1 reply; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-05 14:07 UTC (permalink / raw)
  To: David Woodhouse, Josh Boyer; +Cc: linux-mtd

Hello David, Josh. What did we decided about the GC thread?

I offered to create checkpoints from the GC thread. Are there some 
reasons not to do so? There was one - the GC thread isn't always 
enabled. But we may require it be enabled if checkpoints are enabled...

What do you think?

Or may be its better to use distinct thread ?

May be we create ICP during GC process, or on iget() call, or on iput() 
call? But I have arguments against these...

Are there some comments/advices?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-05 14:07               ` Artem B. Bityuckiy
@ 2004-10-05 16:45                 ` David Woodhouse
  2004-10-05 17:20                   ` Josh Boyer
  0 siblings, 1 reply; 29+ messages in thread
From: David Woodhouse @ 2004-10-05 16:45 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

On Tue, 2004-10-05 at 18:07 +0400, Artem B. Bityuckiy wrote:
> Hello David, Josh. What did we decided about the GC thread?
> 
> I offered to create checkpoints from the GC thread. Are there some 
> reasons not to do so? There was one - the GC thread isn't always 
> enabled. But we may require it be enabled if checkpoints are enabled...

Checkpoints are an optimisation -- so is the GC thread. I'm perfectly
happy with the idea that killing the GC thread also stops checkpoints
from happening.

> May be we create ICP during GC process, or on iget() call, or on
> iput() call? But I have arguments against these...

I was thinking of doing it during a normal write to the file, or during
garbage collection of a node belonging to the file.

But I think we can play with ideas on _when_ to call this once you've
actually implemented the functions which does it :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-05 16:45                 ` David Woodhouse
@ 2004-10-05 17:20                   ` Josh Boyer
  2004-10-06  9:07                     ` Artem B. Bityuckiy
  2004-10-09 11:45                     ` Artem B. Bityuckiy
  0 siblings, 2 replies; 29+ messages in thread
From: Josh Boyer @ 2004-10-05 17:20 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Tue, 2004-10-05 at 11:45, David Woodhouse wrote:
> On Tue, 2004-10-05 at 18:07 +0400, Artem B. Bityuckiy wrote:
> > Hello David, Josh. What did we decided about the GC thread?
> > 
> > I offered to create checkpoints from the GC thread. Are there some 
> > reasons not to do so? There was one - the GC thread isn't always 
> > enabled. But we may require it be enabled if checkpoints are enabled...
> 
> Checkpoints are an optimisation -- so is the GC thread. I'm perfectly
> happy with the idea that killing the GC thread also stops checkpoints
> from happening.

Yep, me too.

> 
> > May be we create ICP during GC process, or on iget() call, or on
> > iput() call? But I have arguments against these...
> 
> I was thinking of doing it during a normal write to the file, or during
> garbage collection of a node belonging to the file.
> 
> But I think we can play with ideas on _when_ to call this once you've
> actually implemented the functions which does it :)

Right.  Lots of places it could be done.  Start with one and work from
there :).

josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-05 17:20                   ` Josh Boyer
@ 2004-10-06  9:07                     ` Artem B. Bityuckiy
  2004-10-09 11:45                     ` Artem B. Bityuckiy
  1 sibling, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-06  9:07 UTC (permalink / raw)
  To: Josh Boyer, David Woodhouse; +Cc: linux-mtd

>>>May be we create ICP during GC process, or on iget() call, or on
>>>iput() call? But I have arguments against these...
>>
>>I was thinking of doing it during a normal write to the file, or during
>>garbage collection of a node belonging to the file.
>>
>>But I think we can play with ideas on _when_ to call this once you've
>>actually implemented the functions which does it :)
> 
> 
> Right.  Lots of places it could be done.  Start with one and work from
> there :).
> 
Ok, I got it. I'll try to keep in my mind the possibility to call the 
checkpoint creation function from different places.

Thanks.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-05 17:20                   ` Josh Boyer
  2004-10-06  9:07                     ` Artem B. Bityuckiy
@ 2004-10-09 11:45                     ` Artem B. Bityuckiy
  2004-10-09 11:58                       ` Artem B. Bityuckiy
                                         ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-09 11:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

Hello Guys.

Unfortunately, I hit on very painful trouble while trying to 
design/implement checkpoints (ICP) ...

Obviously, ICP must contain all information, which is required to build 
the inode cache. These are: offset, size and *version*. The problem is 
in version.

At first I thought that in order to create ICP for some regfile inode, 
we just need to be sure that the inode cache for this regfile exists. 
But! I've overseen the fact that *there is no versions in the inode 
cache* :-(

The inode cache (struct jffs2_inode_info) contains fragtree which has 
struct jffs2_full_dnode objects. But these objects have no version field...

So, I see two ways out:

1. Add the version field to the struct jffs2_full_dnode objects.
This also means that the struct jffs2_tmp_dnode_info structure won't be 
needed anymore.

The advantage of such approach is that the JFFS2 will be more simple 
since one data structure will be removed. I like the KISS principle.

The drawback is that the inode cache will eat more memory. But this 
isn't in-core object, just cache, so I don't think this is a big 
disadvantage.

2. Don't use the inode cache at all. This is bad because in order to 
build the inode cache we'll need to read *all* the node headers, even if 
there is existing inode cache.

I believe this approach is bad and too heavyweight. Moreover, just 
imagine the situation when the GC has fount an ICP ant wants to 
determine if it valid or obsolete. Obsolete means that (1) there is 
newer ICP with higher version (simple case) or (2) all (or most) the 
nodes which are described by the ICP aren't valid anymore (complicated 
case). The second case means that we must read the ICPs lowest_version 
and highest_version and count how many valid nodes with versions within 
this interval exist. But we have NO versions even if the inode cache is 
present. I don't think it is a good idea if the GC will head all the 
node's headers...

What do you guys think? Any advices/ideas?

Thanks.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-09 11:45                     ` Artem B. Bityuckiy
@ 2004-10-09 11:58                       ` Artem B. Bityuckiy
  2004-10-09 13:01                       ` Artem B. Bityuckiy
  2004-10-09 13:22                       ` Artem B. Bityuckiy
  2 siblings, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-09 11:58 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd, David Woodhouse

Artem B. Bityuckiy wrote:
> Hello Guys.
> 
> Unfortunately, I hit on very painful trouble while trying to 
> design/implement checkpoints (ICP) ...
> 
> Obviously, ICP must contain all information, which is required to build 
> the inode cache. These are: offset, size and *version*. The problem is 
> in version.
> 
> At first I thought that in order to create ICP for some regfile inode, 
> we just need to be sure that the inode cache for this regfile exists. 
> But! I've overseen the fact that *there is no versions in the inode 
> cache* :-(
> 
> The inode cache (struct jffs2_inode_info) contains fragtree which has 
> struct jffs2_full_dnode objects. But these objects have no version field...
> 
> So, I see two ways out:
> 
> 1. Add the version field to the struct jffs2_full_dnode objects.
> This also means that the struct jffs2_tmp_dnode_info structure won't be 
> needed anymore.
> 
> The advantage of such approach is that the JFFS2 will be more simple 
> since one data structure will be removed. I like the KISS principle.
> 
> The drawback is that the inode cache will eat more memory. But this 
> isn't in-core object, just cache, so I don't think this is a big 
> disadvantage.
> 
> 2. Don't use the inode cache at all
when building ICP.
> This is bad because in order to 
> build the inode cache
 > soory, not inode cache, I meant "build ICP"
> we'll need to read *all* the node headers, even if 
> there is existing inode cache.
> 
> I believe this approach is bad and too heavyweight. Moreover, just 
> imagine the situation when the GC has fount an ICP ant wants to 
> determine if it valid or obsolete. Obsolete means that (1) there is 
> newer ICP with higher version (simple case) or (2) all (or most) the 
> nodes which are described by the ICP aren't valid anymore (complicated 
> case). The second case means that we must read the ICPs lowest_version 
> and highest_version and count how many valid nodes with versions within 
> this interval exist. But we have NO versions even if the inode cache is 
> present. I don't think it is a good idea if the GC will head all the 
> node's headers...
> 
> What do you guys think? Any advices/ideas?
> 
> Thanks.
> 

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-09 11:45                     ` Artem B. Bityuckiy
  2004-10-09 11:58                       ` Artem B. Bityuckiy
@ 2004-10-09 13:01                       ` Artem B. Bityuckiy
  2004-10-09 14:48                         ` Artem B. Bityuckiy
  2004-10-09 13:22                       ` Artem B. Bityuckiy
  2 siblings, 1 reply; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-09 13:01 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

Artem B. Bityuckiy wrote:
> Hello Guys.
> 
> Unfortunately, I hit on very painful trouble while trying to 
> design/implement checkpoints (ICP) ...
> 
> Obviously, ICP must contain all information, which is required to build 
> the inode cache. These are: offset, size and *version*. The problem is 
> in version.
> 
> At first I thought that in order to create ICP for some regfile inode, 
> we just need to be sure that the inode cache for this regfile exists. 
> But! I've overseen the fact that *there is no versions in the inode 
> cache* :-(
> 
> The inode cache (struct jffs2_inode_info) contains fragtree which has 
> struct jffs2_full_dnode objects. But these objects have no version field...
> 
> So, I see two ways out:
> 
> 1. Add the version field to the struct jffs2_full_dnode objects.
> This also means that the struct jffs2_tmp_dnode_info structure won't be 
> needed anymore.
> 
> The advantage of such approach is that the JFFS2 will be more simple 
> since one data structure will be removed. I like the KISS principle.
> 
> The drawback is that the inode cache will eat more memory. But this 
> isn't in-core object, just cache, so I don't think this is a big 
> disadvantage.
Moreover, there is the 'ofs' field in the jffs2_full_dnode which isn't 
really needed since the sabe information is in the raw->flash_offset 
field. So, we can for example add 'version' and remove 'ofs' field. The 
size won't change, the JFFS2 will be a bit simpler. Only small overhead 
(need to dereference pointer to get the ofs) will be introduced.

> 
> 2. Don't use the inode cache at all. This is bad because in order to 
> build the inode cache we'll need to read *all* the node headers, even if 
> there is existing inode cache.
> 
> I believe this approach is bad and too heavyweight. Moreover, just 
> imagine the situation when the GC has fount an ICP ant wants to 
> determine if it valid or obsolete. Obsolete means that (1) there is 
> newer ICP with higher version (simple case) or (2) all (or most) the 
> nodes which are described by the ICP aren't valid anymore (complicated 
> case). The second case means that we must read the ICPs lowest_version 
> and highest_version and count how many valid nodes with versions within 
> this interval exist. But we have NO versions even if the inode cache is 
> present. I don't think it is a good idea if the GC will head all the 
> node's headers...
> 
> What do you guys think? Any advices/ideas?
> 
> Thanks.
> 

-- 
Best Regards
Artem B. Bityckiy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-09 13:01                       ` Artem B. Bityuckiy
@ 2004-10-09 14:48                         ` Artem B. Bityuckiy
  0 siblings, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-09 14:48 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

> Moreover, there is the 'ofs' field in the jffs2_full_dnode which isn't 
> really needed since the sabe information is in the raw->flash_offset 
> field. So, we can for example add 'version' and remove 'ofs' field. The 
> size won't change, the JFFS2 will be a bit simpler. Only small overhead 
> (need to dereference pointer to get the ofs) will be introduced.
> 
Sorry again (:-)), this is false. 'ofs' field is another offset and is 
needed.
I was just confused by the comment "/* Don't really need this, but 
optimisation */"...

Here is the patch (just for demonstration) to move the version field 
from tmp_dnode_info to full_dnode structure. It is quite simple. But it 
would be very useful fir checkpoints. It works at least with my simple 
JFFS2 test.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.



diff -auNr mtd-snapshot-20041008/fs/jffs2/nodelist.c 
mtd-version-move/fs/jffs2/nodelist.c
--- mtd-snapshot-20041008/fs/jffs2/nodelist.c	2003-11-01 
02:00:35.000000000 +0300
+++ mtd-version-move/fs/jffs2/nodelist.c	2004-10-09 17:50:43.325035342 +0400
@@ -62,7 +62,7 @@
  {
  	struct jffs2_tmp_dnode_info **prev = list;
  	
-	while ((*prev) && (*prev)->version < tn->version) {
+	while ((*prev) && (*prev)->fn->version < tn->fn->version) {
  		prev = &((*prev)->next);
  	}
  	tn->next = (*prev);
@@ -363,7 +363,7 @@
  				jffs2_free_tmp_dnode_info(tn);
  				goto free_out;
  			}
-			tn->version = je32_to_cpu(node.i.version);
+			tn->fn->version = je32_to_cpu(node.i.version);
  			tn->fn->ofs = je32_to_cpu(node.i.offset);
  			/* There was a bug where we wrote hole nodes out with
  			   csize/dsize swapped. Deal with it */
diff -auNr mtd-snapshot-20041008/fs/jffs2/nodelist.h 
mtd-version-move/fs/jffs2/nodelist.h
--- mtd-snapshot-20041008/fs/jffs2/nodelist.h	2004-10-08 
02:00:14.000000000 +0400
+++ mtd-version-move/fs/jffs2/nodelist.h	2004-10-09 17:54:20.785755572 +0400
@@ -167,8 +167,8 @@
  	uint32_t size;
  	uint32_t frags; /* Number of fragments which currently refer
  			to this node. When this reaches zero,
-			the node is obsolete.
-		     */
+			the node is obsolete.  */
+	uint32_t version;
  };

  /*
@@ -180,7 +180,6 @@
  {
  	struct jffs2_tmp_dnode_info *next;
  	struct jffs2_full_dnode *fn;
-	uint32_t version;
  };

  struct jffs2_full_dirent
diff -auNr mtd-snapshot-20041008/fs/jffs2/readinode.c 
mtd-version-move/fs/jffs2/readinode.c
--- mtd-snapshot-20041008/fs/jffs2/readinode.c	2003-11-04 
02:00:36.000000000 +0300
+++ mtd-version-move/fs/jffs2/readinode.c	2004-10-09 17:48:25.354322429 
+0400
@@ -526,7 +526,7 @@
  		fn = tn->fn;

  		if (f->metadata) {
-			if (likely(tn->version >= mdata_ver)) {
+			if (likely(tn->fn->version >= mdata_ver)) {
  				D1(printk(KERN_DEBUG "Obsoleting old metadata at 0x%08x\n", 
ref_offset(f->metadata->raw)));
  				jffs2_mark_node_obsolete(c, f->metadata->raw);
  				jffs2_free_full_dnode(f->metadata);
@@ -536,7 +536,7 @@
  			} else {
  				/* This should never happen. */
  				printk(KERN_WARNING "Er. New metadata at 0x%08x with ver %d is 
actually older than previous ver %d at 0x%08x\n",
-					  ref_offset(fn->raw), tn->version, mdata_ver, 
ref_offset(f->metadata->raw));
+					  ref_offset(fn->raw), tn->fn->version, mdata_ver, 
ref_offset(f->metadata->raw));
  				jffs2_mark_node_obsolete(c, fn->raw);
  				jffs2_free_full_dnode(fn);
  				/* Fill in latest_node from the metadata, not this one we're about 
to free... */
@@ -549,9 +549,9 @@
  			jffs2_add_full_dnode_to_inode(c, f, fn);
  		} else {
  			/* Zero-sized node at end of version list. Just a metadata update */
-			D1(printk(KERN_DEBUG "metadata @%08x: ver %d\n", 
ref_offset(fn->raw), tn->version));
+			D1(printk(KERN_DEBUG "metadata @%08x: ver %d\n", 
ref_offset(fn->raw), tn->fn->version));
  			f->metadata = fn;
-			mdata_ver = tn->version;
+			mdata_ver = tn->fn->version;
  		}
  	next_tn:
  		tn_list = tn->next;


-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-09 11:45                     ` Artem B. Bityuckiy
  2004-10-09 11:58                       ` Artem B. Bityuckiy
  2004-10-09 13:01                       ` Artem B. Bityuckiy
@ 2004-10-09 13:22                       ` Artem B. Bityuckiy
  2 siblings, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-09 13:22 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

Artem B. Bityuckiy wrote:
> Hello Guys.
> 
> Unfortunately, I hit on very painful trouble while trying to 
> design/implement checkpoints (ICP) ...
> 
> Obviously, ICP must contain all information, which is required to build 
> the inode cache. These are: offset, size and *version*. The problem is 
> in version.
> 
> At first I thought that in order to create ICP for some regfile inode, 
> we just need to be sure that the inode cache for this regfile exists. 
> But! I've overseen the fact that *there is no versions in the inode 
> cache* :-(
> 
> The inode cache (struct jffs2_inode_info) contains fragtree which has 
> struct jffs2_full_dnode objects. But these objects have no version field...
> 
> So, I see two ways out:
> 
> 1. Add the version field to the struct jffs2_full_dnode objects.
> This also means that the struct jffs2_tmp_dnode_info structure won't be 
> needed anymore.
> 
> The advantage of such approach is that the JFFS2 will be more simple 
> since one data structure will be removed. I like the KISS principle.
Ok, guys, sorry, I've hurried saying that the struct 
jffs2_tmp_dnode_info won't be needed... It is needed to construct lists 
of struct jffs2_full_dnode objects...

Anyway, my problem is still actual.

> 
> The drawback is that the inode cache will eat more memory. But this 
> isn't in-core object, just cache, so I don't think this is a big 
> disadvantage.
> 
> 2. Don't use the inode cache at all. This is bad because in order to 
> build the inode cache we'll need to read *all* the node headers, even if 
> there is existing inode cache.
> 
> I believe this approach is bad and too heavyweight. Moreover, just 
> imagine the situation when the GC has fount an ICP ant wants to 
> determine if it valid or obsolete. Obsolete means that (1) there is 
> newer ICP with higher version (simple case) or (2) all (or most) the 
> nodes which are described by the ICP aren't valid anymore (complicated 
> case). The second case means that we must read the ICPs lowest_version 
> and highest_version and count how many valid nodes with versions within 
> this interval exist. But we have NO versions even if the inode cache is 
> present. I don't think it is a good idea if the GC will head all the 
> node's headers...
> 
> What do you guys think? Any advices/ideas?
> 
> Thanks.
> 

-- 
Best Regards
Artem B. Bityckiy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 10:22 ` David Woodhouse
  2004-10-04 10:36   ` [OBORONA-SPAM] " Artem B. Bityuckiy
@ 2004-10-04 11:44   ` Artem Bityuckiy
  2004-10-04 12:36     ` Josh Boyer
  1 sibling, 1 reply; 29+ messages in thread
From: Artem Bityuckiy @ 2004-10-04 11:44 UTC (permalink / raw)
  To: linux-mtd; +Cc: David Woodhouse

David Woodhouse wrote:
> On Mon, 2004-10-04 at 14:14 +0400, Artem Bityuckiy wrote:
> 
>>Wha do JFFS2 folks think about the idea?
> 
> 
> It seems to make a lot of sense. We could write such checkpoints only on
> files which are large enough to warrant them, and we could tune the
> frequency of such checkpoints. 
> 
> Of course as with many cute ideas it may turn out to be utterly
> impractical when we implement it :) It does look sane though.
> 
It should also be noted, that the checkpoints support will be added 
under the "#ifdef CONFIG_JFFS2_FS_NAND" directive since it only makes 
sence for the NAND flash devices...


-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 11:44   ` Artem Bityuckiy
@ 2004-10-04 12:36     ` Josh Boyer
  2004-10-04 12:43       ` David Woodhouse
  2004-10-04 13:26       ` [OBORONA-SPAM] " Artem B. Bityuckiy
  0 siblings, 2 replies; 29+ messages in thread
From: Josh Boyer @ 2004-10-04 12:36 UTC (permalink / raw)
  To: Artem Bityuckiy; +Cc: David Woodhouse, linux-mtd

On Mon, 2004-10-04 at 06:44, Artem Bityuckiy wrote:
> > 
> It should also be noted, that the checkpoints support will be added 
> under the "#ifdef CONFIG_JFFS2_FS_NAND" directive since it only makes 
> sence for the NAND flash devices...
> 

Yes and no.  It makes sense for any flash device that you can't write to
the same page multiple times without an erase in between.  NAND is
certainly the largest group of such devices.  There are others that have
this behavior as well.  See cfi_cmdset_0020.c.

Although, JFFS2 doesn't support those chips right now, so I guess I am
just rambling :).

josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 12:36     ` Josh Boyer
@ 2004-10-04 12:43       ` David Woodhouse
  2004-10-04 13:26       ` [OBORONA-SPAM] " Artem B. Bityuckiy
  1 sibling, 0 replies; 29+ messages in thread
From: David Woodhouse @ 2004-10-04 12:43 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Artem Bityuckiy, linux-mtd

On Mon, 2004-10-04 at 07:36 -0500, Josh Boyer wrote:
> Yes and no.  It makes sense for any flash device that you can't write to
> the same page multiple times without an erase in between.  NAND is
> certainly the largest group of such devices.  There are others that have
> this behavior as well.  See cfi_cmdset_0020.c.
> 
> Although, JFFS2 doesn't support those chips right now, so I guess I am
> just rambling :).

There's more chips like that on the way too. It'll be 

	if (!jffs2_can_mark_obsolete(c))

and it'll probably also be in the ifdef, although we may actually change
the name of that configuration option.

I agree that it shouldn't be left to the GC thread alone -- there isn't
one on eCos and it's optional on Linux too. We can write out snapshots
in the normal course of garbage collection though, when we have
obsoleted a node belonging to the inode in question.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [OBORONA-SPAM]  Re: inode checkpoints
  2004-10-04 12:36     ` Josh Boyer
  2004-10-04 12:43       ` David Woodhouse
@ 2004-10-04 13:26       ` Artem B. Bityuckiy
  2004-10-04 13:39         ` Josh Boyer
  1 sibling, 1 reply; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 13:26 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Artem Bityuckiy, David Woodhouse, linux-mtd

> 
> Yes and no.  It makes sense for any flash device that you can't write to
> the same page multiple times without an erase in between.  NAND is
> certainly the largest group of such devices.  There are others that have
> this behavior as well.  See cfi_cmdset_0020.c.

The original idea of ICPs (inode checkpoint) is to improve JFFS2 
performance on NAND not because the page can't be written lots of times.
The NAND flash specific is that there is constant delay when accessing 
any page.

For example consider TOSHIBA TC5DxM82A1xxxx 8-bit 256 Mbit NAND flash.
Page access delay is 25 microsecs.
Read cycle is 50 nanosecs.

When you read 3 bytes from this NAND flash and these bytes are situated 
in one page (consequently), you will spent about 25 microsecs + 50 * 3 
nanosecs.
But when you read those 3 bytes from the same NAND flash and these bytes 
are situated in three different pages, you will spent about 25*3 
microsecs + 50 * 3 nanosecs.

So, when the inode cache is built, we read node headers from *different 
pages*. But if there is checkpoint, we read from few pages. This is the 
main idea.

But in case of regular file inodes we also save *additional* time since 
we don't check the *data* CRC for those nodes, which are in 
checkpoint(s). In case of the directory inode there is no such 
additional time savings.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [OBORONA-SPAM]  Re: inode checkpoints
  2004-10-04 13:26       ` [OBORONA-SPAM] " Artem B. Bityuckiy
@ 2004-10-04 13:39         ` Josh Boyer
  2004-10-04 13:56           ` Artem Bityuckiy
  0 siblings, 1 reply; 29+ messages in thread
From: Josh Boyer @ 2004-10-04 13:39 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Artem Bityuckiy, David Woodhouse, linux-mtd

On Mon, 2004-10-04 at 08:26, Artem B. Bityuckiy wrote:
> > 
> > Yes and no.  It makes sense for any flash device that you can't write to
> > the same page multiple times without an erase in between.  NAND is
> > certainly the largest group of such devices.  There are others that have
> > this behavior as well.  See cfi_cmdset_0020.c.
> 
> The original idea of ICPs (inode checkpoint) is to improve JFFS2 
> performance on NAND not because the page can't be written lots of times.
> The NAND flash specific is that there is constant delay when accessing 
> any page.
> 
> For example consider TOSHIBA TC5DxM82A1xxxx 8-bit 256 Mbit NAND flash.
> Page access delay is 25 microsecs.
> Read cycle is 50 nanosecs.
> 
> When you read 3 bytes from this NAND flash and these bytes are situated 
> in one page (consequently), you will spent about 25 microsecs + 50 * 3 
> nanosecs.
> But when you read those 3 bytes from the same NAND flash and these bytes 
> are situated in three different pages, you will spent about 25*3 
> microsecs + 50 * 3 nanosecs.

Yep.

> 
> So, when the inode cache is built, we read node headers from *different 
> pages*. But if there is checkpoint, we read from few pages. This is the 
> main idea.
> 
> But in case of regular file inodes we also save *additional* time since 
> we don't check the *data* CRC for those nodes, which are in 
> checkpoint(s). In case of the directory inode there is no such 
> additional time savings.

I agree that these changes will benefit NAND the most, but they are also
valuable for all flashes.  

Say I have a 16 MiB file on my NOR flash.  When a stat or open is done,
iget is called, which calls readinode, which then goes and reads all the
nodes off of the flash to build up the inode information.  I've seen
that take upwards of 2 seconds, just because of the number of data nodes
that have to be read.

Now, same situation, but I have a checkpoint node out there.  I don't
have to read _all_ the data nodes because the checkpoint gives me a
snapshot of at least some of them, right?

Or maybe I am extending your idea a bit further than what you
intended...

josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 13:39         ` Josh Boyer
@ 2004-10-04 13:56           ` Artem Bityuckiy
  2004-10-04 14:06             ` Artem B. Bityuckiy
  0 siblings, 1 reply; 29+ messages in thread
From: Artem Bityuckiy @ 2004-10-04 13:56 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-mtd, David Woodhouse

> 
> 
> I agree that these changes will benefit NAND the most, but they are also
> valuable for all flashes.  
> 
> Say I have a 16 MiB file on my NOR flash.  When a stat or open is done,
> iget is called, which calls readinode, which then goes and reads all the
> nodes off of the flash to build up the inode information.  I've seen
> that take upwards of 2 seconds, just because of the number of data nodes
> that have to be read.
> 
> Now, same situation, but I have a checkpoint node out there.  I don't
> have to read _all_ the data nodes because the checkpoint gives me a
> snapshot of at least some of them, right?
> 
> Or maybe I am extending your idea a bit further than what you
> intended...
> 
> josh
> 
You are right in general. I would like to say that the NAND ICPs idea is 
*superset* for NOR ICP idea, not vice versa... (ICP=inode check point)

But I believe the NOR ICPs shouldn't be implemented together with NAND 
cips since NOR ICPs are much simpler. For example, the need bych simpler 
data structures, algorithms, etc. I dont think it is a good Idea to have 
hunderds of #ifdefs ...

-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 13:56           ` Artem Bityuckiy
@ 2004-10-04 14:06             ` Artem B. Bityuckiy
  2004-10-04 14:17               ` Josh Boyer
  0 siblings, 1 reply; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 14:06 UTC (permalink / raw)
  To: dedekind; +Cc: David Woodhouse, linux-mtd

Sorry:
> But I believe the NOR ICPs shouldn't be implemented together with NAND 
> ICPs since NOR ICPs are much simpler. For example, they need much simpler 
> data structures, algorithms, etc. I don't think it is a good idea to have 
> hunderds of #ifdefs ... It's better to implement NOR ICPs as distinct thing...
 > Don't you think so?


-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 14:06             ` Artem B. Bityuckiy
@ 2004-10-04 14:17               ` Josh Boyer
  2004-10-04 14:22                 ` Artem B. Bityuckiy
  0 siblings, 1 reply; 29+ messages in thread
From: Josh Boyer @ 2004-10-04 14:17 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: dedekind, linux-mtd, David Woodhouse

On Mon, 2004-10-04 at 09:06, Artem B. Bityuckiy wrote:
> Sorry:
> > But I believe the NOR ICPs shouldn't be implemented together with NAND 
> > ICPs since NOR ICPs are much simpler. For example, they need much simpler 
> > data structures, algorithms, etc. I don't think it is a good idea to have 
> > hunderds of #ifdefs ... It's better to implement NOR ICPs as distinct thing...
>  > Don't you think so?
> 

Ok, good point.  #ifdefs are evil :).  But maybe the functions to access
them could be commonly named with different implementations depending on
the config options.  I'm thinking along the lines of how the wbuf stuff
is handled for NOR in os-linux.h.

Or maybe that doesn't make sense.  I think I should just wait and see
what you come up with :).

josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode checkpoints
  2004-10-04 14:17               ` Josh Boyer
@ 2004-10-04 14:22                 ` Artem B. Bityuckiy
  0 siblings, 0 replies; 29+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-04 14:22 UTC (permalink / raw)
  To: Josh Boyer; +Cc: David Woodhouse, linux-mtd

> Ok, good point.  #ifdefs are evil :).  But maybe the functions to access
> them could be commonly named with different implementations depending on
> the config options.  I'm thinking along the lines of how the wbuf stuff
> is handled for NOR in os-linux.h.
> 
> Or maybe that doesn't make sense.  I think I should just wait and see
> what you come up with :).
> 

Yes, I'm fully agree. I believe It would be rather easy to add NOR ICPs if
the NAND ICPs were implemented :-)

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2004-10-09 14:49 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-04 10:14 inode checkpoints Artem Bityuckiy
2004-10-04 10:22 ` David Woodhouse
2004-10-04 10:36   ` [OBORONA-SPAM] " Artem B. Bityuckiy
2004-10-04 12:34     ` Josh Boyer
2004-10-04 13:07       ` Artem B. Bityuckiy
2004-10-04 13:18         ` David Woodhouse
2004-10-04 13:32           ` Artem B. Bityuckiy
2004-10-04 13:46           ` Artem B. Bityuckiy
2004-10-04 14:18           ` Artem B. Bityuckiy
2004-10-04 14:23             ` David Woodhouse
2004-10-04 15:07               ` Artem B. Bityuckiy
2004-10-05 14:07               ` Artem B. Bityuckiy
2004-10-05 16:45                 ` David Woodhouse
2004-10-05 17:20                   ` Josh Boyer
2004-10-06  9:07                     ` Artem B. Bityuckiy
2004-10-09 11:45                     ` Artem B. Bityuckiy
2004-10-09 11:58                       ` Artem B. Bityuckiy
2004-10-09 13:01                       ` Artem B. Bityuckiy
2004-10-09 14:48                         ` Artem B. Bityuckiy
2004-10-09 13:22                       ` Artem B. Bityuckiy
2004-10-04 11:44   ` Artem Bityuckiy
2004-10-04 12:36     ` Josh Boyer
2004-10-04 12:43       ` David Woodhouse
2004-10-04 13:26       ` [OBORONA-SPAM] " Artem B. Bityuckiy
2004-10-04 13:39         ` Josh Boyer
2004-10-04 13:56           ` Artem Bityuckiy
2004-10-04 14:06             ` Artem B. Bityuckiy
2004-10-04 14:17               ` Josh Boyer
2004-10-04 14:22                 ` Artem B. Bityuckiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox