JFFS3 and RAM consumprion reincarnated

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* JFFS3 and RAM consumprion reincarnated
@ 2005-03-01 16:28 Artem B. Bityuckiy
  2005-03-02 14:44 ` Jörn Engel
  0 siblings, 1 reply; 8+ messages in thread
From: Artem B. Bityuckiy @ 2005-03-01 16:28 UTC (permalink / raw)
  To: MTD List

Hello,

I'd like to continue the JFFS3 RAM consumption discussion.
Please, take a glimpse at 
http://lists.infradead.org/pipermail/linux-mtd/2005-January/011671.html 
and follow the conversation 
athttp://lists.infradead.org/pipermail/linux-mtd/2005-January/011716.html 
to refresh your memory.

Well, I suppose the reader knows what are Summary and ICP. If no, follow 
the links above.

We've stopped on the design like this: each inode has ICP which is stored 
on flash. The ICP is being outdated by GC and users. Since it is 
inefficient to rewrite new inode every time, we won't rewrite it but add 
node_ref instead. Sometimes we'll flush ICP and free node_refs.

Example:
1. We have inode X with 10 nodes (node 1, node 2, ... node 10) and ICP 
node which describes them. So far so good. On iget() call we read ICP and 
have the inode built.

2. Suppose GC has moved node 1 of our inode. Its position has been 
changed, so the relating ICP entry is obsolete now. In this case JFFS3 
does not rewrite new ICP. Instesd, it allocates node_ref for node 1 and 
keeps it in-core.
Now, on iget() call, we read ICP and the node 1's node_ref and this is 
sufficient to build our inode.

Anologeously, if another nodes are moved, we just allocate more node_refs.

Sometimes, I don't specify when, we might rewrite old obsolete ICP and 
free node_refs. And so on.

I think it makes sence to introduce JFFS3 parameter that limits the abount 
of in-core RAM that JFFS3 might consume. Jets call this parameter 
JFFS3_MAXRAM.

If JFFS3_MAXRAM = 0, we rewrite new ICP every time when any of its entry 
is obsoleted.

If JFFS3_MAXRAM is very large, we have JFFS3 = JFFS2 in the memory 
consumption's respect.

Comments?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: JFFS3 and RAM consumprion reincarnated
  2005-03-01 16:28 JFFS3 and RAM consumprion reincarnated Artem B. Bityuckiy
@ 2005-03-02 14:44 ` Jörn Engel
  2005-03-03 11:09   ` Artem B. Bityuckiy
  2005-03-03 18:34   ` Jared Hulbert
  0 siblings, 2 replies; 8+ messages in thread
From: Jörn Engel @ 2005-03-02 14:44 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Tue, 1 March 2005 16:28:36 +0000, Artem B. Bityuckiy wrote:
> 
> We've stopped on the design like this: each inode has ICP which is stored 
> on flash. The ICP is being outdated by GC and users. Since it is 
> inefficient to rewrite new inode every time, we won't rewrite it but add 
> node_ref instead. Sometimes we'll flush ICP and free node_refs.

I agree with tglx, your approach is complicated (aka horrible).  How
about something much simpler:
o The ICP is just a list of erase blocks.
o For any non-obsolete node belonging to an inode, the containing
  erase block number *must* be part of ICP.
o If an erase block doesn't contain non-obsolute nodes for this inode,
  its number *should* not be part of ICP.
o The ICP *can* be stored in flash.

Advantages over current design:
o Lower memory consumption, as we don't track individual nodes anymore.

Advantages over your old ICP concept:
o GC and write are simple.  They simply add the current eraseblock to
  the ICP list, if it isn't part of it already.

Disadvantages:
o Whenever we need to check the full node list, we take a few more
  indirections.
o Removal of erase blocks from the ICP list is done on lookup.
  "Whoops, this erase block doesn't contain any of my nodes."

Ultimate advantage: Design can be explained in less words. :)

Jörn

-- 
My second remark is that our intellectual powers are rather geared to
master static relations and that our powers to visualize processes
evolving in time are relatively poorly developed.
-- Edsger W. Dijkstra

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: JFFS3 and RAM consumprion reincarnated
  2005-03-02 14:44 ` Jörn Engel
@ 2005-03-03 11:09   ` Artem B. Bityuckiy
  2005-03-04 16:24     ` Jörn Engel
  2005-03-03 18:34   ` Jared Hulbert
  1 sibling, 1 reply; 8+ messages in thread
From: Artem B. Bityuckiy @ 2005-03-03 11:09 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

Dear Joern,

> I agree with tglx, your approach is complicated (aka horrible).  How
> about something much simpler:
Whel this is on of approaches I've already proposed :-)

> o The ICP is just a list of erase blocks.
> o For any non-obsolete node belonging to an inode, the containing
>   erase block number *must* be part of ICP.
> o If an erase block doesn't contain non-obsolute nodes for this inode,
>   its number *should* not be part of ICP.
> o The ICP *can* be stored in flash.
Ok.

> Advantages over current design:
> o Lower memory consumption, as we don't track individual nodes anymore.
Right.

> Advantages over your old ICP concept:
> o GC and write are simple.  They simply add the current eraseblock to
>   the ICP list, if it isn't part of it already.
Right.

> Disadvantages:
> o Whenever we need to check the full node list, we take a few more
>   indirections.
I'll elaborate your "whenever" - this is iget() call at most. The 
per-inode list might also be required when we have deals with 
deletion/deleted direntries.

> o Removal of erase blocks from the ICP list is done on lookup.
>   "Whoops, this erase block doesn't contain any of my nodes."
Right.

> Ultimate advantage: Design can be explained in less words. :)
I don't think this is so important :-)

Again:
> o Whenever we need to check the full node list, we take a few more
>   indirections.
Well, this might be serious disadvantage. Conceivably, there is a large 
file which is distributed over a lot of blocks. The iget() of the relating 
inode assumes:

for (all blocks relating to our inode)
{
	Read_the_block_summary();
	Identify_the_position_of_all_the_inode's_nodes();
	for (all the nodes found)
	{
		Read_the_node();
	}
}

We risk to end up with extremely slow iget().

But yes, as I wrote earlier and as you has affirmed, this is fairly simple 
and elegant idea. ICP is not needed here at all while summary nodes are 
obligatory. And this fits well to the idea of superblock which is 
distributed and encompasses summaries.

BTW, I assume that the technique we're talking about is applied *only* to 
inodes that aren't in the inode cache (i.e., iget() haven't been called 
for them yet). (Is there some term for them?). Those inodes that are in 
inode cache do not need this since they keep track of nodes in 
fragtree/dirent list.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: JFFS3 and RAM consumprion reincarnated
  2005-03-03 11:09   ` Artem B. Bityuckiy
@ 2005-03-04 16:24     ` Jörn Engel
  2005-03-05 11:15       ` Artem B. Bityuckiy
  0 siblings, 1 reply; 8+ messages in thread
From: Jörn Engel @ 2005-03-04 16:24 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Thu, 3 March 2005 11:09:12 +0000, Artem B. Bityuckiy wrote:
> 
> > I agree with tglx, your approach is complicated (aka horrible).  How
> > about something much simpler:
> Whel this is on of approaches I've already proposed :-)

Ok, sorry.  -ENOTIME, didn't read all previous postings, just your
references.

> > Disadvantages:
> > o Whenever we need to check the full node list, we take a few more
> >   indirections.
> I'll elaborate your "whenever" - this is iget() call at most. The 
> per-inode list might also be required when we have deals with 
> deletion/deleted direntries.

Yup.  Thanks for checking, I was too lazy.

> > Ultimate advantage: Design can be explained in less words. :)
> I don't think this is so important :-)

Not in English, no.  But a simple design is an indication of a simple
implementation - more robust, less buggy, pick your favorite
attribute.

> Again:
> > o Whenever we need to check the full node list, we take a few more
> >   indirections.
> Well, this might be serious disadvantage. Conceivably, there is a large 
> file which is distributed over a lot of blocks. The iget() of the relating 
> inode assumes:
> 
> for (all blocks relating to our inode)
> {
> 	Read_the_block_summary();
> 	Identify_the_position_of_all_the_inode's_nodes();
> 	for (all the nodes found)
> 	{
> 		Read_the_node();
> 	}
> }

Almost.  Unless I misread Read_the_block_summary() and you mean "take
the list of erase blocks from ICP"

> We risk to end up with extremely slow iget().

Hence, we should cache this.  Extremely slow iget() under memory
pressure is fine, still much faster than OOM.  Without memory
pressure, we'd have current performance.

> But yes, as I wrote earlier and as you has affirmed, this is fairly simple 
> and elegant idea. ICP is not needed here at all while summary nodes are 
> obligatory. And this fits well to the idea of superblock which is 
> distributed and encompasses summaries.

Well, I'd still store *some* information, namely the full list of
erase blocks containing nodes.  Not sure if that is necessary, maybe
you're right and we should get rid of this information as well.  Time
to code and test things.

Jörn

-- 
Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
-- Rob Pike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: JFFS3 and RAM consumprion reincarnated
  2005-03-04 16:24     ` Jörn Engel
@ 2005-03-05 11:15       ` Artem B. Bityuckiy
  2005-03-05 11:27         ` Jörn Engel
  0 siblings, 1 reply; 8+ messages in thread
From: Artem B. Bityuckiy @ 2005-03-05 11:15 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Fri, 2005-03-04 at 17:24 +0100, Jörn Engel wrote:
> Not in English, no.  But a simple design is an indication of a simple
> implementation - more robust, less buggy, pick your favorite
> attribute.
Simple and clear design is certainly of high priority. But again, I
afraid the performance will suffer too much.

> Almost.  Unless I misread Read_the_block_summary() and you mean "take
> the list of erase blocks from ICP"
ICP contains per-inode information. Physical nodes are placed
everywhere.
Summary node contains per-block information, i.e., one summary node
describes all the nodes in the current block. We suppose JFFS3 supports
summaries.

Consequently, Read_the_block_summary() means read the summary node, no
need to scan block.

> 
> > We risk to end up with extremely slow iget().
> 
> Hence, we should cache this.  Extremely slow iget() under memory
> pressure is fine, still much faster than OOM.  Without memory
> pressure, we'd have current performance.
Hmm. Do you know whether it possible to register JFFS2-specific "reap"
function ?

> 
> > But yes, as I wrote earlier and as you has affirmed, this is fairly simple 
> > and elegant idea. ICP is not needed here at all while summary nodes are 
> > obligatory. And this fits well to the idea of superblock which is 
> > distributed and encompasses summaries.
> 
> Well, I'd still store *some* information, namely the full list of
> erase blocks containing nodes.  Not sure if that is necessary, maybe
> you're right and we should get rid of this information as well. 
No need to create ICP to store this IMO.

> Time to code and test things.
I think it is a bit early. We need to discuss and agree on something.
Then document it and agree again. :-) If there are several approaches,
I'd like to design them all :-)
I'll try to gather all together and document this. I'd be happy to get
some help :=)

Furthermore, I'd like to discuss several extra ideas, e.g.:
*	Separate users writes and GC writes between different blocks.
*	Deletion direntries processing. It is far no good in JFFS3.


-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: JFFS3 and RAM consumprion reincarnated
  2005-03-05 11:15       ` Artem B. Bityuckiy
@ 2005-03-05 11:27         ` Jörn Engel
  0 siblings, 0 replies; 8+ messages in thread
From: Jörn Engel @ 2005-03-05 11:27 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Sat, 5 March 2005 14:15:40 +0300, Artem B. Bityuckiy wrote:
> On Fri, 2005-03-04 at 17:24 +0100, Jörn Engel wrote:
> > Not in English, no.  But a simple design is an indication of a simple
> > implementation - more robust, less buggy, pick your favorite
> > attribute.
> Simple and clear design is certainly of high priority. But again, I
> afraid the performance will suffer too much.

I don't care.  Code, test.  If tests agree with you, you win.

> > Almost.  Unless I misread Read_the_block_summary() and you mean "take
> > the list of erase blocks from ICP"
> ICP contains per-inode information. Physical nodes are placed
> everywhere.
> Summary node contains per-block information, i.e., one summary node
> describes all the nodes in the current block. We suppose JFFS3 supports
> summaries.
> 
> Consequently, Read_the_block_summary() means read the summary node, no
> need to scan block.

Ok, sorry.  s/ICP/IBL/
IBL is the Inode Block List, which contains all erase blocks
containing valid nodes.  It may contain more, but not less.

> > Hence, we should cache this.  Extremely slow iget() under memory
> > pressure is fine, still much faster than OOM.  Without memory
> > pressure, we'd have current performance.
> Hmm. Do you know whether it possible to register JFFS2-specific "reap"
> function ?

set_shrinker

> > Well, I'd still store *some* information, namely the full list of
> > erase blocks containing nodes.  Not sure if that is necessary, maybe
> > you're right and we should get rid of this information as well. 
> No need to create ICP to store this IMO.

Could be.  The IBL will reduce the number of erase blocks to test
(performance), but increase the memory consumption again.  As usual,
we need to test to see if the tradeoff is worth it.

> > Time to code and test things.
> I think it is a bit early. We need to discuss and agree on something.
> Then document it and agree again. :-) If there are several approaches,
> I'd like to design them all :-)
> I'll try to gather all together and document this. I'd be happy to get
> some help :=)

I personally hate to discuss, agree, document, carve in stone and have
the pope sprinkle holy water over something that may not survive the
first contact with reality.  Get some cold hard numbers, noone will
disagree with those.

If the case was obvious, yes, we could go down your path.  But our
discussion already proves, it's not.

> Furthermore, I'd like to discuss several extra ideas, e.g.:
> *	Separate users writes and GC writes between different blocks.
> *	Deletion direntries processing. It is far no good in JFFS3.

Sure.  Seperate threads?

Jörn

-- 
It's not whether you win or lose, it's how you place the blame.
-- unknown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: JFFS3 and RAM consumprion reincarnated
  2005-03-02 14:44 ` Jörn Engel
  2005-03-03 11:09   ` Artem B. Bityuckiy
@ 2005-03-03 18:34   ` Jared Hulbert
  2005-03-04 16:07     ` Jörn Engel
  1 sibling, 1 reply; 8+ messages in thread
From: Jared Hulbert @ 2005-03-03 18:34 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Wed, 2 Mar 2005 15:44:07 +0100, Jörn Engel
<joern@wohnheim.fh-wedel.de> wrote:
> I agree with tglx, your approach is complicated (aka horrible).

Can you point me to the arguments where you and tglx explain why it's
'horrible'.  I was thinking it was a rather nice idea myself.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: JFFS3 and RAM consumprion reincarnated
  2005-03-03 18:34   ` Jared Hulbert
@ 2005-03-04 16:07     ` Jörn Engel
  0 siblings, 0 replies; 8+ messages in thread
From: Jörn Engel @ 2005-03-04 16:07 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List

On Thu, 3 March 2005 10:34:52 -0800, Jared Hulbert wrote:
> On Wed, 2 Mar 2005 15:44:07 +0100, Jörn Engel
> <joern@wohnheim.fh-wedel.de> wrote:
> > I agree with tglx, your approach is complicated (aka horrible).
> 
> Can you point me to the arguments where you and tglx explain why it's
> 'horrible'.  I was thinking it was a rather nice idea myself.

Follow one of the links Artem had in the original mail to this thread.
One of tglx's comments was "this is horrible" or similar.

Jörn

-- 
Good warriors cause others to come to them and do not go to others.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-03-05 11:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-01 16:28 JFFS3 and RAM consumprion reincarnated Artem B. Bityuckiy
2005-03-02 14:44 ` Jörn Engel
2005-03-03 11:09   ` Artem B. Bityuckiy
2005-03-04 16:24     ` Jörn Engel
2005-03-05 11:15       ` Artem B. Bityuckiy
2005-03-05 11:27         ` Jörn Engel
2005-03-03 18:34   ` Jared Hulbert
2005-03-04 16:07     ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox