JFFS2 is broken

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* JFFS2 is broken
@ 2001-06-29  0:14 Vipin Malik
  2001-06-29  2:32 ` Nicolas Pitre
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Vipin Malik @ 2001-06-29  0:14 UTC (permalink / raw)
  To: David Woodhouse; +Cc: jffs-dev, MTD for Linux, elw_dev_list

For all practical purposes, JFFS2, in its present form, IMHO,  is
broken.

I've been doing a lot of "jitter" or "blocking" time testing for various
tasks running on a system where there is JFFS2 activity going on (info
for those that have not been following my posts).

Here are the results:

Task interacting with JFFS2 fs directly. JFFS2 compression enabled. (the
latest code in CVS):

Worst case jitter on a POSIX real time task interacting with
JFFS2~>30*seconds*

POSIX RT Tast NOT directly interacting with JFFS2. JFFS2 compression
enabled, but another task reading/writing to JFFS2 system.

Worst case jitter on *task NOT interacting with JFFS2* ~>30 seconds!
(same for task interacting with JFFS2).

Ok, so I turned compression off (hacked the code. There is no option to
do this).

Worst case jitter on task interacting with JFFS2, ~>4 seconds! Quite am
improvement!

Worst case jitter on task NOT interacting with JFFS2, ~>4seconds! :(

So, in other words, if you use JFFS2 in your embedded system, you cannot
expect a guranteed response to anything in less than 30 seconds if you
use the stock code.
If you turn compression off, that time is ~4 seconds.

Note that these times are HIGHLY system speed dependent. My test system
is a AMD SC520 (486 DX4 w/16MB L1 cache) @133MHz w/ 64MB 66MHz SDRAM.
(~61 VAX MIPS). 8MB of AMD flash connected 32 bits wide.

The problem is that JFFS2 tries to be a good guy and tries its hand at
GC'ing dirty flash, _from within a write() system call_

Now, I don't know if this can be made schedulable or not, but at this
time, *all other* activity in the system stops.
When the GC is complete, life resumes as before, but more than 30-40
seconds may have elapsed.

To test my hypothesis, I hacked the code, to refuse to try to GC from
within a write() to the JFFS2 fs. all GC is now done by the gc thread
(as it should).
In the compression turned off case, my block times for the task not
interacting with JFFS2 WENT DOWN TO 49.9 *ms* worst case, with the test
going
from an empty JFFS2 to a completely full JFFS2 fs (as in all cases
above).

Unfortunately, there is a problem with this approach. If write() cannot
find space and now we refuse to GC inside the write and return with
-ENOSPC, a lot of stock programs may break. I am returning -ENSPC
because I just didn't take the time to figure out how to return 0, which

IMHO is the right thing to do.

Under POSIX write() can return 0, and it not be an error. The system is
not ready for the write yet- exactly as in our case.
However, I think stock programs will break with this too.

The only solution, that I think will work, is to find a way to block the
write() to JFFS2 but allow kernel schedduling to go on. I really don't
know
if this is possible under Linux as it exists today, maybe someone else
can answer this question.

Comments welcome

Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-06-29  0:14 Vipin Malik
@ 2001-06-29  2:32 ` Nicolas Pitre
  2001-06-29  9:00 ` Alan Cox
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2001-06-29  2:32 UTC (permalink / raw)
  To: Vipin Malik; +Cc: David Woodhouse, jffs-dev, MTD for Linux

On Thu, 28 Jun 2001, Vipin Malik wrote:

> So, in other words, if you use JFFS2 in your embedded system, you cannot
> expect a guranteed response to anything in less than 30 seconds if you
> use the stock code.
> If you turn compression off, that time is ~4 seconds.
>
> Note that these times are HIGHLY system speed dependent. My test system
> is a AMD SC520 (486 DX4 w/16MB L1 cache) @133MHz w/ 64MB 66MHz SDRAM.
> (~61 VAX MIPS). 8MB of AMD flash connected 32 bits wide.
>
> The problem is that JFFS2 tries to be a good guy and tries its hand at
> GC'ing dirty flash, _from within a write() system call_
>
> Now, I don't know if this can be made schedulable or not, but at this
> time, *all other* activity in the system stops.
> When the GC is complete, life resumes as before, but more than 30-40
> seconds may have elapsed.

This is completely wrong.  There is no excuse for the compression code to
monopolize the CPU that way.  This, of course, might be solved by the patch
that makes the kernel preemptive.  You could try the patch from
ftp://ftp.mvista.com/pub/Area51/preemptible_kernel/ and see the difference.
Of course the compression will still take a significant amount of CPU time,
but the rest of the system won't be starved.  Without the preemptive kernel
patch, the code executing in kernel mode is following the cooperative model
i.e. you must give up the CPU volontarily after a while.  So the alternative
to the preemptive kernel would be something like inserting this construct
within the inner loop in the compression code:

	if (current->need_resched) schedule();

This should solve the problem with all other activities stalling while
compression is going on.

> To test my hypothesis, I hacked the code, to refuse to try to GC from
> within a write() to the JFFS2 fs. all GC is now done by the gc thread
> (as it should).
> In the compression turned off case, my block times for the task not
> interacting with JFFS2 WENT DOWN TO 49.9 *ms* worst case, with the test
> going
> from an empty JFFS2 to a completely full JFFS2 fs (as in all cases
> above).
>
> Unfortunately, there is a problem with this approach. If write() cannot
> find space and now we refuse to GC inside the write and return with
> -ENOSPC, a lot of stock programs may break. I am returning -ENSPC
> because I just didn't take the time to figure out how to return 0, which
> IMHO is the right thing to do.

In fact, if you're not using aio's, you can't expect any predictable delay
for a write operation.  Even on floppies a write may take several seconds to
complete.  If you really want your process to keep running even while the
write is going on then dispatch the write operation to a separate thread.
HOWEVER the fact that all system activities are stopped while compression is
going on is actually a bug and should be solved by the introduction of the
if(...)  schedule() above into the compression loop at strategic places.

> The only solution, that I think will work, is to find a way to block the
> write() to JFFS2 but allow kernel schedduling to go on. I really don't
> know
> if this is possible under Linux as it exists today, maybe someone else
> can answer this question.

Not only it's possible, but mandatory for long operations like I said above.
You can try inserting the magic line in the compression code.  For instance
it shouldn't hurt even if you add too much or if you don't put it at the
best places, as long as schedule() isn't called from interrupt handlers or
bottom halves which the JFFS2 compression code doesn't have anyway.  This
will certainly give you a different behavior.

Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-06-29  0:14 Vipin Malik
  2001-06-29  2:32 ` Nicolas Pitre
@ 2001-06-29  9:00 ` Alan Cox
  2001-06-29 14:13   ` Vipin Malik
  2001-07-01 19:10   ` David Woodhouse
  2001-06-29 22:11 ` Vipin Malik
  2001-08-09 17:00 ` A. Craig West
  3 siblings, 2 replies; 18+ messages in thread
From: Alan Cox @ 2001-06-29  9:00 UTC (permalink / raw)
  To: Vipin Malik; +Cc: David Woodhouse, jffs-dev, MTD for Linux, elw_dev_list

> The problem is that JFFS2 tries to be a good guy and tries its hand at
> GC'ing dirty flash, _from within a write() system call_

That in itself is probably fine, write should only hold the inode lock
for that file

> Unfortunately, there is a problem with this approach. If write() cannot
> find space and now we refuse to GC inside the write and return with
> -ENOSPC, a lot of stock programs may break. I am returning -ENSPC
> because I just didn't take the time to figure out how to return 0, which

One thing you can do here is to wake the gc thread and sleep politely on it

> The only solution, that I think will work, is to find a way to block the
> write() to JFFS2 but allow kernel schedduling to go on. I really don't
> know

Well there are two things there.

1.	You could wake the GC and sleep on it  using sleep/wakeup or
	semaphores

2.	Profile the kernel and find out where it is tight looping. I can't
	see any reason for tight loops except for the compression itself
	so it suggests a code bug.

Finally within the compression loops you can check current->need_resched and
if it is set call schedule() to allow the compression to switch to other
tasks at the point it has used its time slice.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-06-29  9:00 ` Alan Cox
@ 2001-06-29 14:13   ` Vipin Malik
  2001-07-01 19:10   ` David Woodhouse
  1 sibling, 0 replies; 18+ messages in thread
From: Vipin Malik @ 2001-06-29 14:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Alan Cox, jffs-dev, MTD for Linux, elw_dev_list

Alan Cox wrote:

>
> One thing you can do here is to wake the gc thread and sleep politely on it
>
> > The only solution, that I think will work, is to find a way to block the
> > write() to JFFS2 but allow kernel schedduling to go on. I really don't
> > know
>
> Well there are two things there.
>
> 1.      You could wake the GC and sleep on it  using sleep/wakeup or
>         semaphores
>
> 2.      Profile the kernel and find out where it is tight looping. I can't
>         see any reason for tight loops except for the compression itself
>         so it suggests a code bug.
>
> Finally within the compression loops you can check current->need_resched and
> if it is set call schedule() to allow the compression to switch to other
> tasks at the point it has used its time slice.

If David is already looking at this, maybe I'll wait for a few days before
bumbling through the code myself ;) David? You've been awfully quite!

Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-06-29  0:14 Vipin Malik
  2001-06-29  2:32 ` Nicolas Pitre
  2001-06-29  9:00 ` Alan Cox
@ 2001-06-29 22:11 ` Vipin Malik
  2001-08-09 17:00 ` A. Craig West
  3 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 2001-06-29 22:11 UTC (permalink / raw)
  To: David Woodhouse; +Cc: jffs-dev, MTD for Linux, elw_dev_list

Just as a follow up to this last email, I just confirmed the results with
my "I_refuse_to_do_a_GC_from_within_a_write()" hack test *with* compression
enabled.

I get the same results: namely, max jitter on a task NOT directly
interacting with the JFFS2 fs is ~50ms worst case, with the JFFS2 going
from
empty to full in the background (another task is filling it up) (vs. >40
secs w/o the hack).

So ,this confirms that the excessive blocking time is somewhere inside the
function:  "jffs2_garbage_collect_pass(c)"

Here is the trivial hack that I used to
"refuse_to_gc_from_within_a_write()"
(Note: This is against the patched nodemgmt.c with the patch that David
sent me. Not against the code in CVS).

Vipin


--- nodemgmt.origpatched.c      Thu Jun 28 17:12:05 2001
+++ nodemgmt.c  Thu Jun 28 17:16:41 2001
@@ -116,6 +116,17 @@
                        int ret;

                        up(&c->alloc_sem);
+
+                       /* Try to see what happens if we refuse to do GC
when we have been
+                          requested to do just a simple write().
+                          This is to test if our blocking times on "other"
tasks (that
+                          are not interacting with the fs) are improved.
-Vipin 06/28/2001
+                        */
+                       printk("jffs2_reserve_space(): Refusing to GC! ret
-ENOSPC\n");
+
+                       spin_unlock_bh(&c->erase_completion_lock);
+                       return -ENOSPC;
+
                        if (c->dirty_size < c->sector_size) {
                                D1(printk(KERN_DEBUG "Short on space, but
total dirty size 0x%08x < sector size 0x%08x, so -ENOSPC\n", c->dirty_size,
c->sector_size));
                                spin_unlock_bh(&c->erase_completion_lock);






Vipin Malik wrote:

> For all practical purposes, JFFS2, in its present form, IMHO,  is
> broken.
>
> I've been doing a lot of "jitter" or "blocking" time testing for various
> tasks running on a system where there is JFFS2 activity going on (info
> for those that have not been following my posts).
>
> Here are the results:
>
> Task interacting with JFFS2 fs directly. JFFS2 compression enabled. (the
> latest code in CVS):
>
> Worst case jitter on a POSIX real time task interacting with
> JFFS2~>30*seconds*
>
> POSIX RT Tast NOT directly interacting with JFFS2. JFFS2 compression
> enabled, but another task reading/writing to JFFS2 system.
>
> Worst case jitter on *task NOT interacting with JFFS2* ~>30 seconds!
> (same for task interacting with JFFS2).
>
> Ok, so I turned compression off (hacked the code. There is no option to
> do this).
>
> Worst case jitter on task interacting with JFFS2, ~>4 seconds! Quite am
> improvement!
>
> Worst case jitter on task NOT interacting with JFFS2, ~>4seconds! :(
>
> So, in other words, if you use JFFS2 in your embedded system, you cannot
> expect a guranteed response to anything in less than 30 seconds if you
> use the stock code.
> If you turn compression off, that time is ~4 seconds.
>
> Note that these times are HIGHLY system speed dependent. My test system
> is a AMD SC520 (486 DX4 w/16MB L1 cache) @133MHz w/ 64MB 66MHz SDRAM.
> (~61 VAX MIPS). 8MB of AMD flash connected 32 bits wide.
>
> The problem is that JFFS2 tries to be a good guy and tries its hand at
> GC'ing dirty flash, _from within a write() system call_
>
> Now, I don't know if this can be made schedulable or not, but at this
> time, *all other* activity in the system stops.
> When the GC is complete, life resumes as before, but more than 30-40
> seconds may have elapsed.
>
> To test my hypothesis, I hacked the code, to refuse to try to GC from
> within a write() to the JFFS2 fs. all GC is now done by the gc thread
> (as it should).
> In the compression turned off case, my block times for the task not
> interacting with JFFS2 WENT DOWN TO 49.9 *ms* worst case, with the test
> going
> from an empty JFFS2 to a completely full JFFS2 fs (as in all cases
> above).
>
> Unfortunately, there is a problem with this approach. If write() cannot
> find space and now we refuse to GC inside the write and return with
> -ENOSPC, a lot of stock programs may break. I am returning -ENSPC
> because I just didn't take the time to figure out how to return 0, which
>
> IMHO is the right thing to do.
>
> Under POSIX write() can return 0, and it not be an error. The system is
> not ready for the write yet- exactly as in our case.
> However, I think stock programs will break with this too.
>
> The only solution, that I think will work, is to find a way to block the
> write() to JFFS2 but allow kernel schedduling to go on. I really don't
> know
> if this is possible under Linux as it exists today, maybe someone else
> can answer this question.
>
> Comments welcome
>
> Vipin
>
> To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
> the body of a message to majordomo@axis.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-06-29  9:00 ` Alan Cox
  2001-06-29 14:13   ` Vipin Malik
@ 2001-07-01 19:10   ` David Woodhouse
  1 sibling, 0 replies; 18+ messages in thread
From: David Woodhouse @ 2001-07-01 19:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: Vipin Malik, jffs-dev, MTD for Linux, elw_dev_list

alan@lxorguk.ukuu.org.uk said:
> Well there are two things there.
> 1.	You could wake the GC and sleep on it  using sleep/wakeup or
> 	semaphores

I've tried to keep the GC thread optional. That's not set in stone - but we 
ought to be able to fix this - there's no real reason why the GC in the 
write() path should behave like this.

Thinks... is the BKL still held in write()? Should we be releasing it?

> 2.	Profile the kernel and find out where it is tight looping. I can't
> 	see any reason for tight loops except for the compression itself
> 	so it suggests a code bug.

AFAIR the compression routines aren't turning up on the profiles. But then 
we're getting 71 timer ticks in a 30-second period, according to the 
profiles I saw.

> Finally within the compression loops you can check current->
> need_resched and if it is set call schedule() to allow the compression
> to switch to other tasks at the point it has used its time slice. 

We already do this between each node we move, in the garbage collection
loop. Doing it inside the compression loops would be possible - although
it'd confuse mkfs.jffs2, which just imports those files directly too.

Better still to avoid decompressing and recompressing data when we move 
nodes intact.

I'll poke at this on Monday. 

--
dwmw2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-06-29  0:14 Vipin Malik
                   ` (2 preceding siblings ...)
  2001-06-29 22:11 ` Vipin Malik
@ 2001-08-09 17:00 ` A. Craig West
  2001-08-09 22:36   ` Vipin Malik
  3 siblings, 1 reply; 18+ messages in thread
From: A. Craig West @ 2001-08-09 17:00 UTC (permalink / raw)
  To: Vipin Malik; +Cc: David Woodhouse, jffs-dev, MTD for Linux, elw_dev_list

Is there any more word on this problem? I'm planning on porting jffs2 to the
Agenda VR3, but have been holding off for a resolution of this issue...
By the way, has anybody else noticed that there has been no traffic on this list
in at least a week?

On Thu, 28 Jun 2001, Vipin Malik wrote:

> For all practical purposes, JFFS2, in its present form, IMHO,  is
> broken.
> 
> I've been doing a lot of "jitter" or "blocking" time testing for various
> tasks running on a system where there is JFFS2 activity going on (info
> for those that have not been following my posts).
-Snipped, but boils down to kernel blocks for up to 30 seconds-

-- 
Craig West         Ph: (416) 213-0300	|  It's not a bug,
acwest-sig@mail.bdkw.yi.org          	|  It's a feature...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-08-09 17:00 ` A. Craig West
@ 2001-08-09 22:36   ` Vipin Malik
  0 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 2001-08-09 22:36 UTC (permalink / raw)
  To: A. Craig West; +Cc: David Woodhouse, jffs-dev, MTD for Linux, elw_dev_list

Yes,

As it stands right now, a task (not reading/writing to JFFS2) marked POSIX 
real time (via the set_sched(2) calls), is blocked for ~50ms on my 133MHz 
486 DX4 while another _non posix_ task is writing to JFFS2.

The blocking time is highly processor performance dependent.

However, any task writing to JFFS2 _can_ be blocked upto 10's of seconds 
(in my test case >40 seconds).
This happens when there is no more free space left on the JFFS2 fs and 
garbage collect needs to be done to make space for the write.

If a POSIX task is writing to JFFS2, then that AND ALL OTHER tasks will get 
blocked as the POSIX task has priority and the GC is done in the context of 
the writing task.

So the moral of the story is that at the very least do not write to JFFS2 
from a POSIX real time task under Linux.

I really haven't done any tests re. blocking times of a _regular_ task not 
writing to JFFS2, when there is another _regular_ task writing to JFFS2.

That should be easy to setup and run with my jitter test program. At this 
time I haven't released that program under GPL, but I have permission from 
my company to release it, so if anyone wants it just ask and I'll send it 
to you/put it in th mtd CVS rep.

Re. why it takes that long for GC to happen on a full JFFS2 fs? David had 
some thoughts that a particular singly linked list should be doubly linked- 
which would help quite a lot. David?

Vipin
P.S. There's been lots of traffic on the MTD list. Are you referring to the 
JFFS list?

At 01:00 PM 8/9/2001 -0400, A. Craig West wrote:
>Is there any more word on this problem? I'm planning on porting jffs2 to the
>Agenda VR3, but have been holding off for a resolution of this issue...
>By the way, has anybody else noticed that there has been no traffic on 
>this list
>in at least a week?
>
>On Thu, 28 Jun 2001, Vipin Malik wrote:
>
> > For all practical purposes, JFFS2, in its present form, IMHO,  is
> > broken.
> >
> > I've been doing a lot of "jitter" or "blocking" time testing for various
> > tasks running on a system where there is JFFS2 activity going on (info
> > for those that have not been following my posts).
>-Snipped, but boils down to kernel blocks for up to 30 seconds-
>
>--
>Craig West         Ph: (416) 213-0300   |  It's not a bug,
>acwest-sig@mail.bdkw.yi.org             |  It's a feature...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
@ 2001-08-13 18:34 Frederic Giasson
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Giasson @ 2001-08-13 18:34 UTC (permalink / raw)
  To: 'Vipin Malik', MTD mailing list (E-mail)

Hi again Vipin,

I ran your JitterTest program on my MPC860T 50MHz embedded platform.  I
experienced worst case around 40 seconds like you did.  I think that such
jitter time is unacceptable in a real time embedded system like mine, so I
wonder what can be done to patch JFFS2 so it relinquish the CPU from time to
time... I spinkled a lot of loops into JFFS2 compression files with "if(
current->need_resched) schedule();" and the worst case was 4 seconds. 

After having spinkled the code with schedule(), I ran another application
which accessses JFFS2 too at the same time that JitterTest, and the
execution of my program took 17% more time, which in my opinion is
acceptable because now JFFS2 has to cope with 2 applications accessing it at
the same time. 

Do you think that something sharp and clean could be done with JFFS2 to fix
that problem once and for all? 

Regards,

Frédéric Giasson

-----Original Message-----
From: Vipin Malik [mailto:vipin@embeddedlinuxworks.com]
Sent: Friday, August 10, 2001 10:47 PM
To: Frederic Giasson
Subject: RE: JFFS2 is broken

At 08:31 AM 8/10/2001 -0400, Frederic Giasson wrote:
>Hi Vipin,
>
>I'd be interested in having your testcase programs.  It would be much
>appreciated if you send me the sources.

Hi Frederic,

I've released the code under GPL and sent the entire CVS tree for the 
jitter test program to David to put under the MTD tree.

He should have that done (hopefully) sometime this weekend or by Monday.

Let me know if it's not there by monday and I'll send you a private copy.

Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
       [not found] <F1BED55F35F4D3118C0F00E0295CFF4D9955F7@webmail.mediatrix.c om>
@ 2001-08-13 22:27 ` Vipin Malik
  2001-08-14  0:00   ` David Woodhouse
  0 siblings, 1 reply; 18+ messages in thread
From: Vipin Malik @ 2001-08-13 22:27 UTC (permalink / raw)
  To: Frederic Giasson, MTD mailing list (E-mail)

At 02:34 PM 8/13/2001 -0400, Frederic Giasson wrote:
>Hi again Vipin,
>
>I ran your JitterTest program on my MPC860T 50MHz embedded platform.  I
>experienced worst case around 40 seconds like you did.  I think that such
>jitter time is unacceptable in a real time embedded system like mine, so I
>wonder what can be done to patch JFFS2 so it relinquish the CPU from time to
>time... I spinkled a lot of loops into JFFS2 compression files with "if(
>current->need_resched) schedule();" and the worst case was 4 seconds.

Did the 40 become 4 for the task writing to JFFS2? or was it some "other" task
that had the 40 seconds jitter?
Could you pl describe your setup. Were you running JitterTest as a POSIX 
Real Time
task or regular task?

In my test, the task writing to JFFS2 experienced the 40seconds block 
times, but
another task- not interacting with JFFS2- experienced a jitter of "only" ~50ms.

>After having spinkled the code with schedule(), I ran another application
>which accessses JFFS2 too at the same time that JitterTest, and the
>execution of my program took 17% more time, which in my opinion is
>acceptable because now JFFS2 has to cope with 2 applications accessing it at
>the same time.
>
>Do you think that something sharp and clean could be done with JFFS2 to fix
>that problem once and for all?

I hope something can be done. Those excessive blocking times are just killer!
David once mentioned that there are some obvious optimizations that can be
carried out in the JFFS2 code. Maybe he could list out his favorite ones 
and maybe
someone could volunteer to tackle them one by one.

Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-08-13 22:27 ` JFFS2 is broken Vipin Malik
@ 2001-08-14  0:00   ` David Woodhouse
  2001-08-14  2:39     ` Vipin Malik
  0 siblings, 1 reply; 18+ messages in thread
From: David Woodhouse @ 2001-08-14  0:00 UTC (permalink / raw)
  To: Vipin Malik; +Cc: Frederic Giasson, MTD mailing list (E-mail)

vipin@embeddedlinuxworks.com said:
> David once mentioned that there are some obvious optimizations that
> can be carried out in the JFFS2 code. Maybe he could list out his
> favorite ones  and maybe someone could volunteer to tackle them one by
> one. 

I committed the cleanup to jffs2_remove_node_refs_from_ino_list() (Christ, 
did I name that function?), which showed up near the top of the profiles 
you did. What's next? ISTR we got strange data from the profiling. 

--
dwmw2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: JFFS2 is broken
  2001-08-14  0:00   ` David Woodhouse
@ 2001-08-14  2:39     ` Vipin Malik
  0 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 2001-08-14  2:39 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Frederic Giasson, MTD mailing list (E-mail)

At 01:00 AM 8/14/2001 +0100, David Woodhouse wrote:

>vipin@embeddedlinuxworks.com said:
> > David once mentioned that there are some obvious optimizations that
> > can be carried out in the JFFS2 code. Maybe he could list out his
> > favorite ones  and maybe someone could volunteer to tackle them one by
> > one.
>
>I committed the cleanup to jffs2_remove_node_refs_from_ino_list()

Great! Frederic, since you just setup your system to do the jitter tests, 
could you pl run them again (with the new code from CVS) and see what 
difference it made (my own system is busy running the mird db power fail 
tests).

  Also enable profiling and see what else is on the  "CPU hog" list.

The LTT (Linux Trace Toolkit) seems to be an excellent tool to get more 
fine grained performance stats out of the system and
a new version was just released for 2.4.5 (though it may provide us with 
more info that we need/can handle :)

Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
@ 2001-08-14 16:46 Frederic Giasson
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Giasson @ 2001-08-14 16:46 UTC (permalink / raw)
  To: 'Vipin Malik', David Woodhouse; +Cc: MTD mailing list (E-mail)

[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]

It guess it is.  But is it possible (and not too time-consuming) for you to
take that new one I just created with ssh-keygen and put in
/root/.ssh/identity.pub along with /root/.ssh/identity and the config file
you ask me to create?  Here it is, should you decide to use it....

Frédéric Giasson





-----Original Message-----
From: Vipin Malik [mailto:vipin@embeddedlinuxworks.com]
Sent: Monday, August 13, 2001 10:40 PM
To: David Woodhouse
Cc: Frederic Giasson; MTD mailing list (E-mail)
Subject: Re: JFFS2 is broken 


At 01:00 AM 8/14/2001 +0100, David Woodhouse wrote:

>vipin@embeddedlinuxworks.com said:
> > David once mentioned that there are some obvious optimizations that
> > can be carried out in the JFFS2 code. Maybe he could list out his
> > favorite ones  and maybe someone could volunteer to tackle them one by
> > one.
>
>I committed the cleanup to jffs2_remove_node_refs_from_ino_list()

Great! Frederic, since you just setup your system to do the jitter tests, 
could you pl run them again (with the new code from CVS) and see what 
difference it made (my own system is busy running the mird db power fail 
tests).

  Also enable profiling and see what else is on the  "CPU hog" list.

The LTT (Linux Trace Toolkit) seems to be an excellent tool to get more 
fine grained performance stats out of the system and
a new version was just released for 2.4.5 (though it may provide us with 
more info that we need/can handle :)

Vipin


[-- Attachment #2: identity.pub --]
[-- Type: application/octet-stream, Size: 333 bytes --]

1024 35 164358453018075670455790203200706540984393749504204243836036681422744008324009655150232134223034277415336613711914311755267437950808937418173965122857400339606006019692108436584787607289775427783920015394610085223007334556116082199981839573028213537755795029530359907389853830876722709464615820281008293377021 root@samlinux

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
@ 2001-08-14 17:39 Frederic Giasson
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Giasson @ 2001-08-14 17:39 UTC (permalink / raw)
  To: 'Vipin Malik'; +Cc: MTD mailing list (E-mail)

Good news!  

The jitter is now down to 3 seconds. 
In my test I ran JitterTest as a RT task reading I/P from jffs2.

When I run JitterTest with another task running in background which writes
to JFFS2, the jitter is up to 5 seconds.  No more 40 seconds waiting.

Frédéric Giasson

-----Original Message-----
From: Vipin Malik [mailto:vipin@embeddedlinuxworks.com]
Sent: Monday, August 13, 2001 10:40 PM
To: David Woodhouse
Cc: Frederic Giasson; MTD mailing list (E-mail)
Subject: Re: JFFS2 is broken 

At 01:00 AM 8/14/2001 +0100, David Woodhouse wrote:

>vipin@embeddedlinuxworks.com said:
> > David once mentioned that there are some obvious optimizations that
> > can be carried out in the JFFS2 code. Maybe he could list out his
> > favorite ones  and maybe someone could volunteer to tackle them one by
> > one.
>
>I committed the cleanup to jffs2_remove_node_refs_from_ino_list()

Great! Frederic, since you just setup your system to do the jitter tests, 
could you pl run them again (with the new code from CVS) and see what 
difference it made (my own system is busy running the mird db power fail 
tests).

  Also enable profiling and see what else is on the  "CPU hog" list.

The LTT (Linux Trace Toolkit) seems to be an excellent tool to get more 
fine grained performance stats out of the system and
a new version was just released for 2.4.5 (though it may provide us with 
more info that we need/can handle :)

Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
@ 2001-08-14 17:57 Frederic Giasson
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Giasson @ 2001-08-14 17:57 UTC (permalink / raw)
  To: 'Vipin Malik', MTD mailing list (E-mail)

My other program running at the same time is my testcase which fills JFFS2
to 80% with various files, deletes them and continue to do on.  The Jitter
given by JitterTest is 5 seconds, and my testcase now takes 63 to 65 seconds
to complete, compared to 38-43 seconds when it runs alone.  It looks
resonable to me because 2 applications accesses JFFS2 at the same time.

I removed my schedule()'s and I gathered the new erase.c on the MTD before
doing my tests.

There is something else that I did and maybe you did'nt.  In my chip driver,
I put a "if( current->need_resched ) schedule();" in these fonctions:
atmel_0001_write(), atmel_0001_read() and atmel_0001_erase_varsize(), which
are the functions called by mtdblock driver through the mtd_info structure.
I put schedule()'s because I though that if an application request a write
of 1MB to JFFS2, atmel_0001_write() would not return until that 1MB of data
was compressed and written to JFFS2.  Same thing for read and writes.

Frédéric Giasson

-----Original Message-----
From: Vipin Malik [mailto:vipin@embeddedlinuxworks.com]
Sent: Tuesday, August 14, 2001 1:58 PM
To: Frederic Giasson
Cc: MTD mailing list (E-mail)
Subject: RE: JFFS2 is broken 

At 01:39 PM 8/14/2001 -0400, Frederic Giasson wrote:
>Good news!
>
>The jitter is now down to 3 seconds.
>In my test I ran JitterTest as a RT task reading I/P from jffs2.
>
>When I run JitterTest with another task running in background which writes
>to JFFS2, the jitter is up to 5 seconds.  No more 40 seconds waiting.

But what about this other task _writing_ to JFFS2. How long does it block?
Plus is this still your code with liberally sprinkled schedules()?

Vipin

>Frédéric Giasson
>
>
>
>
>
>-----Original Message-----
>From: Vipin Malik [mailto:vipin@embeddedlinuxworks.com]
>Sent: Monday, August 13, 2001 10:40 PM
>To: David Woodhouse
>Cc: Frederic Giasson; MTD mailing list (E-mail)
>Subject: Re: JFFS2 is broken
>
>
>At 01:00 AM 8/14/2001 +0100, David Woodhouse wrote:
>
> >vipin@embeddedlinuxworks.com said:
> > > David once mentioned that there are some obvious optimizations that
> > > can be carried out in the JFFS2 code. Maybe he could list out his
> > > favorite ones  and maybe someone could volunteer to tackle them one by
> > > one.
> >
> >I committed the cleanup to jffs2_remove_node_refs_from_ino_list()
>
>Great! Frederic, since you just setup your system to do the jitter tests,
>could you pl run them again (with the new code from CVS) and see what
>difference it made (my own system is busy running the mird db power fail
>tests).
>
>   Also enable profiling and see what else is on the  "CPU hog" list.
>
>The LTT (Linux Trace Toolkit) seems to be an excellent tool to get more
>fine grained performance stats out of the system and
>a new version was just released for 2.4.5 (though it may provide us with
>more info that we need/can handle :)
>
>Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
       [not found] <F1BED55F35F4D3118C0F00E0295CFF4DD0DADF@webmail.mediatrix.c om>
@ 2001-08-14 17:57 ` Vipin Malik
  0 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 2001-08-14 17:57 UTC (permalink / raw)
  To: Frederic Giasson; +Cc: MTD mailing list (E-mail)

At 01:39 PM 8/14/2001 -0400, Frederic Giasson wrote:
>Good news!
>
>The jitter is now down to 3 seconds.
>In my test I ran JitterTest as a RT task reading I/P from jffs2.
>
>When I run JitterTest with another task running in background which writes
>to JFFS2, the jitter is up to 5 seconds.  No more 40 seconds waiting.



But what about this other task _writing_ to JFFS2. How long does it block?
Plus is this still your code with liberally sprinkled schedules()?

Vipin




>Frédéric Giasson
>
>
>
>
>
>-----Original Message-----
>From: Vipin Malik [mailto:vipin@embeddedlinuxworks.com]
>Sent: Monday, August 13, 2001 10:40 PM
>To: David Woodhouse
>Cc: Frederic Giasson; MTD mailing list (E-mail)
>Subject: Re: JFFS2 is broken
>
>
>At 01:00 AM 8/14/2001 +0100, David Woodhouse wrote:
>
> >vipin@embeddedlinuxworks.com said:
> > > David once mentioned that there are some obvious optimizations that
> > > can be carried out in the JFFS2 code. Maybe he could list out his
> > > favorite ones  and maybe someone could volunteer to tackle them one by
> > > one.
> >
> >I committed the cleanup to jffs2_remove_node_refs_from_ino_list()
>
>Great! Frederic, since you just setup your system to do the jitter tests,
>could you pl run them again (with the new code from CVS) and see what
>difference it made (my own system is busy running the mird db power fail
>tests).
>
>   Also enable profiling and see what else is on the  "CPU hog" list.
>
>The LTT (Linux Trace Toolkit) seems to be an excellent tool to get more
>fine grained performance stats out of the system and
>a new version was just released for 2.4.5 (though it may provide us with
>more info that we need/can handle :)
>
>Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
       [not found] <F1BED55F35F4D3118C0F00E0295CFF4DD0DAE1@webmail.mediatrix.c om>
@ 2001-08-14 21:57 ` Vipin Malik
  0 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 2001-08-14 21:57 UTC (permalink / raw)
  To: Frederic Giasson, MTD mailing list (E-mail)

At 01:57 PM 8/14/2001 -0400, Frederic Giasson wrote:
>My other program running at the same time is my testcase which fills JFFS2
>to 80% with various files, deletes them and continue to do on.

Be careful, in my tests, the worst case blocking times came when the flash 
was >90+%
full. You may want to increase the full % on your flash full program and 
retest.

As a matter of fact, how I tested was: I ran JitterTest itself and asked it 
to put its "fill file"
(not the log file- which i redirected to /dev/console) on my JFFS2 
partition. Then I let JitterTest fillup the entire flash partition and
measured the worst case jitter in this case.

This way you can measure the jitter time for a task writing to JFFS2 itself.

I also ran another test where in addition to the above, I also ran another 
version of JitterTest
(this time as a POSIX RT task) and did not make it interact with JFFS2 at 
all. The worst case
jitter of this guy told me the worst case jitter to expect in a task NOT 
interacting with JFFS2
even while there was another task filling up the JFFS2 partition in the 
background. The results I got
for this test were ~50ms.

>There is something else that I did and maybe you did'nt.  In my chip driver,
>I put a "if( current->need_resched ) schedule();" in these fonctions:
>atmel_0001_write(), atmel_0001_read() and atmel_0001_erase_varsize(), which
>are the functions called by mtdblock driver through the mtd_info structure.
>I put schedule()'s because I though that if an application request a write
>of 1MB to JFFS2, atmel_0001_write() would not return until that 1MB of data
>was compressed and written to JFFS2.  Same thing for read and writes.

Interesting point. Are your flash chips in my MTD flash database on 
www.embeddedlinuxworks.com?

Vipin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: JFFS2 is broken
@ 2001-08-15 15:09 Frederic Giasson
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Giasson @ 2001-08-15 15:09 UTC (permalink / raw)
  To: 'Vipin Malik'; +Cc: MTD mailing list (E-mail)

-----Original Message-----
From: Vipin Malik [mailto:vipin@embeddedlinuxworks.com]
Sent: Tuesday, August 14, 2001 5:57 PM
To: Frederic Giasson; MTD mailing list (E-mail)
Subject: RE: JFFS2 is broken 

Be careful, in my tests, the worst case blocking times came when the flash 
was >90+%
full. You may want to increase the full % on your flash full program and 
retest.

I ran JitterTest with -w 1000 bytes and after a while, it began to slow down
a lot.  Note that I had the kernel image file stored in jffs2 during that
test, so that JitterTest had 50% of the flash left for itself.  When I
stopped JitterTest, the worst case was 24 seconds and the instant average
jitter was probably around 5 seconds... What is happening?  Is JFFS2
constantly gargabe collecting after a while?  I looked at the size of the
file created by the -f option, and along with my kernel file it was using
95% of the flash.

Frédéric Giasson

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2001-08-15 15:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <F1BED55F35F4D3118C0F00E0295CFF4D9955F7@webmail.mediatrix.c om>
2001-08-13 22:27 ` JFFS2 is broken Vipin Malik
2001-08-14  0:00   ` David Woodhouse
2001-08-14  2:39     ` Vipin Malik
2001-08-15 15:09 Frederic Giasson
     [not found] <F1BED55F35F4D3118C0F00E0295CFF4DD0DAE1@webmail.mediatrix.c om>
2001-08-14 21:57 ` Vipin Malik
     [not found] <F1BED55F35F4D3118C0F00E0295CFF4DD0DADF@webmail.mediatrix.c om>
2001-08-14 17:57 ` Vipin Malik
  -- strict thread matches above, loose matches on Subject: below --
2001-08-14 17:57 Frederic Giasson
2001-08-14 17:39 Frederic Giasson
2001-08-14 16:46 Frederic Giasson
2001-08-13 18:34 Frederic Giasson
2001-06-29  0:14 Vipin Malik
2001-06-29  2:32 ` Nicolas Pitre
2001-06-29  9:00 ` Alan Cox
2001-06-29 14:13   ` Vipin Malik
2001-07-01 19:10   ` David Woodhouse
2001-06-29 22:11 ` Vipin Malik
2001-08-09 17:00 ` A. Craig West
2001-08-09 22:36   ` Vipin Malik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox