public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* RE: Disk blocks for long periods
@ 2002-08-06 20:16 Dave Ellis
  2002-08-06 21:06 ` Joakim Tjernlund
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Ellis @ 2002-08-06 20:16 UTC (permalink / raw)
  To: 'joakim.tjernlund@lumentis.se'; +Cc: linux-mtd

joakim.tjernlund@lumentis.se said:

> hmm, I just noticed that the BIG kernel lock is held while 
> the erasing thread executes, since kupdate takes it in 
> sync_old_buffers() (if I understand this code correctly). 
> Maybe that is causing current->need_resched to be set?
> 
> Also, is it not a bad idea to hold the kernel lock for, 
> possibly, seconds while erasing sectors?

If it is doing that, it doesn't seem right to me. But I 
thought jffs2/mtd bypassed the kernel buffers, so maybe 
kupdate and sync_old_buffers() are not involved here? I don't
know much about this.
 
> I assume your chip does not support buffered writes? or is it 
> just do_write_one_word() that is causing the log delays?

No, it doesn't support buffered writes. The long delay I saw
comes from sleeping for a jiffie during each call to
do_write_oneword().

Dave Ellis
dge@sixnetio.com

^ permalink raw reply	[flat|nested] 28+ messages in thread
* RE: Disk blocks for long periods
@ 2002-08-07 16:42 Dave Ellis
  2002-08-08  7:08 ` Joakim Tjernlund
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Ellis @ 2002-08-07 16:42 UTC (permalink / raw)
  To: 'David Woodhouse'; +Cc: linux-mtd

David Woodhouse said:
> DGE@sixnetio.com said:
> > The cfi_udelay() call is part of the problem, since it lets other 
> > processes run, and the one that runs is the erasing process, but it 
> > can't do anything except slow things down since the flash  is busy 
> > writing.
> 
> The erasing process should be sitting in TASK_UNINTERRUPTIBLE and on the 
> chip's wait queue, waiting to be woken when the chip becomes available.
Why 
> is it running?

It is running because of the
        wake_up(&chip->wq);
at the end of do_write_oneword(). So it wakes up for every word written.

With the old code (cfi_cmdset_0002.c 1.56 and before) the next chance it 
gets to run is when cfi_udelay() is called during the next 
do_write_oneword(). Unfortunately, it can't start erasing because 
chip->state is FL_WRITING, so it puts itself back on the wait queue. The
wake_up() at the end of do_write_oneword() wakes it up again, but it
doesn't run until the cfi_delay() in the _next_ call to do_write_oneword().
So it runs and puts itself back on the wait queue for every call to 
do_write_oneword(). Eventually all the words are written, it wakes up to 
find FL_READY and starts the erase. Without the one jiffie delay in 
cfi_delay() each time it wouldn't have been as bad, but erase going off 
and back on the wait queue several thousand times still doesn't seem right.

> > What the erasing process does seem to do (and  I don't understand why) 
> > is cause current->need_resched to be set for every call to 
> > cfi_udelay(), so it calls schedule_timeout(1) and sleeps for a whole 
> > jiffie.
> 
> Sleeping for a whole jiffie is bad. We should udelay() and yield() as you 
> point out; there's no reason to use schedule_timeout() and hence stop 
> ourself from being woken immediately if there's not really anything else
to 
> run. I've committed that change to CVS.

Thanks, I looked at the change and seems much better to me.

> Adding cfi_udelay() to do_write_oneword() is dangerous. You 
> may trigger a bug that's actually already there. ISTR there's 
> a time limit on the 'unlock bypass' mode, and if you 
> schedule() you may find that the chip is no longer 
> in that mode when you return.

Actually I didn't add it, I just moved it to the end, where chip->state
is FL_READY, so the erase can actually start if it wakes up. I think
both the old code and your latest change to cfi_cmdset_0002.c
would have the same problem with a time limit, since they call 
cfi_udelay() or yield() while each word is writing. I didn't
find any time limit for my chip.

I think my change (which really should be cond_resched(), not 
cfi_udelay() since I don't need the delay) would cause a 
different problem with 'unlock bypass' (which I am not using).
The chip is tagged as FL_READY when it really (I guess) 
isn't. But if I schedule() between words with the chip not 
FL_READY, I am right back to the original problem. And if I
don't schedule() until the whole write is done nothing else
can run for a long time.

Maybe if 'unlock bypass' is used the
  chip->state = FL_READY; wake_up(&chip->wq);
should only happen after leaving 'unlock bypass' mode?

I wonder if 'unlock bypass' is worth all the trouble. I think
I'll just not use it and call cond_resched() instead of 
cfi_udelay() at the end of do_write_oneword().
 
> Another thing you might want to do if you really care about write 
> performance is copy the logic for calculating word_write_time 
> from the cfi_cmdset_0001 driver.

Thanks for the suggestion, but the write performance with my
fix is fine so I may not try it for a while. The long delays
(up to 60 seconds) on single write() were the problem. They 
were making a communications protocol time out and making total
load times very long. Without them, the average performance is 
fine.

I am very impressed with jffs2 and mtd. With this one problem
fixed it is exactly what we need and works very well.

Dave Ellis
dge@sixnetio.com

^ permalink raw reply	[flat|nested] 28+ messages in thread
* RE: Disk blocks for long periods
@ 2002-08-06 14:53 Dave Ellis
  2002-08-06 15:09 ` Joakim Tjernlund
  2002-08-07 11:11 ` David Woodhouse
  0 siblings, 2 replies; 28+ messages in thread
From: Dave Ellis @ 2002-08-06 14:53 UTC (permalink / raw)
  To: 'joakim.tjernlund@lumentis.se'; +Cc: linux-mtd

joakim.tjernlund@lumentis.se said:

> Have you tried to use cfi_udelay(chip->word_write_time) 
> instead of udelay(chip->word_write_time)? That will at least 
> let other processes run, won't it? 

The cfi_udelay() call is part of the problem, since it lets other
processes run, and the one that runs is the erasing process, but
it can't do anything except slow things down since the flash 
is busy writing. What the erasing process does seem to do (and 
I don't understand why) is cause current->need_resched to be set
for every call to cfi_udelay(), so it calls schedule_timeout(1) 
and sleeps for a whole jiffie.

By calling cfi_udelay() at the _end_ of the write of each word,
it still allows the call to schedule_timeout() to happen, but
at a time when the chip state allows other mtd processes operations,
such as erase, to use it, so the one jiffie doesn't happen
for every word written. The word writes are short (10 us typical,
360 us worst case for my chip), so not scheduling during the write
is not so bad.

My change seems to solve the problem for me, but I don't 
understand the scheduling well enough to understand why 
current->need_resched gets set each time.

Maybe cfi_udelay() should be changed so for short delays it
does schedule(); udelay(delay); instead of schedule_timeout(1).
If I get some time later I will try it.
 
Dave Ellis
dge@sixnetio.com

> > --- cfi_cmdset_0002.c	Mon Jul 15 11:13:25 2002
> > +++ cfi_cmdset_0002.fixed.c	Mon Aug  5 14:28:40 2002
> > @@ -386,9 +384,7 @@
> >  
> >  	cfi_write(map, datum, adr);
> >  
> > -	cfi_spin_unlock(chip->mutex);
> > -	cfi_udelay(chip->word_write_time);
> > -	cfi_spin_lock(chip->mutex);
> > +	udelay(chip->word_write_time);
> >  
> >  
> >  	/* Polling toggle bits instead of reading back many 
> times @@ -447,6 
> > +444,7 @@
> >  	chip->state = FL_READY;
> >  	wake_up(&chip->wq);
> >  	cfi_spin_unlock(chip->mutex);
> > +	cfi_udelay(1);	/* just a chance to schedule() */
> >  	
> >  	return ret;
> >  }

^ permalink raw reply	[flat|nested] 28+ messages in thread
* RE: Disk blocks for long periods
@ 2002-08-05 18:50 Dave Ellis
  2002-08-06 10:06 ` Joakim Tjernlund
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Ellis @ 2002-08-05 18:50 UTC (permalink / raw)
  To: linux-mtd

joakim.tjernlund@lumentis.se said:

> I have noticed that if i copy a "big" file(580K) it sometimes 
> take up to 42 seconds before it's finished. Normally it takes 
> about 3-4 seconds. When this long copy happen, top reports 
> that kupdate and the copy(cp) process is in D state, the rest 
> is sleeping. The FS is a 45% usage. FS is about 63MB in size. 
> Using the stable branch.
> 
> Why does it block for so long time?

I've been working on what may be the same problem and I think I 
finally understand it. I've seen it with 2.4.4, 2.4.18 and
with 2.4.18 with the 2.4.19 jffs2 and mtd code. I am using an 
AM29LV641, which uses cfi_cmdset_0002.c, but the code in 
cfi_cmdset_0001.c is similar. I have a possible solution
but I'd like some feedback on it.

The problem occurs when do_erase_oneblock() tries to lock the 
flash while cfi_amdstd_write() is writing a lot of data. The 
erasing thread locks the chip mutex when do_write_oneword() does the
    cfi_spin_unlock(chip->mutex);
    cfi_udelay(chip->word_write_time);
    cfi_spinlock(chip->mutex);
sequence. It sees the state is FL_WRITING, so it puts itself 
back on the wait queue. do_write_oneword() continues and eventually
sets the state back to FL_READY and wakes up the queue, but the
erasing thread doesn't actually run until the cfi_udelay() in 
do_write_oneword() calls schedule() while writing the next word
to the flash. The state is FL_WRITING, so the erasing thread goes
back on the wait queue. This continues until the entire write is
finished, then the erasing thread finally starts the erase.

The effect of all this sheduling is to write exactly one word
to flash for each jiffie! My flash is 16 bits wide, so a single
write of 2400 bytes was sometimes taking 1200 jiffies, or
12 seconds.

This patch to cfi_cmdset_0002.c 1.56 seems solve the problem, but
I am not sure if this is right way to do it. Comments?

Dave Ellis
dge@sixnetio.com

BTW - If I make similar changes to 1.55 or before it solves this
problem, but the write fails occasionally. I am guessing that
with my change it gets to the write completion check faster and the
old check fails, but the new write completion polling works better.

--- cfi_cmdset_0002.c	Mon Jul 15 11:13:25 2002
+++ cfi_cmdset_0002.fixed.c	Mon Aug  5 14:28:40 2002
@@ -386,9 +384,7 @@
 
 	cfi_write(map, datum, adr);
 
-	cfi_spin_unlock(chip->mutex);
-	cfi_udelay(chip->word_write_time);
-	cfi_spin_lock(chip->mutex);
+	udelay(chip->word_write_time);
 
 
 	/* Polling toggle bits instead of reading back many times
@@ -447,6 +444,7 @@
 	chip->state = FL_READY;
 	wake_up(&chip->wq);
 	cfi_spin_unlock(chip->mutex);
+	cfi_udelay(1);	/* just a chance to schedule() */
 	
 	return ret;
 }

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: MTD Partition problems
@ 2002-08-05 10:35 David Woodhouse
  2002-08-05 13:35 ` Disk blocks for long periods Joakim Tjernlund
  0 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2002-08-05 10:35 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Vipin Malik, linux-mtd

joern@wohnheim.fh-wedel.de said:
> > Sorry for the stupid question: If it's by design, why is it going to
> > be ripped out sometime soon?

> It might have made sense, when it was designed. It might just have
> been a relatively new developer making the usual mistakes. Whatever.

> Fact is that it is useless, the superuser can create a read only
> device simply by 'chmod -w /dev/mtd0', if she wishes. That is all
> those extra devices do, except for causing confusion and eating up
> minor numbers. 

I inherited the major number from elsewhere, and it was already set up like 
that. I haven't changed it because people have only just started to read 
the bloody documentation and set it up right -- with the occasional 
exception :)

>  Currently, there is no branch that tackles this problem. There was
> one, but David never took a look, was busy, and rescheduled to 'when
> 2.4.19 is out'. Now I am busy,...

I want to get the remaining changes in the tree into 2.4.20-pre, then 
update the code for 2.5 and start doing new exciting stuff. 

--
dwmw2

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2002-08-08  9:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-06 20:16 Disk blocks for long periods Dave Ellis
2002-08-06 21:06 ` Joakim Tjernlund
  -- strict thread matches above, loose matches on Subject: below --
2002-08-07 16:42 Dave Ellis
2002-08-08  7:08 ` Joakim Tjernlund
2002-08-08  8:08   ` David Woodhouse
2002-08-08  9:15     ` Joakim Tjernlund
2002-08-08  9:18       ` David Woodhouse
2002-08-06 14:53 Dave Ellis
2002-08-06 15:09 ` Joakim Tjernlund
2002-08-07 11:11 ` David Woodhouse
2002-08-05 18:50 Dave Ellis
2002-08-06 10:06 ` Joakim Tjernlund
2002-08-06 12:14   ` David Woodhouse
2002-08-06 13:52     ` Jörn Engel
2002-08-06 13:53       ` David Woodhouse
2002-08-06 16:45     ` Joakim Tjernlund
2002-08-07  9:51       ` David Woodhouse
2002-08-08  7:23         ` Joakim Tjernlund
2002-08-08  8:02           ` David Woodhouse
2002-08-08  8:32             ` Joakim Tjernlund
2002-08-08  8:40               ` David Woodhouse
2002-08-05 10:35 MTD Partition problems David Woodhouse
2002-08-05 13:35 ` Disk blocks for long periods Joakim Tjernlund
2002-08-05 13:44   ` David Woodhouse
2002-08-05 13:59     ` Joakim Tjernlund
2002-08-05 14:12       ` David Woodhouse
2002-08-05 14:32         ` Joakim Tjernlund
2002-08-05 14:42           ` David Woodhouse
2002-08-05 21:45             ` Jasmine Strong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox