* CPU caching of flash regions.
@ 2001-05-14 14:15 David Woodhouse
2001-05-14 15:51 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: David Woodhouse @ 2001-05-14 14:15 UTC (permalink / raw)
To: linux-mtd; +Cc: ajlennon
I've just seen profiling of a system mounting JFFS2 filesystem which shows
that the majority of the time is spend in the map driver's copy_from
function.
The copy_from() functions are currently using a completely uncached mapping
of the flash chip, but in fact for reading the chip that's not strictly
necessary. This is especially true during the initial scan.
I think we ought to allow map drivers to do intelligent caching of bus
accesses. Suggested semantics:
1. Only the copy_from() and copy_to() functions can use a cacheable mapping.
2. Any access to the chip through one of the other ({read,write}{8,16,32})
functions causes the cache to be flushed for the entire mapping.
If a cache flush is expensive, a mapping driver may optimise the flushes and
perform a cache flush only if the cache is expected to be non-empty.
This approach is fairly simple, and allows mapping drivers to do something
closely approximating the "right thing" without adding complexity to the
chip driver code. An alternative, which I'm dubious about, is to add
explicit cache management functionality to the methods exported by the
mapping drivers, and to have the chip driver explicitly turn the cache
on/off and flush parts of it when writing/erasing.
Comments?
--
dwmw2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CPU caching of flash regions.
2001-05-14 14:15 CPU caching of flash regions David Woodhouse
@ 2001-05-14 15:51 ` Eric W. Biederman
2001-05-14 16:17 ` David Woodhouse
0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2001-05-14 15:51 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd, ajlennon
David Woodhouse <dwmw2@infradead.org> writes:
> I've just seen profiling of a system mounting JFFS2 filesystem which shows
> that the majority of the time is spend in the map driver's copy_from
> function.
>
> The copy_from() functions are currently using a completely uncached mapping
> of the flash chip, but in fact for reading the chip that's not strictly
> necessary. This is especially true during the initial scan.
>
> I think we ought to allow map drivers to do intelligent caching of bus
> accesses. Suggested semantics:
>
> 1. Only the copy_from() and copy_to() functions can use a cacheable mapping.
>
> 2. Any access to the chip through one of the other ({read,write}{8,16,32})
> functions causes the cache to be flushed for the entire mapping.
>
> If a cache flush is expensive, a mapping driver may optimise the flushes and
> perform a cache flush only if the cache is expected to be non-empty.
>
> This approach is fairly simple, and allows mapping drivers to do something
> closely approximating the "right thing" without adding complexity to the
> chip driver code. An alternative, which I'm dubious about, is to add
> explicit cache management functionality to the methods exported by the
> mapping drivers, and to have the chip driver explicitly turn the cache
> on/off and flush parts of it when writing/erasing.
>
> Comments?
What kind of scenario are we talking about? Do the pages get read
multiple times? Of is it just that that copy_from needs to be more
highly optimized like memcpy? I suspect that before the whole interface
changes you should experiment and see what really needs to be done.
As for interface changes I would suggest an additional opertation
memory_barrier that forces the flush if needed.
But I really think you should be able to get it working faster simply
by optimizing the copy_from routine.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CPU caching of flash regions.
2001-05-14 15:51 ` Eric W. Biederman
@ 2001-05-14 16:17 ` David Woodhouse
2001-05-14 16:32 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: David Woodhouse @ 2001-05-14 16:17 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: linux-mtd, ajlennon
ebiederman@lnxi.com said:
> What kind of scenario are we talking about? Do the pages get read
> multiple times? Of is it just that that copy_from needs to be more
> highly optimized like memcpy? I suspect that before the whole
> interface changes you should experiment and see what really needs to
> be done.
This is during the initial mount of JFFS2. Nothing should be read twice -
but we should at least be able to fill cache lines and do burst reads from
the flash chips, shouldn't we?
> As for interface changes I would suggest an additional opertation
> memory_barrier that forces the flush if needed.
The original plan involved no interface changes - I was suggesting that the
map driver would DTRT with the caches internally.
> But I really think you should be able to get it working faster simply
> by optimizing the copy_from routine.
Most of the copy_from routines use memcpy_fromio(), which on i386 is just
a memcpy(). It ought to be fairly close to optimal.
Actually, the board used for the offending profile is a board with paged
access to the flash, so it's slightly slower than some others - but the
overhead shouldn't be too high. And the cache benefit would be more limited.
--
dwmw2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CPU caching of flash regions.
2001-05-14 16:17 ` David Woodhouse
@ 2001-05-14 16:32 ` Eric W. Biederman
2001-05-15 10:46 ` Alex Lennon
0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2001-05-14 16:32 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd, ajlennon
David Woodhouse <dwmw2@infradead.org> writes:
> ebiederman@lnxi.com said:
> > What kind of scenario are we talking about? Do the pages get read
> > multiple times? Of is it just that that copy_from needs to be more
> > highly optimized like memcpy? I suspect that before the whole
> > interface changes you should experiment and see what really needs to
> > be done.
>
> This is during the initial mount of JFFS2. Nothing should be read twice -
> but we should at least be able to fill cache lines and do burst reads from
> the flash chips, shouldn't we?
Definentily. To date I've only had a real hard look at the write
case. So I can't answer off the top of my head what needs to happen.
> > But I really think you should be able to get it working faster simply
> > by optimizing the copy_from routine.
>
> Most of the copy_from routines use memcpy_fromio(), which on i386 is just
> a memcpy(). It ought to be fairly close to optimal.
O.k. So that shouldn't be an issue if the kernel is properly
optimized.
> Actually, the board used for the offending profile is a board with paged
> access to the flash, so it's slightly slower than some others - but the
> overhead shouldn't be too high. And the cache benefit would be more limited.
First. What kind of chip is being used? What bus is it on? And how
fast is it?
Second. What kind of processor, and what kind of chipset are being used?
Getting bandwidth numbers out of the memcpy would be a useful
debugging technique. I really suspect the overhead is in the chip
itself. Flash chips are not know for their speed.
If the chip is out on the ISA bus unless you set up approriate
decoders for it, the chip PCI->ISA bridge will be doing subtractive
decode which will slow you down.
If we could start with some theoretical bandwidth numbers for
the chip, and compare that to what memcpy_fromio is giving we can see
how much room their is for optimization.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: CPU caching of flash regions.
2001-05-14 16:32 ` Eric W. Biederman
@ 2001-05-15 10:46 ` Alex Lennon
2001-05-15 14:32 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Alex Lennon @ 2001-05-15 10:46 UTC (permalink / raw)
To: Eric W. Biederman, David Woodhouse; +Cc: linux-mtd
Eric,
> Actually, the board used for the offending profile is a board with paged
> access to the flash, so it's slightly slower than some others - but the
> overhead shouldn't be too high. And the cache benefit would be more
limited.
> What kind of chip is being used?
Two contiguous Intel Strataflash 28F640's giving 16Mb total
> What bus is it on?
ISA
> And how fast is it?
8Mhz
> Second. What kind of processor, and what kind of chipset are being used?
National Geode GX1 300Mhz with CS5530 support chipset
To generate some figures I knocked together code which reads the 16Mb from
the flash, paging as it goes. Nothing is done with the data. This takes
about 16s
With the hardcoded value of CONFIG_JFFS2_FS_DEBUG set to 2 in
fs/jffs2/nodelist.h
I get jffs2 root fs mount times in excess of 34s.
When I remove the debugging I get mount times of around 26s
Obviously the figures obtained from df need some massaging to take account
of compression
but I get:
/dev/root 14336 3760 10576 26% /
/dev/mtdblock1 1280 644 636 50% /var
/dev/ram0 3963 26 3733 1% /var/tmp
So what does this mean ? Can I expect a fourfold increase in mount time with
a full f/s ?
Should I be comparing a 26s jffs2 mount to an idealistic 4s 4Mb flash read ?
Regards,
Alex
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CPU caching of flash regions.
2001-05-15 10:46 ` Alex Lennon
@ 2001-05-15 14:32 ` Eric W. Biederman
0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2001-05-15 14:32 UTC (permalink / raw)
To: Alex Lennon; +Cc: Eric W. Biederman, David Woodhouse, linux-mtd
"Alex Lennon" <ajlennon@arcom.co.uk> writes:
> Eric,
>
> > Actually, the board used for the offending profile is a board with paged
> > access to the flash, so it's slightly slower than some others - but the
> > overhead shouldn't be too high. And the cache benefit would be more
> limited.
>
> > What kind of chip is being used?
>
> Two contiguous Intel Strataflash 28F640's giving 16Mb total
>
> > What bus is it on?
>
> ISA
>
> > And how fast is it?
>
> 8Mhz
>
> > Second. What kind of processor, and what kind of chipset are being used?
>
> National Geode GX1 300Mhz with CS5530 support chipset
>
> To generate some figures I knocked together code which reads the 16Mb from
> the flash, paging as it goes. Nothing is done with the data. This takes
> about 16s
O.k. A really crude estimate places ISA at about 8Megabytes/sec.
With protocol overhead you can probably only get maybe a 1/3 of that, but
I suspect your pure read case is bottlenecking on the chip read speed,
and not the ISA bus.
With these numbers it is easy to see that jffs2 is not being
bottlenecked by the flash chip. There is internal overhead, and a
300Mhz processor should be fast enough that mildly inefficent
algorithms shouldn't show up.
> With the hardcoded value of CONFIG_JFFS2_FS_DEBUG set to 2 in
> fs/jffs2/nodelist.h
> I get jffs2 root fs mount times in excess of 34s.
>
> When I remove the debugging I get mount times of around 26s
>
> Obviously the figures obtained from df need some massaging to take account
> of compression
> but I get:
>
> /dev/root 14336 3760 10576 26% /
> /dev/mtdblock1 1280 644 636 50% /var
> /dev/ram0 3963 26 3733 1% /var/tmp
>
>
> So what does this mean ? Can I expect a fourfold increase in mount time with
> a full f/s ?
> Should I be comparing a 26s jffs2 mount to an idealistic 4s 4Mb
> flash read ?
Someone who knows jffs2 whill have to comment on what is going on but
there is some significant overhead here.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-05-15 14:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-14 14:15 CPU caching of flash regions David Woodhouse
2001-05-14 15:51 ` Eric W. Biederman
2001-05-14 16:17 ` David Woodhouse
2001-05-14 16:32 ` Eric W. Biederman
2001-05-15 10:46 ` Alex Lennon
2001-05-15 14:32 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox