* please unsubscribe me
From: yiftachf @ 2002-11-10 7:46 UTC (permalink / raw)
To: linux-mtd
^ permalink raw reply
* Re: Q on MTD support for NOR flash
From: Kevin Kaichuan He @ 2002-11-10 7:49 UTC (permalink / raw)
To: Jörn Engel; +Cc: linux-mtd
In-Reply-To: <20021109205730.GB16704@wohnheim.fh-wedel.de>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1797 bytes --]
Jorn,
Thank you very much !
One Further question is : how do I configure MTD driver to support
my NOR flash ? For example, the AMD29LV800 flash has two width: 8bits
or 16bits, how do I let the driver know which width of word I choose ?
Or can the MTD magically figure out the configuration of my NOR
flash ?
thanks!
Kevin
--- Jörn Engel <joern@wohnheim.fh-wedel.de> wrote:
> On Sat, 9 November 2002 12:36:50 -0800, Kevin Kaichuan He wrote:
> >
> > We are considering to use NOR flash in a embedded linux
> > system. But it seems that NAND flash support was mentioned
> > a lot in MTD instead of NOR flash. I'm wondering if there
> > is intensive NOR flash support in MTD, specifically if
> > AMD's boot sector NOR flash
> >
>
(AM29LV800B,http://www.amd.com/us-en/FlashMemory/ProductInformation/0,,37_1447_1623_1468%5E1532,00.html)
> > is supported.
>
> Yupp.
> Nand flash support is not too old and thus is getting a lot more
> development now. Nor, in almost all cases, simply works.
>
> > Also can we partition the AM29LV800B into multiple partitions and
> > mount different filesystem on it (e.g. JFFS on RW partiton and Cramfs
> > on RO partition) ?
>
> Yupp.
>
> > How about the boot sector of NOR flash, is it supported too ?
>
> Kinda. If you have to access the small fragments seperately, you might
> run into problems. But that is usually only done from a bootloader,
> not from linux.
> For all practical purposes, yupp.
>
> Jörn
>
> --
> Fancy algorithms are slow when n is small, and n is usually small.
> Fancy algorithms have big constants. Until you know that n is
> frequently going to be big, don't get fancy.
> -- Rob Pike
__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2
^ permalink raw reply
* Re: Q on MTD support for NOR flash
From: Jörn Engel @ 2002-11-10 8:20 UTC (permalink / raw)
To: Kevin Kaichuan He; +Cc: linux-mtd
In-Reply-To: <20021110074920.4607.qmail@web14812.mail.yahoo.com>
On Sat, 9 November 2002 23:49:20 -0800, Kevin Kaichuan He wrote:
>
> One Further question is : how do I configure MTD driver to support
> my NOR flash ? For example, the AMD29LV800 flash has two width: 8bits
> or 16bits, how do I let the driver know which width of word I choose ?
> Or can the MTD magically figure out the configuration of my NOR
> flash ?
You need something like this:
CONFIG_MTD=y
CONFIG_MTD_PARTITIONS=y
CONFIG_MTD_CHAR=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_GEN_PROBE=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_PHYSMAP...=...
Details might vary, have fun figuring those out. Buswidth must be set
for physmap.
Jörn
--
Good warriors cause others to come to them and do not go to others.
-- Sun Tzu
^ permalink raw reply
* Re: Building mkfs.jffs2
From: Darren Freeman @ 2002-11-10 19:27 UTC (permalink / raw)
To: Jörn Engel; +Cc: Linux MTD List
In-Reply-To: <20021107153857.GC31118@wohnheim.fh-wedel.de>
Jorn,
On Thu, 2002-11-07 at 15:38, Jörn Engel wrote:
> On Thu, 7 November 2002 22:49:15 +0000, Darren Freeman wrote:
> >
> > > I simply can't get mtd/utils to build on either host. I don't want a
> > > cross-compiled binary, just something to use on the host. I have tried
> > > using ./configure --with-kernel= pointing to the host kernel or target
> > > but no success. I guess the target kernel is the correct one to use
> > > here.
> >
> > Yeah I tried and tried and tried some more. Still gave up.
>
> Dunno. I've never had any problems, compiling for x86 or
> crosscompiling for ppc. Provide ssh login and I will have a look. If
> company policy forbids this, let the company pay your time. :-)
University policy forbids it from even being networked to anything other
than the RedHat box next to it. Let me tell you, it's a real *pain* =)
Imagine the suffering I go through just to get the kernel source on that
box =)
> Jörn
Thanks anyway =)
Have fun,
Darren
^ permalink raw reply
* Is D323DB90VI CFI compliant??...
From: vijay vijay @ 2002-11-10 14:27 UTC (permalink / raw)
To: linux-mtd; +Cc: vijay.peshkar
Hi Friends,
Is AMD's flash D323DB90VI CFI compliant. Couldn't find ANY
data sheets on it @ AMD's site. Searching thru google, found
only one reference on it. That too was a query.
Any suggestions on how to proceed?.
Thanks and Regards,
Vijay
_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail
^ permalink raw reply
* crc32() optimization
From: Joakim Tjernlund @ 2002-11-10 15:28 UTC (permalink / raw)
To: David Woodhouse; +Cc: jffs-dev, linux-mtd
In-Reply-To: <24987.1036797874@passion.cambridge.redhat.com>
Hi David
This patch improves my scan time with 22%( from 2.39 to 1.86 seconds).
Maybe you want to include it in the 2.4 branch.
I will put this in my next backport of the crc32 stuff from 2.5.
Jocke
Index: fs/jffs2/crc32.h
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/crc32.h,v
retrieving revision 1.3
diff -u -b -r1.3 crc32.h
--- fs/jffs2/crc32.h 26 Feb 2001 14:44:37 -0000 1.3
+++ fs/jffs2/crc32.h 10 Nov 2002 15:25:11 -0000
@@ -13,7 +13,16 @@
crc32(__u32 val, const void *ss, int len)
{
const unsigned char *s = ss;
- while (--len >= 0)
+ while (len >= 6){
+ val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
+ val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
+ val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
+ val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
+ val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
+ val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
+ len -= 6;
+ }
+ while (len--)
val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
return val;
}
^ permalink raw reply
* Re: crc32() optimization
From: Marc Singer @ 2002-11-10 18:43 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linux-mtd
In-Reply-To: <002301c288cd$c11a6180$0200a8c0@telia.com>
As it should. I wonder if you'd do better changing the loop slightly.
Check for len == 0 and do a short-circuit return. Then do this
for (++len; len & 0x7; len >>= 3) {
ONCE(); // repeat eight times
...
len >>= 3;
}
while (--len > 0)
ONCE();
This is the implementation I've written for another project which
we've found to be relatively optimal. Note that len *must* be an int
even though contemporary convention is to use the size_t type.
On Sun, Nov 10, 2002 at 04:28:00PM +0100, Joakim Tjernlund wrote:
> Hi David
>
> This patch improves my scan time with 22%( from 2.39 to 1.86 seconds).
> Maybe you want to include it in the 2.4 branch.
>
> I will put this in my next backport of the crc32 stuff from 2.5.
>
> Jocke
>
> Index: fs/jffs2/crc32.h
> ===================================================================
> RCS file: /home/cvs/mtd/fs/jffs2/crc32.h,v
> retrieving revision 1.3
> diff -u -b -r1.3 crc32.h
> --- fs/jffs2/crc32.h 26 Feb 2001 14:44:37 -0000 1.3
> +++ fs/jffs2/crc32.h 10 Nov 2002 15:25:11 -0000
> @@ -13,7 +13,16 @@
> crc32(__u32 val, const void *ss, int len)
> {
> const unsigned char *s = ss;
> - while (--len >= 0)
> + while (len >= 6){
> + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> + len -= 6;
> + }
> + while (len--)
> val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> return val;
> }
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply
* Re: crc32() optimization
From: Wolfgang Denk @ 2002-11-10 19:25 UTC (permalink / raw)
To: Marc Singer; +Cc: Joakim Tjernlund, linux-mtd
In-Reply-To: <20021110184321.GB16087@buici.com>
In message <20021110184321.GB16087@buici.com> you wrote:
> As it should. I wonder if you'd do better changing the loop slightly.
>
> Check for len == 0 and do a short-circuit return. Then do this
>
> for (++len; len & 0x7; len >>= 3) {
> ONCE(); // repeat eight times
> ...
> len >>= 3;
> }
> while (--len > 0)
> ONCE();
>
> This is the implementation I've written for another project which
Seems broken to me, since you "len >>= 3" twice.
Also, Duff's Device comes to mind.
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
^ permalink raw reply
* Re: crc32() optimization
From: Joakim Tjernlund @ 2002-11-10 20:04 UTC (permalink / raw)
To: Marc Singer; +Cc: linux-mtd
In-Reply-To: <20021110184321.GB16087@buici.com>
hmm , maybe. I tried 16, 8 & 4 also, but 6 was a little faster for me.
What would be great if someone that understands CRC better than me could
take a look at Algorithm 4 at http://www.cl.cam.ac.uk/Research/SRG/bluebook/21/crc/node6.html#SECTION00060000000000000000
and apply that on linux CRC32 code. I tried but failed to get it correct.
Jocke
> As it should. I wonder if you'd do better changing the loop slightly.
>
> Check for len == 0 and do a short-circuit return. Then do this
>
> for (++len; len & 0x7; len >>= 3) {
> ONCE(); // repeat eight times
> ...
> len >>= 3;
> }
> while (--len > 0)
> ONCE();
>
> This is the implementation I've written for another project which
> we've found to be relatively optimal. Note that len *must* be an int
> even though contemporary convention is to use the size_t type.
>
>
> On Sun, Nov 10, 2002 at 04:28:00PM +0100, Joakim Tjernlund wrote:
> > Hi David
> >
> > This patch improves my scan time with 22%( from 2.39 to 1.86 seconds).
> > Maybe you want to include it in the 2.4 branch.
> >
> > I will put this in my next backport of the crc32 stuff from 2.5.
> >
> > Jocke
> >
> > Index: fs/jffs2/crc32.h
> > ===================================================================
> > RCS file: /home/cvs/mtd/fs/jffs2/crc32.h,v
> > retrieving revision 1.3
> > diff -u -b -r1.3 crc32.h
> > --- fs/jffs2/crc32.h 26 Feb 2001 14:44:37 -0000 1.3
> > +++ fs/jffs2/crc32.h 10 Nov 2002 15:25:11 -0000
> > @@ -13,7 +13,16 @@
> > crc32(__u32 val, const void *ss, int len)
> > {
> > const unsigned char *s = ss;
> > - while (--len >= 0)
> > + while (len >= 6){
> > + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> > + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> > + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> > + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> > + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> > + val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> > + len -= 6;
> > + }
> > + while (len--)
> > val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
> > return val;
> > }
> >
> >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply
* Re: crc32() optimization
From: Joakim Tjernlund @ 2002-11-10 20:05 UTC (permalink / raw)
To: Marc Singer, Wolfgang Denk; +Cc: linux-mtd
In-Reply-To: <20021110192544.0A34D10162@denx.denx.de>
Hi Wolfgang
What's "Duff's Device"?
Jocke
>
> Seems broken to me, since you "len >>= 3" twice.
>
> Also, Duff's Device comes to mind.
>
> Best regards,
>
> Wolfgang Denk
>
> --
> Software Engineering: Embedded and Realtime Systems, Embedded Linux
> Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
> See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
^ permalink raw reply
* Re: crc32() optimization
From: Wolfgang Denk @ 2002-11-10 21:00 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Marc Singer, linux-mtd
In-Reply-To: <006901c288f4$79e2ce20$0200a8c0@telia.com>
In message <006901c288f4$79e2ce20$0200a8c0@telia.com> you wrote:
>
> What's "Duff's Device"?
It's a tricky way to implement general loop unrolling directly in C.
Applied to your problem, code that looks like this (instead of 8 any
other loop count may be used, but you need to adjust the "case"
statements then):
register int n = (len + (8-1)) / 8;
switch (len % 8) {
case 0: do { val = crc32_table ... ;
case 7: val = crc32_table ... ;
case 6: val = crc32_table ... ;
case 5: val = crc32_table ... ;
case 4: val = crc32_table ... ;
case 3: val = crc32_table ... ;
case 2: val = crc32_table ... ;
case 1: val = crc32_table ... ;
} while (--n > 0);
}
BTW: this is strictly legal ANSI C!
For an explanation see http://www.lysator.liu.se/c/duffs-device.html
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
^ permalink raw reply
* Re: crc32() optimization
From: Joakim Tjernlund @ 2002-11-10 21:22 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Marc Singer, linux-mtd
In-Reply-To: <20021110210008.7CFF210162@denx.denx.de>
Cool! I will give it a try(tomorrow afternoon) and see what happens.
Jocke
> >
> > What's "Duff's Device"?
>
> It's a tricky way to implement general loop unrolling directly in C.
> Applied to your problem, code that looks like this (instead of 8 any
> other loop count may be used, but you need to adjust the "case"
> statements then):
>
> register int n = (len + (8-1)) / 8;
>
> switch (len % 8) {
> case 0: do { val = crc32_table ... ;
> case 7: val = crc32_table ... ;
> case 6: val = crc32_table ... ;
> case 5: val = crc32_table ... ;
> case 4: val = crc32_table ... ;
> case 3: val = crc32_table ... ;
> case 2: val = crc32_table ... ;
> case 1: val = crc32_table ... ;
> } while (--n > 0);
> }
>
> BTW: this is strictly legal ANSI C!
>
> For an explanation see http://www.lysator.liu.se/c/duffs-device.html
>
> Best regards,
>
> Wolfgang Denk
>
> --
> Software Engineering: Embedded and Realtime Systems, Embedded Linux
> Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
> See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
^ permalink raw reply
* Re: crc32() optimization
From: Joakim Tjernlund @ 2002-11-10 22:35 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Marc Singer, linux-mtd
In-Reply-To: <007b01c288ff$3fc665c0$0200a8c0@telia.com>
I could not wait until tomorrow, so I did it now instead.
The result was worse. The best I got was 7% improvement.
I tried 16, 8, 6 and 4 as unrolling steps.
Jocke
> Cool! I will give it a try(tomorrow afternoon) and see what happens.
>
> Jocke
> > >
> > > What's "Duff's Device"?
> >
> > It's a tricky way to implement general loop unrolling directly in C.
> > Applied to your problem, code that looks like this (instead of 8 any
> > other loop count may be used, but you need to adjust the "case"
> > statements then):
> >
> > register int n = (len + (8-1)) / 8;
> >
> > switch (len % 8) {
> > case 0: do { val = crc32_table ... ;
> > case 7: val = crc32_table ... ;
> > case 6: val = crc32_table ... ;
> > case 5: val = crc32_table ... ;
> > case 4: val = crc32_table ... ;
> > case 3: val = crc32_table ... ;
> > case 2: val = crc32_table ... ;
> > case 1: val = crc32_table ... ;
> > } while (--n > 0);
> > }
> >
> > BTW: this is strictly legal ANSI C!
> >
> > For an explanation see http://www.lysator.liu.se/c/duffs-device.html
> >
> > Best regards,
> >
> > Wolfgang Denk
> >
> > --
> > Software Engineering: Embedded and Realtime Systems, Embedded Linux
> > Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
> > See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply
* Re: crc32() optimization
From: Wolfgang Denk @ 2002-11-10 22:41 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Marc Singer, linux-mtd
In-Reply-To: <001301c28909$743f1f40$0200a8c0@telia.com>
In message <001301c28909$743f1f40$0200a8c0@telia.com> you wrote:
> I could not wait until tomorrow, so I did it now instead.
> The result was worse. The best I got was 7% improvement.
> I tried 16, 8, 6 and 4 as unrolling steps.
Makes no sense to me. Should be at least as efficient as your
original code (marginally better).
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
^ permalink raw reply
* Re: crc32() optimization
From: Joakim Tjernlund @ 2002-11-10 23:00 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Marc Singer, linux-mtd
In-Reply-To: <20021110224142.2F39310162@denx.denx.de>
> In message <001301c28909$743f1f40$0200a8c0@telia.com> you wrote:
> > I could not wait until tomorrow, so I did it now instead.
> > The result was worse. The best I got was 7% improvement.
> > I tried 16, 8, 6 and 4 as unrolling steps.
>
> Makes no sense to me. Should be at least as efficient as your
> original code (marginally better).
I don't understand this either.
Anyone?
Jocke
^ permalink raw reply
* Re: crc32() optimization
From: Eric W. Biederman @ 2002-11-10 23:56 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Wolfgang Denk, Marc Singer, linux-mtd
In-Reply-To: <002b01c2890c$f2c058e0$0200a8c0@telia.com>
"Joakim Tjernlund" <Joakim.Tjernlund@lumentis.se> writes:
> > In message <001301c28909$743f1f40$0200a8c0@telia.com> you wrote:
> > > I could not wait until tomorrow, so I did it now instead.
> > > The result was worse. The best I got was 7% improvement.
> > > I tried 16, 8, 6 and 4 as unrolling steps.
> >
> > Makes no sense to me. Should be at least as efficient as your
> > original code (marginally better).
>
> I don't understand this either.
> Anyone?
You might try it with 6. But a lot depends on what gcc can do with
it and gcc may not be like all of those potential entry points..
Running gcc -S and checking to see the difference in the generated
assembly might be instructive.
Eric
^ permalink raw reply
* Re: crc32() optimization
From: Marc Singer @ 2002-11-11 0:50 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Joakim Tjernlund, linux-mtd
In-Reply-To: <20021110192544.0A34D10162@denx.denx.de>
On Sun, Nov 10, 2002 at 08:25:38PM +0100, Wolfgang Denk wrote:
> In message <20021110184321.GB16087@buici.com> you wrote:
> > As it should. I wonder if you'd do better changing the loop slightly.
> >
> > Check for len == 0 and do a short-circuit return. Then do this
> >
> > for (++len; len & 0x7; len >>= 3) {
> > ONCE(); // repeat eight times
> > ...
> > len >>= 3;
> > }
> > while (--len > 0)
> > ONCE();
> >
> > This is the implementation I've written for another project which
>
> Seems broken to me, since you "len >>= 3" twice.
That's what I get for writing it from memory.
> Also, Duff's Device comes to mind.
What would that be?
>
> Best regards,
>
> Wolfgang Denk
>
> --
> Software Engineering: Embedded and Realtime Systems, Embedded Linux
> Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
> See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
^ permalink raw reply
* Re: crc32() optimization
From: Marc Singer @ 2002-11-11 1:31 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Joakim Tjernlund, linux-mtd
In-Reply-To: <20021110210008.7CFF210162@denx.denx.de>
On Sun, Nov 10, 2002 at 10:00:03PM +0100, Wolfgang Denk wrote:
> In message <006901c288f4$79e2ce20$0200a8c0@telia.com> you wrote:
> >
> > What's "Duff's Device"?
>
> It's a tricky way to implement general loop unrolling directly in C.
> Applied to your problem, code that looks like this (instead of 8 any
> other loop count may be used, but you need to adjust the "case"
> statements then):
>
> register int n = (len + (8-1)) / 8;
>
> switch (len % 8) {
> case 0: do { val = crc32_table ... ;
> case 7: val = crc32_table ... ;
> case 6: val = crc32_table ... ;
> case 5: val = crc32_table ... ;
> case 4: val = crc32_table ... ;
> case 3: val = crc32_table ... ;
> case 2: val = crc32_table ... ;
> case 1: val = crc32_table ... ;
> } while (--n > 0);
> }
This doesn't look right to me. You are decrementing n but using the
modulus of len in the switch. The len modulus is correct when n == 1,
but not when n > 1. The idea makes sense, but the implementation
appears to be missing a detail.
As for performance problems, I believe that the trouble is evident
from the assembler output. The reason that the unrolled loop is more
efficient than the simple loop is mainly because you don't jump as
often. We all know that jumps tend to perturb the instruction fetch
queue and cache.
Here is an example. I know it is wrong, but it shows how the compiler
implements the switch. I've marked the problem jump. It is executed
for every iteration of the loop so it tends to negate other changes
made to the algorithm.
So, the version I posted may not be significantly better that the
original one that unrolled 6 times. It is just that unrolling to a
power of two sometimes makes the math simpler and sometimes improves
the performance. The present crop of CPUs performs all simple ALU
functions in a single cycle, so there is little reason to worry about
how many loops to unroll.
BTW, I checked my code and found I made several errors. The increment
step in for() is a decrement by 8, not a shift.
Cheers.
--------------------------------------------------
#define ONCE do { ++i; } while (0);
int foo ()
{
int i = 0;
int c;
for (c = 20 ; c > 0; c -= 8) {
switch (c % 8) {
case 0: ONCE;
case 7: ONCE;
case 6: ONCE;
case 5: ONCE;
case 4: ONCE;
case 3: ONCE;
case 2: ONCE;
case 1: ONCE;
}
}
return i;
}
int main ()
{
foo ();
}
--------------------------------------------------
.file "unroll.c"
.text
.align 2
.p2align 2,,3
.globl foo
.type foo,@function
foo:
pushl %ebp
movl %esp, %ebp
xorl %eax, %eax
pushl %ebx
movl $20, %ecx
.p2align 2,,3
.L26:
testl %ecx, %ecx
movl %ecx, %edx
js .L29
.L25:
andl $-8, %edx
movl %ecx, %ebx
subl %edx, %ebx
cmpl $7, %ebx
ja .L4
;;; **** This is the problem jump.
jmp *.L23(,%ebx,4)
;;; **** This is the problem jump.
.section .rodata
.align 4
.align 4
.L23:
.long .L7
.long .L21
.long .L19
.long .L17
.long .L15
.long .L13
.long .L11
.long .L9
.text
.L7:
incl %eax
.L9:
incl %eax
.L11:
incl %eax
.L13:
incl %eax
.L15:
incl %eax
.L17:
incl %eax
.L19:
incl %eax
.L21:
incl %eax
.L4:
subl $8, %ecx
testl %ecx, %ecx
jg .L26
popl %ebx
leave
ret
.p2align 2,,3
.L29:
leal 7(%ecx), %edx
jmp .L25
.Lfe1:
.size foo,.Lfe1-foo
.align 2
.p2align 2,,3
.globl main
.type main,@function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
call foo
leave
ret
.Lfe2:
.size main,.Lfe2-main
.ident "GCC: (GNU) 3.2.1 20020830 (Debian prerelease)"
--------------------------------------------------
^ permalink raw reply
* Re: crc32() optimization
From: Wolfgang Denk @ 2002-11-11 1:37 UTC (permalink / raw)
To: Marc Singer; +Cc: Joakim Tjernlund, linux-mtd
In-Reply-To: <20021111013114.GB27214@buici.com>
In message <20021111013114.GB27214@buici.com> you wrote:
>
> > > What's "Duff's Device"?
> >
> > It's a tricky way to implement general loop unrolling directly in C.
> > Applied to your problem, code that looks like this (instead of 8 any
> > other loop count may be used, but you need to adjust the "case"
> > statements then):
> >
> > register int n = (len + (8-1)) / 8;
> >
> > switch (len % 8) {
> > case 0: do { val = crc32_table ... ;
> > case 7: val = crc32_table ... ;
> > case 6: val = crc32_table ... ;
> > case 5: val = crc32_table ... ;
> > case 4: val = crc32_table ... ;
> > case 3: val = crc32_table ... ;
> > case 2: val = crc32_table ... ;
> > case 1: val = crc32_table ... ;
> > } while (--n > 0);
> > }
>
> This doesn't look right to me. You are decrementing n but using the
> modulus of len in the switch. The len modulus is correct when n == 1,
> but not when n > 1. The idea makes sense, but the implementation
> appears to be missing a detail.
You don't understand. The switch is only needed for the first,
partial loop where we want less than N statements; then we're nunning
the remaining fully unrolled loos in the do{}while loop.
> As for performance problems, I believe that the trouble is evident
> from the assembler output. The reason that the unrolled loop is more
> efficient than the simple loop is mainly because you don't jump as
> often. We all know that jumps tend to perturb the instruction fetch
> queue and cache.
Did you enable optimization?
> Here is an example. I know it is wrong, but it shows how the compiler
> implements the switch. I've marked the problem jump. It is executed
Irrelevant here.
> So, the version I posted may not be significantly better that the
> original one that unrolled 6 times. It is just that unrolling to a
You save the extra while{} loop.
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
See us @ electronica 2002 in Munich, Nov 12-15, Hall A3, Booth A3.325
^ permalink raw reply
* Re: crc32() optimization
From: Marc Singer @ 2002-11-11 4:42 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Joakim Tjernlund, linux-mtd
In-Reply-To: <20021111013738.4FEB410162@denx.denx.de>
On Mon, Nov 11, 2002 at 02:37:33AM +0100, Wolfgang Denk wrote:
> In message <20021111013114.GB27214@buici.com> you wrote:
> >
> > > > What's "Duff's Device"?
> > >
> > > It's a tricky way to implement general loop unrolling directly in C.
> > > Applied to your problem, code that looks like this (instead of 8 any
> > > other loop count may be used, but you need to adjust the "case"
> > > statements then):
> > >
> > > register int n = (len + (8-1)) / 8;
> > >
> > > switch (len % 8) {
> > > case 0: do { val = crc32_table ... ;
> > > case 7: val = crc32_table ... ;
> > > case 6: val = crc32_table ... ;
> > > case 5: val = crc32_table ... ;
> > > case 4: val = crc32_table ... ;
> > > case 3: val = crc32_table ... ;
> > > case 2: val = crc32_table ... ;
> > > case 1: val = crc32_table ... ;
> > > } while (--n > 0);
> > > }
> >
> > This doesn't look right to me. You are decrementing n but using the
> > modulus of len in the switch. The len modulus is correct when n == 1,
> > but not when n > 1. The idea makes sense, but the implementation
> > appears to be missing a detail.
>
> You don't understand. The switch is only needed for the first,
> partial loop where we want less than N statements; then we're nunning
> the remaining fully unrolled loos in the do{}while loop.
I see. I misread the code. I cannot see why this would not be better
than the original poster's version. I'll test it on my code to see if
there is an improvement.
> > As for performance problems, I believe that the trouble is evident
> > from the assembler output. The reason that the unrolled loop is more
> > efficient than the simple loop is mainly because you don't jump as
> > often. We all know that jumps tend to perturb the instruction fetch
> > queue and cache.
>
> Did you enable optimization?
Indeed. But it doesn't matter since it executes the switch jump only
one time.
^ permalink raw reply
* jedec_probe.c
From: Holger Speck @ 2002-11-11 8:18 UTC (permalink / raw)
To: Linux-MTD
Hello,
Below some NumEraseRegions are set to correct values:
--- jedec_probe.c.orig 2002-11-11 09:01:07.000000000 +0100
+++ jedec_probe.c 2002-10-31 11:30:31.000000000 +0100
@@ -43,6 +43,7 @@
#define AM29F080 0x00D5
#define AM29F040 0x00A4
#define AM29LV040B 0x004F
+#define AM29F032B 0x0041
/* Atmel */
#define AT49BV512 0x0003
@@ -147,6 +148,15 @@
static const struct amd_flash_info jedec_table[] = {
{
mfr_id: MANUFACTURER_AMD,
+ dev_id: AM29F032B,
+ name: "AMD AM29F032B",
+ DevSize: SIZE_4MiB,
+ CmdSet: P_ID_AMD_STD,
+ NumEraseRegions: 1,
+ regions: {ERASEINFO(0x10000,64)
+ }
+ }, {
+ mfr_id: MANUFACTURER_AMD,
dev_id: AM29LV160DT,
name: "AMD AM29LV160DT",
DevSize: SIZE_2MiB,
@@ -199,7 +209,7 @@
name: "Toshiba TC58FVB321",
DevSize: SIZE_4MiB,
CmdSet: P_ID_AMD_STD,
- NumEraseRegions: 4,
+ NumEraseRegions: 2,
regions: {ERASEINFO(0x02000,8),
ERASEINFO(0x10000,63)
}
@@ -209,7 +219,7 @@
name: "Toshiba TC58FVT321",
DevSize: SIZE_4MiB,
CmdSet: P_ID_AMD_STD,
- NumEraseRegions: 4,
+ NumEraseRegions: 2,
regions: {ERASEINFO(0x10000,63),
ERASEINFO(0x02000,8)
}
@@ -219,7 +229,7 @@
name: "Toshiba TC58FVB641",
DevSize: SIZE_8MiB,
CmdSet: P_ID_AMD_STD,
- NumEraseRegions: 4,
+ NumEraseRegions: 2,
regions: {ERASEINFO(0x02000,8),
ERASEINFO(0x10000,127)
}
@@ -229,7 +239,7 @@
name: "Toshiba TC58FVT641",
DevSize: SIZE_8MiB,
CmdSet: P_ID_AMD_STD,
- NumEraseRegions: 4,
+ NumEraseRegions: 2,
regions: {ERASEINFO(0x10000,127),
ERASEINFO(0x02000,8)
}
Holger
^ permalink raw reply
* UMSDOS now works in the 2.4.19 kernel, I wonder what else
From: Gregg C Levine @ 2002-11-12 4:30 UTC (permalink / raw)
To: linux-mtd
Hello from Gregg C Levine
Just a quick heads up. It looks as if the bug that caused UMSDOS file
systems to be broken in the 2.4 series of kernels has been fixed by now,
at least in the 2.4.19 one, that I am using. I am going to try and get
to work the MTD drivers for my system, using the one for testing
different configurations, that is the system memory one.
-------------------
Gregg C Levine hansolofalcon@worldnet.att.net
------------------------------------------------------------
"The Force will be with you...Always." Obi-Wan Kenobi
"Use the Force, Luke." Obi-Wan Kenobi
(This company dedicates this E-Mail to General Obi-Wan Kenobi )
(This company dedicates this E-Mail to Master Yoda )
^ permalink raw reply
* [PATCH] small fix for drivers/mtd/chips/map_ram.c in 2.4
From: Ian Campbell @ 2002-11-12 10:22 UTC (permalink / raw)
To: Linux MTD Mailing List
[-- Attachment #1: Type: text/plain, Size: 879 bytes --]
Hi,
The attached one-liner was necessary to allow me to mount a JFFS2 file
system using an MTD device based on the map_ram map driver.
It sets the state to MTD_ERASE_DONE in mapram_erase, if memory serves (I
did it a couple of weeks ago) not having this was causing badness in
jffs2_erase_callback.
Cheers,
Ian.
--
Ian Campbell
Design Engineer
Arcom Control Systems Ltd,
Clifton Road,
Cambridge CB1 7EA
United Kingdom
Tel: +44 (0)1223 403465
E-Mail: icampbell@arcomcontrols.com
Web: http://www.arcomcontrols.com
________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.messagelabs.com
________________________________________________________________________
[-- Attachment #2: mtd.map_ram.patch --]
[-- Type: text/plain, Size: 289 bytes --]
--- kernel-2.4.18.orig/drivers/mtd/chips/map_ram.c
+++ kernel-2.4.18/drivers/mtd/chips/map_ram.c
@@ -107,5 +107,7 @@
for (i=0; i<instr->len; i++)
map->write8(map, 0xFF, instr->addr + i);
+ instr->state = MTD_ERASE_DONE;
+
if (instr->callback)
instr->callback(instr);
^ permalink raw reply
* [PATCH] Add a map_sram driver to drivers/chips/
From: Ian Campbell @ 2002-11-12 11:05 UTC (permalink / raw)
To: Linux MTD Mailing List; +Cc: David Woodhouse
[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]
Howdy,
The attached patch adds "map_sram" as a second chip driver in
drivers/mtd/chips/map_ram.c. The two are basically identical apart from
the use of the MTD_VOLATILE flag.
I have another patch which makes an entirely separate map_sram.c, but
there was so much code duplication with map_ram.c I didn't see the
point. I guess if you are using modules then you will need 'alias
map_sram map_ram' in /etc/modules.conf or something to get autoloading
to work.
The patch also includes a previous patch to set the state to
MTD_ERASE_DONE in mapram_erase. This is needed to mount a JFFS2 f/s on
an SRAM or RAM device.
Cheers,
Ian.
--
Ian Campbell
Design Engineer
Arcom Control Systems Ltd,
Clifton Road,
Cambridge CB1 7EA
United Kingdom
Tel: +44 (0)1223 403465
E-Mail: icampbell@arcomcontrols.com
Web: http://www.arcomcontrols.com
________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.messagelabs.com
________________________________________________________________________
[-- Attachment #2: mtd.map_sram-2.patch --]
[-- Type: text/x-patch, Size: 3370 bytes --]
diff -urN kernel-2.4.18.orig/include/linux/mtd/mtd.h kernel-2.4.18/include/linux/mtd/mtd.h
--- kernel-2.4.18.orig/include/linux/mtd/mtd.h Tue Jun 12 18:30:27 2001
+++ kernel-2.4.18/include/linux/mtd/mtd.h Wed Nov 6 14:05:30 2002
@@ -39,5 +39,6 @@
#define MTD_NORFLASH 3
#define MTD_NANDFLASH 4
#define MTD_PEROM 5
+#define MTD_SRAM 6
#define MTD_OTHER 14
#define MTD_UNKNOWN 15
diff -urN kernel-2.4.18.orig/drivers/mtd/chips/Config.in kernel-2.4.18/drivers/mtd/chips/Config.in
--- kernel-2.4.18.orig/drivers/mtd/chips/Config.in Thu Oct 4 23:13:18 2001
+++ kernel-2.4.18/drivers/mtd/chips/Config.in Tue Nov 12 10:29:27 2002
@@ -44,7 +44,7 @@
dep_tristate ' Support for Intel/Sharp flash chips' CONFIG_MTD_CFI_INTELEXT $CONFIG_MTD_GEN_PROBE
dep_tristate ' Support for AMD/Fujitsu flash chips' CONFIG_MTD_CFI_AMDSTD $CONFIG_MTD_GEN_PROBE
-dep_tristate ' Support for RAM chips in bus mapping' CONFIG_MTD_RAM $CONFIG_MTD
+dep_tristate ' Support for RAM/SRAM chips in bus mapping' CONFIG_MTD_RAM $CONFIG_MTD
dep_tristate ' Support for ROM chips in bus mapping' CONFIG_MTD_ROM $CONFIG_MTD
dep_tristate ' Support for absent chips in bus mapping' CONFIG_MTD_ABSENT $CONFIG_MTD
diff -urN kernel-2.4.18.orig/drivers/mtd/chips/map_ram.c kernel-2.4.18/drivers/mtd/chips/map_ram.c
--- kernel-2.4.18.orig/drivers/mtd/chips/map_ram.c Thu Oct 4 23:14:59 2001
+++ kernel-2.4.18/drivers/mtd/chips/map_ram.c Tue Nov 12 10:52:11 2002
@@ -20,6 +20,7 @@
static int mapram_erase (struct mtd_info *, struct erase_info *);
static void mapram_nop (struct mtd_info *);
static struct mtd_info *map_ram_probe(struct map_info *map);
+static struct mtd_info *map_sram_probe(struct map_info *map);
static struct mtd_chip_driver mapram_chipdrv = {
@@ -27,6 +28,11 @@
name: "map_ram",
module: THIS_MODULE
};
+static struct mtd_chip_driver mapsram_chipdrv = {
+ probe: map_sram_probe,
+ name: "map_sram",
+ module: THIS_MODULE
+};
static struct mtd_info *map_ram_probe(struct map_info *map)
{
@@ -78,6 +84,34 @@
return mtd;
}
+static struct mtd_info *map_sram_probe(struct map_info *map)
+{
+ struct mtd_info *mtd;
+
+ mtd = kmalloc(sizeof(*mtd), GFP_KERNEL);
+ if (!mtd)
+ return NULL;
+
+ memset(mtd, 0, sizeof(*mtd));
+
+ map->fldrv = &mapram_chipdrv;
+ mtd->priv = map;
+ mtd->name = map->name;
+ mtd->type = MTD_SRAM;
+ mtd->size = map->size;
+ mtd->erase = mapram_erase;
+ mtd->read = mapram_read;
+ mtd->write = mapram_write;
+ mtd->sync = mapram_nop;
+ mtd->flags = MTD_CAP_RAM;
+
+ mtd->erasesize = PAGE_SIZE;
+ while(mtd->size & (mtd->erasesize - 1))
+ mtd->erasesize >>= 1;
+
+ MOD_INC_USE_COUNT;
+ return mtd;
+}
static int mapram_read (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, u_char *buf)
{
@@ -107,6 +141,8 @@
for (i=0; i<instr->len; i++)
map->write8(map, 0xFF, instr->addr + i);
+ instr->state = MTD_ERASE_DONE;
+
if (instr->callback)
instr->callback(instr);
@@ -121,12 +157,14 @@
int __init map_ram_init(void)
{
register_mtd_chip_driver(&mapram_chipdrv);
+ register_mtd_chip_driver(&mapsram_chipdrv);
return 0;
}
static void __exit map_ram_exit(void)
{
unregister_mtd_chip_driver(&mapram_chipdrv);
+ unregister_mtd_chip_driver(&mapsram_chipdrv);
}
module_init(map_ram_init);
^ permalink raw reply
* Re: [PATCH] Add a map_sram driver to drivers/chips/
From: Jörn Engel @ 2002-11-12 16:02 UTC (permalink / raw)
To: Ian Campbell; +Cc: Linux MTD Mailing List
In-Reply-To: <1037099125.26491.81.camel@LinuxDev>
On Tue, 12 November 2002 11:05:25 +0000, Ian Campbell wrote:
>
> The attached patch adds "map_sram" as a second chip driver in
> drivers/mtd/chips/map_ram.c. The two are basically identical apart from
> the use of the MTD_VOLATILE flag.
Slightly OT.
Have you tried the slram driver? I have always used that one to access
plain old memory and was very happy. And now you make me wonder, if
there is another way that might even be better.
Jörn
--
A victorious army first wins and then seeks battle.
-- Sun Tzu
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox