From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?B?SsO2cm4=?= Engel Subject: Re: [PATCH 06/14] Pramfs: Include files Date: Mon, 22 Jun 2009 23:41:55 +0200 Message-ID: <20090622214155.GA19332@logfs.org> References: <4A33A7EC.6070008@gmail.com> <200906221317.04166.arnd@arndb.de> <4A3FC7F1.5050108@gmail.com> <200906222033.20883.arnd@arndb.de> <4A3FDBFE.8050509@2net.co.uk> Mime-Version: 1.0 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <4A3FDBFE.8050509@2net.co.uk> Sender: linux-embedded-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: Chris Simmonds Cc: Arnd Bergmann , Marco , Sam Ravnborg , Linux FS Devel , Linux Embedded , Linux Kernel On Mon, 22 June 2009 20:31:10 +0100, Chris Simmonds wrote: >=20 > I disagree: that adds an unnecessary overhead for those architectures= =20 > where the cpu byte order does not match the data structure ordering. = I=20 > think the data structures should be native endian and when mkpramfs i= s=20 > written it can take a flag (e.g. -r) in the same way mkcramfs does. Just to quantify this point, I've written a small crap program: #include #include #include #include long long delta(struct timeval *t1, struct timeval *t2) { long long delta; delta =3D 1000000ull * t2->tv_sec + t2->tv_usec; delta -=3D 1000000ull * t1->tv_sec + t1->tv_usec; return delta; } #define LOOPS 100000000 int main(void) { long native =3D 0; uint32_t narrow =3D 0; uint64_t wide =3D 0, native_wide =3D 0; struct timeval t1, t2, t3, t4, t5; int i; gettimeofday(&t1, NULL); for (i =3D 0; i < LOOPS; i++) native++; gettimeofday(&t2, NULL); for (i =3D 0; i < LOOPS; i++) narrow =3D bswap_32(bswap_64(narrow) + 1); gettimeofday(&t3, NULL); for (i =3D 0; i < LOOPS; i++) native_wide++; gettimeofday(&t4, NULL); for (i =3D 0; i < LOOPS; i++) wide =3D bswap_64(bswap_64(wide) + 1); gettimeofday(&t5, NULL); printf("long: %9lld us\n", delta(&t1, &t2)); printf("we32: %9lld us\n", delta(&t2, &t3)); printf("u64: %9lld us\n", delta(&t3, &t4)); printf("we64: %9lld us\n", delta(&t4, &t5)); printf("loops: %9d\n", LOOPS); return 0; } =46our loops doing the same increment with different data types: long, u64, we32 (wrong-endian) and we64. Compile with _no_ optimizations. Results on my i386 notebook: long: 453953 us we32: 880273 us u64: 504214 us we64: 2259953 us loops: 100000000 Or thereabouts, not completely stable. Increasing the data width is 10= % slower, 32bit endianness conversions is 2x slower, 64bit conversion is 5x slower. However, even the we64 loop still munches through 353MB/s (100M conversions in 2.2s, 8bytes per converion. Double the number if you count both conversion to/from wrong endianness). Elsewhere in this thread someone claimed the filesystem peaks out at 13MB/s. One might further note that only filesystem metadata has to go through endianness conversion, so on this particular machine it is completely lost in the noise. =46eel free to run the program on any machine you care about. If you g= et numbers to back up your position, I'm willing to be convinced. Until then, I consider the alleged overhead of endianness conversion a prime example of premature optimization. J=C3=B6rn --=20 Joern's library part 7: http://www.usenix.org/publications/library/proceedings/neworl/full_pape= rs/mckusick.a