All of lore.kernel.org
 help / color / mirror / Atom feed
* Fixing ipkg-make-index slowness
@ 2007-01-05  7:33 Paul Sokolovsky
  2007-01-05  9:21 ` Koen Kooi
  2007-01-05 11:52 ` Richard Purdie
  0 siblings, 2 replies; 5+ messages in thread
From: Paul Sokolovsky @ 2007-01-05  7:33 UTC (permalink / raw)
  To: openembedded-devel

Hello ,

      I bragged some time ago on IRC that I sped up ipkg-make-index
few times, and recently, question of ipkg-make-index slowness was
brought on ML too. So, I grasped my old patches wanting to submit it,
but of course it appeared not that easy. I've identified two causes of
slowness, each discussed separately below.

      All benchmarking was done on ipk repo of 5886 files and 245MB
total size, by running "bitbake package-index".


1. md5sum thrashing

In the summer, RP introduced patch for i-m-i ironically called
index_speedup.patch:
http://www.openembedded.org/bonsai/view/rev/17477/
It does following: If there's already Packages file exists, i-m-i
takes metadata from it, instead of parsing ipk's themselves. Before
this patch, such a cache was used simply when ipk file's name matched
filename recorded in Packages. RP added check for filesize, and also
for md5sum of the ipk's content. That means if you have Packages
file, and want to index few new ipk's, i-m-i will happily thrash over
each byte of entire package repo you have (like 26Gb). On my
repo running "bitbake package-index" with already existing and
up-to-date Packages led to:

i-m-i/md5sum
real    2m1.219s

i-m-i/no-md5sum
real    0m53.294s


Richard, what were the reasons for such conservative file matching?
Filename matching should be just enough, as per OE convention, any
package source changes leading to changes in the package metadata must
lead to bumping of package recipe's PR, and that in turn updates package
filename. Whoever don't follow PR update convention either call for
trouble, and no checks could really help them, or know what they do
(like bother to rebuild Packages from scratch). In this regard,
checking ipk size is the great convenience for adventurous, because
it's really high probability that update of any package metadata will
lead to change of file size due to compression. In other words, I
propose to remove md5sum check.


2. Unix process thrashing

Ok, that was cause of slowdown with already made Packages. Now, major
annoyance is (re)creating it or adding large number of packages. This
due to ar, tar, gz, being spawned for each ipk. Multiple Unix process
handling inefficiency by thousands files, and we get what we have.

So, I just took tarfile module from Python 2.3+, created arfile which
is not in Python, and I even failed to google it, and made them work
recursively one on another. Results:

i-m-i/spawn
real    14m30.239s

i-m-i/tarfile
real    5m21.950s

(Btw, I swear I was getting 6-7 times speed when I initially tried it
on Familiar buildtree.)

There's a small regression though: with this change, only deb-style
ipk's are supported. This shouldn't be an issue, as OE (and current
ipkg) generates exactly such ipks. And I envy people who used to know
and still remember what is the other ipk format ;-).

The patch is posted as:
http://bugs.openembedded.org/show_bug.cgi?id=1751


I hope that this analysis and changes/patches proposed will be of use
to someone.

-- 
Best regards,
 Paul                          mailto:pmiscml@gmail.com




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fixing ipkg-make-index slowness
  2007-01-05  7:33 Fixing ipkg-make-index slowness Paul Sokolovsky
@ 2007-01-05  9:21 ` Koen Kooi
  2007-01-05  9:33   ` Paul Sokolovsky
  2007-01-05 11:52 ` Richard Purdie
  1 sibling, 1 reply; 5+ messages in thread
From: Koen Kooi @ 2007-01-05  9:21 UTC (permalink / raw)
  To: openembedded-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Paul Sokolovsky schreef:

> 
> Richard, what were the reasons for such conservative file matching?
> Filename matching should be just enough, as per OE convention, any
> package source changes leading to changes in the package metadata must
> lead to bumping of package recipe's PR, and that in turn updates package
> filename.

It isn't enough. If you repackages the exact same files at a later date, the timestamps in
the tarball have changed, and hence the md5 will be different. When a user will download a
package ipkg will check the md5sum present in the index and abort if it doesn't match.

regards,

Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFnhivMkyGM64RGpERAtYQAJ9PiBdpePYJk04Fe2dKOKKuMsCzvQCdF0TD
UEm1X+QXcXVpp/LMBaAB2RE=
=BnNe
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fixing ipkg-make-index slowness
  2007-01-05  9:21 ` Koen Kooi
@ 2007-01-05  9:33   ` Paul Sokolovsky
  2007-01-05  9:59     ` Koen Kooi
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Sokolovsky @ 2007-01-05  9:33 UTC (permalink / raw)
  To: Koen Kooi

Hello Koen,

Friday, January 5, 2007, 11:21:51 AM, you wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1

> Paul Sokolovsky schreef:

>> 
>> Richard, what were the reasons for such conservative file matching?
>> Filename matching should be just enough, as per OE convention, any
>> package source changes leading to changes in the package metadata must
>> lead to bumping of package recipe's PR, and that in turn updates package
>> filename.

> It isn't enough. If you repackages the exact same files at a later date, the timestamps in
> the tarball have changed, and hence the md5 will be different. When a user will download a
> package ipkg will check the md5sum present in the index and abort if it doesn't match.

  Then we probably need to separate image building from feed
creation/setup. IIRC, i-m-i currently run during image building
too, and it steals precious minutes from each developer (vs
distro/feed maintainer) multiple times a day. Having Packages
to be updated only by explicit "bitbake package-index" seems
like good compromise and corresponds to the docs which suggest
to do that explicitly.

> regards,

> Koen

-- 
Best regards,
 Paul                            mailto:pmiscml@gmail.com




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fixing ipkg-make-index slowness
  2007-01-05  9:33   ` Paul Sokolovsky
@ 2007-01-05  9:59     ` Koen Kooi
  0 siblings, 0 replies; 5+ messages in thread
From: Koen Kooi @ 2007-01-05  9:59 UTC (permalink / raw)
  To: openembedded-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Paul Sokolovsky schreef:
> Hello Koen,
> 
> Friday, January 5, 2007, 11:21:51 AM, you wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
> 
>> Paul Sokolovsky schreef:
> 
>>> Richard, what were the reasons for such conservative file matching?
>>> Filename matching should be just enough, as per OE convention, any
>>> package source changes leading to changes in the package metadata must
>>> lead to bumping of package recipe's PR, and that in turn updates package
>>> filename.
> 
>> It isn't enough. If you repackages the exact same files at a later date, the timestamps in
>> the tarball have changed, and hence the md5 will be different. When a user will download a
>> package ipkg will check the md5sum present in the index and abort if it doesn't match.
> 
>   Then we probably need to separate image building from feed
> creation/setup. IIRC, i-m-i currently run during image building
> too, and it steals precious minutes from each developer (vs
> distro/feed maintainer) multiple times a day. Having Packages
> to be updated only by explicit "bitbake package-index" seems
> like good compromise and corresponds to the docs which suggest
> to do that explicitly.

You just broke image creation with your suggestion.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFniGeMkyGM64RGpERAuWLAJ4y9QdBlHDOykCtrTeaXUlAIt43swCggJEw
0hcn6JzjHod58DCGCLNtJ64=
=9Gmb
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fixing ipkg-make-index slowness
  2007-01-05  7:33 Fixing ipkg-make-index slowness Paul Sokolovsky
  2007-01-05  9:21 ` Koen Kooi
@ 2007-01-05 11:52 ` Richard Purdie
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Purdie @ 2007-01-05 11:52 UTC (permalink / raw)
  To: openembedded-devel

Paul,

On Fri, 2007-01-05 at 09:33 +0200, Paul Sokolovsky wrote:
> 1. md5sum thrashing
> 
> In the summer, RP introduced patch for i-m-i ironically called
> index_speedup.patch:

There is nothing ironic about it, it does speed up the process ;-).
md5suming the files was faster than actually parsing them.

> Richard, what were the reasons for such conservative file matching?
> Filename matching should be just enough, as per OE convention, any
> package source changes leading to changes in the package metadata must
> lead to bumping of package recipe's PR, and that in turn updates package
> filename. Whoever don't follow PR update convention either call for
> trouble, and no checks could really help them, or know what they do
> (like bother to rebuild Packages from scratch). In this regard,
> checking ipk size is the great convenience for adventurous, because
> it's really high probability that update of any package metadata will
> lead to change of file size due to compression. In other words, I
> propose to remove md5sum check.

See Koen's reply.

The way forward is to store more information in the index, namely the
last modified timestamp for the file. If you add that and compare
timestamp+size, you can safely skip the md5sum check.

I can't remember why I didn't do that, it could have been a time thing.
It does mean adding a new field to the Packages file and that could also
have had implications elsewhere, I can't remember...

Richard




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-01-05 11:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-05  7:33 Fixing ipkg-make-index slowness Paul Sokolovsky
2007-01-05  9:21 ` Koen Kooi
2007-01-05  9:33   ` Paul Sokolovsky
2007-01-05  9:59     ` Koen Kooi
2007-01-05 11:52 ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.