* content addressable fetching of packages, through MD5SUM or SHA1SUM?
@ 2008-03-02 19:04 Leon Woestenberg
2008-03-02 19:33 ` Koen Kooi
2008-03-03 5:39 ` Rod Whitby
0 siblings, 2 replies; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-02 19:04 UTC (permalink / raw)
To: openembedded-devel
Hello,
I wonder if someone knows about a content addressable file server, or
content addressable URL server?
The idea is that the bitbake fetcher does not use the SRC_URI provided
URL per se, but can fall back to use the SHA1SUM or MD5SUM to ask
"something" for alternative locations of the package.
Such a "server" would implement a two table database:
|authentic file name|md5sum|sha1sum|
(the primary key is |md5sum|sha1sum|)
In a different table, all instances of valid URI that provide this
content are collected
|md5sum|sha1sum|URI|
Now, this might be a Google SoC project, and something that Google
should have already.
Regards,
--
Leon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-02 19:04 content addressable fetching of packages, through MD5SUM or SHA1SUM? Leon Woestenberg
@ 2008-03-02 19:33 ` Koen Kooi
2008-03-04 9:19 ` Leon Woestenberg
2008-03-03 5:39 ` Rod Whitby
1 sibling, 1 reply; 8+ messages in thread
From: Koen Kooi @ 2008-03-02 19:33 UTC (permalink / raw)
To: openembedded-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Leon Woestenberg schreef:
| Hello,
|
|
| I wonder if someone knows about a content addressable file server, or
| content addressable URL server?
|
| The idea is that the bitbake fetcher does not use the SRC_URI provided
| URL per se, but can fall back to use the SHA1SUM or MD5SUM to ask
| "something" for alternative locations of the package.
|
|
| Such a "server" would implement a two table database:
|
| |authentic file name|md5sum|sha1sum|
|
| (the primary key is |md5sum|sha1sum|)
|
| In a different table, all instances of valid URI that provide this
| content are collected
|
| |md5sum|sha1sum|URI|
|
|
| Now, this might be a Google SoC project, and something that Google
| should have already.
what about:
for archive in * ; do
~ ln -sf $archive $(md5sum $archive)
~ ln -sf $archive $(sha256sum $archive)
done
That's the quick and dirty way :) If someone comes up with a script, I
can run in on the angstrom source mirror for people to test against.
regards,
Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iD8DBQFHywEmMkyGM64RGpERAk6GAKC7coKtqUP26q5zkV9pRc3881A5FACff8hi
XlTJA8PXuOnBxCVRWNNQCt0=
=Kg6V
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-02 19:04 content addressable fetching of packages, through MD5SUM or SHA1SUM? Leon Woestenberg
2008-03-02 19:33 ` Koen Kooi
@ 2008-03-03 5:39 ` Rod Whitby
1 sibling, 0 replies; 8+ messages in thread
From: Rod Whitby @ 2008-03-03 5:39 UTC (permalink / raw)
To: openembedded-devel
Leon Woestenberg wrote:
> I wonder if someone knows about a content addressable file server, or
> content addressable URL server?
>
> The idea is that the bitbake fetcher does not use the SRC_URI provided
> URL per se, but can fall back to use the SHA1SUM or MD5SUM to ask
> "something" for alternative locations of the package.
Perhaps just put all the sources in a big git repository. Then you can
get them directly by SHA1.
-- Rod
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-02 19:33 ` Koen Kooi
@ 2008-03-04 9:19 ` Leon Woestenberg
2008-03-04 10:08 ` Leon Woestenberg
0 siblings, 1 reply; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-04 9:19 UTC (permalink / raw)
To: openembedded-devel
Hello Koen,
On Sun, Mar 2, 2008 at 8:33 PM, Koen Kooi
<koen@dominion.kabel.utwente.nl> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
>
> for archive in * ; do
> ~ ln -sf $archive $(md5sum $archive)
> ~ ln -sf $archive $(sha256sum $archive)
> done
>
Neat, but this is a one-to-one mapping from a single hash to a single URI.
I meant an extra level of indirection, where
(md5sum,sha1sum) => table of URIs
I think this means the fetcher class must be changed so that it first
tries the proposed SRC_URI,
and then falls back to quering the above database for alternative URIs.
The neat thing would be to have a crawler, that Google's on the file
name for alternate locations,
fetches them, md5sums them, and inserts them into the URI table once
it matches the (md5sum,sha1sum).
> That's the quick and dirty way :) If someone comes up with a script, I
> can run in on the angstrom source mirror for people to test against.
>
In fact, your approach is how we track datasheets here :-)
Regards,
--
Leon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-04 9:19 ` Leon Woestenberg
@ 2008-03-04 10:08 ` Leon Woestenberg
2008-03-04 10:31 ` Leon Woestenberg
0 siblings, 1 reply; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-04 10:08 UTC (permalink / raw)
To: openembedded-devel
As a quick proof-of-concept for the package crawler that builds a
table of URI's:
#!/bin/bash
# Requires links, sed, grep and Google :-)
# Given a file name as argument, Google's for alternative links
# @todo Add MD5SUM,SHA1SUM to check against, reject false URI's etc
FILE=$1
LIST=`links -dump
'http://www.google.com/search?q=intitle%3A%22index+of%22+'$FILE | grep
-e 'Cached' | sed 's@[ \t]*\(.*\)/[ ^t]\(.*\)@\1/@' | grep -v Cached`
for URI in $LIST
do
echo -n $URI$FILE -
wget --timeout=2 -t1 -O- $URI$FILE 2>/dev/null | md5sum
done
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-04 10:08 ` Leon Woestenberg
@ 2008-03-04 10:31 ` Leon Woestenberg
2008-03-04 10:59 ` Koen Kooi
2008-03-04 11:07 ` Leon Woestenberg
0 siblings, 2 replies; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-04 10:31 UTC (permalink / raw)
To: openembedded-devel
In trying to improve our package fetching robustness, a package
crawler. (I still wonder if no such thing already exists...)
A new version is inlined below, usage is as follows:
Use mode #1, generate URI checksums given a package file name:
leon@precise:/tmp$ ./crawl.sh popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netsw.org/system/libs/options/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
su-se.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.aoloser.com/dist/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netmax.org/Download/SOURCE/packages/extras/SOURCE/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
be.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
mirrors.isc.org/pub/MidnightBSD/distfiles/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.ibiblio.org/pub/packages/solaris/freeware/SOURCES/popt-1.7.tar.gz
Use mode #2, find URI's that have a matching MD5SUM and SHA1SUM:
leon@precise:/tmp$ ./crawl.sh popt-1.7.tar.gz
5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netsw.org/system/libs/options/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
su-se.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.aoloser.com/dist/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netmax.org/Download/SOURCE/packages/extras/SOURCE/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
be.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
mirrors.isc.org/pub/MidnightBSD/distfiles/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.ibiblio.org/pub/packages/solaris/freeware/SOURCES/popt-1.7.tar.gz
leon@precise:/tmp$
#!/bin/bash
# Requires links, sed, grep, md5sum, sha1sum
# $1 package file name
# $2 md5sum to check against
# $3 sha1sum to check against
if [ x$1y == xy ]; then
echo "Provide a filename, for example: popt-1.7.tar.gz"
exit
fi
FILE=`basename $1`
LIST=`links -dump
'http://www.google.com/search?q=intitle%3A%22index+of%22+'$FILE | grep
-e 'Cached' | sed 's@[ \t]*\(.*\)/[ ^t]\(.*\)@\1/@' | grep -v Cached`
cd /tmp
for URI in $LIST
do
rm -rf /tmp/$FILE
wget --timeout=2 -t1 -O$FILE $URI$FILE 2>/dev/null
if [ $? == 0 ]; then
FILE_MD=`md5sum $FILE | sed 's@\([0-9,a-f]*\)\(.*\)@\1@'`
FILE_SHA=`sha1sum $FILE | sed 's@\([0-9,a-f]*\)\(.*\)@\1@'`
if [ x$2y != xy -a x$3y != xy ]; then
if [ $2 == $FILE_MD -a $3 == $FILE_SHA ]; then
echo "[VALIDATED]" $FILE_MD $FILE_SHA $URI$FILE
fi
else
echo "[GENERATED]" $FILE_MD $FILE_SHA $URI$FILE
fi
fi
done
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-04 10:31 ` Leon Woestenberg
@ 2008-03-04 10:59 ` Koen Kooi
2008-03-04 11:07 ` Leon Woestenberg
1 sibling, 0 replies; 8+ messages in thread
From: Koen Kooi @ 2008-03-04 10:59 UTC (permalink / raw)
To: openembedded-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Leon Woestenberg schreef:
| In trying to improve our package fetching robustness, a package
| crawler. (I still wonder if no such thing already exists...)
A small addition:
| if [ $2 == $FILE_MD -a $3 == $FILE_SHA ]; then
| echo "[VALIDATED]" $FILE_MD $FILE_SHA $URI$FILE
else
~ echo "[REJECTED]" $FILE_MD $FILE_SHA $URI$FILE
| fi
You can view the output of it running against the angstrom source mirror at:
http://www.angstrom-distribution.org/unstable/sources/likewise-crawler-output.txt
regards,
Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iD8DBQFHzSulMkyGM64RGpERAgieAJ9KoOZfZJcSDOkul/a7wkCDX18GSQCeLRKj
X+d8btJ0J/yB2f0YjhgUWCI=
=mjr7
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-04 10:31 ` Leon Woestenberg
2008-03-04 10:59 ` Koen Kooi
@ 2008-03-04 11:07 ` Leon Woestenberg
1 sibling, 0 replies; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-04 11:07 UTC (permalink / raw)
To: openembedded-devel
Using a much enhanced script that iterates over my downloads/ has
resulted in my IP address from temporarely being blocked from Google.
Do not try this at home.
The idea works though, and we will spend some time on trying to optimize this.
Regards,
Leon.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-03-04 11:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-02 19:04 content addressable fetching of packages, through MD5SUM or SHA1SUM? Leon Woestenberg
2008-03-02 19:33 ` Koen Kooi
2008-03-04 9:19 ` Leon Woestenberg
2008-03-04 10:08 ` Leon Woestenberg
2008-03-04 10:31 ` Leon Woestenberg
2008-03-04 10:59 ` Koen Kooi
2008-03-04 11:07 ` Leon Woestenberg
2008-03-03 5:39 ` Rod Whitby
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.