* content addressable fetching of packages, through MD5SUM or SHA1SUM?
@ 2008-03-02 19:04 Leon Woestenberg
2008-03-02 19:33 ` Koen Kooi
2008-03-03 5:39 ` Rod Whitby
0 siblings, 2 replies; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-02 19:04 UTC (permalink / raw)
To: openembedded-devel
Hello,
I wonder if someone knows about a content addressable file server, or
content addressable URL server?
The idea is that the bitbake fetcher does not use the SRC_URI provided
URL per se, but can fall back to use the SHA1SUM or MD5SUM to ask
"something" for alternative locations of the package.
Such a "server" would implement a two table database:
|authentic file name|md5sum|sha1sum|
(the primary key is |md5sum|sha1sum|)
In a different table, all instances of valid URI that provide this
content are collected
|md5sum|sha1sum|URI|
Now, this might be a Google SoC project, and something that Google
should have already.
Regards,
--
Leon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-02 19:04 content addressable fetching of packages, through MD5SUM or SHA1SUM? Leon Woestenberg
@ 2008-03-02 19:33 ` Koen Kooi
2008-03-04 9:19 ` Leon Woestenberg
2008-03-03 5:39 ` Rod Whitby
1 sibling, 1 reply; 8+ messages in thread
From: Koen Kooi @ 2008-03-02 19:33 UTC (permalink / raw)
To: openembedded-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Leon Woestenberg schreef:
| Hello,
|
|
| I wonder if someone knows about a content addressable file server, or
| content addressable URL server?
|
| The idea is that the bitbake fetcher does not use the SRC_URI provided
| URL per se, but can fall back to use the SHA1SUM or MD5SUM to ask
| "something" for alternative locations of the package.
|
|
| Such a "server" would implement a two table database:
|
| |authentic file name|md5sum|sha1sum|
|
| (the primary key is |md5sum|sha1sum|)
|
| In a different table, all instances of valid URI that provide this
| content are collected
|
| |md5sum|sha1sum|URI|
|
|
| Now, this might be a Google SoC project, and something that Google
| should have already.
what about:
for archive in * ; do
~ ln -sf $archive $(md5sum $archive)
~ ln -sf $archive $(sha256sum $archive)
done
That's the quick and dirty way :) If someone comes up with a script, I
can run in on the angstrom source mirror for people to test against.
regards,
Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iD8DBQFHywEmMkyGM64RGpERAk6GAKC7coKtqUP26q5zkV9pRc3881A5FACff8hi
XlTJA8PXuOnBxCVRWNNQCt0=
=Kg6V
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-02 19:33 ` Koen Kooi
@ 2008-03-04 9:19 ` Leon Woestenberg
2008-03-04 10:08 ` Leon Woestenberg
0 siblings, 1 reply; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-04 9:19 UTC (permalink / raw)
To: openembedded-devel
Hello Koen,
On Sun, Mar 2, 2008 at 8:33 PM, Koen Kooi
<koen@dominion.kabel.utwente.nl> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
>
> for archive in * ; do
> ~ ln -sf $archive $(md5sum $archive)
> ~ ln -sf $archive $(sha256sum $archive)
> done
>
Neat, but this is a one-to-one mapping from a single hash to a single URI.
I meant an extra level of indirection, where
(md5sum,sha1sum) => table of URIs
I think this means the fetcher class must be changed so that it first
tries the proposed SRC_URI,
and then falls back to quering the above database for alternative URIs.
The neat thing would be to have a crawler, that Google's on the file
name for alternate locations,
fetches them, md5sums them, and inserts them into the URI table once
it matches the (md5sum,sha1sum).
> That's the quick and dirty way :) If someone comes up with a script, I
> can run in on the angstrom source mirror for people to test against.
>
In fact, your approach is how we track datasheets here :-)
Regards,
--
Leon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-04 9:19 ` Leon Woestenberg
@ 2008-03-04 10:08 ` Leon Woestenberg
2008-03-04 10:31 ` Leon Woestenberg
0 siblings, 1 reply; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-04 10:08 UTC (permalink / raw)
To: openembedded-devel
As a quick proof-of-concept for the package crawler that builds a
table of URI's:
#!/bin/bash
# Requires links, sed, grep and Google :-)
# Given a file name as argument, Google's for alternative links
# @todo Add MD5SUM,SHA1SUM to check against, reject false URI's etc
FILE=$1
LIST=`links -dump
'http://www.google.com/search?q=intitle%3A%22index+of%22+'$FILE | grep
-e 'Cached' | sed 's@[ \t]*\(.*\)/[ ^t]\(.*\)@\1/@' | grep -v Cached`
for URI in $LIST
do
echo -n $URI$FILE -
wget --timeout=2 -t1 -O- $URI$FILE 2>/dev/null | md5sum
done
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-04 10:08 ` Leon Woestenberg
@ 2008-03-04 10:31 ` Leon Woestenberg
2008-03-04 10:59 ` Koen Kooi
2008-03-04 11:07 ` Leon Woestenberg
0 siblings, 2 replies; 8+ messages in thread
From: Leon Woestenberg @ 2008-03-04 10:31 UTC (permalink / raw)
To: openembedded-devel
In trying to improve our package fetching robustness, a package
crawler. (I still wonder if no such thing already exists...)
A new version is inlined below, usage is as follows:
Use mode #1, generate URI checksums given a package file name:
leon@precise:/tmp$ ./crawl.sh popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netsw.org/system/libs/options/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
su-se.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.aoloser.com/dist/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netmax.org/Download/SOURCE/packages/extras/SOURCE/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
be.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
mirrors.isc.org/pub/MidnightBSD/distfiles/popt-1.7.tar.gz
[GENERATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.ibiblio.org/pub/packages/solaris/freeware/SOURCES/popt-1.7.tar.gz
Use mode #2, find URI's that have a matching MD5SUM and SHA1SUM:
leon@precise:/tmp$ ./crawl.sh popt-1.7.tar.gz
5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netsw.org/system/libs/options/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
su-se.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.aoloser.com/dist/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.netmax.org/Download/SOURCE/packages/extras/SOURCE/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
be.lunar-linux.org/lunar/mirrors/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
mirrors.isc.org/pub/MidnightBSD/distfiles/popt-1.7.tar.gz
[VALIDATED] 5988e7aeb0ae4dac8d83561265984cc9
66f3c77b87a160951b180447f4a6dce68ad2f71b
www.ibiblio.org/pub/packages/solaris/freeware/SOURCES/popt-1.7.tar.gz
leon@precise:/tmp$
#!/bin/bash
# Requires links, sed, grep, md5sum, sha1sum
# $1 package file name
# $2 md5sum to check against
# $3 sha1sum to check against
if [ x$1y == xy ]; then
echo "Provide a filename, for example: popt-1.7.tar.gz"
exit
fi
FILE=`basename $1`
LIST=`links -dump
'http://www.google.com/search?q=intitle%3A%22index+of%22+'$FILE | grep
-e 'Cached' | sed 's@[ \t]*\(.*\)/[ ^t]\(.*\)@\1/@' | grep -v Cached`
cd /tmp
for URI in $LIST
do
rm -rf /tmp/$FILE
wget --timeout=2 -t1 -O$FILE $URI$FILE 2>/dev/null
if [ $? == 0 ]; then
FILE_MD=`md5sum $FILE | sed 's@\([0-9,a-f]*\)\(.*\)@\1@'`
FILE_SHA=`sha1sum $FILE | sed 's@\([0-9,a-f]*\)\(.*\)@\1@'`
if [ x$2y != xy -a x$3y != xy ]; then
if [ $2 == $FILE_MD -a $3 == $FILE_SHA ]; then
echo "[VALIDATED]" $FILE_MD $FILE_SHA $URI$FILE
fi
else
echo "[GENERATED]" $FILE_MD $FILE_SHA $URI$FILE
fi
fi
done
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: content addressable fetching of packages, through MD5SUM or SHA1SUM?
2008-03-02 19:04 content addressable fetching of packages, through MD5SUM or SHA1SUM? Leon Woestenberg
2008-03-02 19:33 ` Koen Kooi
@ 2008-03-03 5:39 ` Rod Whitby
1 sibling, 0 replies; 8+ messages in thread
From: Rod Whitby @ 2008-03-03 5:39 UTC (permalink / raw)
To: openembedded-devel
Leon Woestenberg wrote:
> I wonder if someone knows about a content addressable file server, or
> content addressable URL server?
>
> The idea is that the bitbake fetcher does not use the SRC_URI provided
> URL per se, but can fall back to use the SHA1SUM or MD5SUM to ask
> "something" for alternative locations of the package.
Perhaps just put all the sources in a big git repository. Then you can
get them directly by SHA1.
-- Rod
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-03-04 11:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-02 19:04 content addressable fetching of packages, through MD5SUM or SHA1SUM? Leon Woestenberg
2008-03-02 19:33 ` Koen Kooi
2008-03-04 9:19 ` Leon Woestenberg
2008-03-04 10:08 ` Leon Woestenberg
2008-03-04 10:31 ` Leon Woestenberg
2008-03-04 10:59 ` Koen Kooi
2008-03-04 11:07 ` Leon Woestenberg
2008-03-03 5:39 ` Rod Whitby
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.