All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Lord <lord@emf.net>
To: git@vger.kernel.org
Cc: torvalds@osdl.org
Subject: on when to checksum
Date: Wed, 20 Apr 2005 15:25:07 -0700 (PDT)	[thread overview]
Message-ID: <200504202225.PAA15992@emf.net> (raw)



Linus, 

I think you have made a mistake by moving the sha1 checksum from the
zipped form to the inflated form.  Here is why:

What you have set in motion with `git' is an ad-hoc p2p network for
sharing filesystem trees -- a global distributed filesystem.  I
believe your starter here has a good chance of taking off to be much,
much larger than just a tool for the kernel.

A subset of your work: blobs and blob databaes, has much wider application
than just sharing trees:  Those parts of `git' can form a very solid 
foundation for many other applications as well.   To the extent `git'
succeeds in the context of the kernel, it will be invested in and
extended and generalized --- and the kernel project will benefit.
So don't ignore those wider applications even though they are not your
focus today: they will generate investment that feeds back to your project.

Your `git' is silent on transports and mirroring of blob databases --
tasks for scripting, sure -- but those elements won't be far behind.

Eventually, slinging around blobs as atomic elements
of payloads will become very common.

The blob handle (aka "address")/payload model of a blob db is very
clean and simple.   In a network of nodes speaking to one and other
by exchanging blobs, I forsee a prominent need for intermediate
nodes that process blobs "blindly" and as quickly as possible.

Blob compression is mostly goofy if regarded just as a way to 
save on (diminishingly cheap) disk space but it is mostly 
sane if regarded as a way to cut the cost of network bandwidth
roughly in half.

Must intermediate nodes inflate the payloads passing through them
or which they cache just to validate them?   That's not a desirable otucome
for many obvious reasonhs.

There *are* concerns about checksumming zips: it is necessary to nail
down the zip process and make sure it is absolutely and permanently
deterministic for this application.   But *that* is the problem to 
solve, not avoid by moving what the checksum refers to.

Thanks,
-t

             reply	other threads:[~2005-04-20 22:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-20 22:25 Tom Lord [this message]
2005-04-20 22:41 ` on when to checksum Linus Torvalds
2005-04-20 22:52   ` Tom Lord
2005-04-20 23:07     ` Linus Torvalds
2005-04-20 23:39       ` Tom Lord
2005-05-02 19:21       ` Tom Lord
2005-05-02 19:57         ` Linus Torvalds
2005-04-21 16:53 ` Andrew Timberlake-Newell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200504202225.PAA15992@emf.net \
    --to=lord@emf.net \
    --cc=git@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.