Git development
 help / color / mirror / Atom feed
* RE: Mercurial 0.4b vs git patchbomb benchmark
From: Andrew Timberlake-Newell @ 2005-04-29 20:57 UTC (permalink / raw)
  To: 'Tom Lord'; +Cc: noel, seanlkml, git
In-Reply-To: <200504292026.NAA28131@emf.net>

>   > It looks to me like he did read carefully.
> 
>   > There were two different ideas:
>   >   TL)  Passing tree & diff and trusting diff to create tree
>   >   NM)  Passing tree and generating diff versus local tree for review
> 
> Well, I guess *you* didn't read carefully.  I also spoke about the
> value of passing around triples: ancestry, diff, and tree.  The
> question is about linking signatures to things that humans can
> reasonably *intend* and be reasonably held accountable for, hence one
> of the values of signed diffs.  (I cited other practical reasons to
> value signed diffs and use them in specific ways, too.)

I know that you mentioned other things.  That doesn't invalidate that Noel
was talking about your starting point description of how git works and
suggesting that it isn't how git actually works.  The relevance of your
other points depends upon having the base model correct.

You can argue that glass houses are inherently brittle, but why should I
care if mine is already made of bricks instead of glass?  If the model
against which you are arguing is not the model which is used by git, then
the model isn't a relevant basis for claiming problems with git.



^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Horst von Brand @ 2005-04-29 21:45 UTC (permalink / raw)
  To: Tom Lord; +Cc: seanlkml, git
In-Reply-To: <200504291928.MAA27145@emf.net>

Tom Lord <lord@emf.net> said:
> Think of it this way:
> 
>   (a) Joe, the mainline maintainer, gets a trusted message containing
>       a diff.
> 
>   (b) Joe reads the diff, it makes great sense, he wants to merge.
> 
>   (c) Joe downloads a tree.  Supposedly that tree is the result of
>       applying this diff.   The tree, not the diff, is used for
>       merging.
> 
> You can see the logical whole there... now the practical one:
> 
> 
>    (d) Joe is repeating (a..c) at an unfathomably high rate.
>        At a low rate, he could be double-checking enough that
>        that the diff-vs-tree problem isn't that serious.  But
>        at the rate he operates, exploits appear all along the
>        patch-flow pipeline because so much stuff goes unchecked.
> 
>        Joe may be scan the changes he's merged before committing but,
>        if his rate is high, that scan *must*, out of biological and
>        physical necessity, be shallow.   Exploits can occur on the
>        submitter machine, in the communication channel, and on Joe's 
>        machine.   Social exploits can occur because of the separation
>        between a submitter saying "this is what I'm doing" vs. the reality
>        of what the submitter is doing.

Now pray tell how Joe signing one, two, three, or none of the things he is
juggling makes any difference here.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply

* Re: More problems...
From: Daniel Barkalow @ 2005-04-29 21:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Ryan Anderson, Petr Baudis, Russell King, git
In-Reply-To: <7vhdhp47hq.fsf@assigned-by-dhcp.cox.net>

On Fri, 29 Apr 2005, Junio C Hamano wrote:

> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> Absolutely. I use the same "git-pull-script" between two local directories 
> LT> on disk...
> LT> Of course, I don't bother with the linking. But that's the trivial part.
> 
> Would it be useful if somebody wrote local-pull.c similar to
> http-pull.c, which clones one local SHA_FILE_DIRECTORY to
> another, with an option to (1) try hardlink and if it fails
> fail; (2) try hardlink and if it fails try symlink and if it
> fails fail; (3) try hardlink and if it fails try copy and if it
> fails fail?

If someone does this, they should make a pull.c out of http-pull and
rpull; the logic for determining what you need to copy, given what you
have and what the user wants to have, should be shared.

(Note that some usage patterns only require the latest commit, or at least
can deal with fetching other stuff only when needed.)

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* [PATCH] The big git command renaming fallout fix.
From: Junio C Hamano @ 2005-04-29 21:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504291416190.18901@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Ok, I hate to do this, ...

Well, it was time.  This fixes the git-export which calls diff-tree.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
cd /opt/packrat/playpen/public/in-place/git/git.linus/
show-diff -p export.c
--- k/export.c  (mode:100644)
+++ l/export.c  (mode:100644)
@@ -18,7 +18,7 @@ void show_commit(struct commit *commit)
 		char *against = sha1_to_hex(commit->parents->item->object.sha1);
 		printf("\n\n======== diff against %s ========\n", against);
 		fflush(NULL);
-		sprintf(cmdline, "diff-tree -p %s %s", against, hex);
+		sprintf(cmdline, "git-diff-tree -p %s %s", against, hex);
 		system(cmdline);
 	}
 	printf("======== end ========\n\n");





^ permalink raw reply

* Re: [PATCh] jit-trackdown
From: Daniel Barkalow @ 2005-04-29 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Greaves, GIT Mailing Lists
In-Reply-To: <7voebx4dyd.fsf@assigned-by-dhcp.cox.net>

On Fri, 29 Apr 2005, Junio C Hamano wrote:

> Have toilet side gitters reached a concensus (or semi-concensus)
> on how things under .git/ should be organized?  Is there a
> summary somewhere, something along the following lines?

I've made a proposal like the following:

.git/
  objects/    (traditional)
  refs/       Directories of hex SHA1 + newline files
    heads/    Commits which are heads of various sorts
    tags/     Tags, by the tag name (or some local renaming of it)
  info/       Other shared information
    remotes
  ...         Everything else isn't shared
  HEAD        Symlink to refs/heads/<something>

The plumbing doesn't care what you name heads or tags, but expects things
to be in heads to be commit objects and tags to be tag objects (which can
tag whatever).

AFAICT, there is general concensus that this is how things should be, but
I haven't convinced Linus that the plumbing should know about anything
other than objects/.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: The big git command renaming..
From: Dave Jones @ 2005-04-29 21:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504291416190.18901@ppc970.osdl.org>

On Fri, Apr 29, 2005 at 02:24:43PM -0700, Linus Torvalds wrote:
 > 
 > Ok, I hate to do this, since my fingers have already gotten used to the 
 > old names, but we clearly can't continue to use command names like 
 > "update-cache" or "read-tree" that are totally non-git-specific.
 > 
 > So I just pushed out a change that renames the commands to always have a 
 > "git-" prefix. In addition, I renamed "show-diff" to "diff-files", with 
 > together with the prefix means that it becomes "git-diff-files" when used.
 > 
 > Since I end up using tab-completion for almost all my work, and since
 > -within- the source directory there's no confusion, I didn't actually name
 > the source files with any git- prefix. Quite the reverse: I removed the
 > prefix from the two .c files that already had it (so git-mktag.c is now
 > just "mktag.c"), and the general rule for building the executable from a C 
 > file is now
 > 
 > 	git-%: %.c $(LIB_FILE)
 > 		$(CC) $(CFLAGS) -o $@ $(filter %.c,$^) $(LIBS)
 > 
 > 
 > this seemed to be a nice regular interface that means that binaries get 
 > installed with clear "git-" prefixes, but that I don't have to look at 
 > them when I edit the sources.

Can you push out a new tarball to kernel.org too please, to kill
some potential confusion in documentation/scripts ?

		Dave



^ permalink raw reply

* Re: More problems...
From: Russell King @ 2005-04-29 21:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Ryan Anderson, Petr Baudis, git
In-Reply-To: <7vhdhp47hq.fsf@assigned-by-dhcp.cox.net>

On Fri, Apr 29, 2005 at 02:07:29PM -0700, Junio C Hamano wrote:
> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> Absolutely. I use the same "git-pull-script" between two local directories 
> LT> on disk...
> LT> Of course, I don't bother with the linking. But that's the trivial part.
> 
> Would it be useful if somebody wrote local-pull.c similar to
> http-pull.c, which clones one local SHA_FILE_DIRECTORY to
> another, with an option to (1) try hardlink and if it fails
> fail; (2) try hardlink and if it fails try symlink and if it
> fails fail; (3) try hardlink and if it fails try copy and if it
> fails fail?

What would be nice is if it finds an existing file for the one it's
trying to hard link, it compares the contents (maybe - is this actually
necessary?) and if identical, it removes the original file replacing
it with a hard link.

This means that you'll always be trying to maintain the hard linked
structure between various working trees in the background.

But maybe this should have an option to enable this behaviour.

-- 
Russell King


^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Matt Mackall @ 2005-04-29 21:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sean, linux-kernel, git
In-Reply-To: <Pine.LNX.4.58.0504291338540.18901@ppc970.osdl.org>

On Fri, Apr 29, 2005 at 01:49:18PM -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 29 Apr 2005, Matt Mackall wrote:
> > 
> > The changeset log (and everything else) has an external index.
> 
> I don't actually know exactly how the BK changeset file works, but your 
> explanation really sounds _very_ much like it.

I've never used BK, but I got the impression that it was all SCCS
under the covers, which means adding stuff and reconstructing random
versions is expensive (just as it is in CVS). The split between index
and data in Mercurial is intended to address that.
 
> I didn't want to do anything that even smelled of BK. Of course, part of
> my reason for that is that I didn't feel comfortable with a delta model at
> all (I wouldn't know where to start, and I hate how they always end up
> having different rules for "delta"ble and "non-delta"ble objects).

There aren't really any such rules here. While the index contains a
full DAG, the deltas are done opportunistically on a linearized
(topologically sorted) version of it. We try to make a delta against
the previous tip (regardless of whether or not it's the parent), and
if that is a win, we store it.

> So it sounds like it could work fine, but it in fact sounds so much like 
> the ChangeSet file that I'd personally not have done it that way. 

Well I originally set out to do it differently, but I decided my
current approach was the fastest route to something that actually
worked.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply

* The big git command renaming..
From: Linus Torvalds @ 2005-04-29 21:24 UTC (permalink / raw)
  To: Git Mailing List


Ok, I hate to do this, since my fingers have already gotten used to the 
old names, but we clearly can't continue to use command names like 
"update-cache" or "read-tree" that are totally non-git-specific.

So I just pushed out a change that renames the commands to always have a 
"git-" prefix. In addition, I renamed "show-diff" to "diff-files", with 
together with the prefix means that it becomes "git-diff-files" when used.

Since I end up using tab-completion for almost all my work, and since
-within- the source directory there's no confusion, I didn't actually name
the source files with any git- prefix. Quite the reverse: I removed the
prefix from the two .c files that already had it (so git-mktag.c is now
just "mktag.c"), and the general rule for building the executable from a C 
file is now

	git-%: %.c $(LIB_FILE)
		$(CC) $(CFLAGS) -o $@ $(filter %.c,$^) $(LIBS)


this seemed to be a nice regular interface that means that binaries get 
installed with clear "git-" prefixes, but that I don't have to look at 
them when I edit the sources.

Sorry to everybody else whose fingers have already learnt the old names. 
The good news is that if you use cogito, you won't care.

		Linus

^ permalink raw reply

* Re: git network protocol
From: Daniel Barkalow @ 2005-04-29 21:15 UTC (permalink / raw)
  To: David Lang; +Cc: git
In-Reply-To: <Pine.LNX.4.62.0504291333550.7439@qynat.qvtvafvgr.pbz>

On Fri, 29 Apr 2005, David Lang wrote:

> would it make sense for the network git protocol to be something along the 
> lines of
> 
> client contacts server and sends
> the tag you want to sync with (defaults to head)
> the local index file

Actually, you really want to have a bidirectional interaction, where the
client first fetches the info to determine where to start, and then goes
through the reachable space, asking for anything it doesn't already have.

(In the long run, we want to keep track of some things we already have all
of, or know we're missing, etc., so the receiver side doesn't have to
look over its whole tree.)

git already includes two versions of this protocol; the first runs against
a static HTTP server, and the second uses ssh to get a socket. At some
point, I'm going to enable these programs to read and write
.git/refs/?/? to figure out what they're supposed to get.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: More problems...
From: Junio C Hamano @ 2005-04-29 21:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ryan Anderson, Petr Baudis, Russell King, git
In-Reply-To: <Pine.LNX.4.58.0504291311320.18901@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Absolutely. I use the same "git-pull-script" between two local directories 
LT> on disk...
LT> Of course, I don't bother with the linking. But that's the trivial part.

Would it be useful if somebody wrote local-pull.c similar to
http-pull.c, which clones one local SHA_FILE_DIRECTORY to
another, with an option to (1) try hardlink and if it fails
fail; (2) try hardlink and if it fails try symlink and if it
fails fail; (3) try hardlink and if it fails try copy and if it
fails fail?

Then from a source repository that contains good stuff plus
throwaway experimental commits you can prepare pruned for-public
tree.  Of course you can do it today by copying and then running
git-prune in the destination, though.



^ permalink raw reply

* Next problem: cg-commit
From: Russell King @ 2005-04-29 20:51 UTC (permalink / raw)
  To: git

Unfortunately, cg-commit seems to return wrong exit status, returning
1 on success.  Eg:

$ cg-commit
arch/arm/mach-ixp2000/pci.c
include/asm-arm/arch-ixp2000/platform.h
Enter commit message, terminated by ctrl-D on a separate line:
blah blah blah
Committed as fafb525292acc9c0818b91b1d8e58cf770616542.
$ echo $?
1

It appears that [ "$merging" ] towards the end of cg-commit is the
cause of this odd behaviour.  Force zero exit status, since we
successfully completed.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>

--- cg-commit.old	2005-04-26 04:02:01.000000000 +0100
+++ cg-commit	2005-04-29 21:47:57.161333483 +0100
@@ -114,6 +114,7 @@
 	echo "Committed as $newhead."
 	echo $newhead >.git/HEAD
 	[ "$merging" ] && rm .git/merging .git/merging-sym .git/merge-base
+	exit 0
 else
 	die "error during commit (oldhead $oldhead, treeid $treeid)"
 fi


-- 
Russell King


^ permalink raw reply

* Re: Odd decision of git-pasky-0.7 to do a merge
From: Russell King @ 2005-04-29 20:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504291043060.18901@ppc970.osdl.org>

On Fri, Apr 29, 2005 at 10:44:29AM -0700, Linus Torvalds wrote:
> On Fri, 29 Apr 2005, Russell King wrote:
> > Why it decided that a merge was necessary is beyond me.  Any ideas?
> > Did Linus forget to merge his tree properly?
> 
> It looks like it was unable to find the right common ancestor.
> 
> If you only had my stuff in it, the common ancestor _should_ have been the 
> parent (c60c390620e0abb60d4ae8c43583714bda27763f), which _should_ have 
> been your old top.
> 
> But maybe merge-base didn't work right?

Yup - pasky-0.7 came out with some weird commit-id, but cogito-0.8
got it right.  Now using cogito-0.8 here, so I'm no longer concerned
about this particular problem.

-- 
Russell King


^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Linus Torvalds @ 2005-04-29 20:49 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Sean, linux-kernel, git
In-Reply-To: <20050429202341.GB21897@waste.org>



On Fri, 29 Apr 2005, Matt Mackall wrote:
> 
> The changeset log (and everything else) has an external index.

I don't actually know exactly how the BK changeset file works, but your 
explanation really sounds _very_ much like it.

I didn't want to do anything that even smelled of BK. Of course, part of
my reason for that is that I didn't feel comfortable with a delta model at
all (I wouldn't know where to start, and I hate how they always end up
having different rules for "delta"ble and "non-delta"ble objects).

But another was that exactly since I've been using BK for so long, I
wanted to make sure that my model just emulated the way I've been _using_
BK, rather than any BK technical details.

So it sounds like it could work fine, but it in fact sounds so much like 
the ChangeSet file that I'd personally not have done it that way. 

			Linus

^ permalink raw reply

* Re: Val Henson's critique of hash-based content storage systems
From: Morten Welinder @ 2005-04-29 20:47 UTC (permalink / raw)
  To: Rob Jellinghaus; +Cc: git
In-Reply-To: <loom.20050429T015434-928@post.gmane.org>

On 4/28/05, Rob Jellinghaus <robj@unrealities.com> wrote:
> I assume most people here have read this, but just in case:
> 
> http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson.pdf

The math in section 3 is bogus.  1-(1-2^-b)^n  isn't hard to compute and
even if it was, it is the wrong formula.  (Set n==2^b; you obviously should
get probability 1 for collision.)

The right formula is 1-B!/B^n/(B-n)! where B=2^n.  For n=2^80 and b=160
you get about 39%.

Morten

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Tom Lord @ 2005-04-29 20:44 UTC (permalink / raw)
  To: noel; +Cc: noel, seanlkml, git
In-Reply-To: <20050429202117.GA15417@uglybox.localnet>



  > Your example had Joe reviewing a signed diff, and then applying changes
  > from a tree that "supposedly" had the diff applied correctly, but may
  > have been corrupted. If the tree was not an accurate representation of
  > applying the diff, then the changes Joe applied to his tree will be
  > different than those that he reviewed.

That's right.   I'm saying that Joe needn't rely on the tree at all since
he should be having his tools verify its contents anyway.  Given that, 
he may as well have his tools *generate* the tree.  Having generated the tree,
it's gravy to then verify that it matches the tree the submitter thought he
was sending -- that's a *secondary* checksum where `git' currently uses
it as primary.


  > My example had Joe downloading a remote signed tree, reviewing the changes
  > locally between his own trusted tree and the remote tree, 

In the real world, that "review" step is the weak link.  When it goes
wrong, the first step is to make sure we are reviewing a tree everyone
involved *intended* -- and it's only with signed diffs adding up to
that tree that we get there.

-t

^ permalink raw reply

* git network protocol
From: David Lang @ 2005-04-29 20:42 UTC (permalink / raw)
  To: git
In-Reply-To: <20050429202117.GA15417@uglybox.localnet>

would it make sense for the network git protocol to be something along the 
lines of

client contacts server and sends
the tag you want to sync with (defaults to head)
the local index file

then the server can use the git tools locally to figure out what objects 
need to be sent to do the merge and only send those objects.

no this isn't as efficiant as only sending diffs, but it avoids sending 
any objects that aren't needed (which would be sent if you just did a 
straight rsync)

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Matt Mackall @ 2005-04-29 20:39 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, linux-kernel, git
In-Reply-To: <20050429203027.GK17379@opteron.random>

On Fri, Apr 29, 2005 at 10:30:27PM +0200, Andrea Arcangeli wrote:
> On Thu, Apr 28, 2005 at 11:01:57PM -0700, Matt Mackall wrote:
> > change nodes so you've got to potentially traverse all the commits to
> > reconstruct a file's history. That's gonna be O(top-level changes)
> > seeks. This introduces a number of problems:
> > 
> > - no way to easily find previous revisions of a file
> >   (being able to see when a particular change was introduced is a
> >   pretty critical feature)
> > - no way to do bandwidth-efficient delta transfer
> > - no way to do efficient delta storage
> > - no way to do merges based on the file's history[1]
> 
> And IMHO also no-way to implement a git-on-the-fly efficient network
> protocol if tons of clients connects at the same time, it would be
> dosable etc... At the very least such a system would require an huge
> amount of ram. So I see the only efficient way to design a network
> protocol for git not to use git, but to import the data into mercurial
> and to implement the network protocol on top of mercurial.
> 
> The one downside is that git is sort of rock solid in the way it stores
> data on disk, it makes rsync usage trivial too, the git fsck is reliable
> and you can just sign the hash of the root of the tree and you sign
> everything including file contents. And of course the checkin is
> absolutely trivial and fast too.

Mercurial is ammenable to rsync provided you devote a read-only
repository to it on the client side. In other words, you rsync from
kernel.org/mercurial/linus to local/linus and then you merge from
local/linus to your own branch. Mercurial's hashing hierarchy is
similar to git's (and Monotone's), so you can sign a single hash of
the tree as well.

> With a more efficient diff-based storage like mercurial we'd be losing
> those fsck properties etc.. but those reliability properties don't worth
> the network and disk space they take IMHO, and the checkin time
> shouldn't be substantially different (still running in O(1) when
> appending at the head). And we could always store the hash of the
> changeset, to give it some basic self-checking.

I think I can implement a decent repository check similar to git, it's
just not been a priority.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply

* Re: Val Henson's critique of hash-based content storage systems
From: C. Scott Ananian @ 2005-04-29 20:41 UTC (permalink / raw)
  To: Tom Lord; +Cc: git, robj
In-Reply-To: <200504292037.NAA28344@emf.net>

On Fri, 29 Apr 2005, Tom Lord wrote:

> My point is simply that blob-db implementations should assume that the
> mathemeticians will succeed and take the small steps necessary to make
> sure that those bitstrings can't be used to crash a distributed
> blob-db infrastructure.

And my point is that you haven't *begun* to describe how one might use an 
arbitrary hash collision to "crash a distributed blob-db infrastructure".

Remember, first you've got to get some reference to your collision into 
the db...  (and if you can do that, why are you mucking around with hash 
collisions?)
   --scott

Philadelphia PBPRIME STANDEL for Dummies milita Richard Tomlinson 
ESSENCE SUMAC Nader KUCLUB WSHOOFS QKENCHANT AK-47 AMQUACK supercomputer
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: Val Henson's critique of hash-based content storage systems
From: Tom Lord @ 2005-04-29 20:37 UTC (permalink / raw)
  To: cscott; +Cc: git, robj
In-Reply-To: <Pine.LNX.4.61.0504291608410.32145@cag.csail.mit.edu>


  lord:

  > I would expect someone to have on hand a small number of blobs that are
  > different but have different hashes and, eventually, to drop said files
  > into a blob-based infrastructure to wreak havoc.

  cscott:
  
  This is just ridiculous.  The number of known collisions in SHA1 is 
  *exactly zero* at this point in time --- not guaranteed to stay that way, 
  of course, but generating collisions is likely to remain relatively 
  expensive for some time.

Blob-dbs and the low-level object system (trees, file-contents, and
changesets) are pretty fundamental things.  It is likely (and
desirable) -- not guaranteed but likely (and desirable) -- that people
will invest heavily in building infrastructure that operates solely at
that level of abstraction.  Arguably, that is already happening.

Simultaneously, it is very desirable that some mathemetican somewhere
will discover two bitstrings which are different but have SHA1
checksums, and then tell everyone in the world about their discovery.

My point is simply that blob-db implementations should assume that the
mathemeticians will succeed and take the small steps necessary to make
sure that those bitstrings can't be used to crash a distributed
blob-db infrastructure.

-t



^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Morgan Schweers @ 2005-04-29 20:16 UTC (permalink / raw)
  To: git; +Cc: Tom Lord
In-Reply-To: <200504291954.MAA27561@emf.net>

Greetings,

On 4/29/05, Tom Lord <lord@emf.net> wrote:
> 
> 
>   > Call me a naive git, but seems to me the "git way" is a little
>   > different. It's tree-based rather than diff-based, and doesn't involve
>   > passing diffs around, right?
> 
> Isn't that a significant part of what I said?  Go back and read more
> carefully, is my suggestion.
> 
>   > Or am I missing something?
> 
> Very much so.

It doesn't appear that he is.  You appeared to predicate your argument
on the 'auditor' believing a diff looks good, but getting a tree
instead, that might not reflect the diff.

Instead, in the git-world, the auditor actually gets a tree, and
produces the diff themselves, and then decides whether the diff looks
good enough to keep.

The argument about the high velocity of git-transfers causing the
inability to check doesn't appear to apply here, because the
distributed development environment of Linux says that the
'gatekeepers' ARE in fact validating the changes from people in their
area of expertise are good (or are relying on sub-gatekeepers), and
then Linus is trusting them completely.

This seems like the methodology that has been used up until now via bk
previously.  Git doesn't change that, and in fact supports that method
of development.

Your further suggestion that Linus could be replaced by a
patch-manager, in that case, got a chuckle from me at least, but the
more serious point is that Linus is necessary as the arbiter of who
actually receives the absolute trust of a gatekeeper.  He is, in
effect, a meta-gatekeeper.

> -t

In reading this conversation, it seems you're looking for a more
absolute standard of trust than the kernel developers are working
with.  I believe this is an example of 'good enough' process being
accepted, versus 'perfect' process.

--  Morgan Schweers

^ permalink raw reply

* Re: More problems...
From: Linus Torvalds @ 2005-04-29 20:21 UTC (permalink / raw)
  To: Ryan Anderson; +Cc: Petr Baudis, Russell King, git
In-Reply-To: <20050429195055.GE1233@mythryan2.michonline.com>



On Fri, 29 Apr 2005, Ryan Anderson wrote:
> 
> Why not just use "rsync" for both remote and local synchronization, and
> provide a "relink" command to scan two .git/objects/ repositories and
> hardlink matching files together?

Absolutely. I use the same "git-pull-script" between two local directories 
on disk. The only issue there is that you have to give the ".git" 
directory, ie you should do

	git-pull-script ~/by/other/repository/.git

instead of pointing to the other repo's root.

Of course, I don't bother with the linking. But that's the trivial part.

> With the SHA1 hash, you can even have a --unsafe option that just
> compares the has names and does a link based purely off of that and the
> stat(2) results of both files.  (I'd expect that a ... safer variant
> would extract both files and compare them, but the --unsafe should be
> sufficient, in practice, I would think.)

I don't think there is any point to unsafe. The assumption is that if you 
do things this way, the "unlinked" files will the the uncommon case, so 
what you do is

 - remember the list of files you copied when you did the pull (you had to 
   have this list at some point anyway). Sort by name,
 - create a list of names of both repositories, sorted by name
 - do the union of those three lists (cheap, thanks to the sorting)
 - stat each name to see if it's already linked (which it will be, most of 
   the time), continue to the next one..
 - if they aren't linked, just do a "cmp" on them, and warn if they aren't 
   the same, continue to the next one.
 - else link them.

And if you want to, you can skip the first stage, and just relink two
trees without looking at a list of "known new" files - it's going to be
expensive to link two big repositories the _first_ time, but hey even the
"expensive" part is likely to be pretty cheap in the end. If it takes an
hour or two to relink some years of history, big deal. Do it overnight,
you only need it once.

		Linus

^ permalink raw reply

* Signed commit vulnerabilities? (was: Mercurial 0.4b vs git patchbomb benchmark)
From: Kevin Smith @ 2005-04-29 20:29 UTC (permalink / raw)
  To: Tom Lord; +Cc: git
In-Reply-To: <200504291954.MAA27561@emf.net>

Tom Lord wrote:
>   > Call me a naive git, but seems to me the "git way" is a little
>   > different. It's tree-based rather than diff-based, and doesn't involve
>   > passing diffs around, right?
> 
> Isn't that a significant part of what I said?  Go back and read more
> carefully, is my suggestion.
> 
>   > Or am I missing something?
> 
> Very much so.

So far, this is a frustrating conversation to watch. Here's my own
interpretation, presented to help the participants understand whether or
not their intended messages are getting through clearly.

Originally, Tom seemed to claim that the problem was that git requires
you to sign an entire tree, rather than a diff, even though the signer
is only vouching for their diff.

Linus responded by saying that a git signature of a tree would match
that description, but signing a commit is different. I think he claimed
that (by convention) signing a commit ONLY means you are signing the
most recent change, which turned tree A into tree B.

Tom then appeared to propose some specific attacks that could work
against the git model. The precondition seems to be if the patch
receiver does not exhaustively analyze each and every patch. The
receiver trusts the contents based solely on who signed the commit object.

One category of attacks were that a computer or communication channel
was broken. It's not immediately clear to me how git's model contributes
any weakness to these cases, compared to other signing strategies.

The other category of attack mentioned was social, such as a signer
creating a patch that claims to do one thing, but actually does another.
Again, I don't see how git is weaker in this case than any other tool.

Noel then pointed out that in practice, someone receiving a signed
commit in git would view the commit comments and the diff, so the effect
is similar to having the diff itself be signed.

And that's where we are right now. So, from here, it looks like Tom
needs to be more specific about which attacks might be more effective
against git's signing strategy than against signed diffs.

Kevin

^ permalink raw reply

* RE: Mercurial 0.4b vs git patchbomb benchmark
From: Tom Lord @ 2005-04-29 20:26 UTC (permalink / raw)
  To: Andrew.Timberlake-Newell; +Cc: noel, seanlkml, git
In-Reply-To: <000e01c54cf7$f61ee4a0$9b11a8c0@allianceoneinc.com>



  > It looks to me like he did read carefully.

  > There were two different ideas:
  >   TL)  Passing tree & diff and trusting diff to create tree
  >   NM)  Passing tree and generating diff versus local tree for review

Well, I guess *you* didn't read carefully.  I also spoke about the
value of passing around triples: ancestry, diff, and tree.  The
question is about linking signatures to things that humans can
reasonably *intend* and be reasonably held accountable for, hence one
of the values of signed diffs.  (I cited other practical reasons to
value signed diffs and use them in specific ways, too.)

-t

^ permalink raw reply

* RE: Mercurial 0.4b vs git patchbomb benchmark
From: Andrew Timberlake-Newell @ 2005-04-29 20:13 UTC (permalink / raw)
  To: 'Tom Lord', noel; +Cc: seanlkml, git
In-Reply-To: <200504291954.MAA27561@emf.net>

Tom Lord responded to Noel Maddy: 
>   > Call me a naive git, but seems to me the "git way" is a little
>   > different. It's tree-based rather than diff-based, and doesn't involve
>   > passing diffs around, right?
> 
> Isn't that a significant part of what I said?  Go back and read more
> carefully, is my suggestion.

It looks to me like he did read carefully.

There were two different ideas:
   TL)  Passing tree & diff and trusting diff to create tree
   NM)  Passing tree and generating diff versus local tree for review

Maybe I'm reading them wrong, but that certainly looks like what each was
expressing and they don't look like the same thing.



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox