Git development

Git development
 help / color / mirror / Atom feed

* cogito cg-update fails
From: Benjamin Herrenschmidt @ 2005-05-03  3:19 UTC (permalink / raw)
  To: Git Mailing List

Hi Folks !

I have something weird happening with cogito. What I did is:

 - d/l & install 0.8 archive
 - cg-init <rync path>
 - built & install that, removed 0.8 files
 - a bit later: cg-update origin to check for new stuffs

The last one fails with:

benh@pogo:~/cogito$ cg-update origin
MOTD:
MOTD:   .../.. stripped kernel.org legal blurb

receiving file list ... done
.git/refs/heads/origin

sent 119 bytes  received 857 bytes  390.40 bytes/sec
total size is 41  speedup is 0.04
rsync: link_stat "/home/benh/cogito/origin/objects/." failed: No such file or directory (2)
building file list ... done
rsync error: some files could not be transferred (code 23) at main.c(702)

sent 17 bytes  received 20 bytes  74.00 bytes/sec
total size is 0  speedup is 0.00
cg-pull: objects pull failed

So it looks like it's trying to rsync to a bogus destination ...

Ben.

^ permalink raw reply

* cogito "origin" vs. HEAD
From: Benjamin Herrenschmidt @ 2005-05-03  3:24 UTC (permalink / raw)
  To: Git Mailing List

Hi !

So it's my understanding that linus only uses this HEAD that symlinks
to .git/refs/heads/master as his head of tree.

However, when using cogito, it creates another one here called "origin"
that matches the "origin" branch (I don't like "branch" here, it's more
like a source of objects than a branch...) locally. Is this actually the
content of the remote's HEAD or is git also looking for a remote
"refs/heads/origin" ?

So when I later do cg-pull or cg-update origin to update, my "origin"
pointer is updated I suppose to the new head of the remote repository,
does it also update my local "refs/heads/master" ? Or not ? What happens
to it ? does anything will use my local HEAD -> refs/heads/master/
ever ? If I want to publish my tree, what will remote cogito's try to
rsync down ? HEAD ? origin ?

Ben.

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Matt Mackall @ 2005-05-03  3:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bill Davidsen, Morten Welinder, Sean, linux-kernel, git
In-Reply-To: <Pine.LNX.4.58.0505021932270.3594@ppc970.osdl.org>

On Mon, May 02, 2005 at 07:48:29PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 2 May 2005, Matt Mackall wrote:
> > 
> > Umm.. I am _not_ calculating the SHA of the delta itself. That'd be
> > silly.
> 
> It's not silly.

The delta is not the object I care about and its representation is
arbitrary. In fact different branches will store different deltas
depending on how their DAGs get topologically sorted. The object I
care about is the original text, so that's the hash I store.

> In other words, you need to hash the metadata too. Otherwise how do you
> consistency-check the _collection_ of files?

Well naturally, I hash the metadata too. For every change, there's a
toplevel changeset hash that is the hash of the entire project state
at that time. And it's all signable and so on. Just like git and just
like Monotone.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply

* RFC: adding xdelta compression to git
From: Alon Ziv @ 2005-05-03  3:57 UTC (permalink / raw)
  To: git

Looking for novel methods of wasting my time :), I am considering adding 
xdelta to git.

I have two concrete proposals, both of which (IMO) are consistent with the git 
philosophy:

1. Add a git-deltify command, which will take two trees and replace the second 
tree's blobs with delta-blobs referring to the first tree. Each delta-blob is 
self-contained; from the outside it looks like any other blob, but internally 
it contains another blob reference + an xdelta. The only function which would 
need to understand the new format would be unpack_sha1_file.
The scripting level will be in charge of deciding which trees to deltify (or 
undeltify--we could also have a "git-undeltify" command). A sane 
deltification schedule, for example, could be to always keep tagged versions 
as stand-alone objects, and deltify intermediate versions against the latest 
tag. It would also do its best to avoid delta chains (i.e. a delta referring 
to another delta).
Pros:
* Interoperates with the existing structure (including pull/push) with almost 
no changes to existing infrastructure.
Cons:
* Changes the repository format.
* Some performance impact (probably quite small).
* Same blob may have different representation in two repositories (one 
compressed, on deltified). [I am not sure this is really a bad thing...]

2. Add a completely external framework which manages a "deltas repository" of 
deltas. The shadow repository will contain delta objects between selected 
trees; again the scripts will need to populate it.
Pros:
* No changes at all to existing code.
Cons:
* Push/pull tools will need to be taught to talk with the new "deltas  
repository".
* Synchronization between the deltas repository and the real one may be lost, 
leading to odd failures.

Personally I'm rooting for #1 above... I would like to begin implementation in 
a few days, so any discussion will be useful.

	-az

^ permalink raw reply

* Re: cogito cg-update fails
From: Steven Cole @ 2005-05-03  3:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Git Mailing List
In-Reply-To: <1115090374.6030.50.camel@gaston>

On Monday 02 May 2005 09:19 pm, Benjamin Herrenschmidt wrote:
> Hi Folks !
> 
> I have something weird happening with cogito. What I did is:
> 
>  - d/l & install 0.8 archive
>  - cg-init <rync path>
>  - built & install that, removed 0.8 files
>  - a bit later: cg-update origin to check for new stuffs
> 
> The last one fails with:
> 
> benh@pogo:~/cogito$ cg-update origin
> MOTD:
> MOTD:   .../.. stripped kernel.org legal blurb
> 
> receiving file list ... done
> .git/refs/heads/origin
> 
> sent 119 bytes  received 857 bytes  390.40 bytes/sec
> total size is 41  speedup is 0.04
> rsync: link_stat "/home/benh/cogito/origin/objects/." failed: No such file or directory (2)
> building file list ... done
> rsync error: some files could not be transferred (code 23) at main.c(702)
> 
> sent 17 bytes  received 20 bytes  74.00 bytes/sec
> total size is 0  speedup is 0.00
> cg-pull: objects pull failed
> 
> So it looks like it's trying to rsync to a bogus destination ...
> 
> Ben.
> 

Yeah, I got exactly the same behavior a little while ago, but thanks
to www.kernel.org/git, I saw that the problem had been found and fixed.

I had an older backup copy of all the cogito scripts, so I used that to update.

I believe the fix is this patch:

===== cd7a12e5d569d59a04823114c275a83d65b9f37e vs 437167273f77c0d5f039280d158b43324a79f820 =====
--- a/cg-pull
+++ b/cg-pull
@@ -48,7 +48,7 @@ fetch_rsync () {
 }
 
 pull_rsync () {
-	fetch_rsync -s -u -d "$1/objects" ".git/objects"
+	fetch_rsync -s -u -d "$2/objects" ".git/objects"
 }
 
 
Hope this helps,
Steven

^ permalink raw reply

* Re: RFC: adding xdelta compression to git
From: Nicolas Pitre @ 2005-05-03  4:12 UTC (permalink / raw)
  To: Alon Ziv; +Cc: git
In-Reply-To: <200505030657.38309.alonz@nolaviz.org>

On Tue, 3 May 2005, Alon Ziv wrote:

> Looking for novel methods of wasting my time :), I am considering adding 
> xdelta to git.
> 
> I have two concrete proposals, both of which (IMO) are consistent with the git 
> philosophy:
> 
> 1. Add a git-deltify command, which will take two trees and replace the second 
> tree's blobs with delta-blobs referring to the first tree. Each delta-blob is 
> self-contained; from the outside it looks like any other blob, but internally 
> it contains another blob reference + an xdelta. The only function which would 
> need to understand the new format would be unpack_sha1_file.
[....]

Guess what?

That's exactly what I did, except that I used libxdiff, stripped it to 
the bare minimum and even optimized it to be as efficient (i.e. fast) as 
possible given the git environment.

I'm finalizing the code right now.


Nicolas

^ permalink raw reply

* Re: [PATCH] Add exclude file support to cg-status
From: Junio C Hamano @ 2005-05-03  4:15 UTC (permalink / raw)
  To: Matt Porter; +Cc: Petr Baudis, git
In-Reply-To: <20050502193343.A25462@cox.net>

>>>>> "MP" == Matt Porter <mporter@kernel.crashing.org> writes:

MP> My reasoning for not doing something like this was that there is
MP> only ever one exclude file.  In other instances of cogito specific
MP> data in the .git directory, there is a subdir named for the class
MP> of data being stored there (i.e. branches, refs).  In this case,
MP> it didn't seem necessary.  On the other hand, this made me
MP> wonder whether there should just be a .git/cginfo subdir where
MP> exclude, branches, refs, etc. all live under since they are
MP> cogito specfic functionality. Something like:

MP> .git/cginfo/

MP> 	    exclude
MP> 	    branches/
MP> 	    refs/

MP> and so on...

You may want to check the past thread, like this one:

    From: Daniel Barkalow <barkalow@iabervon.org>
    To: Junio C Hamano <junkio@cox.net>
    cc: David Greaves <david@dgreaves.com>, 
        GIT Mailing Lists <git@vger.kernel.org>
    Subject: Re: [PATCh] jit-trackdown
    Message-ID: <Pine.LNX.4.21.0504291730000.30848-100000@iabervon.org>

    On Fri, 29 Apr 2005, Junio C Hamano wrote:

    > Have toilet side gitters reached a concensus (or semi-concensus)
    > on how things under .git/ should be organized?  Is there a
    > summary somewhere, something along the following lines?

    I've made a proposal like the following:

    .git/
      objects/    (traditional)
      refs/       Directories of hex SHA1 + newline files
        heads/    Commits which are heads of various sorts
        tags/     Tags, by the tag name (or some local renaming of it)
      info/       Other shared information
        remotes
      ...         Everything else isn't shared
      HEAD        Symlink to refs/heads/<something>

    The plumbing doesn't care what you name heads or tags, but expects things
    to be in heads to be commit objects and tags to be tag objects (which can
    tag whatever).

    AFAICT, there is general concensus that this is how things should be, but
    I haven't convinced Linus that the plumbing should know about anything
    other than objects/.

            -Daniel


^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Linus Torvalds @ 2005-05-03  4:18 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Bill Davidsen, Morten Welinder, Sean, linux-kernel, git
In-Reply-To: <20050503032916.GE22038@waste.org>



On Mon, 2 May 2005, Matt Mackall wrote:
> 
> The delta is not the object I care about and its representation is
> arbitrary. In fact different branches will store different deltas
> depending on how their DAGs get topologically sorted. The object I
> care about is the original text, so that's the hash I store.

Ok. In that case, it sounds like you're really doing everything git is
doing, except your "blob" objects effectively can have pointers to a
previous object (and you have a different on-disk representation)?  Is
that correct?

			Linus

^ permalink raw reply

* Re: [RFC] git-diff-cache sans --cached and unmerged paths
From: Linus Torvalds @ 2005-05-03  4:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vr7gpnria.fsf@assigned-by-dhcp.cox.net>

On Mon, 2 May 2005, Junio C Hamano wrote:
> 
>     git-diff-cache without --cached says 'U filename" (or "unmerged
> filename") when working with an unmerged cache entry. Since the form
> without --cached is to mean "look at the work tree", I think it should
> be changed to report the mode and the magic 0{40} SHA1.  What do you
> think?

Hmm.. I like the "it's unmerged, so we report it that way" thing, but on 
the other hand, what you describe as your workflow:

> I was manually fixing up a merge and I wanted to compare the
> merge result in the work tree with the pre-merge HEAD version
> from either heads, but this behaviour (yes I am the guilty one)
> makes it cumbersome, and that is the reason behind this
> question.

This seems to make sense as a workflow perspective, so I guess I'll have 
to agree. A unmerged file that exists in the working tree should probably 
get reported as the working tree version, not as unmerged.

If it's not in the working tree at all, I'd assume that "U" is the right 
thing to do (and obviously, with "--cached" there is no question about 
it).

> BTW, when you have a chance, could you please give the
> executable bit to git-apply-patch-script, pretty please.  This
> is my fourth attempt ;-).

I just sent you a description of the problem I had with your tree, so 
maybe fifth time lucky..

		Linus

^ permalink raw reply

* Re: [PATCH] Add exclude file support to cg-status
From: Matt Porter @ 2005-05-03  4:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7v4qdlndw0.fsf@assigned-by-dhcp.cox.net>

On Mon, May 02, 2005 at 09:15:43PM -0700, Junio C Hamano wrote:
> You may want to check the past thread, like this one:

<snip>

>     .git/
>       objects/    (traditional)
>       refs/       Directories of hex SHA1 + newline files
>         heads/    Commits which are heads of various sorts
>         tags/     Tags, by the tag name (or some local renaming of it)
>       info/       Other shared information
>         remotes
>       ...         Everything else isn't shared
>       HEAD        Symlink to refs/heads/<something>

Ok, I see I skimmed the archives way too fast after vacation. :)

I'll update the patch to match.

Thanks,
Matt

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Linus Torvalds @ 2005-05-03  4:24 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Bill Davidsen, Morten Welinder, Sean, linux-kernel, git
In-Reply-To: <20050503000011.GA22038@waste.org>

On Mon, 2 May 2005, Matt Mackall wrote:
> 
> It's still simple in Mercurial, but more importantly Mercurial _won't
> need it_. Dropping history is a work-around, not a feature.

Side note: this is what Larry thought about BK too. Until three years had
passed, and the ChangeSet file was many megabytes in size. Even slow
growth ends up being big growth in the end..

We had been talking about pruning the BK history as long back as a year 
ago.

		Linus

^ permalink raw reply

* Re: cogito cg-update fails
From: Benjamin Herrenschmidt @ 2005-05-03  4:23 UTC (permalink / raw)
  To: Steven Cole; +Cc: Git Mailing List
In-Reply-To: <200505022157.07800.elenstev@mesatop.com>


> 
> Yeah, I got exactly the same behavior a little while ago, but thanks
> to www.kernel.org/git, I saw that the problem had been found and fixed.
> 
> I had an older backup copy of all the cogito scripts, so I used that to update.
> 
> I believe the fix is this patch:

Thanks, fixed it ! Current top of tree seem to be fine too.

Cheers,
Ben.

> ===== cd7a12e5d569d59a04823114c275a83d65b9f37e vs 437167273f77c0d5f039280d158b43324a79f820 =====
> --- a/cg-pull
> +++ b/cg-pull
> @@ -48,7 +48,7 @@ fetch_rsync () {
>  }
>  
>  pull_rsync () {
> -	fetch_rsync -s -u -d "$1/objects" ".git/objects"
> +	fetch_rsync -s -u -d "$2/objects" ".git/objects"
>  }
>  
> 
> Hope this helps,
> Steven
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Matt Mackall @ 2005-05-03  4:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bill Davidsen, Morten Welinder, Sean, linux-kernel, git
In-Reply-To: <Pine.LNX.4.58.0505022123270.3594@ppc970.osdl.org>

On Mon, May 02, 2005 at 09:24:54PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 2 May 2005, Matt Mackall wrote:
> > 
> > It's still simple in Mercurial, but more importantly Mercurial _won't
> > need it_. Dropping history is a work-around, not a feature.
> 
> Side note: this is what Larry thought about BK too. Until three years had
> passed, and the ChangeSet file was many megabytes in size. Even slow
> growth ends up being big growth in the end..
> 
> We had been talking about pruning the BK history as long back as a year 
> ago.

Ok, I'll implement it on my red eye flight tonight. But Mercurial
won't suffer from the O(filesize) problem of BK.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply

* Re: [PATCH] Add exclude file support to cg-status
From: Matt Porter @ 2005-05-03  4:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7vd5s9nmio.fsf@assigned-by-dhcp.cox.net>

On Mon, May 02, 2005 at 06:09:19PM -0700, Junio C Hamano wrote:
> >>>>> "MP" == Matt Porter <mporter@kernel.crashing.org> writes:
> 
> MP> Adds a trivial per-repository exclude file implementation for
> MP> cg-status on top of the new git-ls-files option.
> 
>  
> MP> +EXCLUDEFILE=.git/exclude
> 
> Good intentions, but shouldn't the file be .git/info/exclude
> (i.e. under .git/info)?

Ok, here is the updated version.

Signed-off-by: Matt Porter <mporter@kernel.crashing.org>

--- aa6233be6d1b8bf42797c409a7c23b50593afc99/cg-status  (mode:100755 sha1:9e7f0e59284a3d15cda35bbd5579c44d8eda05d5)
+++ d69eece260c0c4fcd53991c1b37ac91b99962681/cg-status  (mode:100755 sha1:874504aa8cf9ab7076eb405e19995615b4f59eab)
@@ -7,8 +7,14 @@
 
 . cg-Xlib
 
+EXCLUDEFILE=.git/info/exclude
+EXCLUDE=
+if [ -f $EXCLUDEFILE ]; then
+	EXCLUDE="--exclude-from=$EXCLUDEFILE"
+fi
+
 {
-	git-ls-files -z -t --others --deleted --unmerged
+	git-ls-files -z -t --others --deleted --unmerged $EXCLUDE
 } | sort -z -k 2 | xargs -0 sh -c '
 while [ "$1" ]; do
 	tag=${1% *};

^ permalink raw reply

* Re: RFC: adding xdelta compression to git
From: Linus Torvalds @ 2005-05-03  4:52 UTC (permalink / raw)
  To: Alon Ziv; +Cc: git
In-Reply-To: <200505030657.38309.alonz@nolaviz.org>

On Tue, 3 May 2005, Alon Ziv wrote:
> 
> 1. Add a git-deltify command, which will take two trees and replace the second 
> tree's blobs with delta-blobs referring to the first tree.

If you do something like this, you want such a delta-blob to be named by 
the sha1 of the result, so that things that refer to it can transparently 
see either the original blob _or_ the "deltified" one, and will never 
care.

It seems that is your plan:

> from the outside it looks like any other blob, but internally it
> contains another blob reference + an xdelta.

Yes. git doesn't much care, as long as the objects unpack to the right 
format. That's all hidden away.

> The only function which would need to understand the new format would be
> unpack_sha1_file.

Yes. EXCEPT for one thing. fsck. I'd _really_ like fsck to be able to know
something about any xdelta objects, if only because if/when things go
wrong, it's really nasty to suddenly see a million "blob" objects not work
any more, with no indication of _why_ they don't work. The core reason may
be that one original object (that just got used as a base for tons of
other objects through deltas) is corrupt or missing. And then you want to
show that _one_ object.

> Cons:
> * Changes the repository format.

It wouldn't necessarily. You should be able to do this with _zero_ changes 
to existing objects what-so-ever.

What you do is introduce an "xdelta" object, which has a reference to a 
blob object and the delta. The git object model already names all objects 
by a simple ascii name, so adding a new object type in _no_ way changes 
any existing objects.

So you can just make "unpack_sha1_file()" notice that it unpacked a xdelta 
object, and then do the proper delta application, and nobody will ever be 
the wiser.

> * Some performance impact (probably quite small).

If you limit the depth of deltas, probably not too bad.

> * Same blob may have different representation in two repositories (one 
> compressed, on deltified). [I am not sure this is really a bad thing...]

THIS, I think, is the real issue. fsck-cache and pull etc, that needs to
know about references to other objects, would have to be able to see the
xdelta object, so that they can build up the reference graph. So you'd
need to basically make a "raw_unpack_sha1_file()" interface (the current
regular unpack_sha1_file()) for that.

Also, the fact is, since git saves things as separate files, you'd not win
as much as you would with some other backing store. So the second step is
to start packing the objects etc. I think there is actually a very steep
complexity edge here - not because any of the individual steps necessarily
add a whole lot, but because they all lead to the "next step".

I personally clearly feel that simplicity (and the resulting robustness)
is worth a _lot_ of disk-space.

So I think that what you suggest is likely to actually be pretty easy, but 
I'm not entirely convinced it's worth the slide into complexity.

		Linus

^ permalink raw reply

* Re: RFC: adding xdelta compression to git
From: Davide Libenzi @ 2005-05-03  5:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alon Ziv, git
In-Reply-To: <Pine.LNX.4.58.0505022131380.3594@ppc970.osdl.org>

On Mon, 2 May 2005, Linus Torvalds wrote:

> Yes. EXCEPT for one thing. fsck. I'd _really_ like fsck to be able to know
> something about any xdelta objects, if only because if/when things go
> wrong, it's really nasty to suddenly see a million "blob" objects not work
> any more, with no indication of _why_ they don't work. The core reason may
> be that one original object (that just got used as a base for tons of
> other objects through deltas) is corrupt or missing. And then you want to
> show that _one_ object.

Linus, xdelta-based algorithms already stores informations regarding the 
object that originated the diff. Since they have no context (like 
text-based diffs) and are simply based on offset-driven copy/insert 
operations, this is a requirement. Libxdiff uses an adler32+size of the 
original object, but you can get as fancy as you like in your own 
implementation. Before a delta patching, the stored information are cross 
checked with the input base object, and the delta patch will fail in the 
eventuality of mismatch. So an fsck is simply a walk backward (or forward, 
depending on your metadata model) of the whole delta chain.

- Davide

^ permalink raw reply

* Re: cogito cg-update fails
From: Petr Baudis @ 2005-05-03  6:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Steven Cole, Git Mailing List
In-Reply-To: <1115094227.6031.62.camel@gaston>

Dear diary, on Tue, May 03, 2005 at 06:23:46AM CEST, I got a letter
where Benjamin Herrenschmidt <benh@kernel.crashing.org> told me that...
> 
> > 
> > Yeah, I got exactly the same behavior a little while ago, but thanks
> > to www.kernel.org/git, I saw that the problem had been found and fixed.
> > 
> > I had an older backup copy of all the cogito scripts, so I used that to update.
> > 
> > I believe the fix is this patch:
> 
> Thanks, fixed it ! Current top of tree seem to be fine too.

Yes, sorry about this. I fixed it right away but then forgot to push it,
so it appears some people got it in the meantime. :-(

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: cogito "origin" vs. HEAD
From: Petr Baudis @ 2005-05-03  6:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Git Mailing List
In-Reply-To: <1115090660.6156.56.camel@gaston>

Dear diary, on Tue, May 03, 2005 at 05:24:19AM CEST, I got a letter
where Benjamin Herrenschmidt <benh@kernel.crashing.org> told me that...
> Hi !

Hi,

> So when I later do cg-pull or cg-update origin to update, my "origin"
> pointer is updated I suppose to the new head of the remote repository,
> does it also update my local "refs/heads/master" ? Or not ? What happens
> to it ? does anything will use my local HEAD -> refs/heads/master/
> ever ? If I want to publish my tree, what will remote cogito's try to
> rsync down ? HEAD ? origin ?

when accessing the remote repository, Cogito always looks for remote
refs/heads/master first - if that one isn't there, it takes HEAD, but
there is no correlation between the local and remote branch name. If you
want to fetch a different branch from the remote repository, use the
fragment identifier (see cg-help cg-branch-add).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: cogito "origin" vs. HEAD
From: Benjamin Herrenschmidt @ 2005-05-03  7:13 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Git Mailing List
In-Reply-To: <20050503064943.GB10244@pasky.ji.cz>

> when accessing the remote repository, Cogito always looks for remote
> refs/heads/master first - if that one isn't there, it takes HEAD, but
> there is no correlation between the local and remote branch name. If you
> want to fetch a different branch from the remote repository, use the
> fragment identifier (see cg-help cg-branch-add).

Ok, that I'm getting. So then, what happen of my local
refs/heads/<branchname> and refs/heads/master/ ? I'm still a bit
confused by the whole branch mecanism... It's my understanding than when
I cg-init, it creates both "master" (a head without matching branch)
and "origin" (a branch  + a head) both having the same sha1. It also
checks out the tree.

Now, when I cg-update origin, what happens exactly ? I mean, I know it's
pulls all objects, then get the master from the remote pointed by the
origin branch, but then, I suppose it updates both my local "origin" and
my local "master" pointer, right ? I mean, they are always in sync ? Or
is this related to what branch my current checkout is tracking ?

Ben.

^ permalink raw reply

* [PATCH] add the ability to create and retrieve delta objects
From: Nicolas Pitre @ 2005-05-03  8:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alon Ziv, git
In-Reply-To: <Pine.LNX.4.58.0505022131380.3594@ppc970.osdl.org>

On Mon, 2 May 2005, Linus Torvalds wrote:

> If you do something like this, you want such a delta-blob to be named by 
> the sha1 of the result, so that things that refer to it can transparently 
> see either the original blob _or_ the "deltified" one, and will never 
> care.

Yep, that's what I've done last weekend (and just made it actually 
work since people are getting interested).

==========

This patch adds the necessary functionalities to perform delta 
compression on objects.  It adds a git-mkdelta command which can replace 
any object with its deltafied version given a reference object.

Access to a delta object will transparently fetch the reference object 
and apply the transformation.  Scripts can be used to perform any sort 
of compression policy on top of it.

The delta generator has been extracted from libxdiff and optimized for 
git usage in order to avoid as much data copy as possible, and the delta 
storage format modified to be even more compact.  Therefore no need to 
rely on any external library.  The test-delta program can be used to 
test it.

The fsck tool doesn't know about delta object and its relation with 
other objects yet.  But if one doesn't use git-mkdelta it should not be 
a problem.  Many refinements are possible but better merge them 
separately.  Loop detection and recursion limit are a few examples.

Signed-off-by: Nicolas Pitre <nico@cam.org>

--- k/delta.h
+++ l/delta.h
@@ -0,0 +1,6 @@
+extern void *diff_delta(void *from_buf, unsigned long from_size,
+			void *to_buf, unsigned long to_size,
+		        unsigned long *delta_size);
+extern void *patch_delta(void *src_buf, unsigned long src_size,
+			 void *delta_buf, unsigned long delta_size,
+			 unsigned long *dst_size);
--- k/diff-delta.c
+++ l/diff-delta.c
@@ -0,0 +1,315 @@
+/*
+ * diff-delta.c: generate a delta between two buffers
+ *
+ *  Many parts of this file have been lifted from LibXDiff version 0.10.
+ *  http://www.xmailserver.org/xdiff-lib.html
+ *
+ *  LibXDiff was written by Davide Libenzi <davidel@xmailserver.org>
+ *  Copyright (C) 2003	Davide Libenzi
+ *
+ *  Many mods for GIT usage by Nicolas Pitre <nico@cam.org>, (C) 2005.
+ *
+ *  This file is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ */
+
+#include <stdlib.h>
+#include "delta.h"
+
+
+/* block size: min = 16, max = 64k, power of 2 */
+#define BLK_SIZE 16
+
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+#define GR_PRIME 0x9e370001
+#define HASH(v, b) (((unsigned int)(v) * GR_PRIME) >> (32 - (b)))
+	
+/* largest prime smaller than 65536 */
+#define BASE 65521
+
+/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
+#define NMAX 5552
+
+#define DO1(buf, i)  { s1 += buf[i]; s2 += s1; }
+#define DO2(buf, i)  DO1(buf, i); DO1(buf, i + 1);
+#define DO4(buf, i)  DO2(buf, i); DO2(buf, i + 2);
+#define DO8(buf, i)  DO4(buf, i); DO4(buf, i + 4);
+#define DO16(buf)    DO8(buf, 0); DO8(buf, 8);
+
+static unsigned int adler32(unsigned int adler, const unsigned char *buf, int len)
+{
+	int k;
+	unsigned int s1 = adler & 0xffff;
+	unsigned int s2 = adler >> 16;
+
+	while (len > 0) {
+		k = MIN(len, NMAX);
+		len -= k;
+		while (k >= 16) {
+			DO16(buf);
+			buf += 16;
+			k -= 16;
+		}
+		if (k != 0)
+			do {
+				s1 += *buf++;
+				s2 += s1;
+			} while (--k);
+		s1 %= BASE;
+		s2 %= BASE;
+	}
+
+	return (s2 << 16) | s1;
+}
+
+static unsigned int hashbits(unsigned int size)
+{
+	unsigned int val = 1, bits = 0;
+	while (val < size && bits < 32) {
+		val <<= 1;
+	       	bits++;
+	}
+	return bits ? bits: 1;
+}
+
+typedef struct s_chanode {
+	struct s_chanode *next;
+	int icurr;
+} chanode_t;
+
+typedef struct s_chastore {
+	chanode_t *head, *tail;
+	int isize, nsize;
+	chanode_t *ancur;
+	chanode_t *sncur;
+	int scurr;
+} chastore_t;
+
+static void cha_init(chastore_t *cha, int isize, int icount)
+{
+	cha->head = cha->tail = NULL;
+	cha->isize = isize;
+	cha->nsize = icount * isize;
+	cha->ancur = cha->sncur = NULL;
+	cha->scurr = 0;
+}
+
+static void *cha_alloc(chastore_t *cha)
+{
+	chanode_t *ancur;
+	void *data;
+
+	ancur = cha->ancur;
+	if (!ancur || ancur->icurr == cha->nsize) {
+		ancur = malloc(sizeof(chanode_t) + cha->nsize);
+		if (!ancur)
+			return NULL;
+		ancur->icurr = 0;
+		ancur->next = NULL;
+		if (cha->tail)
+			cha->tail->next = ancur;
+		if (!cha->head)
+			cha->head = ancur;
+		cha->tail = ancur;
+		cha->ancur = ancur;
+	}
+
+	data = (void *)ancur + sizeof(chanode_t) + ancur->icurr;
+	ancur->icurr += cha->isize;
+	return data;
+}
+
+static void cha_free(chastore_t *cha)
+{
+	chanode_t *cur = cha->head;
+	while (cur) {
+		chanode_t *tmp = cur;
+		cur = cur->next;
+		free(tmp);
+	}
+}
+
+typedef struct s_bdrecord {
+	struct s_bdrecord *next;
+	unsigned int fp;
+	const unsigned char *ptr;
+} bdrecord_t;
+
+typedef struct s_bdfile {
+	const unsigned char *data, *top;
+	chastore_t cha;
+	unsigned int fphbits;
+	bdrecord_t **fphash;
+} bdfile_t;
+
+static int delta_prepare(const unsigned char *buf, int bufsize, bdfile_t *bdf)
+{
+	unsigned int fphbits;
+	int i, hsize;
+	const unsigned char *base, *data, *top;
+	bdrecord_t *brec;
+	bdrecord_t **fphash;
+
+	fphbits = hashbits(bufsize / BLK_SIZE + 1);
+	hsize = 1 << fphbits;
+	fphash = malloc(hsize * sizeof(bdrecord_t *));
+	if (!fphash)
+		return -1;
+	for (i = 0; i < hsize; i++)
+		fphash[i] = NULL;
+	cha_init(&bdf->cha, sizeof(bdrecord_t), hsize / 4 + 1);
+
+	bdf->data = data = base = buf;
+	bdf->top = top = buf + bufsize;
+	data += (bufsize / BLK_SIZE) * BLK_SIZE;
+	if (data == top)
+		data -= BLK_SIZE;
+
+	for ( ; data >= base; data -= BLK_SIZE) {
+		brec = cha_alloc(&bdf->cha);
+		if (!brec) {
+			cha_free(&bdf->cha);
+			free(fphash);
+			return -1;
+		}
+		brec->fp = adler32(0, data, MIN(BLK_SIZE, top - data));
+		brec->ptr = data;
+		i = HASH(brec->fp, fphbits);
+		brec->next = fphash[i];
+		fphash[i] = brec;
+	}
+
+	bdf->fphbits = fphbits;
+	bdf->fphash = fphash;
+
+	return 0;
+}
+
+static void delta_cleanup(bdfile_t *bdf)
+{
+	free(bdf->fphash);
+	cha_free(&bdf->cha);
+}
+
+#define COPYOP_SIZE(o, s) \
+    (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
+     !!(s & 0xff) + !!(s & 0xff00) + 1)
+
+void *diff_delta(void *from_buf, unsigned long from_size,
+		 void *to_buf, unsigned long to_size,
+		 unsigned long *delta_size)
+{
+	int i, outpos, outsize, inscnt, csize, msize, moff;
+	unsigned int fp;
+	const unsigned char *data, *top, *ptr1, *ptr2;
+	unsigned char *out, *orig;
+	bdrecord_t *brec;
+	bdfile_t bdf;
+
+	if (delta_prepare(from_buf, from_size, &bdf))
+		return NULL;
+	
+	outpos = 0;
+	outsize = 4096;
+	out = malloc(outsize);
+	if (!out) {
+		delta_cleanup(&bdf);
+		return NULL;
+	}
+
+	data = to_buf;
+	top = to_buf + to_size;
+
+	out[outpos++] = from_size; from_size >>= 8;
+	out[outpos++] = from_size; from_size >>= 8;
+	out[outpos++] = from_size; from_size >>= 8;
+	out[outpos++] = from_size;
+	out[outpos++] = to_size; to_size >>= 8;
+	out[outpos++] = to_size; to_size >>= 8;
+	out[outpos++] = to_size; to_size >>= 8;
+	out[outpos++] = to_size;
+
+	inscnt = 0;
+	moff = 0;
+	while (data < top) {
+		msize = 0;
+		fp = adler32(0, data, MIN(top - data, BLK_SIZE));
+		i = HASH(fp, bdf.fphbits);
+		for (brec = bdf.fphash[i]; brec; brec = brec->next) {
+			if (brec->fp == fp) {
+				csize = bdf.top - brec->ptr;
+				if (csize > top - data)
+					csize = top - data;
+				for (ptr1 = brec->ptr, ptr2 = data; 
+				     csize && *ptr1 == *ptr2;
+				     csize--, ptr1++, ptr2++);
+
+				csize = ptr1 - brec->ptr;
+				if (csize > msize) {
+					moff = brec->ptr - bdf.data;
+					msize = csize;
+					if (msize >= 0x10000) {
+						msize = 0x10000;
+						break;
+					}
+				}
+			}
+		}
+
+		if (!msize || msize < COPYOP_SIZE(moff, msize)) {
+			if (!inscnt)
+				outpos++;
+			out[outpos++] = *data++;
+			inscnt++;
+			if (inscnt == 0x7f) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+		} else {
+			if (inscnt) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+
+			data += msize;
+			orig = out + outpos++;
+			i = 0x80;
+
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x01; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x02; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x04; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x08; }
+
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x10; }
+			msize >>= 8;
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x20; }
+
+			*orig = i;
+		}
+
+		/* next time around the largest possible output is 1 + 4 + 3 */
+		if (outpos > outsize - 8) {
+			void *tmp = out;
+			outsize = outsize * 3 / 2;
+			out = realloc(out, outsize);
+			if (!out) {
+				free(tmp);
+				delta_cleanup(&bdf);
+				return NULL;
+			}
+		}
+	}
+
+	if (inscnt)
+		out[-inscnt - 1] = inscnt;
+
+	delta_cleanup(&bdf);
+	*delta_size = outpos;
+	return out;
+}
--- k/mkdelta.c
+++ l/mkdelta.c
@@ -0,0 +1,95 @@
+#include "cache.h"
+#include "delta.h"
+
+static int write_delta_file(char *buf, unsigned long len, unsigned char *sha1_ref, unsigned char *path)
+{
+	int size;
+	char *compressed;
+	z_stream stream;
+	char hdr[50];
+	int fd, hdrlen;
+
+	/* Generate the header */
+	hdrlen = sprintf(hdr, "delta %lu", len+20)+1;
+	memcpy(hdr + hdrlen, sha1_ref, 20);
+	hdrlen += 20;
+
+	fd = open(path, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	if (fd < 0)
+		return -1;
+
+	/* Set it up */
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_BEST_COMPRESSION);
+	size = deflateBound(&stream, len+hdrlen);
+	compressed = xmalloc(size);
+
+	/* Compress it */
+	stream.next_out = compressed;
+	stream.avail_out = size;
+
+	/* First header.. */
+	stream.next_in = hdr;
+	stream.avail_in = hdrlen;
+	while (deflate(&stream, 0) == Z_OK)
+		/* nothing */
+
+	/* Then the data itself.. */
+	stream.next_in = buf;
+	stream.avail_in = len;
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+		/* nothing */;
+	deflateEnd(&stream);
+	size = stream.total_out;
+
+	if (write(fd, compressed, size) != size)
+		die("unable to write file");
+	close(fd);
+		
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	unsigned char sha1_ref[20], sha1_trg[20];
+	char type_ref[20], type_trg[20];
+	void *buf_ref, *buf_trg, *buf_delta;
+	unsigned long size_ref, size_trg, size_delta;
+	char *filename, tmpname[100];
+
+	if (argc != 3 || get_sha1(argv[1], sha1_ref) || get_sha1(argv[2], sha1_trg))
+		usage("git-mkdelta <reference_sha1> <target_sha1>");
+
+	buf_ref = read_sha1_file(sha1_ref, type_ref, &size_ref);
+	if (!buf_ref) {
+		fprintf(stderr, "%s: unable to read reference object\n", argv[0]);
+		exit(1);
+	}
+	buf_trg = read_sha1_file(sha1_trg, type_trg, &size_trg);
+	if (!buf_trg) {
+		fprintf(stderr, "%s: unable to read target object\n", argv[0]);
+		exit(1);
+	}
+	if (strcmp(type_ref, type_trg)) {
+		fprintf(stderr, "%s: reference and target are of different type\n", argv[0]);
+		exit(2);
+	}
+	buf_delta = diff_delta(buf_ref, size_ref, buf_trg, size_trg, &size_delta);
+	if (!buf_delta) {
+		fprintf(stderr, "%s: unable to create delta\n", argv[0]);
+		exit(3);
+	}
+
+	filename = sha1_file_name(sha1_trg);
+	sprintf(tmpname, "%s.delta.tmp", filename);
+	if (write_delta_file(buf_delta, size_delta, sha1_ref, tmpname)) {
+		perror(tmpname);
+		exit(1);
+	}
+	if (rename(tmpname, filename)) {
+		perror("rename");
+		exit(1);
+	}
+
+	return 0;
+}
--- k/patch-delta.c
+++ l/patch-delta.c
@@ -0,0 +1,73 @@
+/*
+ * patch-delta.c:
+ * recreate a buffer from a source and the delta produced by diff-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include "delta.h"
+
+void *patch_delta(void *src_buf, unsigned long src_size,
+		  void *delta_buf, unsigned long delta_size,
+		  unsigned long *dst_size)
+{
+	const unsigned char *data, *top;
+	unsigned char *dst, *out;
+	int size;
+
+	/* the smallest delta size possible is 10 bytes */
+	if (delta_size < 10)
+		return NULL;
+
+	data = delta_buf;
+	top = delta_buf + delta_size;
+
+	/* make sure the orig file size matches what we expect */
+	size = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
+	data += 4;
+	if (size != src_size)
+		return NULL;
+
+	/* now the result size */
+	size = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
+	data += 4;
+	dst = malloc(size);
+	if (!dst)
+		return NULL;
+
+	out = dst;
+	while (data < top) {
+		unsigned char cmd = *data++;
+		if (cmd & 0x80) {
+			unsigned int cp_off = 0, cp_size = 0;
+			if (cmd & 0x01) cp_off = *data++;
+			if (cmd & 0x02) cp_off |= (*data++ << 8);
+			if (cmd & 0x04) cp_off |= (*data++ << 16);
+			if (cmd & 0x08) cp_off |= (*data++ << 24);
+			if (cmd & 0x10) cp_size = *data++;
+			if (cmd & 0x20) cp_size |= (*data++ << 8);
+			if (cp_size == 0) cp_size = 0x10000;
+			memcpy(out, src_buf + cp_off, cp_size);
+			out += cp_size;
+		} else {
+			memcpy(out, data, cmd);
+			out += cmd;
+			data += cmd;
+		}
+	}
+
+	/* sanity check */
+	if (data != top || out - dst != size) {
+		free(dst);
+		return NULL;
+	}
+
+	*dst_size = size;
+	return dst;
+}
Binary files k/test-delta and l/test-delta differ
--- k/test-delta.c
+++ l/test-delta.c
@@ -0,0 +1,79 @@
+/*
+ * test-delta.c: test code to exercise diff-delta.c and patch-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include "delta.h"
+
+static const char *usage =
+	"test-delta (-d|-p) <from_file> <data_file> <out_file>";
+
+int main(int argc, char *argv[])
+{
+	int fd;
+	struct stat st;
+	void *from_buf, *data_buf, *out_buf;
+	unsigned long from_size, data_size, out_size;
+
+	if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) {
+		fprintf(stderr, "Usage: %s\n", usage);
+		return 1;
+	}
+
+	fd = open(argv[2], O_RDONLY);
+	if (fd < 0 || fstat(fd, &st)) {
+		perror(argv[2]);
+		return 1;
+	}
+	from_size = st.st_size;
+	from_buf = mmap(NULL, from_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (from_buf == MAP_FAILED) {
+		perror(argv[2]);
+		return 1;
+	}
+	close(fd);
+
+	fd = open(argv[3], O_RDONLY);
+	if (fd < 0 || fstat(fd, &st)) {
+		perror(argv[3]);
+		return 1;
+	}
+	data_size = st.st_size;
+	data_buf = mmap(NULL, data_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (data_buf == MAP_FAILED) {
+		perror(argv[3]);
+		return 1;
+	}
+	close(fd);
+
+	if (argv[1][1] == 'd')
+		out_buf = diff_delta(from_buf, from_size,
+				     data_buf, data_size, &out_size);
+	else
+		out_buf = patch_delta(from_buf, from_size,
+				      data_buf, data_size, &out_size);
+	if (!out_buf) {
+		fprintf(stderr, "delta operation failed (returned NULL)\n");
+		return 1;
+	}
+
+	fd = open (argv[4], O_WRONLY|O_CREAT|O_TRUNC, 0666);
+	if (fd < 0 || write(fd, out_buf, out_size) != out_size) {
+		perror(argv[4]);
+		return 1;
+	}
+
+	return 0;
+}
--- k/Makefile
+++ l/Makefile
@@ -21,7 +21,7 @@ PROG=   git-update-cache git-diff-files 
 	git-check-files git-ls-tree git-merge-base git-merge-cache \
 	git-unpack-file git-export git-diff-cache git-convert-cache \
 	git-http-pull git-rpush git-rpull git-rev-list git-mktag \
-	git-diff-tree-helper git-tar-tree git-local-pull
+	git-diff-tree-helper git-tar-tree git-local-pull git-mkdelta
 
 all: $(PROG)
 
@@ -29,7 +29,7 @@ install: $(PROG) $(SCRIPTS)
 	install $(PROG) $(SCRIPTS) $(HOME)/bin/
 
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
-	 tag.o date.o
+	 tag.o date.o diff-delta.o patch-delta.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h blob.h tree.h commit.h tag.h
 
@@ -63,6 +63,9 @@ $(LIB_FILE): $(LIB_OBJS)
 test-date: test-date.c date.o
 	$(CC) $(CFLAGS) -o $@ test-date.c date.o
 
+test-delta: test-delta.c diff-delta.o patch-delta.o
+	$(CC) $(CFLAGS) -o $@ $^
+
 git-%: %.c $(LIB_FILE)
 	$(CC) $(CFLAGS) -o $@ $(filter %.c,$^) $(LIBS)
 
@@ -92,6 +95,7 @@ git-rpush: rsh.c
 git-rpull: rsh.c pull.c
 git-rev-list: rev-list.c
 git-mktag: mktag.c
+git-mkdelta: mkdelta.c
 git-diff-tree-helper: diff-tree-helper.c
 git-tar-tree: tar-tree.c
 
--- k/sha1_file.c
+++ l/sha1_file.c
@@ -8,6 +8,7 @@
  */
 #include <stdarg.h>
 #include "cache.h"
+#include "delta.h"
 
 const char *sha1_file_directory = NULL;
 
@@ -186,7 +187,8 @@ void * unpack_sha1_file(void *map, unsig
 	int ret, bytes;
 	z_stream stream;
 	char buffer[8192];
-	char *buf;
+	char *buf, *delta_ref;
+	unsigned long delta_ref_sz;
 
 	/* Get the data stream */
 	memset(&stream, 0, sizeof(stream));
@@ -201,8 +203,15 @@ void * unpack_sha1_file(void *map, unsig
 		return NULL;
 	if (sscanf(buffer, "%10s %lu", type, size) != 2)
 		return NULL;
-
 	bytes = strlen(buffer) + 1;
+
+	if (!strcmp(type, "delta")) {
+		delta_ref = read_sha1_file(buffer + bytes, type, &delta_ref_sz);
+		if (!delta_ref)
+			return NULL;
+	} else
+		delta_ref = NULL;
+
 	buf = xmalloc(*size);
 
 	memcpy(buf, buffer + bytes, stream.total_out - bytes);
@@ -214,6 +223,17 @@ void * unpack_sha1_file(void *map, unsig
 			/* nothing */;
 	}
 	inflateEnd(&stream);
+
+	if (delta_ref) {
+		char *newbuf;
+		unsigned long newsize;
+		newbuf = patch_delta(delta_ref, delta_ref_sz, buf+20, *size-20, &newsize);
+		free(delta_ref);
+		free(buf);
+		buf = newbuf;
+		*size = newsize;
+	}
+
 	return buf;
 }
 

^ permalink raw reply

* [PATCH] fsck-cache segfaults on a tag referring to a missing object.
From: Junio C Hamano @ 2005-05-03  8:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

I do not understand the comment about ignoring tag reachability
in check_connectivity(), but fsck_tag() fails to notice that a
parsing of tag may have failed in the parse_object() call in
fsck_sha1() before it is called, in which case it can get a tag
object with NULL in the tagged field and segfault.  Here is a
patch to fix this.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

P.S. Since this is probably more urgent than other fixes I've
been bugging you about, I am sending this via e-mail, not as a
GIT pull request, but I have a couple more update there, along
with updated HEAD.

--- a/fsck-cache.c
+++ b/fsck-cache.c
@@ -136,6 +136,12 @@ static int fsck_tag(struct tag *tag)
 	if (!show_tags)
 		return 0;

+	if (!tag->tagged) {
+		printf("bad referenced object in tag %s\n",
+		       sha1_to_hex(tag->object.sha1));
+		return 0;
+	}
+
 	printf("tagged %s %s",
 	       tag->tagged->type,
 	       sha1_to_hex(tag->tagged->sha1));

----------------------------------------------------------------

^ permalink raw reply

* gitweb on kernel.org lies?
From: David Woodhouse @ 2005-05-03  8:40 UTC (permalink / raw)
  To: git

http://www.kernel.org/git/gitweb.cgi?p=linux%2Fkernel%2Fgit%2Fdwmw2%2Faudit-2.6.git;a=log
doesn't seem to show the commits which were just put there. Why?

-- 
dwmw2


^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Chris Wedgwood @ 2005-05-03  8:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Mackall, Bill Davidsen, Morten Welinder, Sean, linux-kernel,
	git
In-Reply-To: <Pine.LNX.4.58.0505022123270.3594@ppc970.osdl.org>

On Mon, May 02, 2005 at 09:24:54PM -0700, Linus Torvalds wrote:

> We had been talking about pruning the BK history as long back as a
> year ago.

Was that the history or all the deletes/renames that were painful
though?

^ permalink raw reply

* Re: cogito "origin" vs. HEAD
From: Alexey Nezhdanov @ 2005-05-03  9:06 UTC (permalink / raw)
  To: git; +Cc: Benjamin Herrenschmidt
In-Reply-To: <1115104408.6156.100.camel@gaston>

At Tuesday, 03 May 2005 11:13 Benjamin Herrenschmidt wrote:
> > when accessing the remote repository, Cogito always looks for remote
> > refs/heads/master first - if that one isn't there, it takes HEAD, but
> > there is no correlation between the local and remote branch name. If you
> > want to fetch a different branch from the remote repository, use the
> > fragment identifier (see cg-help cg-branch-add).
>
> Ok, that I'm getting. So then, what happen of my local
> refs/heads/<branchname> and refs/heads/master/ ? I'm still a bit
> confused by the whole branch mecanism... It's my understanding than when
> I cg-init, it creates both "master" (a head without matching branch)
> and "origin" (a branch  + a head) both having the same sha1. It also
> checks out the tree.
>
> Now, when I cg-update origin, what happens exactly ? I mean, I know it's
> pulls all objects, then get the master from the remote pointed by the
> origin branch, but then, I suppose it updates both my local "origin" and
> my local "master" pointer, right ? I mean, they are always in sync ? Or
> is this related to what branch my current checkout is tracking ?
If I understand this mechanics correctly then "master head" always track your 
local tree (i.e. with all remote and local patches applied) and "origin head" 
always tracking head of the remote branch from where you are getting objects.

I.e. it is really a tree, not source of objects. The tree can be strored on 
many different hosts but it is the same across them. But the master tree have 
no source to sync from - you are creating it yourself locally so there is no 
"master branch" - only head.

So if you are just tracking some other tree and do not do any merges/patches 
yourself then your master head will always match your remote source head 
("origin" in most cases).

-- 
Respectfully
Alexey Nezhdanov


^ permalink raw reply

* Re: cogito "origin" vs. HEAD
From: Petr Baudis @ 2005-05-03  9:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Git Mailing List
In-Reply-To: <1115104408.6156.100.camel@gaston>

Dear diary, on Tue, May 03, 2005 at 09:13:28AM CEST, I got a letter
where Benjamin Herrenschmidt <benh@kernel.crashing.org> told me that...
> > when accessing the remote repository, Cogito always looks for remote
> > refs/heads/master first - if that one isn't there, it takes HEAD, but
> > there is no correlation between the local and remote branch name. If you
> > want to fetch a different branch from the remote repository, use the
> > fragment identifier (see cg-help cg-branch-add).
> 
> Ok, that I'm getting. So then, what happen of my local
> refs/heads/<branchname> and refs/heads/master/ ? I'm still a bit
> confused by the whole branch mecanism... It's my understanding than when
> I cg-init, it creates both "master" (a head without matching branch)
> and "origin" (a branch  + a head) both having the same sha1. It also
> checks out the tree.
> 
> Now, when I cg-update origin, what happens exactly ? I mean, I know it's
> pulls all objects, then get the master from the remote pointed by the
> origin branch, but then, I suppose it updates both my local "origin" and
> my local "master" pointer, right ? I mean, they are always in sync ? Or
> is this related to what branch my current checkout is tracking ?

They are in sync as long as you update only from that given branch.
At the moment you do a local commit, they get out of sync, at least
until your master branch is merged to the origin branch on the other
side. Every cg-update will then generate a merging commit, so it will
look like this:

     [origin]    [master]
            commit1
              |
            commit2               Both heads are in sync so far...
              |
            commit3
             /    \
            /     commit4         Now heads/master is commit4, but
           /        |             heads/origin is still commit3
          /         |
      commit5-.     |             heads/master:commit4, heads/origin:commit5
          |    \    |
          |     `-commit6         commit6 merges origin to master
          |       /
          |     /
          |   /
      commit6                     origin merged your master; since it
                                  contained all the commits on the origin
               |                  branch, it just took over the commit6
             commit6              commit pointer as its new head; so both
                                  heads are again in sync now

This is the reason why there are always at least two branches, origin
and master. The checked out tree is always of the master branch (unless
you do cg-seek, which is somewhat special anyway). [*] "Normally", when
you do no local changes and just always cg-update the origin branch, the
two branches are always in sync. At the point you start to "mix" several
remote branches besides origin in your tree, or at the point you do a
local commit, the master branch gets standalone - until the origin
merges your changes as drawn in the diagram.

There is one other situation when the head pointers may not be in sync -
when you do cg-pull instead of cg-update. You want to see what are the
changes in the origin branch, but you are not sure if you want them to
appear in your master branch, you do cg-pull origin. Your origin head
pointer is updated, but your master pointer stays where it is. If you
decide it's ok to bring the changes in, you do either cg-update, or only
cg-merge to avoid re-pulling.

[*] Technically, you can have multiple local branches and your tree can
be based on any of them, not only "master". Cogito supports that
internally, but (deliberately) provides no UI to set that up, at least
until we devise a way to do it without confusing people even more.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox