Git development

Git development
 help / color / mirror / Atom feed

* [PATCH] Alter git-rebase command line options.
From: sean @ 2006-04-26 11:51 UTC (permalink / raw)
  To: git


  git rebase [--branch <branch>] <newbase>
  git rebase --continue
  git rebase --abort

Add "--continue" to restart the rebase process after
manually resolving conflicts.  The user is warned if
there are still differences between the index and the
working files.

Add "--abort" to restore the original branch, and
remove the .dotest working files.

Change the order that branch and newbase are specified
as per comments from Linus.  Also remove the need to
specify both an upstream branch _and_ a new merge base.

The documentation is updated to reflect this new command
line format but the script still quietly supports the
existing command line options completely.

This fixes a minor bug in the current version where:
"git rebase master^ master" doesn't notice that there
is no need to perform the rebase.

---

 Documentation/git-rebase.txt |   95 ++++++++++++++++++++++--------
 git-rebase.sh                |  133 ++++++++++++++++++++++--------------------
 2 files changed, 139 insertions(+), 89 deletions(-)

d8366d9de1aecf3143af646f49e7f7bc0f924ae6
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 4a7e67a..f1e83ea 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -3,76 +3,121 @@ git-rebase(1)
 
 NAME
 ----
-git-rebase - Rebase local commits to new upstream head
+git-rebase - Rebase local commits to a new upstream head
 
 SYNOPSIS
 --------
-'git-rebase' [--onto <newbase>] <upstream> [<branch>]
+'git-rebase' [--branch <branch>] <newbase>
+
+'git-rebase' --continue
+
+'git-rebase' --abort
 
 DESCRIPTION
 -----------
-git-rebase applies to <upstream> (or optionally to <newbase>) commits
-from <branch> that do not appear in <upstream>. When <branch> is not
-specified it defaults to the current branch (HEAD).
+git-rebase replaces <branch> with a new branch of the same name having
+a HEAD of <newbase>.  It then attempts to make a new commit for each
+commit from the original <branch> that does not yet exist in this new
+<branch>.
+
+It is possible that a merge failure will prevent this process from being
+completely automatic.  You will have to resolve any such merge failure
+and run `git rebase --continue`.  If you can not resolve the merge
+failure, running `git rebase --abort` will restore the original <branch>
+and remove the working files found in the .dotest directory.
 
-When git-rebase is complete, <branch> will be updated to point to the
-newly created line of commit objects, so the previous line will not be
-accessible unless there are other references to it already.
+Note that if <branch> is not specified on the command line, the currently
+checked out branch is used.
 
 Assume the following history exists and the current branch is "topic":
 
+------------
           A---B---C topic
          /
     D---E---F---G master
+------------
+
+From this point, the result of running the following command:
+
 
-From this point, the result of either of the following commands:
+    git rebase --branch topic master
 
-    git-rebase master
-    git-rebase master topic
 
 would be:
 
+------------
                   A'--B'--C' topic
                  /
     D---E---F---G master
+------------
 
 While, starting from the same point, the result of either of the following
 commands:
 
-    git-rebase --onto master~1 master
-    git-rebase --onto master~1 master topic
+    git rebase master~1
+    git rebase --branch topic master~1
+
 
 would be:
 
+------------
               A'--B'--C' topic
              /
     D---E---F---G master
+------------
 
 In case of conflict, git-rebase will stop at the first problematic commit
-and leave conflict markers in the tree.  After resolving the conflict manually
-and updating the index with the desired resolution, you can continue the
-rebasing process with
+and leave conflict markers in the tree.  You can use git diff to locate
+the markers (<<<<<<) and make edits to resolve the conflict.  For each
+file you edit, you need to tell git that the conflict has been resolved,
+typically this would be done with
+
+
+    git update-index <filename>
+
+
+After resolving the conflict manually and updating the index with the
+desired resolution, you can continue the rebasing process with
+
+
+    git rebase --continue
 
-    git am --resolved --3way
 
 Alternatively, you can undo the git-rebase with
 
-    git reset --hard ORIG_HEAD
-    rm -r .dotest
+
+    git rebase --abort
 
 OPTIONS
 -------
 <newbase>::
-	Starting point at which to create the new commits. If the
-	--onto option is not specified, the starting point is
-	<upstream>.
-
-<upstream>::
-	Upstream branch to compare against.
+	Starting point at which to create the new commits.
 
 <branch>::
 	Working branch; defaults to HEAD.
 
+--continue::
+	Restart the rebasing process after having resolved a merge conflict.
+
+--abort::
+	Restore the original branch and abort the rebase operation.
+
+NOTES
+-----
+When you rebase a branch, you are changing its history in a way that
+will cause problems for anyone who already has a copy of the branch
+in their repository and tries to pull updates from you.  You should
+understand the implications of using 'git rebase' on a repository that
+you share.
+
+When the git rebase command is run, it will first execute a "pre-rebase"
+hook if one exists.  You can use this hook to do sanity checks and
+reject the rebase if it isn't appropriate.  Please see the template
+pre-rebase hook script for an example.
+
+You must be in the top directory of your project to start (or continue)
+a rebase.  Upon completion, <branch> will be the current branch.
+
 Author
 ------
 Written by Junio C Hamano <junkio@cox.net>
diff --git a/git-rebase.sh b/git-rebase.sh
index 86dfe9c..5a4e33b 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -3,40 +3,61 @@ #
 # Copyright (c) 2005 Junio C Hamano.
 #
 
-USAGE='[--onto <newbase>] <upstream> [<branch>]'
-LONG_USAGE='git-rebase applies to <upstream> (or optionally to <newbase>) commits
-from <branch> that do not appear in <upstream>. When <branch> is not
-specified it defaults to the current branch (HEAD).
-
-When git-rebase is complete, <branch> will be updated to point to the
-newly created line of commit objects, so the previous line will not be
-accessible unless there are other references to it already.
-
-Assuming the following history:
-
-          A---B---C topic
-         /
-    D---E---F---G master
-
-The result of the following command:
-
-    git-rebase --onto master~1 master topic
-
-  would be:
-
-              A'\''--B'\''--C'\'' topic
-             /
-    D---E---F---G master
+USAGE='[--branch <branch>] <newbase>'
+LONG_USAGE='git-rebase replaces <branch> with a new one of the
+same name having a HEAD of <newbase>.  It then attempts to create
+a new commit for each commit from the original <branch> that does
+not yet exist on this new <branch>.
+
+It is possible that a merge failure will prevent this process
+from being completely automatic.  You will have to resolve any
+such merge failure and run git-rebase --continue.  If you can
+not resolve the merge failure, running git-rebase --abort will
+restore the original <branch> and remove the working files found
+in the .dotest directory.
+
+Note that if <branch> is not specified on the command line, the
+currently checked out branch is used.  You must be in the top
+directory of your project to start (or continue) a rebase.
+
+Example:       git-rebase --branch topic master~1
+
+        A---B---C topic                   A'\''--B'\''--C'\'' topic
+       /                   -->           /
+  D---E---F---G master          D---E---F---G master
 '
 
 . git-sh-setup
 
 unset newbase
+unset branch_name
 while case "$#" in 0) break ;; esac
 do
 	case "$1" in
+	--continue)
+		diff=$(git-diff-files)
+		case "$diff" in
+		?*)	echo "You must edit all merge conflicts and then"
+			echo "mark them as resolved using git update-index"
+			exit 1
+			;;
+		esac
+		git am --resolved --3way
+		exit
+		;;
+	--abort)
+		[ -d .dotest ] || die "No rebase in progress?"
+		git reset --hard ORIG_HEAD
+		rm -r .dotest
+		exit
+		;;
+	--branch)
+		test $# -ne 3 -o -n "$newbase" && usage
+		branch_name="$2"
+		shift
+		;;
 	--onto)
-		test 2 -le "$#" || usage
+		test $# -lt 2 -o -n "$branch_name" && usage
 		newbase="$2"
 		shift
 		;;
@@ -49,6 +70,20 @@ do
 	esac
 	shift
 done
+# Quietly support the historic command line [--onto newbase] newbase' [branch]
+test $# -lt 1 && usage
+test -z "$newbase" && newbase="$1"
+shift
+if [ -z "$branch_name" ]; then
+	if [ $# -gt 0 ]; then
+		branch_name="$1"
+		shift
+	else	branch_name=`git symbolic-ref HEAD` || die "No current branch"
+		branch_name=`expr "z$branch_name" : 'zrefs/heads/\(.*\)'`
+	fi
+fi
+test $# -gt 0 && usage
+git checkout "$branch_name" || usage
 
 # Make sure we do not have .dotest
 if mkdir .dotest
@@ -72,11 +107,6 @@ case "$diff" in
 	;;
 esac
 
-# The upstream head must be given.  Make sure it is valid.
-upstream_name="$1"
-upstream=`git rev-parse --verify "${upstream_name}^0"` ||
-    die "invalid upstream $upstream_name"
-
 # If a hook exists, give it a chance to interrupt
 if test -x "$GIT_DIR/hooks/pre-rebase"
 then
@@ -86,47 +116,22 @@ then
 	}
 fi
 
-# If the branch to rebase is given, first switch to it.
-case "$#" in
-2)
-	branch_name="$2"
-	git-checkout "$2" || usage
-	;;
-*)
-	branch_name=`git symbolic-ref HEAD` || die "No current branch"
-	branch_name=`expr "z$branch_name" : 'zrefs/heads/\(.*\)'`
-	;;
-esac
-branch=$(git-rev-parse --verify "${branch_name}^0") || exit
-
 # Make sure the branch to rebase onto is valid.
-onto_name=${newbase-"$upstream_name"}
-onto=$(git-rev-parse --verify "${onto_name}^0") || exit
-
-# Now we are rebasing commits $upstream..$branch on top of $onto
+branch=$(git-rev-parse --verify "${branch_name}^0") || exit
+onto=$(git-rev-parse --verify "${newbase}^0") || exit
 
 # Check if we are already based on $onto, but this should be
 # done only when upstream and onto are the same.
-if test "$upstream" = "onto"
-then
-	mb=$(git-merge-base "$onto" "$branch")
-	if test "$mb" = "$onto"
-	then
-		echo >&2 "Current branch $branch_name is up to date."
-		exit 0
-	fi
-fi
-
-# Rewind the head to "$onto"; this saves our current head in ORIG_HEAD.
-git-reset --hard "$onto"
-
-# If the $onto is a proper descendant of the tip of the branch, then
-# we just fast forwarded.
+mb=$(git-merge-base "$onto" "$branch")
 if test "$mb" = "$onto"
 then
-	echo >&2 "Fast-forwarded $branch to $newbase."
+	echo >&2 "Current branch $branch_name already has $newbase as a base!"
 	exit 0
 fi
 
-git-format-patch -k --stdout --full-index "$upstream" ORIG_HEAD |
+# Rewind the head to "$onto"; this saves our current head in ORIG_HEAD.
+git-reset --hard "$newbase"
+
+# Now we are rebasing commits $newbase..$branch on top of $newbase
+git-format-patch -k --stdout --full-index "$newbase" ORIG_HEAD |
 git am --binary -3 -k
-- 
1.3.0.gd8366

^ permalink raw reply related

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Andreas Ericsson @ 2006-04-26 11:25 UTC (permalink / raw)
  To: sean; +Cc: Linus Torvalds, junkio, git, jnareb
In-Reply-To: <BAYC1-PASMTP086A906CFB378AB229C2D8AEBF0@CEZ.ICE>

sean wrote:
> On Tue, 25 Apr 2006 08:40:25 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
> 
> 
>>On Tue, 25 Apr 2006, Linus Torvalds wrote:
>>
>>>I want the git objects to have clear and unambiguous semantics. I want 
>>>people to be able to explain exactly what the fields _mean_. No "this 
>>>random field could be used this random way" crud, please.
>>
>>Btw, if the whole point is a "leave random porcelain a field that they can 
>>use any way they want", then I say "Hell NO!".
>>
>>Random porcelain can already just maintain their own lists of "related" 
>>stuff, any way they want: you can keep it in a file in ".git/porcelain", 
>>called "list-commit-relationships", or you could use a git blob for it and 
>>have a reference to it in .git/refs/porcelain/relationships or whatever. 
>>
>>If it has no clear and real semantic meaning for core git, then it 
>>shouldn't be in the core git objects.
>>
>>The absolute last thing we want is a "random out" that starts to mean 
>>different things to different people, groups and porcelains.
>>
>>That's just crazy, and it's how you end up with a backwards compatibility 
>>mess five years from now that is totally unresolvable, because different 
>>projects end up having different meanings or uses for the fields, so 
>>converting the database (if we ever find a better format, or somebody 
>>notices that SHA1 can be broken by a five-year-old-with-a-crayon).
>>
>>There's a reason "minimalist" actually ends up _working_. I'll take a UNIX 
>>"system calls have meanings" approach over a Windows "there's fifteen 
>>different flavors of 'open()', and we also support magic filenames with 
>>specific meaning" kind of thing.
>>
> 
> 
> It's a fair point.  But adding a separate database to augment the core 
> information has some downsides.  That is, that information isn't pulled, 
> cloned, or pushed automatically; it doesn't get to ride for free on top 
> of the core.
> 
> Accommodating extra git headers (or "note"'s in Junio's example) would allow
> a developer to record the fact that he is integrating a patch taken 
> from a commit in the devel branch and backporting it to the release 
> branch.   Either by adding a note that references the bug tracking #, or 
> a commit sha1 from the devel branch that is already associated with the bug.
> 

This information is something I, as a human, would definitely want to 
read. What's the point of recording it in the commit-header if we're not 
going to show it to users anyway? I'm with Linus on this one. Keep 
headers as simple as possible.

> Of course that information could be embedded in the free text area, but 
> you yourself have argued vigorously that it is brain damaged to try and rely
> on parsing free form text for these types of situations.

Why would there be a need to parse it? The entire *point* of history is 
to present it to readers in an as accessible and understandable way as 
possible. Git's sha1 hashes mean absolutely nothing, so a note saying 
something was cherry-picked from commit 
"89987987ad987aef987987aff987987d" on branch "devel" will be pointless 
unless the one doing the committing states the why as well as the what 
in the commit-message anyways.

Besides, only developers will likely ever look at the commit-messages, 
and they will likely only ever do it when they are bisecting or looking 
for the implementation date of a certain feature or other.

>  Most of the potential 
> uses aren't really meant for a human to read while looking at the log anyway, 
> they just get in the way.

I still fail to see a use case for this. Could you give me some examples 
to when information recorded isn't meant for being presented to the user?

> 
> But if the information is in the actual commit header it gets to tag along
> for free with never any worry it will be separated from the commit in question.
> So when the developer above updates his official repo the bug tracker system 
> can notice that the bug referenced in its system has had a patch backported 
> and take whatever action is desired.  
> 

We already have something like this. All commits with a top-line message 
containing "bug #" followed by a number automatically updates our 
bugtracking system with the commit-message in its entirety. If the word 
before "bug #" matches "fix.*" then the status of the bug is set to that.

This might seem cumbersome to some but it's really very straightforward, 
and for a couple of reasons it's a very good solution:
1. Devs who Do It Right don't have to fiddle with their browser just to 
enter the info twice, so they learn fast. :)
2. BT history (viewed by non-devs too) gets updated with accurate 
information promptly.
3. No matter how you solve the problem you're going to need to write a 
custom commit/update hook anyway, so this is as good as having the info 
in the note.
4. The info going to the BT is easily modifiable, so if someone screws 
up they can fix it later. Fixing an already written git commit takes 
some doing if there are commits on top.

> Of course there are other ways to do this, but integrating it into git means it
> gets a free ride on the core, and it shouldn't really get in the way of core 
> any more than email X- headers get in the way of email flowing.
> 

True. I've suggested before that arbitrary headers could be added to git 
commits by prefixing them with X- (preferrably followed by an abbrev of 
the porcelain name adding the note). This way it's easy to filter, you 
get the free ride, and porcelains can do whatever they want while core 
git can strip everything following the sequence "\nX-" up to and 
including the next newline.

This way you have only one special byte-sequence with special meaning 
that the plumbing has to know it should ignore, which is a lot more 
extensible (not to mention easier to code).

In addition, if those X- lines aren't included in the sha1 computation 
they can easily be removed and added to without affecting the ancestry 
chain. This would probably have quite a performance impact though.

That said, I don't think even "X-" headers is a very good idea. Perhaps 
i've just got poor imagination but I can't think of a good use for them.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* What's in git.git
From: Junio C Hamano @ 2006-04-26 11:09 UTC (permalink / raw)
  To: git; +Cc: linux-kernel

* The 'maint' branch has fixes mentioned in the 1.3.1 
  announcement.

  As I outlined in the 1.3.1 maintenance release announcement,
  people with that release will soon be missing many
  improvements.  The following is a list of what to expect.


* In addition to the above. the 'master' branch has these since
  the last announcement,

  - git-update-index --chmod=+x now affects all the subsequent
    files (Alex Riesen).

  - git-update-index --unresolve paths...; this needs
    documentation (hint).

  - minor "diff --stat" and "show --stat" fixes.

  - Makefile dependency fixes.  This fixes the infamous
    "libgit.a still contains stale diff.o" problem.

  - contrib has colordiff that understands --cc output.

  - beginning of libified "git diff" family.

  - git-commit-tree <ent> -p <parent> now takes extended SHA1
    expression, not limited to 40-byte SHA1, for <ent> (it
    already did so for <parent>).

  - updated gitk to handle repositories with large number of
    tags and heads (Paul).


* The 'next' branch, in addition, has these.

  - internal log/show/whatchanged family (Linus and me).

  - beginning of internal format-patch.

  - Geert's similarity code in contrib/

  - cache-tree optimization to speed up git-apply + write-tree
    cycles.

    Initially I was getting close to 50% improvement, but
    re-benching suggests it is more like 16%.  An earlier
    version in 'next' used a separate .git/index.aux to record
    the cache-tree information but now it is stored as part of
    the index.  If you used previous 'next' (ha, ha) version and
    see tmp-indexXXXX.aux or next-indexXXXX.aux files left in
    your $GIT_DIR, they can safely be removed.

  - more "diff --stat" fixes.

  - git-cvsserver: typofixes.

  - diff-delta interface reorganization (Nico)

  - git-repo-config --list (Pasky)


* The 'pu' branch, in addition, has these.

  - resurrect "bind commit"; this has been done only partially.

    I have not updated the rev-list/fsck-objects yet.  Probably
    need to drop the specific "bind " line and replace it with
    "link object bind" in the commit objects before going
    forward.

  - get_sha1(): :path and :[0-3]:path to extract from index.

  - Loosening path argument check a little bit in revision.c.

    I've been meaning to do the opposite of this, the tightening
    of ambiguous case mentione by Linus, but haven't got around
    to yet (I haven't got around to too many things, hint hint).

  - reverse the pack-objects delta window logic (Nico)

    This is in theory the right thing to do, but things are not
    quite there yet.  But Nico is on top of it so we will see
    quite an improvement in the pack generation hopefully very
    soon.

^ permalink raw reply

* new gitk feature
From: Paul Mackerras @ 2006-04-26 10:59 UTC (permalink / raw)
  To: git

I just pushed some changes to gitk which add a new feature, the
ability to have multiple "views" of a repository.  Each view is a
subgraph of the full graph.  At the moment the only subgraph that you
can specify is the subgraph containing the commits that affect a
specified set of files or directories.  You can switch between views
quickly, and if the currently selected commit exists in the new view
when you switch views, it is selected in the new view.  There is one
view which always exists, the "All files" view.  If files or
directories are specified on the command line, a "Command line" view
is automatically created and selected at startup.

Thus, for the kernel repository I can have a "PPC" view which shows
changes to arch/powerpc, include/asm-powerpc etc.  When looking at a
commit in that view, I can switch to the "All files" view to see where
that commit fits in the overall history.

There is a "View" menu which contains the menu items for creating,
deleting, editing and selecting views.  If you check the "Remember
this view" box, gitk will write the definition of the view to your
~/.gitk file, and it will be automatically put in the list on startup.

I plan to add various other kinds of views, for example, a view that
shows only the commits that affect a selected file (or part of a file,
perhaps), and a view that shows just the current commit together with
all the commits that have tags.  (The latter will require some help
from git-rev-list. :)

Paul.

^ permalink raw reply

* Re: [PATCH] Make die() and error() prefix line with binary name if set
From: Rocco Rutte @ 2006-04-26 10:43 UTC (permalink / raw)
  To: git
In-Reply-To: <7vejzkrb2y.fsf@assigned-by-dhcp.cox.net>

* Junio C Hamano <junkio@cox.net>:

>... what's wrong with your mailer?

I don't know. I recall to have seen this earlier.

And while I'll look at it (I bet this an f=f issue), the patch is at:

   <http://user.cs.tu-berlin.de/~pdmef/0001-Make-die-and-error-prefix-line-with-binary-name-if-set.txt>

   bye, Rocco
-- 
:wq!

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  9:28 UTC (permalink / raw)
  To: git
In-Reply-To: <7virowrd1y.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> And a subproject commit, unless it contains subsubproject, would
> look like just an ordinary commit.  Its tree would match the
> entry in the tree the toplevel commit at the path in "bind" line
> of the top-level commit.
> 
> Some reading material, from newer to older:
> 
>   * http://www.kernel.org/git/?p=git/git.git;a=blob;hb=todo;f=Subpro.txt
> 
>   This talks about the overall "vision" on how the user-level
>   interaction might look like, with a sketch on how the core-level
>   would help Porcelain to implement that interaction.  Most of the
>   core-level support described there is in the "bind commit"
>   changes, except "update-index --bind/-unbind" to record the
>   information on bound subprojects in the index file.

By the way, this file talks about (1) "using"/"userspace"/"embedder"
subproject holding 'appliance/', and toplevel (master) holding toplevel
Makefile, or (2) 'using' subproject holding both 'appliance/' and toplevel
Makefile with the help of --exclude. 

Another option would be to have only "embedded"/"used"/"requirement" be
subproject holding 'kernel-2.6', and 'appliance/' hold by toplevel (master)
commit.  Perhaps not the best solution for 'kernel + userspace tools'
example, but might be better workflow for 'application + library' or
'application + engine' example. 

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Junio C Hamano @ 2006-04-26  9:21 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2nbrl$p6l$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

>> Notice two .git directories?  That's right.
> [...] 
>> Meta/.git is a separate repository that is a clone of "todo"
>> branch of git.git repository.  The top-level .git repository
>> does not even have "todo" branch.  I just happen to push into
>> the same public repository git.git at kernel.org from these two
>> separate repositories.
>
> And top-level .git repository is told to ignore Meta directory?

Yes, I have .git/info/exclude that says something like this:

/.mailmap
*~
/Meta
+*

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  8:44 UTC (permalink / raw)
  To: git
In-Reply-To: <7virowrd1y.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> BTW. I have lately stumbled upon (somewhat Vault and Subversion biased)
>>  http://software.ericsink.com/Beyond_CheckOut_and_CheckIn.html
>> Read about Share and Pin -- it's about subprojects (when you edit out the
>> flawed "branch as folder" approach of author).

By the way I mentioned this link only because it *might* be interesting what
others need subproject support for and how others think of it and implement
it.

> Not really.  You can easily do that by checking out another
> project in a separate subdirectory.
> 
> My private working area for git.git is structured like this:
> 
> /home/junio/git.junio/.git
>         Makefile
>                               COPYING
>                               Documentation/
>                               ...
>                               Meta/.git
>                               Meta/TODO
>                               Meta/Make
>                               Meta/TO
>                               Meta/WI
>                               ...
> 
> Notice two .git directories?  That's right.
[...] 
> Meta/.git is a separate repository that is a clone of "todo"
> branch of git.git repository.  The top-level .git repository
> does not even have "todo" branch.  I just happen to push into
> the same public repository git.git at kernel.org from these two
> separate repositories.

And top-level .git repository is told to ignore Meta directory?

Interesting idea...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [PATCH] Make die() and error() prefix line with binary name if set
From: Junio C Hamano @ 2006-04-26  8:32 UTC (permalink / raw)
  To: Rocco Rutte; +Cc: git
In-Reply-To: <20060425101207.GC5482@bolero.cs.tu-berlin.de>

Rocco Rutte <pdmef@gmx.net> writes:

> Now, git_set_appname() can be used to set the name of the binary
> as first call in a binary's main() routine which will be used
> as prefix in die() and error(). If it was not called, no prefix
> will be printed.

I agree with the general direction, but...

> @@ -1960,6 +1960,8 @@ int main(int argc, char **argv)
>  	int read_stdin = 1;
>  	const char *whitespace_option = NULL;
>  +	git_set_appname("git-apply");
> +
>  	for (i = 1; i < argc; i++) {
>  		const char *arg = argv[i];
>  		char *end;

... what's wrong with your mailer?

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Junio C Hamano @ 2006-04-26  7:50 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2n72h$aqe$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Do I understand correctly that toplevel (master project) commits have tree
> which points to combined tree, and "bind" links which points to the
> subprojects commits whose trees make up the overall tree, or does the
> master tree points to tree containing only toplevel files (overall Makefile
> for example, INSTALL or README for the whole project including
> subprojects,...)?

The plan for "bind commit" was to have the toplevel commit to
contain:

	tree -- this covers the whole tree including subprojects
        parent -- list of parents in the toplevel project
        bind -- commit object name of subproject, plus which
	        directory to graft its tree onto.

And a subproject commit, unless it contains subsubproject, would
look like just an ordinary commit.  Its tree would match the
entry in the tree the toplevel commit at the path in "bind" line
of the top-level commit.

Some reading material, from newer to older:

  * http://www.kernel.org/git/?p=git/git.git;a=blob;hb=todo;f=Subpro.txt

  This talks about the overall "vision" on how the user-level
  interaction might look like, with a sketch on how the core-level
  would help Porcelain to implement that interaction.  Most of the
  core-level support described there is in the "bind commit"
  changes, except "update-index --bind/-unbind" to record the
  information on bound subprojects in the index file.

  * http://thread.gmane.org/gmane.comp.version-control.git/15072

  This was the thread that led to the above proposal.

  * http://thread.gmane.org/gmane.comp.version-control.git/14486

  This is older.  It touches an alternative "gitlink" approach,
  which I meant to prototype but never got around to.

  Surprisingly, these two threads are mostly noise-free and
  literally every message is worth reading.

Some old but working core-side code is available at jc/bind
branch of public git.git repository.

> BTW. I have lately stumbled upon (somewhat Vault and Subversion biased)
>  http://software.ericsink.com/Beyond_CheckOut_and_CheckIn.html
> Read about Share and Pin -- it's about subprojects (when you edit out the
> flawed "branch as folder" approach of author).

Not really.  You can easily do that by checking out another
project in a separate subdirectory.

My private working area for git.git is structured like this:

	/home/junio/git.junio/.git
        		      Makefile
                              COPYING
                              Documentation/
                              ...
                              Meta/.git
                              Meta/TODO
                              Meta/Make
                              Meta/TO
                              Meta/WI
                              ...

Notice two .git directories?  That's right.  

The top-level .git repository has the familiar branches like
"maint", "master", "next", "pu", in addition to various topic
branches.

Meta/.git is a separate repository that is a clone of "todo"
branch of git.git repository.  The top-level .git repository
does not even have "todo" branch.  I just happen to push into
the same public repository git.git at kernel.org from these two
separate repositories.

The Meta/ repository is "pinned" to a specific version, without
having any funky "Pin feature", no thank you, because I have
full control of when I update what is checked out in the Meta/
directory.

What you _might_ want is a reverse of Pinning.  Sometimes, you
would want to make sure subproject part is at least this version
or later to build other parts of the whole.

But for my particular "Meta/" directory, I do not need such a
linkage.  The major reason I do not keep TODO in the main
project is because it is supposed to be a task list for me
across "maint", "master" and "next".  I do not want it to
fluctuate whenever I work on different branches.

-jc

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  7:22 UTC (permalink / raw)
  To: git
In-Reply-To: <7vlktssudl.fsf_-_@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> (On topic again)
> 
> Link from subproject commit back to the toplevel might work for
> some kind of subprojects, but it would not work for the
> subproject support that frequently comes up on this list.  The
> development of an embedded Linux device, where a Linux kernel
> source tree is grafted at kernel/ subdirectory of the toplevel
> project.  The "prior" link would be placed in the commit that
> belong to the kernel subproject, but that would never be merged
> to the Linus kernel (why should he care about one particular
> embedded device's development history).  The link must go from
> the toplevel to generic parts reusable out of the context of the
> combined project.

Yes, I guess subproject support is most needed for the "third-party embedded
(sub)project", when one sometimes have to modify (sub)project files, and
perhaps have to watch for the (sub)project version. Hmmm... if one used
Tailor (to allow for projects not managed under GIT, though I wonder if it
would be possible to link up project without [externally available] SCM)
one could use this approach for managing distribution packages, like RPMS
or debs...

Do I understand correctly that toplevel (master project) commits have tree
which points to combined tree, and "bind" links which points to the
subprojects commits whose trees make up the overall tree, or does the
master tree points to tree containing only toplevel files (overall Makefile
for example, INSTALL or README for the whole project including
subprojects,...)?

BTW. I have lately stumbled upon (somewhat Vault and Subversion biased)
 http://software.ericsink.com/Beyond_CheckOut_and_CheckIn.html
Read about Share and Pin -- it's about subprojects (when you edit out the
flawed "branch as folder" approach of author). I wonder if it could be
easily implemented in "subprojects for GIT" proposal... Of course we can do
better, i.e. original subproject repository doesn't need to be on the same
machine, we can use remote repository.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Sam Vilain @ 2006-04-26  6:51 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2mv30$k08$1@sea.gmane.org>

Jakub Narebski wrote:

>>It would still support that. Each commit to the sub-project involves a
>>change to the tree of the "main" commit line (a copy of the commit into
>>a sub-directory of it). The advantage is that the "tree" in the main
>>commit is the combined tree, you don't need to treat the case specially
>>to just get the contents out.
>>    
>>
>
>As far as I understand, for subproject commit "bind" link (and perhaps the
>keyword/name "link" or "ref" would be better than "related") point to other
>subprojects commits (trees), while the Sam's "prior (3)" example link would
>point to the toplevel project (gathering all subprojects) commit, and it
>would probably be named/noted "toplevel", not "prior".
>
>Am I correct?
>  
>

I don't think you quite get my meaning.

What I'm saying is that with the right kind of general purpose relation
between commits, you don't need "bind" at all.

Firstly, you would have your sub-project as its own commit line. That is
a fairly straightforward thing.

Secondly, the project that includes it has a corresponding commit for
each commit on the sub-project. This commit changes the portion of the
outer project's tree where the sub-project is bound.

This means that you don't need to understand this "bind" relation to be
able to extract the tree, and keeps the model simple at the expense of
an extra tree object or three per commit. It also does not restrict the
manner of the "binding", porcelains or users are free to do it
selectively, for instance.

Actually there is large similarity to this and cherry-picking. In
essence you're cherry picking every single commit from a different
commit heirarchy, except that you are applying the patches into a
sub-directory.

Sam.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Junio C Hamano @ 2006-04-26  6:50 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2n4am$1vn$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano wrote:
>
>> Jakub Narebski <jnareb@gmail.com> writes:
>> 
>>> Jakub Narebski wrote:
>>>
>>>> [...] Sam's "prior (3)" example
>>>> link would point to the toplevel project (gathering all subprojects)
>>>> commit, and it would probably be named/noted "toplevel", not "prior".
>>>
>>> Or "master" (like "master document" in DTP).
>> 
>> (Offtopic) isn't "master" in DTP more like template?
>
> Well, in (La)TeX "master document" is a document on it's own rights,
> subdocuments are transcluded using some kind of "include"-like command.

(Offtopic) Ah, the hard-core stuff.  I had something else in
mind ("master page" in "DTP for dummies"), sorry for the
confusion.

(On topic again)

Link from subproject commit back to the toplevel might work for
some kind of subprojects, but it would not work for the
subproject support that frequently comes up on this list.  The
development of an embedded Linux device, where a Linux kernel
source tree is grafted at kernel/ subdirectory of the toplevel
project.  The "prior" link would be placed in the commit that
belong to the kernel subproject, but that would never be merged
to the Linus kernel (why should he care about one particular
embedded device's development history).  The link must go from
the toplevel to generic parts reusable out of the context of the
combined project.

^ permalink raw reply

* Re: [OT] Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  6:35 UTC (permalink / raw)
  To: git
In-Reply-To: <7vzmi8sxt1.fsf_-_@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> Jakub Narebski wrote:
>>
>>> [...] Sam's "prior (3)" example
>>> link would point to the toplevel project (gathering all subprojects)
>>> commit, and it would probably be named/noted "toplevel", not "prior".
>>
>> Or "master" (like "master document" in DTP).
> 
> (Offtopic) isn't "master" in DTP more like template?

Well, in (La)TeX "master document" is a document on it's own rights,
subdocuments are transcluded using some kind of "include"-like command.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [PATCH/RFC] reverse the pack-objects delta window logic
From: Junio C Hamano @ 2006-04-26  5:45 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604252330190.18520@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> Note, this is a RFC particularly to Junio since the resulting pack is 
> larger than without the patch with git-repack -a -f.  However using a 
> subsequent git-repack -a brings the pack size down to expected size.  So 
> I'm not sure I've got everything right.

I haven't tested it seriously yet, but there is nothing that
looks obviously wrong that might cause the inflation problem,
from the cursory look after applying the patch on top of your
last round.

> +	if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth)
> +		return 0;

The older code was loosening this check only for a delta chain
that is already in pack (which is limited to its previous
max_depth).  The end result is almost the same -- a thin pack
recipient would have deeper delta than it asked. The difference
is that the earlier code had implicit 2*max_depth limit, but
this one makes the chain length unbounded, which I do not think
it is necessarily a bad change.  In any case it does not explain
why you are getting larger resulting pack, though.

> +	/* Now some size filtering euristics. */
> +	size = trg_entry->size;
>  	if (size < 50)
> -		return -1;
> -	if (old_entry->depth >= max_depth)
>  		return 0;

This is necessary because you are scanning from smaller to
larger, and I think it is a good change.

> -	/*
> -	 * NOTE!
> -	 *
> -	 * We always delta from the bigger to the smaller, since that's
> -	 * more space-efficient (deletes don't have to say _what_ they
> -	 * delete).
> -	 */

This comment by Linus still applies, even though the scan order
is now reversed; no need to remove it.

> +
> +	if (trg_entry->delta) {
> +		/*
> +		 * The target object already has a delta base but we just
> +		 * found a better one.  Remove it from its former base
> +		 * childhood and redetermine the base delta_limit (if used).
> +		 */

And you are making the delta chain unbound for thin case, you
can probably omit this with the same if() here; the
recomputation seems rather expensive.

> +			die("object %s inconsistent object length (%lu vs %lu)",
> +			    sha1_to_hex(entry->sha1), size, entry->size);
> +		if (!size)
> +			continue;
> +		delta_index = create_delta_index(n->data, size);
> +		if (!delta_index)
> +			die("out of memory");

It might be worth saying "if (size < 50)" here as well; no point
wasting the delta window for small sources.

> -#if 0
> -		/* if we made n a delta, and if n is already at max
> -		 * depth, leaving it in the window is pointless.  we
> -		 * should evict it first.
> -		 * ... in theory only; somehow this makes things worse.
> -		 */
> -		if (entry->delta && depth <= entry->depth)
> -			continue;
> -#endif

I was almost tempted to suggest that the degradation you are
seeing might be related to this mystery I did not get around to
solve.  By allowing to give chance to try delta against less
optimum candidates, it appeared that we ended up making the
final pack size bigger than otherwise, which suggests that our
choice between plain undeltified and a delta half its size might
be favoring delta too much.  But it does not appear to be
related to the inflation you are seeing.

With object list taken between v1.2.3..v1.3.0 in git.git
repository and without delta reuse, 3054 objects are packed
(delta 1734) with this code.  The "next" makes 1818 delta (only
5% more), which makes me suspect that it is making a bad choice
of delta base, because the final pack size is 1.5M vs 1.9M.

The chain length distribution is a bit different (run
"git-verify-pack -v" and look at the end of its output).

The "next" version:

chain length = 1: 257 objects
chain length = 2: 189 objects
chain length = 3: 156 objects
chain length = 4: 149 objects
chain length = 5: 113 objects
chain length = 6: 105 objects
chain length = 7: 105 objects
chain length = 8: 102 objects
chain length = 9: 103 objects
chain length = 10: 539 objects

this version:

chain length = 1: 415 objects
chain length = 2: 333 objects
chain length = 3: 259 objects
chain length = 4: 197 objects
chain length = 5: 155 objects
chain length = 6: 134 objects
chain length = 7: 106 objects
chain length = 8: 69 objects
chain length = 9: 47 objects
chain length = 10: 19 objects

The resulting pack would be faster to access (it has much
shorter median chain length).

BTW, have you tried it without --no-reuse-pack on an object list
that is not thin?  It appears you are busting the depth limit.

Using the same "git rev-list --objects v1.2.3..v1.3.0" as input,
git-pack-objects without --no-reuse-pack gives this
distribution:

chain length = 1: 364 objects
chain length = 2: 269 objects
chain length = 3: 198 objects
chain length = 4: 164 objects
chain length = 5: 148 objects
chain length = 6: 123 objects
chain length = 7: 122 objects
chain length = 8: 103 objects
chain length = 9: 92 objects
chain length = 10: 234 objects
chain length = 11: 12 objects
chain length = 12: 1 object
chain length = 13: 2 objects

So it _might_ be that the depth limiting code is subtly broken
which is causing you throw away a perfectly good delta base
which in turn results in a bad pack.  The distribution from the
"next" version looks like this:

chain length = 1: 358 objects
chain length = 2: 250 objects
chain length = 3: 214 objects
chain length = 4: 169 objects
chain length = 5: 150 objects
chain length = 6: 122 objects
chain length = 7: 126 objects
chain length = 8: 100 objects
chain length = 9: 101 objects
chain length = 10: 232 objects

-- >8 --

Summary of the experiment.

# test dataset
git rev-list --objects v1.2.3..v1.3.0 >RL-1.2.3--1.3.0

# baseline: "next" version is what is on my $PATH
git-pack-objects --no-reuse-delta test-next-pack-nr <RL-1.2.3--1.3.0
git-verify-pack -v test-next-pack-nr-*.pack | tail -n 20
git-pack-objects test-next-pack <RL-1.2.3--1.3.0
git-verify-pack -v test-next-pack-*.pack | tail -n 20

# freshly compiled version with the patch in question
./git-pack-objects --no-reuse-delta test-nico-pack-nr <RL-1.2.3--1.3.0
git-verify-pack -v test-nico-pack-nr-*.pack | tail -n 20
./git-pack-objects test-nico-pack <RL-1.2.3--1.3.0
git-verify-pack -v test-nico-pack-*.pack | tail -n 20

^ permalink raw reply

* [OT] Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Junio C Hamano @ 2006-04-26  5:36 UTC (permalink / raw)
  To: git; +Cc: jnareb
In-Reply-To: <e2n01t$m8j$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Jakub Narebski wrote:
>
>> [...] Sam's "prior (3)" example
>> link would point to the toplevel project (gathering all subprojects)
>> commit, and it would probably be named/noted "toplevel", not "prior".
>
> Or "master" (like "master document" in DTP).

(Offtopic) isn't "master" in DTP more like template?

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  5:22 UTC (permalink / raw)
  To: git
In-Reply-To: <e2mv30$k08$1@sea.gmane.org>

Jakub Narebski wrote:

> [...] Sam's "prior (3)" example
> link would point to the toplevel project (gathering all subprojects)
> commit, and it would probably be named/noted "toplevel", not "prior".

Or "master" (like "master document" in DTP).

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  5:06 UTC (permalink / raw)
  To: git
In-Reply-To: <444EAE7C.5010402@vilain.net>

Sam Vilain wrote:

> Junio C Hamano wrote:

>>> 3. sub-projects
>>>
>>>    In this case, the commit on the "main" commit line would have a
>>>    "prior" link to the commit on the sub-project.  The sub-project
>>>    would effectively be its own head with copied commits objects on
>>>    the main head.
>>>
>>
>>You say you can have only one "prior" per commit, which makes
>>this unsuitable to bind multiple subprojects into a larger
>>project (the earlier "bind" proposal allows zero or more).
> 
> It would still support that. Each commit to the sub-project involves a
> change to the tree of the "main" commit line (a copy of the commit into
> a sub-directory of it). The advantage is that the "tree" in the main
> commit is the combined tree, you don't need to treat the case specially
> to just get the contents out.

As far as I understand, for subproject commit "bind" link (and perhaps the
keyword/name "link" or "ref" would be better than "related") point to other
subprojects commits (trees), while the Sam's "prior (3)" example link would
point to the toplevel project (gathering all subprojects) commit, and it
would probably be named/noted "toplevel", not "prior".

Am I correct?

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* [PATCH/RFC] reverse the pack-objects delta window logic
From: Nicolas Pitre @ 2006-04-26  3:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

This allows for keeping a single delta index constant while delta 
targets are tested against the same base object.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

Note, this is a RFC particularly to Junio since the resulting pack is 
larger than without the patch with git-repack -a -f.  However using a 
subsequent git-repack -a brings the pack size down to expected size.  So 
I'm not sure I've got everything right.

diff --git a/pack-objects.c b/pack-objects.c
index c0acc46..33027a8 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -19,19 +19,17 @@ struct object_entry {
 	unsigned long offset;	/* offset into the final pack file;
 				 * nonzero if already written.
 				 */
-	unsigned int depth;	/* delta depth */
-	unsigned int delta_limit;	/* base adjustment for in-pack delta */
+	unsigned int delta_limit;	/* deepest delta from this object */
 	unsigned int hash;	/* name hint hash */
 	enum object_type type;
 	enum object_type in_pack_type;	/* could be delta */
 	unsigned long delta_size;	/* delta data size (uncompressed) */
 	struct object_entry *delta;	/* delta base object */
-	struct packed_git *in_pack; 	/* already in pack */
-	unsigned int in_pack_offset;
 	struct object_entry *delta_child; /* delitified objects who bases me */
 	struct object_entry *delta_sibling; /* other deltified objects who
-					     * uses the same base as me
-					     */
+					       uses the same base as me */
+	struct packed_git *in_pack; 	/* already in pack */
+	unsigned int in_pack_offset;
 	int preferred_base;	/* we do not pack this, but is encouraged to
 				 * be used as the base objectto delta huge
 				 * objects against.
@@ -906,11 +904,11 @@ static void get_object_details(void)
 	for (i = 0, entry = objects; i < nr_objects; i++, entry++)
 		check_object(entry);
 
-	if (nr_objects == nr_result) {
+	if (!no_reuse_delta && nr_objects == nr_result) {
 		/*
-		 * Depth of objects that depend on the entry -- this
-		 * is subtracted from depth-max to break too deep
-		 * delta chain because of delta data reusing.
+		 * We must determine the maximum depth of reused deltas
+		 * for those objects used as their base before find_deltas()
+		 * starts considering them as potential delta targets.
 		 * However, we loosen this restriction when we know we
 		 * are creating a thin pack -- it will have to be
 		 * expanded on the other end anyway, so do not
@@ -1004,64 +1002,78 @@ struct unpacked {
  * more importantly, the bigger file is likely the more recent
  * one.
  */
-static int try_delta(struct unpacked *cur, struct unpacked *old, unsigned max_depth)
+static int try_delta(struct unpacked *trg, struct unpacked *src,
+		     struct delta_index *src_index, unsigned max_depth)
 {
-	struct object_entry *cur_entry = cur->entry;
-	struct object_entry *old_entry = old->entry;
-	unsigned long size, oldsize, delta_size, sizediff;
-	long max_size;
+	struct object_entry *trg_entry = trg->entry;
+	struct object_entry *src_entry = src->entry;
+	unsigned long size, src_size, delta_size, sizediff, max_size;
 	void *delta_buf;
 
 	/* Don't bother doing diffs between different types */
-	if (cur_entry->type != old_entry->type)
+	if (trg_entry->type != src_entry->type)
 		return -1;
 
 	/* We do not compute delta to *create* objects we are not
 	 * going to pack.
 	 */
-	if (cur_entry->preferred_base)
-		return -1;
+	if (trg_entry->preferred_base)
+		return 0;
 
-	/* If the current object is at pack edge, take the depth the
-	 * objects that depend on the current object into account --
-	 * otherwise they would become too deep.
+	/*
+	 * Make sure deltifying this object won't make its deepest delta
+	 * too deep, but only when not producing a thin pack.
 	 */
-	if (cur_entry->delta_child) {
-		if (max_depth <= cur_entry->delta_limit)
-			return 0;
-		max_depth -= cur_entry->delta_limit;
-	}
-
-	size = cur_entry->size;
-	oldsize = old_entry->size;
-	sizediff = oldsize > size ? oldsize - size : size - oldsize;
+	if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth)
+		return 0;
 
+	/* Now some size filtering euristics. */
+	size = trg_entry->size;
 	if (size < 50)
-		return -1;
-	if (old_entry->depth >= max_depth)
 		return 0;
-
-	/*
-	 * NOTE!
-	 *
-	 * We always delta from the bigger to the smaller, since that's
-	 * more space-efficient (deletes don't have to say _what_ they
-	 * delete).
-	 */
 	max_size = size / 2 - 20;
-	if (cur_entry->delta)
-		max_size = cur_entry->delta_size-1;
+	if (trg_entry->delta)
+		max_size = trg_entry->delta_size-1;
+	src_size = src_entry->size;
+	sizediff = src_size < size ? size - src_size : 0;
 	if (sizediff >= max_size)
 		return 0;
-	delta_buf = diff_delta(old->data, oldsize,
-			       cur->data, size, &delta_size, max_size);
+
+	delta_buf = create_delta(src_index, trg->data, size, &delta_size, max_size);
 	if (!delta_buf)
 		return 0;
-	cur_entry->delta = old_entry;
-	cur_entry->delta_size = delta_size;
-	cur_entry->depth = old_entry->depth + 1;
+
+	if (trg_entry->delta) {
+		/*
+		 * The target object already has a delta base but we just
+		 * found a better one.  Remove it from its former base
+		 * childhood and redetermine the base delta_limit (if used).
+		 */
+		struct object_entry *base = trg_entry->delta;
+		struct object_entry **child_link = &base->delta_child;
+		base->delta_limit = 0;
+		while (*child_link) {
+			if (*child_link == trg_entry) {
+				*child_link = trg_entry->delta_sibling;
+				if (nr_objects != nr_result)
+					break;
+				continue;
+			}
+			if (base->delta_limit <= (*child_link)->delta_limit)
+				base->delta_limit =
+					(*child_link)->delta_limit + 1;
+			child_link = &(*child_link)->delta_sibling;
+		}
+	}
+
+	trg_entry->delta = src_entry;
+	trg_entry->delta_size = delta_size;
+	trg_entry->delta_sibling = src_entry->delta_child;
+	src_entry->delta_child = trg_entry;
+	if (src_entry->delta_limit <= trg_entry->delta_limit)
+		src_entry->delta_limit = trg_entry->delta_limit + 1;
 	free(delta_buf);
-	return 0;
+	return 1;
 }
 
 static void progress_interval(int signum)
@@ -1078,14 +1090,15 @@ static void find_deltas(struct object_en
 	unsigned last_percent = 999;
 
 	memset(array, 0, array_size);
-	i = nr_objects;
+	i = 0;
 	idx = 0;
 	if (progress)
 		fprintf(stderr, "Deltifying %d objects.\n", nr_result);
 
-	while (--i >= 0) {
-		struct object_entry *entry = list[i];
+	while (i < nr_objects) {
+		struct object_entry *entry = list[i++];
 		struct unpacked *n = array + idx;
+		struct delta_index *delta_index;
 		unsigned long size;
 		char type[10];
 		int j;
@@ -1113,7 +1126,13 @@ static void find_deltas(struct object_en
 		n->entry = entry;
 		n->data = read_sha1_file(entry->sha1, type, &size);
 		if (size != entry->size)
-			die("object %s inconsistent object length (%lu vs %lu)", sha1_to_hex(entry->sha1), size, entry->size);
+			die("object %s inconsistent object length (%lu vs %lu)",
+			    sha1_to_hex(entry->sha1), size, entry->size);
+		if (!size)
+			continue;
+		delta_index = create_delta_index(n->data, size);
+		if (!delta_index)
+			die("out of memory");
 
 		j = window;
 		while (--j > 0) {
@@ -1124,18 +1143,10 @@ static void find_deltas(struct object_en
 			m = array + other_idx;
 			if (!m->entry)
 				break;
-			if (try_delta(n, m, depth) < 0)
+			if (try_delta(m, n, delta_index, depth) < 0)
 				break;
 		}
-#if 0
-		/* if we made n a delta, and if n is already at max
-		 * depth, leaving it in the window is pointless.  we
-		 * should evict it first.
-		 * ... in theory only; somehow this makes things worse.
-		 */
-		if (entry->delta && depth <= entry->depth)
-			continue;
-#endif
+		free_delta_index(delta_index);
 		idx++;
 		if (idx >= window)
 			idx = 0;

^ permalink raw reply related

* Re: [PATCH] send-email: Change from Mail::Sendmail to Net::SMTP
From: Martin Langhoff @ 2006-04-26  0:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Ryan Anderson
In-Reply-To: <1143336048205-git-send-email-normalperson@yhbt.net>

On 3/26/06, Eric Wong <normalperson@yhbt.net> wrote:
> Net::SMTP is in the base Perl distribution, so users are more
> likely to have it.  Net::SMTP also allows reusing the SMTP
> connection, so sending multiple emails is faster.

This is causing problems for me on my Debian sarge dev box.

 * If I have to believe strace(), Net::SMTP is trying to look up
"localhost" via DNS. Sketchy workaround: use 127.0.0.1.

 * This box has nothing listening on port 25. It doesn't get email
from the net, being a LAN machine, so I've told the debian config
system that we don't need an smtp daemon. Net::SMTP doesn't know how
to use /usr/bin/sendmail

 * That nasty @@VERSION@@ thing isn't valid perl, so working on this
code is a pain. Something like this (warning! broken diff ahead!)
fixes it for me.

@@ -292,6 +292,11 @@ sub send_message
        @recipients = unique_email_list(@recipients,@cc);
        my $date = strftime('%a, %d %b %Y %H:%M:%S %z', localtime($time++));

+       my $gitversion = '@@GIT_VERSION@@';
+       if ($gitversion eq '@@'.'GIT_VERSION@@') {
+           $gitversion = `git --version`;
+       }
+
        my $header = "From: $from
 To: $to
 Cc: $cc
@@ -299,11 +304,11 @@ Subject: $subject
 Reply-To: $from
 Date: $date
 Message-Id: $message_id
-X-Mailer: git-send-email @@GIT_VERSION@@
+X-Mailer: git-send-email $gitversion
 ";
        $header .= "In-Reply-To: $reply_to\n" if $reply_to;

cheers,

martin

^ permalink raw reply

* Proposal: git-based dependency tracking build system
From: Matt McCutchen @ 2006-04-26  0:13 UTC (permalink / raw)
  To: git

Dear git people,

I have been thinking for some time about how to write a foolproof
general-use build system that automatically tracks dependencies.  (Make
+ depcomp is decent as long as source files aren't added/removed or
generated often.  Cons is good but not general-purpose.)  I know there's
been some work on tracing the compiler to see which files it actually
opens.  Another possibility is to layer a FUSE filesystem over the build
tree and note which files in the virtual filesystem are opened; this has
the advantage of missing most of the boring files (e.g. shared libraries
that make up the compiler).

So I was thinking, why not write a build system that uses git's
excellent hash-based object storage support to store the files in the
virtual build tree?  Hashing the files makes it easy to notice when a
file is rewritten with the same contents, meaning files that depend on
it don't actually have to be rebuilt.  I also envision the build system
automatically marking generated files as git-ignored.

Thoughts?

-- 
Matt McCutchen
hashproduct@verizon.net
http://hashproduct.metaesthetics.net/

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Sam Vilain @ 2006-04-25 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwtde2q1z.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>> 2. revising published commits / re-basing
>>
>>    This is what "stg" et al do.  The tools allow you to commit,
>>    rewind, revise, recommit, fast forward, etc.
>>    
>>
>
>stg wants to have a link to the fork-point commit.  I do not
>know if it is absolutely necessary (you might be able to figure
>it out using merge-base, I dunno).
>  
>

"stg pull" and "stg pick" could conceivably link individual patches in a
patchset to their precedent in a previous series. This would make
looking at the evolution of individual patches over time more feasible.

>>    In this case, the "prior" link would point to the last revision of
>>    a patch.  Tools would probably
>>    
>>
>
>Probably what...???
>  
>

...probably support this as an explicit operation - ie "publish", so
that winding whilst developing is not tracked.

>> 3. sub-projects
>>
>>    In this case, the commit on the "main" commit line would have a
>>    "prior" link to the commit on the sub-project.  The sub-project
>>    would effectively be its own head with copied commits objects on
>>    the main head.
>>    
>>
>
>You say you can have only one "prior" per commit, which makes
>this unsuitable to bind multiple subprojects into a larger
>project (the earlier "bind" proposal allows zero or more).
>  
>

It would still support that. Each commit to the sub-project involves a
change to the tree of the "main" commit line (a copy of the commit into
a sub-directory of it). The advantage is that the "tree" in the main
commit is the combined tree, you don't need to treat the case specially
to just get the contents out.

This is kind of like how SVK works by default - you have one local
repository, inside which you track remote repositories. Each commit on
the upstream repository is copied individually into your own repository.
So your local repository numbers easily reach into tens of thousands
(small numbers in git land, I know) while the upstream revisions are
just in the thousands.

>There may be some narrower concrete use case for which you can
>devise coherent semantics, and teach tools and humans how to
>interpret such inter-commit relationship that are _not_
>parent-child ancestry.  For example, if you have one special
>link to point at a "cherry-picked" commit, rebasing _could_ take
>advantage of it.  When your side branch tip is at D, and commit
>D has "this was cherry-picked from commit E" note, and if you
>are rebasing your work on top of F:
>
>        A---B---C---D
>       /
>  o---o---E---F
>
>the tool can notice that F can reach E and carry forward only A,
>B, and C on top of F, omitting D.  So having such a link might
>be useful.  But if that is what you are going to do, I do not
>think you would want to conflate that with other inter-commit
>relationships, such as "previous hydra cap".
>  
>

Right, I see the problem, a strong argument for a more generic solution
as you presented.

Sam.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Sam Vilain @ 2006-04-25 23:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jnareb
In-Reply-To: <7v7j5e2jv7.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>Here is a related but not necessarily competing idle thought.
>
>How about an ability to "attach" arbitrary objects to commit
>objects?  The commit object would look like:
>
>    tree 0aaa3fecff73ab428999cb9156f8abc075516abe
>    parent 5a6a8c0e012137a3f0059be40ec7b2f4aa614355
>    parent e1cbc46d12a0524fd5e710cbfaf3f178fc3da504
>    related a0e7d36193b96f552073558acf5fcc1f10528917 key
>    related 0032d548db56eac9ea09b4ba05843365f6325b85 cherrypick
>    author Junio C Hamano <junkio@cox.net> 1145943079 -0700
>    committer Junio C Hamano <junkio@cox.net> 1145943079 -0700
>  
>

I agree with the criticisms of the patchset, and I think this is
probably a more comprehensive and less ambiguous solution. I originally
thought that the use cases were close enough together that they could be
called the same thing, but I see now that they are not.

IMHO one important goal is to stop "parent" from meaning anything other
than:

1. for a regular commit, the base for this change. The change consists
of the differences between the two trees.
2. for a "merge", the merge parents for this change. The change consists
of all differences between the index merges (allowing duplicate blobs at
each location) and the final merged tree.

If you were to, for a moving merge head, just record the previous merge
as a "parent", then it would make it difficult to look at the commit
history to figure out which parent links represent the last merge, and
which represent the merge bases.

This suggestion fixes that problem nicely, while being nice and flexible
for solving the other problems too.

>    Merge branch 'pb/config' into next
>
>    * pb/config:
>      Deprecate usage of git-var -l for getting config vars list
>      git-repo-config --list support
>
>The format of "related" attribute is, keyword "related", SP, 40-byte
>hexadecimal object name, SP, and arbitrary sequence of bytes
>except LF and NUL.  Let's call this arbitrary sequence of bytes
>"the nature of relation".
>
>The semantics I would attach to these "related" links are as
>follows:
>
> * To the "core" level git, they do not mean anything other than
>   "you must to have these objects, and objects reachable from
>   them, if you are going to have this commit and claim your
>   repository is without missing objects".
>  
>

This is essentially correct, however you have already described a use
case where you want the behaviour to be to lose the previous commit chain:

>The reason I do not include the previous head when I reconstruct
>"pu" is because I explicitly *want* to drop history -- not
>having to carry forward a failed experiment is what is desired
>there.  Otherwise I would manage "pu" just like I currently do
>"next" and "master".  So this is not a justification to add
>something new.
>  
>

In this case, I think that there are types of relations that are more
along the lines of "don't bother following this link by default, but
warn/fail if it is unavailable depending on the user preferences".

git-fsck could then have options to prune (or archive) certain types of
optional relations. This way people can still record complete history if
they like. And people who want to mark portions of history as bad (such
as, violating copyright law) have a clear way to state that intent.

>That means "git-rev-list --objects" needs to list these objects
>(and if they are tags, commits, and trees, then what are
>reachable from them), and "git-fsck" needs to consider these
>related objects and objects reachable from them are reachable
>from this commit.  NOTHING ELSE NEEDS TO BE DONE by the core
>(obviously, cat-file needs to show them, and commit-tree needs to
>record them, but that goes without saying).
>  
>

Ok, I'll investigate that.

>Then porcelains can agree on what different kinds of nature of
>relation mean and do sensible things.  The earlier "omit the
>cherry-picked ones" example I gave can examine "cherrypick".
>  
>

Sounds good. Let things evolve.

Sam.

^ permalink raw reply

* Re: maintenance of cache-tree data
From: Junio C Hamano @ 2006-04-25 23:05 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds
In-Reply-To: <7vk69e61s4.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Well, I was blind ;-).  As long as the whole-file SHA1 matches,
> read_cache() does not care if we have extra data after the
> series of active_nr cache entry data in the index file.
>
> I'm working on a patch now.

So I did.

There is one bad thing; so far "write-tree" was a read-only
consumer of the index file, but now it primes the cache-tree
structure and needs to update the index.  But that is minor.

While I was at it, I made this "stuffing extra cruft in the
index" slightly more generic than I needed it for this
particular application.  What I see this _might_ be useful for
are:

 - We would want to store which commit of a subproject a
   particular subdirectory came from.  This was one missing
   piece from the "bind commit" proposal that wasn't implemented
   in the jc/bind branch.

 - We might want to record "at this path there is a directory,
   albeit empty"; this cannot be expressed with an usual index
   entry.

   We might be able to use cache-tree for that, but I think this
   is something different at the logical level.  While
   cache-tree is to be fully populated (by write-tree and
   perhaps read-tree later) and invalidated partially when
   update-index and friends smudge part of the tree, this is not
   something we would want to even invalidate (IOW, it should
   always be up-to-date), so they serve different purposes.

I still haven't looked at the read-tree yet, but as I outlined
in a previous message, its intra-index merge could take
advantage of cache-tree.  "diff-index", especially "--cached"
kind, also could use it to skip unchanged subtrees altogether.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jason Riedy @ 2006-04-25 22:17 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2lrk5$ed5$1@sea.gmane.org>

And Jakub Narebski writes:
 - I don't mean we shouldn't define semantic for each use of "related" or
 - "note" header. Just like email X-* headres have detailed form and semantic
 - (long, long time ago Sender was X-Sender for example ;-). It's just a
 - toolkit.

You just proved Linus's point.  Ever have to parse
archives of old mail?  There are many different ways
of saying the same thing, and many of the same way
of saying different things.  It's pure hell.

And people expect you to get the X-* headers correct
for whatever definition of correct they happen to have
at the moment.  ugh.  You have many de-facto semantics
for the same headers, and no way to disambiguate them.

People will need to parse and understand git archives
thirty+ years from now.  Don't place this curse on
them.

Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox