Git development

Git development
 help / color / mirror / Atom feed

* [PATCH] Fix up remaining man pages that use asciidoc "callouts".
From: Sean Estabrooks @ 2006-04-28 13:15 UTC (permalink / raw)
  To: git; +Cc: Sean Estabrooks
In-Reply-To: <11462301063885-git-send-email-seanlkml@sympatico.ca>

Unfortunately docbook does not allow a callout to be
referenced from inside a callout list description.
Rewrite one paragraph in git-reset man page to work
around this limitation.

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/everyday.txt         |   45 +++++++++++------------
 Documentation/git-checkout.txt     |   18 +++++----
 Documentation/git-diff.txt         |   38 ++++++++++---------
 Documentation/git-init-db.txt      |    8 ++--
 Documentation/git-reset.txt        |   72 ++++++++++++++++++------------------
 Documentation/git-update-index.txt |   31 ++++++++--------
 6 files changed, 104 insertions(+), 108 deletions(-)

481f9838c408f36fe74a44197865b54842174546
diff --git a/Documentation/everyday.txt b/Documentation/everyday.txt
index 3ab9b91..4b56370 100644
--- a/Documentation/everyday.txt
+++ b/Documentation/everyday.txt
@@ -61,7 +61,8 @@ Check health and remove cruft.::
 $ git count-objects <2>
 $ git repack <3>
 $ git prune <4>
-
+------------
++
 <1> running without "--full" is usually cheap and assures the
 repository health reasonably well.
 <2> check how many loose objects there are and how much
@@ -69,17 +70,16 @@ diskspace is wasted by not repacking.
 <3> without "-a" repacks incrementally.  repacking every 4-5MB
 of loose objects accumulation may be a good rule of thumb.
 <4> after repack, prune removes the duplicate loose objects.
-------------
 
 Repack a small project into single pack.::
 +
 ------------
 $ git repack -a -d <1>
 $ git prune
-
+------------
++
 <1> pack all the objects reachable from the refs into one pack
 and remove unneeded other packs
-------------
 
 
 Individual Developer (Standalone)[[Individual Developer (Standalone)]]
@@ -129,10 +129,10 @@ Extract a tarball and create a working t
 $ git add . <1>
 $ git commit -m 'import of frotz source tree.'
 $ git tag v2.43 <2>
-
+------------
++
 <1> add everything under the current directory.
 <2> make a lightweight, unannotated tag.
-------------
 
 Create a topic branch and develop.::
 +
@@ -153,7 +153,8 @@ Create a topic branch and develop.::
 $ git pull . alsa-audio <10>
 $ git log --since='3 days ago' <11>
 $ git log v2.43.. curses/ <12>
-
+------------
++
 <1> create a new topic branch.
 <2> revert your botched changes in "curses/ux_audio_oss.c".
 <3> you need to tell git if you added a new file; removal and
@@ -170,7 +171,6 @@ you originally wrote.
 combined and include --max-count=10 (show 10 commits), --until='2005-12-10'.
 <12> view only the changes that touch what's in curses/
 directory, since v2.43 tag.
-------------
 
 
 Individual Developer (Participant)[[Individual Developer (Participant)]]
@@ -208,7 +208,8 @@ Clone the upstream and work on it.  Feed
 $ git reset --hard ORIG_HEAD <6>
 $ git prune <7>
 $ git fetch --tags <8>
-
+------------
++
 <1> repeat as needed.
 <2> extract patches from your branch for e-mail submission.
 <3> "pull" fetches from "origin" by default and merges into the
@@ -221,7 +222,6 @@ area we are interested in.
 <7> garbage collect leftover objects from reverted pull.
 <8> from time to time, obtain official tags from the "origin"
 and store them under .git/refs/tags/.
-------------
 
 
 Push into another repository.::
@@ -239,7 +239,8 @@ satellite$ git push origin <4>
 mothership$ cd frotz
 mothership$ git checkout master
 mothership$ git pull . satellite <5>
-
+------------
++
 <1> mothership machine has a frotz repository under your home
 directory; clone from it to start a repository on the satellite
 machine.
@@ -252,7 +253,6 @@ to local "origin" branch.
 mothership machine.  You could use this as a back-up method.
 <5> on mothership machine, merge the work done on the satellite
 machine into the master branch.
-------------
 
 Branch off of a specific tag.::
 +
@@ -262,12 +262,12 @@ Branch off of a specific tag.::
 $ git checkout master
 $ git format-patch -k -m --stdout v2.6.14..private2.6.14 |
   git am -3 -k <2>
-
+------------
++
 <1> create a private branch based on a well known (but somewhat behind)
 tag.
 <2> forward port all changes in private2.6.14 branch to master branch
 without a formal "merging".
-------------
 
 
 Integrator[[Integrator]]
@@ -317,7 +317,8 @@ My typical GIT day.::
 $ git fetch ko && git show-branch master maint 'tags/ko-*' <11>
 $ git push ko <12>
 $ git push ko v0.99.9x <13>
-
+------------
++
 <1> see what I was in the middle of doing, if any.
 <2> see what topic branches I have and think about how ready
 they are.
@@ -346,7 +347,6 @@ In the output from "git show-branch", "m
 everything "ko-master" has.
 <12> push out the bleeding edge.
 <13> push the tag out, too.
-------------
 
 
 Repository Administration[[Repository Administration]]
@@ -367,7 +367,6 @@ example of managing a shared central rep
 
 Examples
 ~~~~~~~~
-
 Run git-daemon to serve /pub/scm from inetd.::
 +
 ------------
@@ -388,13 +387,13 @@ cindy:x:1002:1002::/home/cindy:/usr/bin/
 david:x:1003:1003::/home/david:/usr/bin/git-shell
 $ grep git /etc/shells <2>
 /usr/bin/git-shell
-
+------------
++
 <1> log-in shell is set to /usr/bin/git-shell, which does not
 allow anything but "git push" and "git pull".  The users should
 get an ssh access to the machine.
 <2> in many distributions /etc/shells needs to list what is used
 as the login shell.
-------------
 
 CVS-style shared repository.::
 +
@@ -419,7 +418,8 @@ git:x:9418:alice,bob,cindy,david
 refs/heads/master	alice\|cindy
 refs/heads/doc-update	bob
 refs/tags/v[0-9]*	david
-
+------------
++
 <1> place the developers into the same git group.
 <2> and make the shared repository writable by the group.
 <3> use update-hook example by Carl from Documentation/howto/
@@ -427,7 +427,6 @@ for branch policy control.
 <4> alice and cindy can push into master, only bob can push into doc-update.
 david is the release manager and is the only person who can
 create and push version tags.
-------------
 
 HTTP server to support dumb protocol transfer.::
 +
@@ -435,7 +434,7 @@ HTTP server to support dumb protocol tra
 dev$ git update-server-info <1>
 dev$ ftp user@isp.example.com <2>
 ftp> cp -r .git /home/user/myproject.git
-
+------------
++
 <1> make sure your info/refs and objects/info/packs are up-to-date
 <2> upload to public HTTP server hosted by your ISP.
-------------
diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index 985bb2f..78f2fe6 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -66,19 +66,19 @@ the `Makefile` to two revisions back, de
 mistake, and gets it back from the index.
 +
 ------------
-$ git checkout master <1>
-$ git checkout master~2 Makefile <2>
+$ git checkout master             <1>
+$ git checkout master~2 Makefile  <2>
 $ rm -f hello.c
-$ git checkout hello.c <3>
-
+$ git checkout hello.c            <3>
+------------
++
 <1> switch branch
 <2> take out a file out of other commit
-<3> or "git checkout -- hello.c", as in the next example.
-------------
+<3> restore hello.c from HEAD of current branch
 +
-If you have an unfortunate branch that is named `hello.c`, the
-last step above would be confused as an instruction to switch to
-that branch.  You should instead write:
+If you have an unfortunate branch that is named `hello.c`, this
+step would be confused as an instruction to switch to that branch.  
+You should instead write:
 +
 ------------
 $ git checkout -- hello.c
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 890931c..7267bcd 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -46,40 +46,41 @@ EXAMPLES
 Various ways to check your working tree::
 +
 ------------
-$ git diff <1>
-$ git diff --cached <2>
-$ git diff HEAD <3>
-
+$ git diff            <1>
+$ git diff --cached   <2>
+$ git diff HEAD       <3>
+------------
++
 <1> changes in the working tree since your last git-update-index.
 <2> changes between the index and your last commit; what you
 would be committing if you run "git commit" without "-a" option.
 <3> changes in the working tree since your last commit; what you
 would be committing if you run "git commit -a"
-------------
 
 Comparing with arbitrary commits::
 +
 ------------
-$ git diff test <1>
-$ git diff HEAD -- ./test <2>
-$ git diff HEAD^ HEAD <3>
-
+$ git diff test            <1>
+$ git diff HEAD -- ./test  <2>
+$ git diff HEAD^ HEAD      <3>
+------------
++
 <1> instead of using the tip of the current branch, compare with the
 tip of "test" branch.
 <2> instead of comparing with the tip of "test" branch, compare with
 the tip of the current branch, but limit the comparison to the
 file "test".
 <3> compare the version before the last commit and the last commit.
-------------
 
 
 Limiting the diff output::
 +
 ------------
-$ git diff --diff-filter=MRC <1>
-$ git diff --name-status -r <2>
-$ git diff arch/i386 include/asm-i386 <3>
-
+$ git diff --diff-filter=MRC            <1>
+$ git diff --name-status -r             <2>
+$ git diff arch/i386 include/asm-i386   <3>
+------------
++
 <1> show only modification, rename and copy, but not addition
 nor deletion.
 <2> show only names and the nature of change, but not actual
@@ -88,18 +89,17 @@ which in turn also disables recursive be
 you would only see the directory name if there is a change in a
 file in a subdirectory.
 <3> limit diff output to named subtrees.
-------------
 
 Munging the diff output::
 +
 ------------
-$ git diff --find-copies-harder -B -C <1>
-$ git diff -R <2>
-
+$ git diff --find-copies-harder -B -C  <1>
+$ git diff -R                          <2>
+------------
++
 <1> spend extra cycles to find renames, copies and complete
 rewrites (very expensive).
 <2> output diff in reverse.
-------------
 
 
 Author
diff --git a/Documentation/git-init-db.txt b/Documentation/git-init-db.txt
index aeb1115..8a150d8 100644
--- a/Documentation/git-init-db.txt
+++ b/Documentation/git-init-db.txt
@@ -60,12 +60,12 @@ Start a new git repository for an existi
 +
 ----------------
 $ cd /path/to/my/codebase
-$ git-init-db <1>
-$ git-add . <2>
-
+$ git-init-db   <1>
+$ git-add .     <2>
+----------------
++
 <1> prepare /path/to/my/codebase/.git directory
 <2> add all existing file to the index
-----------------
 
 
 Author
diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
index b7b9798..b17cdba 100644
--- a/Documentation/git-reset.txt
+++ b/Documentation/git-reset.txt
@@ -49,10 +49,11 @@ Undo a commit and redo::
 +
 ------------
 $ git commit ...
-$ git reset --soft HEAD^ <1>
-$ edit <2>
-$ git commit -a -c ORIG_HEAD <3>
-
+$ git reset --soft HEAD^      <1>
+$ edit                        <2>
+$ git commit -a -c ORIG_HEAD  <3>
+------------
++
 <1> This is most often done when you remembered what you
 just committed is incomplete, or you misspelled your commit
 message, or both.  Leaves working tree as it was before "reset".
@@ -60,43 +61,43 @@ message, or both.  Leaves working tree a
 <3> "reset" copies the old head to .git/ORIG_HEAD; redo the
 commit by starting with its log message.  If you do not need to
 edit the message further, you can give -C option instead.
-------------
 
 Undo commits permanently::
 +
 ------------
 $ git commit ...
-$ git reset --hard HEAD~3 <1>
-
+$ git reset --hard HEAD~3   <1>
+------------
++
 <1> The last three commits (HEAD, HEAD^, and HEAD~2) were bad
 and you do not want to ever see them again.  Do *not* do this if
 you have already given these commits to somebody else.
-------------
 
 Undo a commit, making it a topic branch::
 +
 ------------
-$ git branch topic/wip <1>
-$ git reset --hard HEAD~3 <2>
-$ git checkout topic/wip <3>
-
+$ git branch topic/wip     <1>
+$ git reset --hard HEAD~3  <2>
+$ git checkout topic/wip   <3>
+------------
++
 <1> You have made some commits, but realize they were premature
 to be in the "master" branch.  You want to continue polishing
 them in a topic branch, so create "topic/wip" branch off of the
 current HEAD.
 <2> Rewind the master branch to get rid of those three commits.
 <3> Switch to "topic/wip" branch and keep working.
-------------
 
 Undo update-index::
 +
 ------------
-$ edit <1>
+$ edit                                     <1>
 $ git-update-index frotz.c filfre.c
-$ mailx <2>
-$ git reset <3>
-$ git pull git://info.example.com/ nitfol <4>
-
+$ mailx                                    <2>
+$ git reset                                <3>
+$ git pull git://info.example.com/ nitfol  <4>
+------------
++
 <1> you are happily working on something, and find the changes
 in these files are in good order.  You do not want to see them
 when you run "git diff", because you plan to work on other files
@@ -109,12 +110,11 @@ index changes for these two files.  Your
 remain there.
 <4> then you can pull and merge, leaving frotz.c and filfre.c
 changes still in the working tree.
-------------
 
 Undo a merge or pull::
 +
 ------------
-$ git pull <1>
+$ git pull                         <1>
 Trying really trivial in-index merge...
 fatal: Merge requires file-level merging
 Nope.
@@ -122,20 +122,19 @@ Nope.
 Auto-merging nitfol
 CONFLICT (content): Merge conflict in nitfol
 Automatic merge failed/prevented; fix up by hand
-$ git reset --hard <2>
-
+$ git reset --hard                 <2>
+$ git pull . topic/branch          <3>
+Updating from 41223... to 13134...
+Fast forward
+$ git reset --hard ORIG_HEAD       <4>
+------------
++
 <1> try to update from the upstream resulted in a lot of
 conflicts; you were not ready to spend a lot of time merging
 right now, so you decide to do that later.
 <2> "pull" has not made merge commit, so "git reset --hard"
 which is a synonym for "git reset --hard HEAD" clears the mess
 from the index file and the working tree.
-
-$ git pull . topic/branch <3>
-Updating from 41223... to 13134...
-Fast forward
-$ git reset --hard ORIG_HEAD <4>
-
 <3> merge a topic branch into the current branch, which resulted
 in a fast forward.
 <4> but you decided that the topic branch is not ready for public
@@ -143,7 +142,6 @@ consumption yet.  "pull" or "merge" alwa
 tip of the current branch in ORIG_HEAD, so resetting hard to it
 brings your index file and the working tree back to that state,
 and resets the tip of the branch to that commit.
-------------
 
 Interrupted workflow::
 +
@@ -155,21 +153,21 @@ need to get to the other branch for a qu
 ------------
 $ git checkout feature ;# you were working in "feature" branch and
 $ work work work       ;# got interrupted
-$ git commit -a -m 'snapshot WIP' <1>
+$ git commit -a -m 'snapshot WIP'                 <1>
 $ git checkout master
 $ fix fix fix
 $ git commit ;# commit with real log
 $ git checkout feature
-$ git reset --soft HEAD^ ;# go back to WIP state <2>
-$ git reset <3>
-
+$ git reset --soft HEAD^ ;# go back to WIP state  <2>
+$ git reset                                       <3>
+------------
++
 <1> This commit will get blown away so a throw-away log message is OK.
 <2> This removes the 'WIP' commit from the commit history, and sets
     your working tree to the state just before you made that snapshot.
-<3> After <2>, the index file still has all the WIP changes you
-    committed in <1>.  This sets it to the last commit you were
-    basing the WIP changes on.
-------------
+<3> At this point the index file still has all the WIP changes you
+    committed as 'snapshot WIP'.  This updates the index to show your 
+    WIP files as uncommitted.
 
 Author
 ------
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 0a1b0ad..d4137fc 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -247,34 +247,33 @@ To update and refresh only the files alr
 $ git-checkout-index -n -f -a && git-update-index --ignore-missing --refresh
 ----------------
 
-On an inefficient filesystem with `core.ignorestat` set:
-
+On an inefficient filesystem with `core.ignorestat` set::
++
 ------------
-$ git update-index --really-refresh <1>
-$ git update-index --no-assume-unchanged foo.c <2>
-$ git diff --name-only <3>
+$ git update-index --really-refresh              <1>
+$ git update-index --no-assume-unchanged foo.c   <2>
+$ git diff --name-only                           <3>
 $ edit foo.c
-$ git diff --name-only <4>
+$ git diff --name-only                           <4>
 M foo.c
-$ git update-index foo.c <5>
-$ git diff --name-only <6>
+$ git update-index foo.c                         <5>
+$ git diff --name-only                           <6>
 $ edit foo.c
-$ git diff --name-only <7>
-$ git update-index --no-assume-unchanged foo.c <8>
-$ git diff --name-only <9>
+$ git diff --name-only                           <7>
+$ git update-index --no-assume-unchanged foo.c   <8>
+$ git diff --name-only                           <9>
 M foo.c
-
-<1> forces lstat(2) to set "assume unchanged" bits for paths
-    that match index.
+------------
++
+<1> forces lstat(2) to set "assume unchanged" bits for paths that match index.
 <2> mark the path to be edited.
 <3> this does lstat(2) and finds index matches the path.
-<4> this does lstat(2) and finds index does not match the path.
+<4> this does lstat(2) and finds index does *not* match the path.
 <5> registering the new version to index sets "assume unchanged" bit.
 <6> and it is assumed unchanged.
 <7> even after you edit it.
 <8> you can tell about the change after the fact.
 <9> now it checks with lstat(2) and finds it has been changed.
-------------
 
 
 Configuration
-- 
1.3.1.gc672

^ permalink raw reply related

* [PATCH] Fix trivial typo in git-log man page.
From: Sean Estabrooks @ 2006-04-28 13:15 UTC (permalink / raw)
  To: git; +Cc: Sean Estabrooks

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/git-log.txt |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

0afa7822c6a0d6ffa82f9d9b64c78df8587e190d
diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index 76cb894..af378ff 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -14,13 +14,12 @@ DESCRIPTION
 -----------
 Shows the commit logs.
 
-The command takes options applicable to the gitlink::git-rev-list[1]
+The command takes options applicable to the gitlink:git-rev-list[1]
 command to control what is shown and how, and options applicable to
-the gitlink::git-diff-tree[1] commands to control how the change
+the gitlink:git-diff-tree[1] commands to control how the change
 each commit introduces are shown.
 
-This manual page describes only the most frequently used
-options.
+This manual page describes only the most frequently used options.
 
 
 OPTIONS
-- 
1.3.1.gc672

^ permalink raw reply related

* [PATCH] Properly render asciidoc "callouts" in git man pages.
From: Sean Estabrooks @ 2006-04-28 13:15 UTC (permalink / raw)
  To: git; +Cc: Sean Estabrooks
In-Reply-To: <1146230106696-git-send-email-seanlkml@sympatico.ca>

Adds an xsl fragment to render docbook callouts when
converting to man page format.  Update the Makefile
to have "xmlto" use it when generating man pages.

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/Makefile     |    2 +-
 Documentation/callouts.xsl |   16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/callouts.xsl

82ee912fb2a58194cac3d65b15abc98190a3359a
diff --git a/Documentation/Makefile b/Documentation/Makefile
index f4cbf7e..c1af22c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -79,7 +79,7 @@ clean:
 	asciidoc -b xhtml11 -d manpage -f asciidoc.conf $<
 
 %.1 %.7 : %.xml
-	xmlto man $<
+	xmlto -m callouts.xsl man $<
 
 %.xml : %.txt
 	asciidoc -b docbook -d manpage -f asciidoc.conf $<
diff --git a/Documentation/callouts.xsl b/Documentation/callouts.xsl
new file mode 100644
index 0000000..ad03755
--- /dev/null
+++ b/Documentation/callouts.xsl
@@ -0,0 +1,16 @@
+<!-- callout.xsl: converts asciidoc callouts to man page format -->
+<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
+<xsl:template match="co">
+	<xsl:value-of select="concat('\fB(',substring-after(@id,'-'),')\fR')"/>
+</xsl:template>
+<xsl:template match="calloutlist">
+	<xsl:text>.sp&#10;</xsl:text>
+	<xsl:apply-templates/>
+	<xsl:text>&#10;</xsl:text>
+</xsl:template>
+<xsl:template match="callout">
+	<xsl:value-of select="concat('\fB',substring-after(@arearefs,'-'),'. \fR')"/>
+	<xsl:apply-templates/>
+	<xsl:text>.br&#10;</xsl:text>
+</xsl:template>
+</xsl:stylesheet>
-- 
1.3.1.gc672

^ permalink raw reply related

* [PATCH] Update the git-branch man page to include the "-r" option,
From: Sean Estabrooks @ 2006-04-28 13:15 UTC (permalink / raw)
  To: git; +Cc: Sean Estabrooks
In-Reply-To: <11462301062278-git-send-email-seanlkml@sympatico.ca>

and fix up asciidoc "callouts"

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/git-branch.txt |   57 +++++++++++++++++++++++++++++-------------
 1 files changed, 39 insertions(+), 18 deletions(-)

5f70eb7e8c318528885cdd9b35bfa1d92cbf6782
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 71ecd85..050e1f7 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -3,22 +3,27 @@ git-branch(1)
 
 NAME
 ----
-git-branch - Create a new branch, or remove an old one
+git-branch - List, create, or delete branches.
 
 SYNOPSIS
 --------
 [verse]
-'git-branch' [[-f] <branchname> [<start-point>]]
-'git-branch' (-d | -D) <branchname>
+'git-branch' [-r]
+'git-branch' [-f] <branchname> [<start-point>]
+'git-branch' (-d | -D) <branchname>...
 
 DESCRIPTION
 -----------
-If no argument is provided, show available branches and mark current
-branch with star. Otherwise, create a new branch of name <branchname>.
-If a starting point is also specified, that will be where the branch is
-created, otherwise it will be created at the current HEAD.
+With no arguments given (or just `-r`) a list of available branches
+will be shown, the current branch will be highlighted with an asterisk.
 
-With a `-d` or `-D` option, `<branchname>` will be deleted.
+In its second form, a new branch named <branchname> will be created.
+It will start out with a head equal to the one given as <start-point>.
+If no <start-point> is given, the branch will be created with a head
+equal to that of the currently checked out branch.
+
+With a `-d` or `-D` option, `<branchname>` will be deleted.  You may
+specify more than one branch for deletion.
 
 
 OPTIONS
@@ -30,40 +35,56 @@ OPTIONS
 	Delete a branch irrespective of its index status.
 
 -f::
-	Force a reset of <branchname> to <start-point> (or current head).
+	Force the creation of a new branch even if it means deleting
+	a branch that already exists with the same name.
+
+-r::
+	List only the "remote" branches.
 
 <branchname>::
 	The name of the branch to create or delete.
 
 <start-point>::
-	Where to create the branch; defaults to HEAD. This
-	option has no meaning with -d and -D.
+	The new branch will be created with a HEAD equal to this.  It may
+	be given as a branch name, a commit-id, or a tag.  If this option 
+	is omitted, the current branch is assumed.
+
 
 
 Examples
-~~~~~~~~
+--------
 
 Start development off of a known tag::
 +
 ------------
 $ git clone git://git.kernel.org/pub/scm/.../linux-2.6 my2.6
 $ cd my2.6
-$ git branch my2.6.14 v2.6.14 <1>
+$ git branch my2.6.14 v2.6.14   <1>
 $ git checkout my2.6.14
-
-<1> These two steps are the same as "checkout -b my2.6.14 v2.6.14".
 ------------
++
+<1> This step and the next one could be combined into a single step with 
+"checkout -b my2.6.14 v2.6.14".
 
 Delete unneeded branch::
 +
 ------------
 $ git clone git://git.kernel.org/.../git.git my.git
 $ cd my.git
-$ git branch -D todo <1>
-
+$ git branch -D todo    <1>
+------------
++
 <1> delete todo branch even if the "master" branch does not have all
 commits from todo branch.
-------------
+
+
+Notes
+-----
+
+If you are creating a branch that you want to immediately checkout, it's 
+easier to use the git checkout command with its `-b` option to create
+a branch and check it out with a single command.
+
 
 Author
 ------
-- 
1.3.1.gc672

^ permalink raw reply related

* Fix asciidoc callouts in git man pages
From: Sean Estabrooks @ 2006-04-28 13:13 UTC (permalink / raw)
  To: git

Started out just wanting to update the git-branch man page
to include the "-r" option but noticed that the asciidoc 
callouts weren't being rendered in its man page.  Then 
noticed the same was true for all the man pages where
they are used. 

It turns out we've not been following the guidelines 
properly on how to use them.  The fact that they show up
in a useful way in the html docs is really an accident.
Even there they're not showing up as intended.

Unfortunately, even after all the docs are fixed up to use
the proper format, they still don't render properly in the
man format.   Seems this is a missing feature in the "xmlto"
command.

The final patch in this series adds an xsl fragment which
is passed to xmlto so that the callouts appear properly in 
the man pages.

Sean

 Documentation/Makefile             |    2 +
 Documentation/callouts.xsl         |   16 ++++++++
 Documentation/everyday.txt         |   45 +++++++++++------------
 Documentation/git-branch.txt       |   57 ++++++++++++++++++++---------
 Documentation/git-checkout.txt     |   18 +++++----
 Documentation/git-diff.txt         |   38 ++++++++++---------
 Documentation/git-init-db.txt      |    8 ++--
 Documentation/git-log.txt          |    7 ++--
 Documentation/git-reset.txt        |   72 ++++++++++++++++++------------------
 Documentation/git-update-index.txt |   31 ++++++++--------
 10 files changed, 163 insertions(+), 131 deletions(-)

^ permalink raw reply

* [PATCH] annotate: display usage information if no filename was given
From: Matthias Kestenholz @ 2006-04-28  8:41 UTC (permalink / raw)
  To: junkio; +Cc: git


Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch>

---

 git-annotate.perl |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

ec31877f02673ed1db9d1485ac0065d51cbb7039
diff --git a/git-annotate.perl b/git-annotate.perl
index 9df72a1..bf920a5 100755
--- a/git-annotate.perl
+++ b/git-annotate.perl
@@ -10,9 +10,10 @@ use warnings;
 use strict;
 use Getopt::Long;
 use POSIX qw(strftime gmtime);
+use File::Basename qw(basename dirname);
 
 sub usage() {
-	print STDERR 'Usage: ${\basename $0} [-s] [-S revs-file] file [ revision ]
+	print STDERR "Usage: ${\basename $0} [-s] [-S revs-file] file [ revision ]
 	-l, --long
 			Show long rev (Defaults off)
 	-t, --time
@@ -23,7 +24,7 @@ sub usage() {
 			Use revs from revs-file instead of calling git-rev-list
 	-h, --help
 			This message.
-';
+";
 
 	exit(1);
 }
@@ -35,7 +36,7 @@ my $rc = GetOptions(	"long|l" => \$longr
 			"help|h" => \$help,
 			"rename|r" => \$rename,
 			"rev-file|S=s" => \$rev_file);
-if (!$rc or $help) {
+if (!$rc or $help or !@ARGV) {
 	usage();
 }
 
-- 
1.3.1.gc4586

^ permalink raw reply related

* [PATCH] annotate: fix warning about uninitialized scalar
From: Matthias Kestenholz @ 2006-04-28  8:42 UTC (permalink / raw)
  To: junkio; +Cc: git

Use of uninitialized value in scalar chomp at
./git-annotate.perl line 212, <$kid> chunk 4.

Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch>

---

 git-annotate.perl |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

5322aefa820eef2d635d5dbf621269aefce52135
diff --git a/git-annotate.perl b/git-annotate.perl
index bf920a5..5f8a766 100755
--- a/git-annotate.perl
+++ b/git-annotate.perl
@@ -209,6 +209,9 @@ sub find_parent_renames {
 	while (my $change = <$patch>) {
 		chomp $change;
 		my $filename = <$patch>;
+		if(!$filename) {
+			next;
+		}
 		chomp $filename;
 
 		if ($change =~ m/^[AMD]$/ ) {
-- 
1.3.1.gc4586

^ permalink raw reply related

* Re: fatal: git-write-tree: not able to write tree
From: Junio C Hamano @ 2006-04-28  9:01 UTC (permalink / raw)
  To: Brown, Len; +Cc: git
In-Reply-To: <CFF307C98FEABE47A452B27C06B85BB64A432C@hdsmsx411.amr.corp.intel.com>

"Brown, Len" <len.brown@intel.com> writes about the command "git
am -3 --resolved", after hand merging _but_ without update-index
to actually mark the paths that have been resolved, results in
"write-tree" failure.

> I'm okay with git being conservative and not doing the update-index
> for me.  Perhaps the thing to do here is to make the failure message
> more useful?
>
> "fatal: git-write-tree: not able to write tree"
>
> everything after "fatal" here is effectively a string
> of random characters to the hapless user.

That's very true.  Perhaps something like this?

-- >8 --
git-am --resolved: more usable error message.

After doing the hard work of hand resolving the conflicts in the
working tree, if the user forgets to run update-index to mark
the paths that have been resolved, the command gave an
unfriendly "fatal: git-write-tree: not able to write tree" error
message.  Catch the situation early and give more meaningful
message and suggestion.

Noticed and suggested by Len Brown.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/git-am.sh b/git-am.sh
index eab4aa8..872145b 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -376,6 +376,13 @@ do
 			echo "No changes - did you forget update-index?"
 			stop_here $this
 		fi
+		unmerged=$(git-ls-files -u)
+		if test -n "$unmerged"
+		then
+			echo "You still have unmerged paths in your index"
+			echo "did you forget update-index?"
+			stop_here $this
+		fi
 		apply_status=0
 		;;
 	esac

^ permalink raw reply related

* RE: fatal: git-write-tree: not able to write tree
From: Brown, Len @ 2006-04-28  8:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

>> git am --3way --interactive --signoff --utf8 --resolved

>Please say "--resolved" after you have actually resolved them,
>eh, meaning, (1) edit the working tree file into a desired
>shape, and (2) git-update-index drivers/acpi/thermal.c.

Thanks Junio, once again, for your help, we're up and running!

I'm okay with git being conservative and not doing the update-index
for me.  Perhaps the thing to do here is to make the failure message
more useful?

"fatal: git-write-tree: not able to write tree"

everything after "fatal" here is effectively a string
of random characters to the hapless user.

thanks,
-Len

^ permalink raw reply

* Re: fatal: git-write-tree: not able to write tree
From: Junio C Hamano @ 2006-04-28  8:32 UTC (permalink / raw)
  To: Len Brown; +Cc: git
In-Reply-To: <200604280430.33100.len.brown@intel.com>

Len Brown <len.brown@intel.com> writes:

> I'm trying to  use git-am to apply a patch series in a mailbox.
>
> The first patch has a conflict, which I edit to fix, and and then invoke
> git am --3way --interactive --signoff --utf8 --resolved
>
> but it bails out with this:
>
> drivers/acpi/thermal.c: unmerged (4829f067a3e7acfbeed8b230caac00b1ed4b8554)
> drivers/acpi/thermal.c: unmerged (528d198c28512af1627cce481575f37a599c0fe8)
> drivers/acpi/thermal.c: unmerged (db3bef1a3e51801128e7553f3e546c8272cc9ee1)
> fatal: git-write-tree: not able to write tree
>
> I've tried various incantations of git reset on the theory that there is some 
> old state hanging around someplace, but have not been able to check in this 
> file.
>
> clues?

Please say "--resolved" after you have actually resolved them,
eh, meaning, (1) edit the working tree file into a desired
shape, and (2) git-update-index drivers/acpi/thermal.c.

I've considered making --resolved to do update-index for all
paths that are unmerged in the index, but that risks going
forward by mistake when you still have other paths to resolve,
so...

^ permalink raw reply

* fatal: git-write-tree: not able to write tree
From: Len Brown @ 2006-04-28  8:30 UTC (permalink / raw)
  To: git

I'm trying to  use git-am to apply a patch series in a mailbox.

The first patch has a conflict, which I edit to fix, and and then invoke
git am --3way --interactive --signoff --utf8 --resolved

but it bails out with this:

drivers/acpi/thermal.c: unmerged (4829f067a3e7acfbeed8b230caac00b1ed4b8554)
drivers/acpi/thermal.c: unmerged (528d198c28512af1627cce481575f37a599c0fe8)
drivers/acpi/thermal.c: unmerged (db3bef1a3e51801128e7553f3e546c8272cc9ee1)
fatal: git-write-tree: not able to write tree

I've tried various incantations of git reset on the theory that there is some 
old state hanging around someplace, but have not been able to check in this 
file.

clues?

thanks,
-Len

^ permalink raw reply

* Re: [PATCH] Add a test case for rerere
From: Uwe Zeisberger @ 2006-04-28  8:02 UTC (permalink / raw)
  To: git
In-Reply-To: <20060428075604.GA30714@digi.com>

Hello,

Uwe Zeisberger wrote:
> +echo "added in branch" >> file-common &&
> +git add file-branch file-common &&
> +git commit -m "branch1" -i file-base file-branch file-common &&
> +git branch branch1'
> +
> ...
> + 
> +test_expect_failure 'pull branch1' \
> +'git pull . branch1'

When typing the test I first tried to pull branch^, but this failed with
"no such remote ref refs/heads/branch^".  Is it intended that one can
only pull branches and not any rev?

Best regards
Uwe

PS: I added a double blank line in the file.  Sorry for that...

-- 
Uwe Zeisberger

http://www.google.com/search?q=Planck%27s+constant%3D

^ permalink raw reply

* [PATCH] Add a test case for rerere
From: Uwe Zeisberger @ 2006-04-28  7:56 UTC (permalink / raw)
  To: git

Currently this test fails because rerere is not able to record
resolves for a file that don't exist in the merge base but in
both branches to merge.

Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>

---

 t/t8003-rerere.sh |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 66 insertions(+), 0 deletions(-)
 create mode 100644 t/t8003-rerere.sh

It's the last command that fails because rerere didn't record the
conflict between branch1:file-common and master:file-common.

Please feel free to change the filename as I don't know/see the naming
scheme of the tests.

Best regards
Uwe

ff012a80cafa3fe905de72d0db8b616ff76d0038
diff --git a/t/t8003-rerere.sh b/t/t8003-rerere.sh
new file mode 100644
index 0000000..1bb66ff
--- /dev/null
+++ b/t/t8003-rerere.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='git-rerere'
+. ./test-lib.sh
+
+
+test_expect_success 'prepare repository' \
+'mkdir .git/rr-cache &&
+echo "content" > file-base &&
+git add file-base &&
+git commit -m "Initial commit" &&
+git branch branch &&
+echo "added after branch" >> file-base &&
+echo "added after branch" >> file-common &&
+git add file-common &&
+git commit -m "master1" -i file-base file-common &&
+git checkout branch &&
+echo "added in branch" >> file-base &&
+echo "only in branch" > file-branch &&
+echo "added in branch" >> file-common &&
+git add file-branch file-common &&
+git commit -m "branch1" -i file-base file-branch file-common &&
+git branch branch1'
+
+test_expect_failure 'pull master' \
+'git pull . master'
+
+cat >> file-base-expect << EOF
+content
+<<<<<<< HEAD/file-base
+added in branch
+=======
+added after branch
+>>>>>>> `git rev-parse master`/file-base
+EOF
+
+test_expect_success 'merge result' \
+'cmp file-base file-base-expect &&
+git cat-file blob HEAD:file-common | cmp file-common~HEAD - &&
+git cat-file blob master:file-common | cmp file-common~`git rev-parse master` - &&
+git cat-file blob HEAD:file-branch | cmp file-branch -'
+
+test_expect_success 'record and resolve confilcts' \
+'git rerere &&
+echo "content
+added in branch
+added after branch" > file-base &&
+echo "added in branch
+added after branch" > file-common &&
+git rerere &&
+git-ls-files -o | xargs rm &&
+git commit -m "resolved conflicts" -i file-base file-common file-branch &&
+git-checkout master
+'
+ 
+test_expect_failure 'pull branch1' \
+'git pull . branch1'
+
+test_expect_success 'reuse recorded resolve' \
+'git rerere &&
+git cat-file blob branch:file-branch | cmp file-branch - &&
+git cat-file blob branch:file-base | cmp file-base - &&
+git cat-file blob branch:file-common | cmp file-common -'
+
+test_done
+
-- 
1.3.1.gac92


-- 
Uwe Zeisberger
FS Forth-Systeme GmbH, A Digi International Company
Kueferstrasse 8, D-79206 Breisach, Germany
Phone: +49 (7667) 908 0 Fax: +49 (7667) 908 200
Web: www.fsforth.de, www.digi.com

^ permalink raw reply related

* Re: new gitk feature
From: Linus Torvalds @ 2006-04-28  5:11 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git
In-Reply-To: <17489.22838.502099.575465@cargo.ozlabs.ibm.com>



On Fri, 28 Apr 2006, Paul Mackerras wrote:
> Linus Torvalds writes:
> > Any possibility of something light that? I'd _love_ to be able to see the 
> > whole tree, but with things that touch certain files or things that are 
> > newer highlighted.
> 
> That should be quite doable.  How about I show the commits that are in
> the highlight view in bold?  That won't conflict with the existing
> yellow background for commits that match the find criteria.

Bold sounds good to me.

> > (Btw, the "revision information" is also cool things like "--unpacked". I 
> > actually use "gitk --unpacked" every once in a while, just because it's 
> > such a cool way to say "show me everything I've added since I packed the 
> > repo last).
> 
> OK, I didn't know about --unpacked. :)  I plan to add stuff to the
> view definition window to allow you to select commits to
> include/exclude by reachability from given commits (by head/tag/ID)
> and when I do I can add a way to say --unpacked too.

It's more of a gimmick, but I find myself using it occasionally just to 
decide whether it's time to repack. It falls out automatically - not 
because I thought I'd ever want it, but because the --unpacked semantics 
for git-rev-list are what incremental packing needed.

(Of course, sane people probably just do "git count-objects" to decide to 
repack).

		Linus

^ permalink raw reply

* Re: PATCH: New diff-delta.c implementation (updated)
From: Junio C Hamano @ 2006-04-28  4:28 UTC (permalink / raw)
  To: Geert Bosch; +Cc: git
In-Reply-To: <7v1wvigzka.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> In the kernel repository (checked out is near the tip of the
> source tree), the largest files are fs/nls/nls_cp949.c (900kB
> korean character encoding), drivers/usb/misc/emi62_fw_s.h
> (800kB, Emagic firmware blob), arch/m68k/ifpsp060/src/fpsp.S
> (750kB, floating point emulation?), and nowhere near your
> algorithm really should shine.
>
> We would probably want some internal logic that says "if we see
> that blobs larger than X MB is involved in the packing, we
> should use this version of diff-delta, otherwise the other one."

Third impression, synthetic workload.  A sequence of single file
project, the file is tarball of git.git tree (that is,
"git-tar-tree vX.Y.Z >tarball"), 120 objects or so (1 commit per
rev, 1 tree to hold 1 blob).  The (uncompressed) size of the 40
blobs in the pack are between 2.06MB - 2.86MB (average 2.30MB).

(Nico)
Total 123, written 123 (delta 38), reused 0 (delta 0)
67.26user 1.03system 1:08.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+136066minor)pagefaults 0swaps

1822079 pack-nico-26989d516c62197592d0d52db24dfc6a58b633eb.pack

(Geert)
Total 123, written 123 (delta 38), reused 0 (delta 0)
67.23user 1.35system 1:09.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+164124minor)pagefaults 0swaps

1683139 pack-geert-26989d516c62197592d0d52db24dfc6a58b633eb.pack

That's an 8% improvement in the same time, which is quite
impressive.  But I am _very_ unhappy about this particular
synthetic workload.  I wonder if there are projects with many
large blobs that is updated often, so that we can use it as a
yardstick.  Maybe Wine people have icons, background images and
sounds perhaps?  But I suspect you would not update them that
often.

Thinking about it, it does not make much sense, at least to me,
to store large tarballs or binary blobs or whatnot in a SCM (we
are _not_ in the archival business) and keeping track of their
changes.  The tarball is out of question -- it is not a source
(in GPL sense of the word -- it is not a preferred way to make
modification; you modify constituent files and bundle up the
result as a new tarball).  Graphics images, perhaps.

^ permalink raw reply

* Re: PATCH: New diff-delta.c implementation (updated)
From: Junio C Hamano @ 2006-04-28  3:16 UTC (permalink / raw)
  To: Geert Bosch; +Cc: git
In-Reply-To: <Pine.GSO.4.60.0604272132170.9650@nile.gnat.com>

Geert Bosch <bosch@gnat.com> writes:

> Even though the previous version did really well on large files
> with many changes, performance was lacking for the many small
> files with very few changes that are so common for a VCS.
>...
> The result has been only a slight increase in delta size for
> very large test cases (but with better performance), and
> both smaller deltas and faster execution speed for repacking
> git.git. I had trouble cloning the Linux kernel repository,
> but am now reasonably confident this will outperform the
> existing algorithm pretty consistently.

Interesting.

Initial impression, the same test as before (a full packing of
the git.git repository that does not have _any_ pack -- all 18k
objects are loose).

First, the incumbent, with the "reusing delta-index" patch applied.

Total 17724, written 17724 (delta 12002), reused 0 (delta 0)
34.02user 6.48system 0:42.87elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+434478minor)pagefaults 0swaps

 6188418 pack-nico-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

Then diff-delta.c replaced with your version.

Total 17724, written 17724 (delta 12012), reused 0 (delta 0)
44.87user 6.54system 0:54.01elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+441124minor)pagefaults 0swaps

 6099183 pack-geert-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

Second impression, in a recent kernel tree which is mostly
packed.  Packing 41k objects (v2.6.16..v2.6.17-rc3), with
"git-pack-objects --no-reuse-delta".

(Nico)
Total 41591, written 41591 (delta 29285), reused 8563 (delta 0)
169.08user 12.60system 3:27.68elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+1099928minor)pagefaults 0swaps

37363966 pack-nico-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack

(Geert)
Total 41591, written 41591 (delta 29347), reused 8427 (delta 0)
243.71user 12.32system 4:28.11elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1077843minor)pagefaults 0swaps

37165890 pack-geert-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack

Of course, the absolute numbers do not matter, but for the
record these are on my Duron 750, 760MB or so RAM and with
relatively slow disks.

In the kernel repository (checked out is near the tip of the
source tree), the largest files are fs/nls/nls_cp949.c (900kB
korean character encoding), drivers/usb/misc/emi62_fw_s.h
(800kB, Emagic firmware blob), arch/m68k/ifpsp060/src/fpsp.S
(750kB, floating point emulation?), and nowhere near your
algorithm really should shine.

We would probably want some internal logic that says "if we see
that blobs larger than X MB is involved in the packing, we
should use this version of diff-delta, otherwise the other one."

^ permalink raw reply

* Re: PATCH: New diff-delta.c implementation (updated)
From: Geert Bosch @ 2006-04-28  2:07 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <Pine.GSO.4.60.0604272132170.9650@nile.gnat.com>

On Apr 27, 2006, at 21:59, Geert Bosch wrote:

> The result has been only a slight increase in delta size for
> very large test cases (but with better performance),

Just to clarify: this is compared to my initial implementation.
For very large test cases, both delta size and execution time
are much less than the current implementation.

   -Geert

^ permalink raw reply

* PATCH: New diff-delta.c implementation (updated)
From: Geert Bosch @ 2006-04-28  1:59 UTC (permalink / raw)
  To: git

Even though the previous version did really well on large files
with many changes, performance was lacking for the many small
files with very few changes that are so common for a VCS.

For example, it turns out that, for packing the 17005 objects in
my git.git repository, diff_delta processes 240 MB worth of target
data in about 12s on my powerbook. (There's even a little more
source data, and the 12s includes compression/decompression time.)

So the fancy fingerprint calculations really take too much time.
Fortunately, it turns out that of the 240M, 120M matches directly
at the start or the end of the source data. After this trivial
matching, most remaining matches are quite small. The overhead
of setting up buffers, computing longest runs of the same character
and computing 64-bit fingerprints becomes very noticeable and
can't be regained later.

As a result I implemented special indexing and matching routines
for "small" files. Here a fixed hash table size and index step
are used. The fingerprint window has been reduced to be equal to
the step size, which essentially gets rid of computation for
characters leaving the window. Finally, the fingerprint size
has been reduced to 32 bits with polynome of 31st degree.

The result has been only a slight increase in delta size for
very large test cases (but with better performance), and
both smaller deltas and faster execution speed for repacking
git.git. I had trouble cloning the Linux kernel repository,
but am now reasonably confident this will outperform the
existing algorithm pretty consistently.

On PPC, the trivial matching in head and tail, and for long
matching runs now shows up high in the profile. On x86,
byte operations are very fast, so I think things should
be at least equally good there.

Please play around with this and let me know of any results.

   -Geert

Signed-off-by: Geert Bosch <bosch@gnat.com>

#include <unistd.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
#include <sys/types.h>

#undef assert
#define assert(x) do { } while (0)

/*
  * MIN_HTAB_SIZE is fixed amount to be added to the size of the hash table
  * used for indexing and must be a power of two. This allows for small files
  * to have a sparse hash table, since in that case it's cheap.
  * Hash table sizes are rounded up to a power of two to avoid integer division.
  */
#define MIN_HTAB_SIZE 8192
#define MAX_HTAB_SIZE (1024*1024*1024)
#define SMALL_HTAB_SIZE 8192
#define SMALL_INDEX_STEP 16

/*
  * Diffing files of gigabyte range is impractical with the current
  * algorithm, so we're assuming 32-bit sizes everywhere.
  * Size leaves some room for expansion when diffing random files.
  */
#define MAX_SIZE (0x7eff0000)

/* For small files, indices are represented in 16 bits.
  * Since indices are always a multiple of the index_step, they
  * can be shifted right a few bits to accommodate files larger than 64K
  */
#define SMALL_SHIFT 4
#define MAX_SMALL_SIZE (0xff00<<SMALL_SHIFT)

/* Initial size of copies table, dynamically extended as needed. */
#define MAX_COPIES 512

/*
  * Matching is done using a sliding window for which a Rabin
  * polynomial is computed. The advantage of such polynomials is
  * that they can efficiently be updated at every position.
  * The tables needed for this are precomputed, as it is desirable
  * to use the same polynomial all the time for repeatable results.
  * The 16 byte window is convenient for indexing with index_step 16.
  * In that special case, the U table is not needed during indexing.
  * The 32-bit hash helps on register-starved 32-bit architectures.
  */

#define RABIN_POLY 0xf3a03ce5
#define RABIN_DEGREE 31
#define RABIN_SHIFT 23
#define RABIN_WINDOW_SIZE 16

unsigned T[256] =
{ 0x00000000, 0xf3a03ce5, 0x14e0452f,
   0xe74079ca, 0x29c08a5e, 0xda60b6bb, 0x3d20cf71, 0xce80f394, 0x538114bc,
   0xa0212859, 0x47615193, 0xb4c16d76, 0x7a419ee2, 0x89e1a207, 0x6ea1dbcd,
   0x9d01e728, 0x54a2159d, 0xa7022978, 0x404250b2, 0xb3e26c57, 0x7d629fc3,
   0x8ec2a326, 0x6982daec, 0x9a22e609, 0x07230121, 0xf4833dc4, 0x13c3440e,
   0xe06378eb, 0x2ee38b7f, 0xdd43b79a, 0x3a03ce50, 0xc9a3f2b5, 0x5ae417df,
   0xa9442b3a, 0x4e0452f0, 0xbda46e15, 0x73249d81, 0x8084a164, 0x67c4d8ae,
   0x9464e44b, 0x09650363, 0xfac53f86, 0x1d85464c, 0xee257aa9, 0x20a5893d,
   0xd305b5d8, 0x3445cc12, 0xc7e5f0f7, 0x0e460242, 0xfde63ea7, 0x1aa6476d,
   0xe9067b88, 0x2786881c, 0xd426b4f9, 0x3366cd33, 0xc0c6f1d6, 0x5dc716fe,
   0xae672a1b, 0x492753d1, 0xba876f34, 0x74079ca0, 0x87a7a045, 0x60e7d98f,
   0x9347e56a, 0x4668135b, 0xb5c82fbe, 0x52885674, 0xa1286a91, 0x6fa89905,
   0x9c08a5e0, 0x7b48dc2a, 0x88e8e0cf, 0x15e907e7, 0xe6493b02, 0x010942c8,
   0xf2a97e2d, 0x3c298db9, 0xcf89b15c, 0x28c9c896, 0xdb69f473, 0x12ca06c6,
   0xe16a3a23, 0x062a43e9, 0xf58a7f0c, 0x3b0a8c98, 0xc8aab07d, 0x2feac9b7,
   0xdc4af552, 0x414b127a, 0xb2eb2e9f, 0x55ab5755, 0xa60b6bb0, 0x688b9824,
   0x9b2ba4c1, 0x7c6bdd0b, 0x8fcbe1ee, 0x1c8c0484, 0xef2c3861, 0x086c41ab,
   0xfbcc7d4e, 0x354c8eda, 0xc6ecb23f, 0x21accbf5, 0xd20cf710, 0x4f0d1038,
   0xbcad2cdd, 0x5bed5517, 0xa84d69f2, 0x66cd9a66, 0x956da683, 0x722ddf49,
   0x818de3ac, 0x482e1119, 0xbb8e2dfc, 0x5cce5436, 0xaf6e68d3, 0x61ee9b47,
   0x924ea7a2, 0x750ede68, 0x86aee28d, 0x1baf05a5, 0xe80f3940, 0x0f4f408a,
   0xfcef7c6f, 0x326f8ffb, 0xc1cfb31e, 0x268fcad4, 0xd52ff631, 0x7f701a53,
   0x8cd026b6, 0x6b905f7c, 0x98306399, 0x56b0900d, 0xa510ace8, 0x4250d522,
   0xb1f0e9c7, 0x2cf10eef, 0xdf51320a, 0x38114bc0, 0xcbb17725, 0x053184b1,
   0xf691b854, 0x11d1c19e, 0xe271fd7b, 0x2bd20fce, 0xd872332b, 0x3f324ae1,
   0xcc927604, 0x02128590, 0xf1b2b975, 0x16f2c0bf, 0xe552fc5a, 0x78531b72,
   0x8bf32797, 0x6cb35e5d, 0x9f1362b8, 0x5193912c, 0xa233adc9, 0x4573d403,
   0xb6d3e8e6, 0x25940d8c, 0xd6343169, 0x317448a3, 0xc2d47446, 0x0c5487d2,
   0xfff4bb37, 0x18b4c2fd, 0xeb14fe18, 0x76151930, 0x85b525d5, 0x62f55c1f,
   0x915560fa, 0x5fd5936e, 0xac75af8b, 0x4b35d641, 0xb895eaa4, 0x71361811,
   0x829624f4, 0x65d65d3e, 0x967661db, 0x58f6924f, 0xab56aeaa, 0x4c16d760,
   0xbfb6eb85, 0x22b70cad, 0xd1173048, 0x36574982, 0xc5f77567, 0x0b7786f3,
   0xf8d7ba16, 0x1f97c3dc, 0xec37ff39, 0x39180908, 0xcab835ed, 0x2df84c27,
   0xde5870c2, 0x10d88356, 0xe378bfb3, 0x0438c679, 0xf798fa9c, 0x6a991db4,
   0x99392151, 0x7e79589b, 0x8dd9647e, 0x435997ea, 0xb0f9ab0f, 0x57b9d2c5,
   0xa419ee20, 0x6dba1c95, 0x9e1a2070, 0x795a59ba, 0x8afa655f, 0x447a96cb,
   0xb7daaa2e, 0x509ad3e4, 0xa33aef01, 0x3e3b0829, 0xcd9b34cc, 0x2adb4d06,
   0xd97b71e3, 0x17fb8277, 0xe45bbe92, 0x031bc758, 0xf0bbfbbd, 0x63fc1ed7,
   0x905c2232, 0x771c5bf8, 0x84bc671d, 0x4a3c9489, 0xb99ca86c, 0x5edcd1a6,
   0xad7ced43, 0x307d0a6b, 0xc3dd368e, 0x249d4f44, 0xd73d73a1, 0x19bd8035,
   0xea1dbcd0, 0x0d5dc51a, 0xfefdf9ff, 0x375e0b4a, 0xc4fe37af, 0x23be4e65,
   0xd01e7280, 0x1e9e8114, 0xed3ebdf1, 0x0a7ec43b, 0xf9def8de, 0x64df1ff6,
   0x977f2313, 0x703f5ad9, 0x839f663c, 0x4d1f95a8, 0xbebfa94d, 0x59ffd087,
   0xaa5fec62
};

unsigned U[256] =
{ 0x00000000, 0x302a7c89, 0x6054f912,
   0x507e859b, 0x3309cec1, 0x0323b248, 0x535d37d3, 0x63774b5a, 0x66139d82,
   0x5639e10b, 0x06476490, 0x366d1819, 0x551a5343, 0x65302fca, 0x354eaa51,
   0x0564d6d8, 0x3f8707e1, 0x0fad7b68, 0x5fd3fef3, 0x6ff9827a, 0x0c8ec920,
   0x3ca4b5a9, 0x6cda3032, 0x5cf04cbb, 0x59949a63, 0x69bee6ea, 0x39c06371,
   0x09ea1ff8, 0x6a9d54a2, 0x5ab7282b, 0x0ac9adb0, 0x3ae3d139, 0x7f0e0fc2,
   0x4f24734b, 0x1f5af6d0, 0x2f708a59, 0x4c07c103, 0x7c2dbd8a, 0x2c533811,
   0x1c794498, 0x191d9240, 0x2937eec9, 0x79496b52, 0x496317db, 0x2a145c81,
   0x1a3e2008, 0x4a40a593, 0x7a6ad91a, 0x40890823, 0x70a374aa, 0x20ddf131,
   0x10f78db8, 0x7380c6e2, 0x43aaba6b, 0x13d43ff0, 0x23fe4379, 0x269a95a1,
   0x16b0e928, 0x46ce6cb3, 0x76e4103a, 0x15935b60, 0x25b927e9, 0x75c7a272,
   0x45eddefb, 0x0dbc2361, 0x3d965fe8, 0x6de8da73, 0x5dc2a6fa, 0x3eb5eda0,
   0x0e9f9129, 0x5ee114b2, 0x6ecb683b, 0x6bafbee3, 0x5b85c26a, 0x0bfb47f1,
   0x3bd13b78, 0x58a67022, 0x688c0cab, 0x38f28930, 0x08d8f5b9, 0x323b2480,
   0x02115809, 0x526fdd92, 0x6245a11b, 0x0132ea41, 0x311896c8, 0x61661353,
   0x514c6fda, 0x5428b902, 0x6402c58b, 0x347c4010, 0x04563c99, 0x672177c3,
   0x570b0b4a, 0x07758ed1, 0x375ff258, 0x72b22ca3, 0x4298502a, 0x12e6d5b1,
   0x22cca938, 0x41bbe262, 0x71919eeb, 0x21ef1b70, 0x11c567f9, 0x14a1b121,
   0x248bcda8, 0x74f54833, 0x44df34ba, 0x27a87fe0, 0x17820369, 0x47fc86f2,
   0x77d6fa7b, 0x4d352b42, 0x7d1f57cb, 0x2d61d250, 0x1d4baed9, 0x7e3ce583,
   0x4e16990a, 0x1e681c91, 0x2e426018, 0x2b26b6c0, 0x1b0cca49, 0x4b724fd2,
   0x7b58335b, 0x182f7801, 0x28050488, 0x787b8113, 0x4851fd9a, 0x1b7846c2,
   0x2b523a4b, 0x7b2cbfd0, 0x4b06c359, 0x28718803, 0x185bf48a, 0x48257111,
   0x780f0d98, 0x7d6bdb40, 0x4d41a7c9, 0x1d3f2252, 0x2d155edb, 0x4e621581,
   0x7e486908, 0x2e36ec93, 0x1e1c901a, 0x24ff4123, 0x14d53daa, 0x44abb831,
   0x7481c4b8, 0x17f68fe2, 0x27dcf36b, 0x77a276f0, 0x47880a79, 0x42ecdca1,
   0x72c6a028, 0x22b825b3, 0x1292593a, 0x71e51260, 0x41cf6ee9, 0x11b1eb72,
   0x219b97fb, 0x64764900, 0x545c3589, 0x0422b012, 0x3408cc9b, 0x577f87c1,
   0x6755fb48, 0x372b7ed3, 0x0701025a, 0x0265d482, 0x324fa80b, 0x62312d90,
   0x521b5119, 0x316c1a43, 0x014666ca, 0x5138e351, 0x61129fd8, 0x5bf14ee1,
   0x6bdb3268, 0x3ba5b7f3, 0x0b8fcb7a, 0x68f88020, 0x58d2fca9, 0x08ac7932,
   0x388605bb, 0x3de2d363, 0x0dc8afea, 0x5db62a71, 0x6d9c56f8, 0x0eeb1da2,
   0x3ec1612b, 0x6ebfe4b0, 0x5e959839, 0x16c465a3, 0x26ee192a, 0x76909cb1,
   0x46bae038, 0x25cdab62, 0x15e7d7eb, 0x45995270, 0x75b32ef9, 0x70d7f821,
   0x40fd84a8, 0x10830133, 0x20a97dba, 0x43de36e0, 0x73f44a69, 0x238acff2,
   0x13a0b37b, 0x29436242, 0x19691ecb, 0x49179b50, 0x793de7d9, 0x1a4aac83,
   0x2a60d00a, 0x7a1e5591, 0x4a342918, 0x4f50ffc0, 0x7f7a8349, 0x2f0406d2,
   0x1f2e7a5b, 0x7c593101, 0x4c734d88, 0x1c0dc813, 0x2c27b49a, 0x69ca6a61,
   0x59e016e8, 0x099e9373, 0x39b4effa, 0x5ac3a4a0, 0x6ae9d829, 0x3a975db2,
   0x0abd213b, 0x0fd9f7e3, 0x3ff38b6a, 0x6f8d0ef1, 0x5fa77278, 0x3cd03922,
   0x0cfa45ab, 0x5c84c030, 0x6caebcb9, 0x564d6d80, 0x66671109, 0x36199492,
   0x0633e81b, 0x6544a341, 0x556edfc8, 0x05105a53, 0x353a26da, 0x305ef002,
   0x00748c8b, 0x500a0910, 0x60207599, 0x03573ec3, 0x337d424a, 0x6303c7d1,
   0x5329bb58
};


static unsigned char rabin_window[RABIN_WINDOW_SIZE];
static unsigned rabin_pos = 0;

#ifndef MIN
#define MIN(x,y) ((y)<(x) ? (y) : (x))
#endif
#ifndef MAX
#define MAX(x,y) ((y)>(x) ? (y) : (x))
#endif

/*
  * The copies array is the central data structure for diff generation.
  * Data statements are implicit, for ranges not covered by any copy command.
  *
  * The sum of tgt and length for each entry must be monotonically increasing,
  * and data ranges must be non-overlapping. This is accomplished by not
  * extending matches backwards during initial matching.
  *
  * Copies may have zero length, to make it quick to delete copies during
  * optimization. However, the last copy in the list must always be a
  * non-trivial copy.
  *
  * Before committing copies, an important optimization is performed: during
  * a backward pass through the copies array, each entry is extended backwards,
  * and redundant copies are eliminated.
  *
  * If each match were extended backwards on insertion, the same data may be
  * matched an arbitrary number of times, resulting in potentially quadratic
  * time behavior.
  */

typedef struct copyinfo {
 	unsigned src;
 	unsigned tgt;
 	unsigned length;
} CopyInfo;

static CopyInfo *copies;
static int copy_count = 0;
static unsigned max_copies = 0; /* Dynamically increased */

static unsigned *idx;
static unsigned idx_size;
static unsigned char *idx_data;
static unsigned idx_data_len;

typedef unsigned poly_t;

static void rabin_reset(void)
{
 	memset(rabin_window, 0, sizeof(rabin_window));
}

static poly_t rabin_slide (poly_t fp, unsigned char m)
{
 	unsigned char om;
 	if (++rabin_pos == RABIN_WINDOW_SIZE) rabin_pos = 0;
 	om = rabin_window[rabin_pos];
 	fp ^= U[om];
 	rabin_window[rabin_pos] = m;
 	fp = ((fp << 8) | m) ^ T[fp >> RABIN_SHIFT];
 	return fp;
}

static int add_copy (unsigned src, unsigned tgt, unsigned length)
{
 	if (copy_count == max_copies) {
 		max_copies *= 2;

 		if (!max_copies) {
 			max_copies = MAX_COPIES;
 			copies = malloc (max_copies * sizeof (CopyInfo));
 		} else
 			copies = realloc(copies,
 			   max_copies * sizeof (CopyInfo));
 		if (!copies)
 			return 0;
 	}

 	copies[copy_count].src = src;
 	copies[copy_count].tgt = tgt;
 	copies[copy_count].length = length;
 	return ++copy_count;
}

static unsigned maxofs[256];
static unsigned maxlen[256];
static unsigned maxfp[256];

static const unsigned small_idx_size = SMALL_HTAB_SIZE;
static short unsigned small_idx[SMALL_HTAB_SIZE];

static void small_init_idx (unsigned char * data, unsigned len,
                      	    unsigned head, unsigned tail)
{
 	const unsigned index_step = SMALL_INDEX_STEP;
 	unsigned j = head - head % index_step;
 	unsigned k;

 	if (len < index_step) return;

 	idx_data = data;
 	idx_data_len = len;
 	len -= MIN (len, tail + (index_step - 1));

 	memset (small_idx, 0, sizeof(small_idx));

 	while (j < len) {
 		poly_t fp = 0;
 		do
 			fp = ((fp << 8) | data[j++]) ^ T[fp >> RABIN_SHIFT];
 		while (j % index_step);
 		small_idx[fp % small_idx_size] = j >> SMALL_SHIFT;
 	}
}

static void init_idx (unsigned char *data, unsigned len, int level,
 		      unsigned head, unsigned tail)
{
 	unsigned index_step
 	  = RABIN_WINDOW_SIZE / sizeof(unsigned) * sizeof(unsigned);
 	unsigned j, k;
 	unsigned char ch = 0;
 	unsigned runlen = 0;
 	poly_t fp = 0;

 	/* Special case small files at low optimization levels */
 	if (level <= 1 && len < MAX_SMALL_SIZE
 	  && len - head - tail < (SMALL_HTAB_SIZE * SMALL_INDEX_STEP)) {
 		small_init_idx(data, len, head, tail);
 		return;
 	}

 	assert (len <= MAX_SIZE);
 	assert (head < len);
 	assert (level >= 0 && level <= 9);
 	memset(maxofs, 0, sizeof(maxofs));
 	memset(maxlen, 0, sizeof(maxlen));
 	memset(maxfp, 0, sizeof(maxfp));

 	/* Smaller step size for higher optimization levels.
 	   The index_step must be a multiple of the word size */
 	if (level >= 1)
 		index_step = MIN(index_step, 4 * sizeof (unsigned));
 	if (level >= 3)
 		index_step = MIN (index_step, 3 * sizeof (unsigned));
 	if (level >= 4)
 		index_step = MIN (index_step, 2 * sizeof (unsigned));
 	if (level >= 6)
 		index_step = MIN (index_step, 1 * sizeof (unsigned));
 	assert (index_step && !(index_step % sizeof (unsigned)));

 	/* Add fixed amount to hash table size, as small files will benefit
 	   a lot without using significantly more memory or time. */
 	idx_size = (level + 1) * ((len - head - tail) / index_step) / 2;
 	idx_size = MIN (idx_size + MIN_HTAB_SIZE, MAX_HTAB_SIZE - 1);

 	/* Round up to next power of two, but limit to MAX_HTAB_SIZE. */
 	{
 		unsigned s = MIN_HTAB_SIZE;
 		while (s < idx_size) s += s;
 		idx_size = s;
 	}

 	idx_data = data;
 	idx_data_len = len;
 	idx = calloc(idx_size, sizeof(unsigned));

 	/* It is tempting to first index higher addresses, so hashes of lower
 	   addresses will get preference in the hash table. However, for
 	   repetitive patterns with a period that is a divisor of the
 	   fingerprint window, this may mean the match is not anchored at
 	   the end. Furthermore, even when using a window length that is
 	   prime, the benefits are small and the irregularity of the first
 	   matches being more important is not worth it. */

 	rabin_reset();

 	ch = 0;
 	runlen = 0;

 	if (head < RABIN_WINDOW_SIZE + index_step)
 		head = 0;
 	else {
 		head -= head % index_step;
 		for (j = head - RABIN_WINDOW_SIZE + 1; j < head; j++)
 			fp = rabin_slide (fp, data[j]);
 	}

 	for (j = head; j + index_step < len - tail; j += index_step) {
 		unsigned char pch = 0;
 		unsigned hash;

 		for (k = 0; k < index_step; k++) {
 			pch = ch;
 			ch = data[j + k];
 			if (ch != pch)
 				runlen = 0;
 			runlen++;
 			fp = rabin_slide(fp, ch);
 		}

 		/* See if there is a word-aligned window-sized run of
 		   equal characters */
 		if (runlen >= RABIN_WINDOW_SIZE + sizeof(unsigned) - 1) {
 			/* Skip ahead to end of run */
 			while (j + k < len && data[j + k] == ch) {
 				k++;
 				runlen++;
 			}

 			/* Although matches are usually anchored at the end,
 			   in the case of extended runs of equal characters
 			   it is better to anchor after the first
 			   RABIN_WINDOW_SIZE bytes. This allows for quick
 			   skip ahead while matching such runs, avoiding
 			   unneeded fingerprint calculations.
 			   Also, when anchoring at the end, matches will be
 			   generated after every word, because the fingerprint
 			   stays constant. Even though all matches would get
 			   combined during match optimization, it wastes time
 			   and space. */
 			if (runlen > maxlen[pch] + 4) {
 				unsigned ofs;
 				/* ofs points RABIN_WINDOW_SIZE bytes after
 				   the start of the run, rounded up to the
 				   next word */
 				ofs = j + k - runlen + RABIN_WINDOW_SIZE
 				   + (sizeof (unsigned) - 1);
 				ofs -= ofs % sizeof(unsigned);
 				maxofs[pch] = ofs;
 				maxlen [pch] = runlen;
 				assert(maxfp[pch] == 0
 				  || maxfp[pch] == (unsigned)fp);
 				maxfp[pch] = (unsigned)fp;
 			}
 			/* Keep input aligned as if no special run
 			   processing had taken place */
 			j += k - (k % index_step) - index_step;
 			k = index_step;
 		}

 		/* Testing showed that avoiding collisions using secondary
 		   hashing, or hash chaining had little effect and is not
 		   worth the time. */
 		hash = ((unsigned)fp) & (idx_size - 1);
 		idx[hash] = j + k;
 	}

 	/* Lastly, index the longest runs of equal characters found before.
 	   This ensures we always match the longerst such runs available.  */
 	for (j = 0; j < 256; j++)
 		if (maxlen[j])
 			idx[maxfp[j] % idx_size] = maxofs[j];
}

/* Match data against the current index and record all possible copies */
static int small_find_copies(unsigned char *data, unsigned len, unsigned head)
{
 	unsigned j = head < RABIN_WINDOW_SIZE ? 0 : head - RABIN_WINDOW_SIZE;
 	poly_t fp = 0;

 	while (j < MAX (head, RABIN_WINDOW_SIZE) && j < len)
 		fp = ((fp << 8) | data[j++]) ^ T[fp >> RABIN_SHIFT];

 	while (j < len) {
 		unsigned ofs, src, tgt, runlen, maxrun;

 		fp ^= U[data[j - RABIN_WINDOW_SIZE]];
 		fp = ((fp << 8) | data[j++]) ^ T[fp >> RABIN_SHIFT];

 		ofs = small_idx[fp & (small_idx_size - 1)] << SMALL_SHIFT;

 		/* Invariant:
 		   data[0] .. data[j-1] has been processed
 		   fp is fingerprint of sliding window ending at j-1
 		   ofs is zero or points just past tentative match
 		   ofs is a multiple of index_step */

 		if (!ofs)
 			continue;

 		runlen = 0;
 		tgt = j - 4;
 		src = ofs - 4;
 		maxrun = MIN(idx_data_len - src, len - tgt);

 		/* Hot loop */
 		while (runlen < maxrun &&
 		       data[tgt + runlen] == idx_data[src + runlen])
 			runlen++;
 		if (runlen < 4)
 			continue;

 		if (!add_copy(src, tgt, runlen)) return 0;

 		/* For runs extending more than RABIN_WINDOW_SIZE bytes past j,
 		   skip ahead to prevent useless fingerprint computations. */
 		if (tgt + runlen > j + RABIN_WINDOW_SIZE)
 		{
 			fp = 0;
 			j = tgt + runlen - RABIN_WINDOW_SIZE;
 			while (j < tgt + runlen)
 				fp = ((fp << 8) | data[j++])
 				      ^ T[fp >> RABIN_SHIFT];
 		}

 		/* Quickly scan ahead without looking for matches
 		   until the end of this run */
 		while (j < tgt + runlen) {
 			fp ^= U[data[j - RABIN_WINDOW_SIZE]];
 			fp = ((fp << 8) | data[j++]) ^ T[fp >> RABIN_SHIFT];
 		}
 	}

 	return 1;
}

/* Match data against the current index and record all possible copies */
static int find_copies(unsigned char *data, unsigned len, unsigned head)
{
 	unsigned j = head < RABIN_WINDOW_SIZE ? 0 : head - RABIN_WINDOW_SIZE;
 	poly_t fp = 0;

 	assert (idx_data);

 	if (!idx) return small_find_copies (data, len, head);

 	rabin_reset();

 	while (j < head + RABIN_WINDOW_SIZE && j < len)
 		fp = rabin_slide(fp, data[j++]);

 	while (j < len) {
 		unsigned ofs, src, tgt, runlen, maxrun;

 		fp = rabin_slide(fp, data[j++]);
 		ofs = idx[fp & (idx_size - 1)];

 		/* Invariant:
 		   data[0] .. data[j-1] has been processed
 		   fp is fingerprint of sliding window ending at j-1
 		   ofs is zero or points just past tentative match
 		   ofs is a multiple of index_step */

 		if (!ofs)
 			continue;

 		runlen = 0;
 		tgt = j - 4;
 		src = ofs - 4;
 		maxrun = MIN(idx_data_len - src, len - tgt);

 		/* Hot loop */
 		while (runlen < maxrun &&
 		       data[tgt + runlen] == idx_data[src + runlen])
 			runlen++;
 		if (runlen < 4)
 			continue;

 		if (!add_copy(src, tgt, runlen)) return 0;

 		/* For runs extending more than RABIN_WINDOW_SIZE bytes past j,
 		   skip ahead to prevent useless fingerprint computations. */
 		if (tgt + runlen > j + RABIN_WINDOW_SIZE)
 			j = tgt + runlen - RABIN_WINDOW_SIZE;

 		/* Quickly scan ahead without looking for matches
 		   until the end of this run */
 		while (j < tgt + runlen)
 			fp = rabin_slide(fp, data[j++]);
 	}

 	return 1;
}

static unsigned header_length(unsigned srclen, unsigned tgtlen)
{
 	unsigned len = 0;
 	assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);

 	/* GIT headers start with the length of the source and target,
 	   with 7 bits per byte, least significant byte first, and
 	   the high bit indicating continuation. */
 	do { len++; srclen >>= 7; } while (srclen);
 	do { len++; tgtlen >>= 7; } while (tgtlen);

 	return len;
}

static unsigned char *
write_header(unsigned char *patch, unsigned srclen, unsigned tgtlen)
{
 	assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);

 	while (srclen >= 0x80) {
 		*patch++ = srclen | 0x80;
 		srclen >>= 7;
 	}
 	*patch++ = srclen;

 	while (tgtlen >= 0x80) {
 		*patch++ = tgtlen | 0x80;
 		tgtlen >>= 7;
 	}
 	*patch++ = tgtlen;

 	return patch;
}

static unsigned data_length(unsigned length)
{
 	/* Can only include 0x7f data bytes per command */
 	unsigned partial = length % 0x7f;
 	assert (length > 0 && length <= MAX_SIZE);
 	if (partial) partial++;
 	return partial + (length / 0x7f) * 0x80;
}

static unsigned char *
write_data(unsigned char *patch, unsigned char *data, unsigned size)
{
 	assert (size > 0 && size < MAX_SIZE);
 	/* The return value must be equal to patch + data_length (patch, size).
 	   This correspondence is essential for calculating the patch size.  */

 	/* GIT has no data commands for large data, rest is same as GDIFF */
 	do {
 		unsigned s = size;
 		if (s > 0x7f)
 			s = 0x7f;
 		*patch++ = s;
 		memcpy(patch, data, s);
 		data += s;
 		patch += s;
 		size -= s;
 	} while (size);

 	return patch;
}

static unsigned copy_length (unsigned offset, unsigned length)
{
 	unsigned size = 0;

 	assert (offset < MAX_SIZE && length < MAX_SIZE);

 	/* For now we only copy a maximum of 0x10000 bytes per command.
 	   Longer copies are broken into pieces of that size. */
 	do {
 		signed s = length;
 		if (s > 0x10000)
 			s = 0x10000;
 		size += !!(s & 0xff) + !!(s & 0xff00);
 		size += !!(offset & 0xff) + !!(offset & 0xff00) +
 			!!(offset & 0xff0000) + !!(offset & 0xff000000);
 		size += 1;
 		offset += s;
 		length -= s;
 	} while (length);

 	return size;
}

static unsigned char *
write_copy(unsigned char *patch, unsigned offset, unsigned size)
{
 	/* The return value must be equal to patch + copy_length
 	   (patch, offset, size). This correspondence is essential
 	   for calculating the patch size.  */

 	do {
 		unsigned char c = 0x80, *cmd = patch++;
 		unsigned v, s = size;
 		if (s > 0x10000)
 			s = 0x10000;

 		v = offset;
 		if (v & 0xff) c |= 0x01, *patch++ = v;
 		v >>= 8;
 		if (v & 0xff) c |= 0x02, *patch++ = v;
 		v >>= 8;
 		if (v & 0xff) c |= 0x04, *patch++ = v;
 		v >>= 8;
 		if (v & 0xff) c |= 0x08, *patch++ = v;

 		v = s;
 		if (v & 0xff) c |= 0x10, *patch++ = v;
 		v >>= 8;
 		if (v & 0xff) c |= 0x20, *patch++ = v;

 		*cmd = c;
 		offset += s;
 		size -= s;
 	} while (size);

 	return patch;
}

static unsigned
process_copies (unsigned char *data, unsigned length, unsigned maxlen)
{
 	int j;
 	unsigned ptr = length;
 	unsigned patch_bytes = header_length(idx_data_len, length);

 	/* Work through the copies backwards, extending each one backwards. */
 	for (j = copy_count - 1; j >= 0; j--) {
 		CopyInfo *copy = copies+j;
 		unsigned src = copy->src;
 		unsigned tgt = copy->tgt;
 		unsigned len = copy->length;
 		int data_follows;

 		if (tgt + len > ptr) {
 			/* Part of copy already covered by later one,
 			   so shorten copy. */
 			if (ptr < tgt) {
 				/* Copy completely disappeared, but guess
 				   that a backward extension might still be
 				   useful. This extension is non-contiguous,
 				   as it is irrelevant whether the skipped
 				   data would have matched or not. Be careful
 				   to not extend past the beginning of
 				   the source. */
 				unsigned adjust = tgt - ptr;

 				tgt = ptr;
 				src = (src < adjust) ? 0 : src - adjust;

 				copy->tgt = tgt;
 				copy->src = src;
 			}

 			len = ptr - tgt;
 		}

 		while (src && tgt && idx_data[src - 1] == data[tgt - 1]) {
 			src--;
 			tgt--;
 		}
 		len += copy->tgt - tgt;

 		data_follows = (tgt + len < ptr);

 		/* A short copy may cost as much as 6 bytes for the copy and
 		   5 as result of an extra data command. It's not worth
 		   having extra copies in order to just save a byte or two.
 		   Being too smart here may hurt later compression as well. */
 		if (len < (data_follows ? 16 : 10))
 			len = 0;

 		/* Some target data is not covered by the copies, account for
 		   the DATA command that will follow the copy. */
 		if (len && data_follows)
 			patch_bytes += data_length(ptr - (tgt + len));

 		/* Everything about the copy is known and will not change.
 		   Write back the new information and update the patch size
 		   with the size of the copy instruction. */
 		copy->length = len;
 		copy->src = src;
 		copy->tgt = tgt;

 		if (len) {
 			/* update patch size for copy command */
 			patch_bytes += copy_length (src, len);
 			ptr = tgt;
 		} else if (j == copy_count - 1) {
 			/* Remove empty copies at end of list. */
 			copy_count--;
 		}

 		if (patch_bytes > maxlen)
 			return 0;
 	}

 	/* Account for data before first copy */
 	if (ptr != 0)
 		patch_bytes += data_length(ptr);

 	if (patch_bytes > maxlen)
 		return 0;
 	return patch_bytes;
}

static void *
create_delta (unsigned char *data, unsigned len,
 	      unsigned char *delta, unsigned delta_size)
{
 	unsigned char *ptr = delta;
 	unsigned offset = 0;
 	int j;

 	ptr = write_header(ptr, idx_data_len, len);

 	for (j = 0; j < copy_count; j++) {
 		CopyInfo *copy = copies + j;
 		unsigned copylen = copy->length;

 		if (!copylen)
 			continue;

 		if (copy->tgt > offset) {
 			ptr = write_data(ptr, data + offset,
 			   copy->tgt - offset);
 		}

 		ptr = write_copy(ptr, copy->src, copylen);
 		offset = copy->tgt + copylen;
 	}

 	if (offset < len)
 		ptr = write_data(ptr, data + offset, len - offset);

 	assert(ptr - delta == delta_size);

 	return delta;
}

static void finalize_idx()
{
 	if (max_copies > 8 * MAX_COPIES) {
 		free(copies);
 		copies = 0;
 		max_copies = 0;
 	}
 	copy_count = 0;
 	if (idx) free(idx);
 	idx = 0;
 	idx_size = 0;
 	idx_data = 0;
 	idx_data_len = 0;
}

static unsigned
match_head (unsigned char *from, unsigned char *to, unsigned size)
{
 	unsigned head = 0;
 	while (head < size && from[head] == to[head]) head++;
 	return head;
}

static unsigned
match_tail (unsigned char *from, unsigned char *to, unsigned size)
{
 	unsigned tail = 0;
 	while (tail < size && *(from - tail) == *(to - tail)) tail++;
 	return tail;
}

void *diff_delta(void *from_buf, unsigned long from_size,
 		 void *to_buf, unsigned long to_size,
 		 unsigned long *delta_size, unsigned long max_size)
{
 	unsigned char *delta = 0;
 	unsigned dsize;
         unsigned head = 0;
         unsigned tail = 0;

 	assert (from_size <= MAX_SIZE && to_size <= MAX_SIZE);

 	/* The following actually takes care of about half of all target
 	   data. This is performance critical, and may need some work. */
         head = match_head(from_buf, to_buf, MIN(from_size, to_size));
 	tail = match_tail(from_buf + (from_size - 1), to_buf + (to_size - 1),
 	                  MIN(from_size, to_size - head));

 	if (head <= RABIN_WINDOW_SIZE) head = 0;
 	if (tail <= RABIN_WINDOW_SIZE) tail = 0;

 	if (!max_size)
 		max_size = from_size;

 	init_idx (from_buf, from_size, 1, head, tail);

 	if (head) add_copy (0, 0, head);

 	if (head + tail + RABIN_WINDOW_SIZE < from_size) {
 		if (!find_copies(to_buf, to_size - tail, head))
 			return 0;
 	}
 	if (tail) add_copy (from_size - tail, to_size - tail, tail);

 	dsize = process_copies(to_buf, to_size, max_size);
 	if (dsize)
 	{
 		delta = malloc (dsize);
 		delta = create_delta (to_buf, to_size, delta, dsize);
 	}
 	finalize_idx ();
 	if (delta)
 		*delta_size = dsize;
 	return delta;
}

^ permalink raw reply

* Re: [PATCH] use delta index data when finding best delta matches
From: Nicolas Pitre @ 2006-04-28  1:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vy7xqh5g6.fsf@assigned-by-dhcp.cox.net>

On Thu, 27 Apr 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > This patch allows for computing the delta index for each base object 
> > only once and reuse it when trying to find the best delta match.
> >
> > This should set the mark and pave the way for possibly better delta 
> > generator algorithms.
> >
> > Signed-off-by: Nicolas Pitre <nico@cam.org>
> 
> My understanding is that theoretically this should not make any
> difference to the result, and should run faster when the memory
> pressure does not cause the machine to thrash.  However,....
> 
> I am seeing some differences.  Even with the smallish "git.git"
> repository, packing is slightly slower, and the end result is
> smaller.

Well, I changed some euristics a bit.

> Not that I am complaining that it produces better results with a
> small performance penalty.  I am curious because I do not
> understand where the differences are coming from, and I was
> reluctant to merge it in "next" until I understand what is going
> on.
> 
> But I think I know where the differences come from:
> 
> -	sizediff = oldsize > size ? oldsize - size : size - oldsize;
> +	sizediff = src_size < size ? size - src_size : 0;

Right.  The idea is that when the delta source index has to be computed 
each time, if the target buffer is really small then we spend more time 
computing that index than anything else.

But when the delta index is computed only once and already available 
anyway, we don't lose much attempting a delta with a small target buffer 
since the delta computation is non-existent at that point and the actual 
delta generation will be quick if the target buffer is small.

> There is another "omit smaller than 50" difference but that
> should not trigger -- we do not have files that small.

Right.  And if such small files show up they won't waste window space.


Nicolas

^ permalink raw reply

* Re: [PATCH] use delta index data when finding best delta matches
From: Junio C Hamano @ 2006-04-28  1:08 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604262351221.18520@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> This patch allows for computing the delta index for each base object 
> only once and reuse it when trying to find the best delta match.
>
> This should set the mark and pave the way for possibly better delta 
> generator algorithms.
>
> Signed-off-by: Nicolas Pitre <nico@cam.org>

My understanding is that theoretically this should not make any
difference to the result, and should run faster when the memory
pressure does not cause the machine to thrash.  However,....

I am seeing some differences.  Even with the smallish "git.git"
repository, packing is slightly slower, and the end result is
smaller.

Here are full packing experiments in a fully unpacked git.git
repository.

("next" version)
Total 17724, written 17724 (delta 11779), reused 0 (delta 0)
31.61user 6.24system 0:37.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+431995minor)pagefaults 0swaps

 6520520 pack-next-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

(with "use delta index" patch)
Total 17724, written 17724 (delta 12002), reused 0 (delta 0)
33.26user 6.00system 0:39.33elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+434451minor)pagefaults 0swaps

 6188418 pack-nico-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

Not that I am complaining that it produces better results with a
small performance penalty.  I am curious because I do not
understand where the differences are coming from, and I was
reluctant to merge it in "next" until I understand what is going
on.

But I think I know where the differences come from:

-	sizediff = oldsize > size ? oldsize - size : size - oldsize;
+	sizediff = src_size < size ? size - src_size : 0;

There is another "omit smaller than 50" difference but that
should not trigger -- we do not have files that small.

The size-diff change sort-of makes sense -- you are counting how
much the target grew, which you are likely to need to represent
as additions of literal data, and there is no reason to limit
the diff if the size difference that is greater than maxsize is
in the other direction (deletion).

So, I "backported" that part of the change on top of "next" and
tried the same experiment.

(without "use delta index" but the size heuristics part ported to "next")
Total 17724, written 17724 (delta 12002), reused 0 (delta 0)
36.92user 6.55system 0:43.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+431860minor)pagefaults 0swaps

 6188418 pack-size-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

And now the resulting pack is the same as what you produce.

So comparing 31.61 seconds vs 33.26 seconds and complaining you
made it slower is not fair.  You fixed the size heuristic logic
in the current code to produce 5% smaller pack (which made
things slower to spend 36.92 seconds while doing so -- that's
15% slowdown), and then reusing delta-index brought that penalty
down to 5% or so.

-- >8 --

This patch applies on top of "next" to match the size heuristics
used in the "reuse delta index" patch.

 pack-objects.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/pack-objects.c b/pack-objects.c
index c0acc46..6604338 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1032,12 +1032,6 @@ static int try_delta(struct unpacked *cu
 		max_depth -= cur_entry->delta_limit;
 	}

-	size = cur_entry->size;
-	oldsize = old_entry->size;
-	sizediff = oldsize > size ? oldsize - size : size - oldsize;
-
-	if (size < 50)
-		return -1;
 	if (old_entry->depth >= max_depth)
 		return 0;

@@ -1048,9 +1042,12 @@ static int try_delta(struct unpacked *cu
 	 * more space-efficient (deletes don't have to say _what_ they
 	 * delete).
 	 */
+	size = cur_entry->size;
 	max_size = size / 2 - 20;
 	if (cur_entry->delta)
 		max_size = cur_entry->delta_size-1;
+	oldsize = old_entry->size;
+	sizediff = oldsize < size ? size - oldsize : 0;
 	if (sizediff >= max_size)
 		return 0;
 	delta_buf = diff_delta(old->data, oldsize,
@@ -1109,6 +1106,9 @@ static void find_deltas(struct object_en
 			 */
 			continue;

+		if (entry->size < 50)
+			continue;
+
 		free(n->data);
 		n->entry = entry;
 		n->data = read_sha1_file(entry->sha1, type, &size);

^ permalink raw reply related

* Re: [PATCH] send-email: Change from Mail::Sendmail to Net::SMTP
From: Martin Langhoff @ 2006-04-28  1:04 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Ryan Anderson
In-Reply-To: <20060428002744.GB9146@hand.yhbt.net>

On 4/28/06, Eric Wong <normalperson@yhbt.net> wrote:
> You should be able to just open a pipe to:
>         /usr/sbin/sendmail @recipients
> and just write headers\nbody to that pipe.

Sounds reasonable. I just looked at what Mail::Sendmail does and it
isn't specially interesting. (There used to be a different Perl module
that did smart things, depending on what MTA it found, but I can't
find it now).

> Perhaps allow and detect --smtp-server=/path/to/sendmail ?

Oh, it should just work with sendmail if it's there and we don't
provide --smtp-server ;-)



m

^ permalink raw reply

* Re: [PATCH] send-email: Change from Mail::Sendmail to Net::SMTP
From: Eric Wong @ 2006-04-28  0:27 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Junio C Hamano, git, Ryan Anderson
In-Reply-To: <46a038f90604261324w76f272edp93941d7e8645be8@mail.gmail.com>

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 4/27/06, Junio C Hamano <junkio@cox.net> wrote:
> > > system that we don't need an smtp daemon. Net::SMTP doesn't know how
> > > to use /usr/bin/sendmail
> 
> 
> > Wouldn't --smtp-server=that.smtp.server work for you?  Ah, that
> > would not work if your use is to send a local mail.  Hmph...
> 
> Well, the machine knows that the smtp server is (I mean, files in /etc
> have the right values in them), but I don't think often about it. Only
> when I am installing OSs or MTAs...
> 
> I know... I'm a whiner! ;-) I'll probably do something that does an
> eval and tries Mail::Sendmail and post it.

You should be able to just open a pipe to:
	/usr/sbin/sendmail @recipients
and just write headers\nbody to that pipe.

Perhaps allow and detect --smtp-server=/path/to/sendmail ?

-- 
Eric Wong

^ permalink raw reply

* Re: [PATCH] C version of git-count-objects
From: Junio C Hamano @ 2006-04-28  0:25 UTC (permalink / raw)
  To: Peter Hagervall; +Cc: git
In-Reply-To: <20060428001049.GA28347@brainysmurf.cs.umu.se>

Peter Hagervall <hager@cs.umu.se> writes:

> On Thu, Apr 27, 2006 at 03:07:37PM -0700, Junio C Hamano wrote:
>
> ...
>
>> +int cmd_count_objects(int ac, const char **av, char *ep)
>                                                        ^
> ...
>
>> +extern int cmd_count_objects(int argc, const char **argv, char **envp);
>                                                                   ^^
> Looks like we have a type mismatch here, no?

Interesting.  Lack of #include <builtin.h> was causing the
compiler not to notice X-<.

^ permalink raw reply

* Re: [PATCH] C version of git-count-objects
From: Peter Hagervall @ 2006-04-28  0:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vaca6k6za.fsf@assigned-by-dhcp.cox.net>

On Thu, Apr 27, 2006 at 03:07:37PM -0700, Junio C Hamano wrote:

...

> +int cmd_count_objects(int ac, const char **av, char *ep)
                                                       ^
...

> +extern int cmd_count_objects(int argc, const char **argv, char **envp);
                                                                  ^^
Looks like we have a type mismatch here, no?

	Peter

^ permalink raw reply

* Re: new gitk feature
From: Paul Mackerras @ 2006-04-27 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604260802050.3701@g5.osdl.org>

Linus Torvalds writes:

> Any possibility of something light that? I'd _love_ to be able to see the 
> whole tree, but with things that touch certain files or things that are 
> newer highlighted.

That should be quite doable.  How about I show the commits that are in
the highlight view in bold?  That won't conflict with the existing
yellow background for commits that match the find criteria.

> (Btw, the "revision information" is also cool things like "--unpacked". I 
> actually use "gitk --unpacked" every once in a while, just because it's 
> such a cool way to say "show me everything I've added since I packed the 
> repo last).

OK, I didn't know about --unpacked. :)  I plan to add stuff to the
view definition window to allow you to select commits to
include/exclude by reachability from given commits (by head/tag/ID)
and when I do I can add a way to say --unpacked too.

Paul.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox