Git development

Git development
 help / color / mirror / Atom feed

* svn to git, N-squared?
From: Jon Smirl @ 2006-06-12  2:02 UTC (permalink / raw)
  To: git

I have Mozilla CVS in a SVN repository. I've been using git-svnimport
to import it. This time I am letting it run to completion; but the
import has been running for four days now and it is only up to 2004.
The import task is stable at 570MB and it is using about 50% of my
CPU. It is constantly spawning off git write-tree, read-tree,
hash-object, update-index. It is not doing excessive disk activity.

The import seems to be getting n-squared slower. It is still making
forward progress but the progress seems to be getting slower and
slower.

It looks like it is doing write-tree, read-tree, hash-object,
update-index once or more per change set. If these commands are
n-proportional and they are getting run n times, then this is a
n-squared process. Projecting this out, the import may take 10 days or
more to completely finish.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* [PATCH] The hash name is SHA-1, use that throughout
From: Horst H. von Brand @ 2006-06-12  2:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <11500778593947-git-send-email-vonbrand@inf.utfsm.cl>

Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl>
---
 Documentation/technical/pack-format.txt     |    8 ++++----
 Documentation/technical/pack-heuristics.txt |    4 ++--
 Documentation/technical/pack-protocol.txt   |   20 ++++++++++----------
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 0e1ffb2..8823dce 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -32,7 +32,7 @@ GIT pack format
      Observation: length of each object is encoded in a variable
      length format and is not constrained to 32-bit or anything.
 
-  - The trailer records 20-byte SHA1 checksum of all of the above.
+  - The trailer records 20-byte SHA-1 checksum of all of the above.
 
 = pack-*.idx file has the following format:
 
@@ -61,10 +61,10 @@ GIT pack format
 
   - The file is concluded with a trailer:
 
-    A copy of the 20-byte SHA1 checksum at the end of
+    A copy of the 20-byte SHA-1 checksum at the end of
     corresponding packfile.
 
-    20-byte SHA1-checksum of all of the above.
+    20-byte SHA-1-checksum of all of the above.
 
 Pack Idx file:
 
@@ -111,6 +111,6 @@ Pack file entry: <+
         If it is not DELTA, then deflated bytes (the size above
 		is the size before compression).
 	If it is DELTA, then
-	  20-byte base object name SHA1 (the size above is the
+	  20-byte base object name SHA-1 (the size above is the
 	  	size of the delta data that follows).
           delta data, deflated.
diff --git a/Documentation/technical/pack-heuristics.txt b/Documentation/technical/pack-heuristics.txt
index 9aadd5c..458677e 100644
--- a/Documentation/technical/pack-heuristics.txt
+++ b/Documentation/technical/pack-heuristics.txt
@@ -77,7 +77,7 @@ And Bable-like confusion flowed.
 
     <njs`> oh, hmm, and I'm not sure what this sliding window means either
 
-    <pasky> iirc, it appeared to me to be just the sha1 of the object
+    <pasky> iirc, it appeared to me to be just the SHA-1 of the object
         when reading the code casually ...
 
         ... which simply doesn't sound as a very good heuristics, though ;)
@@ -89,7 +89,7 @@ Ah, grasshopper!  And thus the enlighten
 
     <linus> The "magic" is actually in theory totally arbitrary.
         ANY order will give you a working pack, but no, it's not
-        ordered by SHA1.
+        ordered by SHA-1.
 
         Before talking about the ordering for the sliding delta
         window, let's talk about the recency order. That's more
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 9cd48b4..9df76e3 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -6,22 +6,22 @@ There are two Pack push-pull protocols.
 upload-pack (S) | fetch/clone-pack (C) protocol:
 
 	# Tell the puller what commits we have and what their names are
-	S: SHA1 name
+	S: SHA-1 name
 	S: ...
-	S: SHA1 name
+	S: SHA-1 name
 	S: # flush -- it's your turn
 	# Tell the pusher what commits we want, and what we have
 	C: want name
 	C: ..
 	C: want name
-	C: have SHA1
-	C: have SHA1
+	C: have SHA-1
+	C: have SHA-1
 	C: ...
 	C: # flush -- occasionally ask "had enough?"
 	S: NAK
-	C: have SHA1
+	C: have SHA-1
 	C: ...
-	C: have SHA1
+	C: have SHA-1
 	S: ACK
 	C: done
 	S: XXXXXXX -- packfile contents.
@@ -29,13 +29,13 @@ upload-pack (S) | fetch/clone-pack (C) p
 send-pack | receive-pack protocol.
 
 	# Tell the pusher what commits we have and what their names are
-	C: SHA1 name
+	C: SHA-1 name
 	C: ...
-	C: SHA1 name
+	C: SHA-1 name
 	C: # flush -- it's your turn
 	# Tell the puller what the pusher has
-	S: old-SHA1 new-SHA1 name
-	S: old-SHA1 new-SHA1 name
+	S: old-SHA-1 new-SHA-1 name
+	S: old-SHA-1 new-SHA-1 name
 	S: ...
 	S: # flush -- done with the list
 	S: XXXXXXX --- packfile contents.
-- 
1.4.0.g1b2d

^ permalink raw reply related

* [PATCH] The name of the hash is SHA-1, use that throughout
From: Horst H. von Brand @ 2006-06-12  2:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl>
---
 Documentation/config.txt           |    2 +-
 Documentation/core-tutorial.txt    |    8 ++++----
 Documentation/diff-format.txt      |    8 ++++----
 Documentation/diffcore.txt         |    2 +-
 Documentation/git-branch.txt       |    2 +-
 Documentation/git-cat-file.txt     |    2 +-
 Documentation/git-checkout.txt     |    2 +-
 Documentation/git-cherry.txt       |    2 +-
 Documentation/git-diff-index.txt   |    8 ++++----
 Documentation/git-fsck-objects.txt |    8 ++++----
 Documentation/git-init-db.txt      |    2 +-
 Documentation/git-ls-files.txt     |    2 +-
 Documentation/git-merge-index.txt  |    2 +-
 Documentation/git-mktag.txt        |    2 +-
 Documentation/git-pack-objects.txt |    2 +-
 Documentation/git-push.txt         |    2 +-
 Documentation/git-rev-parse.txt    |    8 ++++----
 Documentation/git-show-branch.txt  |    4 ++--
 Documentation/git-unpack-file.txt  |    2 +-
 Documentation/git-update-index.txt |    2 +-
 Documentation/git-verify-pack.txt  |    4 ++--
 Documentation/git-verify-tag.txt   |    2 +-
 Documentation/git.txt              |    2 +-
 Documentation/glossary.txt         |    6 +++---
 Documentation/tutorial-2.txt       |   14 +++++++-------
 25 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index bb93dc5..4195713 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -72,7 +72,7 @@ core.preferSymlinkRefs::
 
 core.logAllRefUpdates::
 	If true, `git-update-ref` will append a line to
-	"$GIT_DIR/logs/<ref>" listing the new SHA1 and the date/time
+	"$GIT_DIR/logs/<ref>" listing the new SHA-1 and the date/time
 	of the update.	If the file does not exist it will be
 	created automatically.	This information can be used to
 	determine what commit was the tip of a branch "2 days ago".
diff --git a/Documentation/core-tutorial.txt b/Documentation/core-tutorial.txt
index b59153e..59f86ee 100644
--- a/Documentation/core-tutorial.txt
+++ b/Documentation/core-tutorial.txt
@@ -100,9 +100,9 @@ branch. A number of the git tools will a
 valid, though.
 
 [NOTE]
-An 'object' is identified by its 160-bit SHA1 hash, aka 'object name',
+An 'object' is identified by its 160-bit SHA-1 hash, aka 'object name',
 and a reference to an object is always the 40-byte hex
-representation of that SHA1 name. The files in the `refs`
+representation of that SHA-1 name. The files in the `refs`
 subdirectory are expected to contain these hex references
 (usually with a final `\'\n\'` at the end), and you should thus
 expect to see a number of 41-byte files containing these
@@ -772,7 +772,7 @@ already discussed, the `HEAD` branch is 
 these object pointers. 
 
 You can at any time create a new branch by just picking an arbitrary
-point in the project history, and just writing the SHA1 name of that
+point in the project history, and just writing the SHA-1 name of that
 object into a file under `.git/refs/heads/`. You can use any filename you
 want (and indeed, subdirectories), but the convention is that the
 "normal" branch is called `master`. That's just a convention, though,
@@ -1260,7 +1260,7 @@ file (the first tree goes to stage 1, th
 etc.).  After reading three trees into three stages, the paths
 that are the same in all three stages are 'collapsed' into stage
 0.  Also paths that are the same in two of three stages are
-collapsed into stage 0, taking the SHA1 from either stage 2 or
+collapsed into stage 0, taking the SHA-1 from either stage 2 or
 stage 3, whichever is different from stage 1 (i.e. only one side
 changed from the common ancestor).
 
diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt
index 89607c8..859e527 100644
--- a/Documentation/diff-format.txt
+++ b/Documentation/diff-format.txt
@@ -35,9 +35,9 @@ That is, from the left to the right:
 . a space.
 . mode for "dst"; 000000 if deletion or unmerged.
 . a space.
-. sha1 for "src"; 0\{40\} if creation or unmerged.
+. SHA-1 for "src"; 0\{40\} if creation or unmerged.
 . a space.
-. sha1 for "dst"; 0\{40\} if creation, unmerged or "look at work tree".
+. SHA-1 for "dst"; 0\{40\} if creation, unmerged or "look at work tree".
 . a space.
 . status, followed by optional "score" number.
 . a tab or a NUL when `-z` option is used.
@@ -46,7 +46,7 @@ That is, from the left to the right:
 . path for "dst"; only exists for C or R.
 . an LF or a NUL when `-z` option is used, to terminate the record.
 
-<sha1> is shown as all 0's if a file is new on the filesystem
+<SHA-1> is shown as all 0's if a file is new on the filesystem
 and it is out of sync with the index.
 
 Example:
@@ -97,7 +97,7 @@ where:
 
      <old|new>-file:: are files GIT_EXTERNAL_DIFF can use to read the
 		      contents of <old|new>,
-     <old|new>-hex:: are the 40-hexdigit SHA1 hashes,
+     <old|new>-hex:: are the 40-hexdigit SHA-1 hashes,
      <old|new>-mode:: are the octal representation of the file modes.
 
 + 
diff --git a/Documentation/diffcore.txt b/Documentation/diffcore.txt
index 5492669..2d45ea0 100644
--- a/Documentation/diffcore.txt
+++ b/Documentation/diffcore.txt
@@ -115,7 +115,7 @@ it changes it to:
 For the purpose of breaking a filepair, `diffcore-break` examines
 the extent of changes between the contents of the files before
 and after modification (i.e. the contents that have "bcd1234..."
-and "0123456..." as their SHA1 content ID, in the above
+and "0123456..." as their SHA-1 content ID, in the above
 example).  The amount of deletion of original contents and
 insertion of new material are added together, and if it exceeds
 the "break score", the filepair is broken into two.  The break
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 29b102d..36100c0 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -38,7 +38,7 @@ OPTIONS
 -l::
 	Create the branch's ref log.  This activates recording of
 	all changes to made the branch ref, enabling use of date
-	based sha1 expressions such as "<branchname>@{yesterday}".
+	based SHA-1 expressions such as "<branchname>@{yesterday}".
 
 -f::
 	Force the creation of a new branch even if it means deleting
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 85fb9ae..760d4b3 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -19,7 +19,7 @@ or `-s` is used to find the object size.
 OPTIONS
 -------
 <object>::
-	The sha1 identifier of the object.
+	The SHA-1 identifier of the object.
 
 -t::
 	Instead of the content, show the object type identified by
diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index 90fc318..9b6c719 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -43,7 +43,7 @@ OPTIONS
 -l::
 	Create the new branch's ref log.  This activates recording of
 	all changes to made the branch ref, enabling use of date
-	based sha1 expressions such as "<branchname>@{yesterday}".
+	based SHA-1 expressions such as "<branchname>@{yesterday}".
 
 -m::
 	If you have local modifications to one or more files that
diff --git a/Documentation/git-cherry.txt b/Documentation/git-cherry.txt
index cccb781..35f11cd 100644
--- a/Documentation/git-cherry.txt
+++ b/Documentation/git-cherry.txt
@@ -15,7 +15,7 @@ The changeset (or "diff") of each commit
 is compared against each commit between the fork-point and <upstream>.
 
 Every commit with a changeset that doesn't exist in the other branch
-has its id (sha1) reported, prefixed by a symbol.  Those existing only
+has its id (SHA-1) reported, prefixed by a symbol.  Those existing only
 in the <upstream> branch are prefixed with a minus (-) sign, and those
 that only exist in the <head> branch are prefixed with a plus (+) symbol.
 
diff --git a/Documentation/git-diff-index.txt b/Documentation/git-diff-index.txt
index 39d3b99..cdace11 100644
--- a/Documentation/git-diff-index.txt
+++ b/Documentation/git-diff-index.txt
@@ -93,7 +93,7 @@ you *could* commit. Again, the output ma
 output to a tee, but with a twist.
 
 The twist is that if some file doesn't match the index, we don't have
-a backing store thing for it, and we use the magic "all-zero" sha1 to
+a backing store thing for it, and we use the magic "all-zero" SHA-1 to
 show that. So let's say that you have edited `kernel/sched.c`, but
 have not actually done a `git update-index` on it yet - there is no
 "object" associated with the new state, and you get:
@@ -102,7 +102,7 @@ have not actually done a `git update-ind
   *100644->100664 blob    7476bb......->000000......      kernel/sched.c
 
 i.e., it shows that the tree has changed, and that `kernel/sched.c` has is
-not up-to-date and may contain new stuff. The all-zero sha1 means that to
+not up-to-date and may contain new stuff. The all-zero SHA-1 means that to
 get the real diff, you need to look at the object in the working directory
 directly rather than do an object-to-object diff.
 
@@ -115,8 +115,8 @@ touched it. In either case, it's a note 
 NOTE: You can have a mixture of files show up as "has been updated"
 and "is still dirty in the working directory" together. You can always
 tell which file is in which state, since the "has been updated" ones
-show a valid sha1, and the "not in sync with the index" ones will
-always have the special all-zero sha1.
+show a valid SHA-1, and the "not in sync with the index" ones will
+always have the special all-zero SHA-1.
 
 
 Author
diff --git a/Documentation/git-fsck-objects.txt b/Documentation/git-fsck-objects.txt
index e842bfd..1413f77 100644
--- a/Documentation/git-fsck-objects.txt
+++ b/Documentation/git-fsck-objects.txt
@@ -22,7 +22,7 @@ OPTIONS
 	An object to treat as the head of an unreachability trace.
 +
 If no objects are given, git-fsck-objects defaults to using the
-index file and all SHA1 references in .git/refs/* as heads.
+index file and all SHA-1 references in .git/refs/* as heads.
 
 --unreachable::
 	Print out objects that exist but that aren't readable from any
@@ -55,7 +55,7 @@ index file and all SHA1 references in .g
 	objects that triggers this check, but it is recommended
 	to check new projects with this flag.
 
-It tests SHA1 and general object sanity, and it does full tracking of
+It tests SHA-1 and general object sanity, and it does full tracking of
 the resulting reachability and everything else. It prints out any
 corruption it finds (missing or bad objects), and if you use the
 `--unreachable` flag it will also print out objects that exist but
@@ -87,7 +87,7 @@ expect dangling commits - potential head
 	root nodes.
 
 missing sha1 directory '<dir>'::
-	The directory holding the sha1 objects is missing.
+	The directory holding the SHA-1 objects is missing.
 
 unreachable <type> <object>::
 	The <type> object <object>, isn't actually referred to directly
@@ -109,7 +109,7 @@ warning: git-fsck-objects: tree <tree> h
 	And it shouldn't...
 
 sha1 mismatch <object>::
-	The database has an object who's sha1 doesn't match the
+	The database has an object who's SHA-1 doesn't match the
 	database value.
 	This indicates a serious data integrity problem.
 
diff --git a/Documentation/git-init-db.txt b/Documentation/git-init-db.txt
index 6e32f88..7f83f7e 100644
--- a/Documentation/git-init-db.txt
+++ b/Documentation/git-init-db.txt
@@ -39,7 +39,7 @@ If the `$GIT_DIR` environment variable i
 to use instead of `./.git` for the base of the repository.
 
 If the object storage directory is specified via the `$GIT_OBJECT_DIRECTORY`
-environment variable then the sha1 directories are created underneath -
+environment variable then the SHA-1 directories are created underneath -
 otherwise the default `$GIT_DIR/objects` directory is used.
 
 A shared repository allows users belonging to the same group to push into that
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 7567323..4bb0f5f 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -123,7 +123,7 @@ which case it outputs:
 "git-ls-files --unmerged" and "git-ls-files --stage" can be used to examine
 detailed information on unmerged paths.
 
-For an unmerged path, instead of recording a single mode/SHA1 pair,
+For an unmerged path, instead of recording a single mode/SHA-1 pair,
 the dircache records up to three such pairs; one from tree O in stage
 1, A in stage 2, and B in stage 3.  This information can be used by
 the user (or the porcelain) to see what should eventually be recorded at the
diff --git a/Documentation/git-merge-index.txt b/Documentation/git-merge-index.txt
index af79688..7348682 100644
--- a/Documentation/git-merge-index.txt
+++ b/Documentation/git-merge-index.txt
@@ -13,7 +13,7 @@ SYNOPSIS
 DESCRIPTION
 -----------
 This looks up the <file>(s) in the index and, if there are any merge
-entries, passes the SHA1 hash for those files as arguments 1, 2, 3 (empty
+entries, passes the SHA-1 hash for those files as arguments 1, 2, 3 (empty
 argument if no file), and <file> as argument 4.  File modes for the three
 files are passed as arguments 5, 6 and 7.
 
diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
index ca0f48a..26b2a6e 100644
--- a/Documentation/git-mktag.txt
+++ b/Documentation/git-mktag.txt
@@ -21,7 +21,7 @@ Tag Format
 ----------
 A tag signature file has a very simple fixed format: three lines of
 
-  object <sha1>
+  object <SHA-1>
   type <typename>
   tag <tagname>
 
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index d968afe..bbe2afa 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -46,7 +46,7 @@ base-name::
 	Write into a pair of files (.pack and .idx), using
 	<base-name> to determine the name of the created file.
 	When this option is used, the two files are written in
-	<base-name>-<SHA1>.{pack,idx} files.  <SHA1> is a hash
+	<base-name>-<SHA-1>.{pack,idx} files.  <SHA-1> is a hash
 	of object names (currently in random order so it does
 	not have any useful meaning) to make the resulting
 	filename reasonably unique, and written to the standard
diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 52be715..b299045 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -34,7 +34,7 @@ OPTIONS
 	the destination ref.
 +
 The <src> side can be an
-arbitrary "SHA1 expression" that can be used as an
+arbitrary "SHA-1 expression" that can be used as an
 argument to `git-cat-file -t`.  E.g. `master~4` (push
 four parents before the current master head).
 +
diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index 5a4e6b5..6235834 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -59,7 +59,7 @@ OPTIONS
 	one.
 
 --symbolic::
-	Usually the object names are output in SHA1 form (with
+	Usually the object names are output in SHA-1 form (with
 	possible '{caret}' prefix); this option makes them output in a
 	form as close to the original input as possible.
 
@@ -90,7 +90,7 @@ OPTIONS
 	Show `$GIT_DIR` if defined else show the path to the .git directory.
 
 --short, --short=number::
-	Instead of outputting the full SHA1 values of object names try to
+	Instead of outputting the full SHA-1 values of object names try to
 	abbreviate them to a shorter unique name. When no length is specified
 	7 is used. The minimum length is 4.
 
@@ -110,10 +110,10 @@ SPECIFYING REVISIONS
 --------------------
 
 A revision parameter typically, but not necessarily, names a
-commit object.  They use what is called an 'extended SHA1'
+commit object.  They use what is called an 'extended SHA-1'
 syntax.
 
-* The full SHA1 object name (40-byte hexadecimal string), or
+* The full SHA-1 object name (40-byte hexadecimal string), or
   a substring of such that is unique within the repository.
   E.g. dae86e1950b1277e545cee180551750029cfe735 and dae86e both
   name the same commit object if there are no other object in
diff --git a/Documentation/git-show-branch.txt b/Documentation/git-show-branch.txt
index 424b97b..7afdea3 100644
--- a/Documentation/git-show-branch.txt
+++ b/Documentation/git-show-branch.txt
@@ -28,7 +28,7 @@ no <rev> nor <glob> is given on the comm
 OPTIONS
 -------
 <rev>::
-	Arbitrary extended SHA1 expression (see `git-rev-parse`)
+	Arbitrary extended SHA-1 expression (see `git-rev-parse`)
 	that typically names a branch HEAD or a tag.
 
 <glob>::
@@ -97,7 +97,7 @@ displayed, indented N places.  If a comm
 branch, the I-th indentation character shows a `+` sign;
 otherwise it shows a space.  Merge commits are denoted by
 a `-` sign.  Each commit shows a short name that
-can be used as an extended SHA1 to name that commit.
+can be used as an extended SHA-1 to name that commit.
 
 The following example shows three branches, "master", "fixes"
 and "mhf":
diff --git a/Documentation/git-unpack-file.txt b/Documentation/git-unpack-file.txt
index 259df2c..c9da258 100644
--- a/Documentation/git-unpack-file.txt
+++ b/Documentation/git-unpack-file.txt
@@ -13,7 +13,7 @@ SYNOPSIS
 
 DESCRIPTION
 -----------
-Creates a file holding the contents of the blob specified by sha1. It
+Creates a file holding the contents of the blob specified by SHA-1. It
 returns the name of the temporary file in the following format:
 	.merge_file_XXXXX
 
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 56a2b15..10eabd7 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -167,7 +167,7 @@ Using --index-info
 multiple entry definitions from the standard input, and designed
 specifically for scripts.  It can take inputs of three formats:
 
-    . mode         SP sha1          TAB path
+    . mode         SP SHA-1          TAB path
 +
 The first format is what "git-apply --index-info"
 reports, and used to reconstruct a partial tree
diff --git a/Documentation/git-verify-pack.txt b/Documentation/git-verify-pack.txt
index d10454c..e0e503f 100644
--- a/Documentation/git-verify-pack.txt
+++ b/Documentation/git-verify-pack.txt
@@ -32,11 +32,11 @@ OUTPUT FORMAT
 -------------
 When specifying the -v option the format used is:
 
-	SHA1 type size offset-in-packfile
+	SHA-1 type size offset-in-packfile
 
 for objects that are not deltified in the pack, and
 
-	SHA1 type size offset-in-packfile depth base-SHA1
+	SHA-1 type size offset-in-packfile depth base-SHA-1
 
 for objects that are deltified.
 
diff --git a/Documentation/git-verify-tag.txt b/Documentation/git-verify-tag.txt
index 1a150f6..7c9835c 100644
--- a/Documentation/git-verify-tag.txt
+++ b/Documentation/git-verify-tag.txt
@@ -16,7 +16,7 @@ Validates the gpg signature created by g
 OPTIONS
 -------
 <tag>::
-	SHA1 identifier of a git tag object.
+	SHA-1 identifier of a git tag object.
 
 Author
 ------
diff --git a/Documentation/git.txt b/Documentation/git.txt
index d4472b5..2d454f8 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -585,7 +585,7 @@ git so take care if using Cogito etc.
 
 'GIT_OBJECT_DIRECTORY'::
 	If the object storage directory is specified via this
-	environment variable then the sha1 directories are created
+	environment variable then the SHA-1 directories are created
 	underneath - otherwise the default `$GIT_DIR/objects`
 	directory is used.
 
diff --git a/Documentation/glossary.txt b/Documentation/glossary.txt
index 116ddb7..3a0215c 100644
--- a/Documentation/glossary.txt
+++ b/Documentation/glossary.txt
@@ -163,7 +163,7 @@ merge::
 
 object::
 	The unit of storage in git. It is uniquely identified by
-	the SHA1 of its contents. Consequently, an object can not
+	the SHA-1 of its contents. Consequently, an object can not
 	be changed.
 
 object database::
@@ -243,7 +243,7 @@ rebase::
 	changes from that branch.
 
 ref::
-	A 40-byte hex representation of a SHA1 or a name that denotes
+	A 40-byte hex representation of a SHA-1 or a name that denotes
 	a particular object. These may be stored in `$GIT_DIR/refs/`.
 
 refspec::
@@ -279,7 +279,7 @@ rewind::
 SCM::
 	Source code management (tool).
 
-SHA1::
+SHA-1::
 	Synonym for object name.
 
 topic branch::
diff --git a/Documentation/tutorial-2.txt b/Documentation/tutorial-2.txt
index 894ca5e..0dc91e7 100644
--- a/Documentation/tutorial-2.txt
+++ b/Documentation/tutorial-2.txt
@@ -32,9 +32,9 @@ with?
 
 We saw in part one of the tutorial that commits have names like this.
 It turns out that every object in the git history is stored under
-such a 40-digit hex name.  That name is the SHA1 hash of the object's
+such a 40-digit hex name.  That name is the SHA-1 hash of the object's
 contents; among other things, this ensures that git will never store
-the same data twice (since identical data is given an identical SHA1
+the same data twice (since identical data is given an identical SHA-1
 name), and that the contents of a git object will never change (since
 that would change the object's name as well).
 
@@ -51,14 +51,14 @@ A tree can refer to one or more "blob" o
 a file.  In addition, a tree can also refer to other tree objects,
 thus creating a directory hierarchy.  You can examine the contents of
 any tree using ls-tree (remember that a long enough initial portion
-of the SHA1 will also work):
+of the SHA-1 will also work):
 
 ------------------------------------------------
 $ git ls-tree 92b8b694
 100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad    file.txt
 ------------------------------------------------
 
-Thus we see that this tree has one file in it.  The SHA1 hash is a
+Thus we see that this tree has one file in it.  The SHA-1 hash is a
 reference to that file's data:
 
 ------------------------------------------------
@@ -77,7 +77,7 @@ Note that this is the old file data; so 
 its response to the initial tree was a tree with a snapshot of the
 directory state that was recorded by the first commit.
 
-All of these objects are stored under their SHA1 names inside the git
+All of these objects are stored under their SHA-1 names inside the git
 directory:
 
 ------------------------------------------------
@@ -114,7 +114,7 @@ ref: refs/heads/master
 
 As you can see, this tells us which branch we're currently on, and it
 tells us this by naming a file under the .git directory, which itself
-contains a SHA1 name referring to a commit object, which we can
+contains a SHA-1 name referring to a commit object, which we can
 examine with cat-file:
 
 ------------------------------------------------
@@ -180,7 +180,7 @@ project's history:
 
 Note, by the way, that lots of commands take a tree as an argument.
 But as we can see above, a tree can be referred to in many different
-ways--by the SHA1 name for that tree, by the name of a commit that
+ways--by the SHA-1 name for that tree, by the name of a commit that
 refers to the tree, by the name of a branch whose head refers to that
 tree, etc.--and most such commands can accept any of these names.
 
-- 
1.4.0.g1b2d

^ permalink raw reply related

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12  3:31 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910606111902l709c71ccyf45070d55112739e@mail.gmail.com>

On Sun, 11 Jun 2006, Jon Smirl wrote:
>
> I have Mozilla CVS in a SVN repository. I've been using git-svnimport
> to import it. This time I am letting it run to completion; but the
> import has been running for four days now and it is only up to 2004.
> The import task is stable at 570MB and it is using about 50% of my
> CPU. It is constantly spawning off git write-tree, read-tree,
> hash-object, update-index. It is not doing excessive disk activity.

This sounds like _exactly_ what happens if you don't repack occasionally. 
Expecially if you are using a filesystem without hashed filename lookup, 
but it's true to some degree even with that - the filesystem tends to end 
up spending tons of time in kernel space, trying to find a place to put 
new objects.

I don't think git-svnimport has the repack logic in it, so that would be 
it.

You can probably stop it with ^Z, do a "git repack -a -d", and then let it 
continue.

(The only reason for stopping it is actually to let "git repack" remove 
most of the object directories - many filesystems, including ext3, don't 
even speed up all that much if the directories are emptied after they've 
grown big, and it's much better if the object directories get totally 
removed and re-created)

			Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12  3:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0606112028010.5498@g5.osdl.org>

On 6/11/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Sun, 11 Jun 2006, Jon Smirl wrote:
> >
> > I have Mozilla CVS in a SVN repository. I've been using git-svnimport
> > to import it. This time I am letting it run to completion; but the
> > import has been running for four days now and it is only up to 2004.
> > The import task is stable at 570MB and it is using about 50% of my
> > CPU. It is constantly spawning off git write-tree, read-tree,
> > hash-object, update-index. It is not doing excessive disk activity.
>
> This sounds like _exactly_ what happens if you don't repack occasionally.
> Expecially if you are using a filesystem without hashed filename lookup,
> but it's true to some degree even with that - the filesystem tends to end
> up spending tons of time in kernel space, trying to find a place to put
> new objects.
>
> I don't think git-svnimport has the repack logic in it, so that would be
> it.
>
> You can probably stop it with ^Z, do a "git repack -a -d", and then let it
> continue.

I have it stopped and I am running the repack.
There are 1.27M files in my .git directory

I ordered 2GB more RAM which should be here Tuesday.

> (The only reason for stopping it is actually to let "git repack" remove
> most of the object directories - many filesystems, including ext3, don't
> even speed up all that much if the directories are emptied after they've
> grown big, and it's much better if the object directories get totally
> removed and re-created)
>
>                         Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: cvs import error
From: carbonated beverage @ 2006-06-12  3:57 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90606111731q34fe431fn36d751b387ab69a9@mail.gmail.com>

On Mon, Jun 12, 2006 at 12:31:35PM +1200, Martin Langhoff wrote:
> After each attempt, the import leaves a .git file around. rm -fr .git
> before retrying... or just retry in a new directory every time ;-)
> 
> (... we should die with a more helpful message here...)

This error occurs on a fresh import attempt, unfortunately.

rm -rf'ing .git and doing an import again always fails at the exact same
spot with the above message.

-- DN
Daniel

^ permalink raw reply

* Re: cvs import error
From: Martin Langhoff @ 2006-06-12  4:01 UTC (permalink / raw)
  To: carbonated beverage; +Cc: git
In-Reply-To: <20060612035737.GA16580@prophet.net-ronin.org>

On 6/12/06, carbonated beverage <ramune@net-ronin.org> wrote:
> On Mon, Jun 12, 2006 at 12:31:35PM +1200, Martin Langhoff wrote:
> > After each attempt, the import leaves a .git file around. rm -fr .git
> > before retrying... or just retry in a new directory every time ;-)
> >
> > (... we should die with a more helpful message here...)
>
> This error occurs on a fresh import attempt, unfortunately.
>
> rm -rf'ing .git and doing an import again always fails at the exact same
> spot with the above message.

Unsure then. Try with the patches I've posted yesterday that ignore
bogus-looking branches.

cheers,



martin

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12  4:02 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910606112039p7aff60c7w7a074d0e35c7b0f@mail.gmail.com>

On Sun, 11 Jun 2006, Jon Smirl wrote:
> 
> I have it stopped and I am running the repack.
> There are 1.27M files in my .git directory

Yeah, that would do it. That's ~5000 files per object directory, so I 
assume that your directories are 200+kB in size, and for every new object 
added, you'll basically have to traverse the old directory fully in order 
to find an empty place for it (and without hashing, you'll traverse it 
_twice_ - first to look for it, then to look for the empty space).

Btw, after repacking, if it still has lots of lose objects, and you still 
have several directories that are huge (because there are pending objects 
for a commit that didn't happen yet when you ^Z'd the svnimport), you'll 
literally get better performance if you do something like

	for i in ??
	do
		cp -r $i $i.new
		rm -rf $i
		mv $i.new $i
	done

in your .git/objects/ directory (CAREFUL! Any script that does "rm -rf" 
should be double- and triple-checked for sanity! ;)

That should make sure that you don't still have huge directories.

(And yes, this is a real problem at least with ext3).

The git cvsimporter ends up repacking the archive every thousand commits. 
That's just a random number, but it's indicative of what we did there to 
handle large imports. I don't think anybody has done a large import using 
the git-svnimport before, so you're in new territory which explains some 
of the teething problems.

		Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Eric Wong @ 2006-06-12  4:29 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910606111902l709c71ccyf45070d55112739e@mail.gmail.com>

Jon Smirl <jonsmirl@gmail.com> wrote:
> I have Mozilla CVS in a SVN repository. I've been using git-svnimport
> to import it. This time I am letting it run to completion; but the
> import has been running for four days now and it is only up to 2004.
> The import task is stable at 570MB and it is using about 50% of my
> CPU. It is constantly spawning off git write-tree, read-tree,
> hash-object, update-index. It is not doing excessive disk activity.

SVN itself seems to get much slower as you get towards newer revisions
in a repository (FSFS) with lots of history.  I've been experimenting a
bit with a local copy of the gcc repo from November and git-svn SUCKED
at importing it (it took over a week and I cancelled it out of
frustration).   I started repacking too, but, and it didn't help,  Much
of the performance defieciency was the svn sub process. being extremely
slow at updating.

I also tried git-svnimport, of course, but I only had 512M on that
machine and the machine became unusable due to heavy swapping.

> The import seems to be getting n-squared slower. It is still making
> forward progress but the progress seems to be getting slower and
> slower.
> 
> It looks like it is doing write-tree, read-tree, hash-object,
> update-index once or more per change set. If these commands are
> n-proportional and they are getting run n times, then this is a
> n-squared process. Projecting this out, the import may take 10 days or
> more to completely finish.

I'm working on some improvements to git-svn to make it a bit more
spiffy.

-- 
Eric Wong

^ permalink raw reply

* git-diff --cc broken in 1.4.0?
From: Martin Langhoff @ 2006-06-12  4:32 UTC (permalink / raw)
  To: git

I was looking at some merges in gitk and lamenting the apparent loss
of the nice two-sided diff we get with -cc, and now duting a slightly
messy merge I did git-diff -cc only to get...

$ git-ls-files --unmerged
100644 f1d3843b2b2e42ba78adcf37da6440f0d321852e 1       local/version.php
100644 9352efa45cd25d9ad58df12b4ac241ac226a8ad4 2       local/version.php
100644 50da9b47903f6179f55a3f44290e7feaa08342f4 3       local/version.php

$ git-diff --cc
diff --cc local/version.php
index 9352efa,50da9b4..0000000
--- a/local/version.php
+++ b/local/version.php

cheers,


martin

^ permalink raw reply

* Re: svn to git, N-squared?
From: linux @ 2006-06-12  4:39 UTC (permalink / raw)
  To: git, jonsmirl, torvalds

>	for i in ??
>	do
>		cp -r $i $i.new
>		rm -rf $i
>		mv $i.new $i
>	done
>
> in your .git/objects/ directory (CAREFUL! Any script that does "rm -rf" 
> should be double- and triple-checked for sanity! ;)

Insanity is copying the data rather than just the file name.  Git is
good about not reading unnecessary files, and anything necessary should
be cached, so on-disk fragmentation is not a concern.

rmdir --ignore-fail-on-non-empty ??	# Probably unnecessary.
for i in ??
do
	mkdir $i.new
	mv $i/* $i.new
	rmdir $i
	mv $i.new $i
done

^ permalink raw reply

* Order status, moon-glittering
From: Nora Tovar @ 2006-06-12  5:16 UTC (permalink / raw)
  To: linux-newbie

Even if you have no erectin problems SOFT CIAzLIS 
would help you to make BETTER SE  X MORE OFTEN!
and to bring  unimagnable plesure to her.

Just disolve half a pil under your tongue 
and get ready for action in 15 minutes. 

The tests showed that the majority of men 
after taking this medic ation were able to have 
PERFECT ER ECTI ON during 36 hours!

VISIT US, AND GET OUR SPECIAL 70% DISC OUNT OFER!

http://qdvile.feastlegend.com/?18596350

=====
too..." But no matter how he said it, it sounded  like  pleasant  fiction,
But  they  insisted that it was a powerful thunderbolt that blinded them. By
But with a difference. Here were gulls who thought as he thought, For each
to gawk that you couldn't push your way through them. And it was so typical.
night and cloud and storm, for the sport of it, while  the  Flock  huddled
     "So," I said. "I'm  not offering  any to you, because this is the first

     "I... I enjoy speed," Jonathan said, taken aback but proud  that  the
     So  we got around the mosquito mange spot and got up on the hillock. It

^ permalink raw reply

* Fresh stuff Most quality products for anyone who wants to become a champion in bed
From: Frankie @ 2006-06-12  5:27 UTC (permalink / raw)
  To: git

You certainly know the way of becoming a real, powerful man Doctor approved and safe formula to boost your confidence to heavens Increased desire, increased size and increased energy do matter She will love it immediately after you pull down your pants Go here to get wonderful products which will increase your size http://sexygd.com 

The proof of the pudding is in the eating Dog buy rum, cow drink am, hog in sty get drunk. A barking dog never bites. One, one dutty build dam.

^ permalink raw reply

* Re: git-applymbox broken?
From: Eric W. Biederman @ 2006-06-12  7:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606111735440.5498@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 11 Jun 2006, Eric W. Biederman wrote:
>> 
>> This doesn't look like the From: header was in the middle of the
>> message until it was imported into git so it is probably a small
>> logic error that is easily corrected.  But I need to see what
>> we are parsing so I can understand what is happening.
>
> No, it's at the top of the body, although there might have been an empty 
> line or two (ie whitespace only) before it.

Ok.  I'm not certain why we would not be ignoring blank lines that
we used to skip.  The untested patch below should ensure we always
skip those lines.


>> Even if the header lines are in the middle of the body?
>
> What do you mean by "middle"?
>
> No, it should only look at From: and Subject: lines if they are at the 
> very top, with no other non-whitespace lines above them. But when it looks 
> at them and uses the data from them, it should then remove them from the 
> body - they are "conceptually" just extended header lines that just 
> happened to technically (from an rfc822 standpoint) be in the body of the 
> email.

This is a separate conversation and once the problem of not ignoring leading
blank lines is fixed I will be happy to address it.

Eric

diff --git a/mailinfo.c b/mailinfo.c
index 5b6c215..72c5454 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -279,6 +279,14 @@ static void handle_inbody_header(int *se
                        return;
                }
        }
+       /* Ignore leading blank lines */
+       if (!(*seen & SEEN_PREFIX)) {
+               char *ch;
+               for (ch = line; isspace(*ch); ch++)
+                       ;
+               if (*ch == '\0')
+                       return;
+       }
        *seen |= SEEN_PREFIX;
 }

^ permalink raw reply related

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Florian Forster @ 2006-06-12  8:24 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90606111502g607be3cfnf83ce81764a5f909@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

Hi Martin,

On Mon, Jun 12, 2006 at 10:02:05AM +1200, Martin Langhoff wrote:
> good! git-blame/git-annotate are quite expensive to run. Do you think
> it would make sense making it conditional on a git-repo-config option
> (gitweb.blame=1)?

sure, that it's a big change and if it helps the kernel.org folks ;)
I'll follow-up with a patch for this in a second..

Would it help to cache `git-annotate's output, e.g. using one of the
`Cache::Cache' modules? Or is browsing of blobs too sparse for this to
result in a performance gain? I'm sure the modules could be integrated
as a weak precondition.

I have two more points regarding gitweb's configuration:
- IMHO it would make sense to move the general gitweb-configuration
  (where are the repositories, where are the binaries, etc) out of the
  script.  As far as I know the Debian maintainer of the `gitweb'
  package has asked for this before but was refused for some reason..
  Possibly a file `gitweb.conf' in the same directory as the script
  could be read and overwrite the builtin defaults..?
- If `GIT_DIR/description' is only used by gitweb it may be more
  consistent to use the git-repo-config option `gitweb.description' in
  the future.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH] gitweb: Make the availability of the `blame' interface in gitweb configurable.
From: Florian Forster @ 2006-06-12  8:31 UTC (permalink / raw)
  To: git; +Cc: Florian Forster
In-Reply-To: <20060612082448.GA11857@verplant.org>

Since `git-annotate' is an expensive operation to run it may be desirable to
deactivate this functionality. This patch introduces the `gitweb.blame' option
to git-repo-config and disables the blame support by default.

Signed-off-by: Florian Forster <octo@verplant.org>


---

 gitweb/gitweb.cgi |   27 +++++++++++++++++++++++++--
 1 files changed, 25 insertions(+), 2 deletions(-)

3eea23e8d8a13579455cdf8d5088794d33bdcba2
diff --git a/gitweb/gitweb.cgi b/gitweb/gitweb.cgi
index 91c075d..5eabe06 100755
--- a/gitweb/gitweb.cgi
+++ b/gitweb/gitweb.cgi
@@ -837,6 +837,25 @@ sub git_read_projects {
 	return @list;
 }
 
+sub git_get_project_config {
+	my $key = shift;
+
+	return unless ($key);
+	$key =~ s/^gitweb\.//;
+	return if ($key =~ m/\W/);
+
+	my $val = qx(git-repo-config --get gitweb.$key);
+	return ($val);
+}
+
+sub git_get_project_config_bool {
+	my $val = git_get_project_config (@_);
+	if ($val and $val =~ m/true|yes|on/) {
+		return (1);
+	}
+	return; # implicit false
+}
+
 sub git_project_list {
 	my @list = git_read_projects();
 	my @projects;
@@ -1233,6 +1252,7 @@ sub git_tag {
 
 sub git_blame {
 	my $fd;
+	die_error('403 Permission denied', "Permission denied.") if (!git_get_project_config_bool ('blame'));
 	die_error('404 Not Found', "What file will it be, master?") if (!$file_name);
 	$hash_base ||= git_read_head($project);
 	die_error(undef, "Reading commit failed.") unless ($hash_base);
@@ -1468,6 +1488,7 @@ sub git_blob {
 		my $base = $hash_base || git_read_head($project);
 		$hash = git_get_hash_by_path($base, $file_name, "blob") || die_error(undef, "Error lookup file.");
 	}
+	my $have_blame = git_get_project_config_bool ('blame');
 	open my $fd, "-|", "$gitbin/git-cat-file blob $hash" or die_error(undef, "Open failed.");
 	git_header_html();
 	if (defined $hash_base && (my %co = git_read_commit($hash_base))) {
@@ -1479,8 +1500,10 @@ sub git_blob {
 		      " | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=commitdiff;h=$hash_base")}, "commitdiff") .
 		      " | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=tree;h=$co{'tree'};hb=$hash_base")}, "tree") . "<br/>\n";
 		if (defined $file_name) {
-			print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blame;h=$hash;hb=$hash_base;f=$file_name")}, "blame") .
-			" | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob_plain;h=$hash;f=$file_name")}, "plain") .
+			if ($have_blame) {
+				print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blame;h=$hash;hb=$hash_base;f=$file_name")}, "blame") .  " | ";
+			}
+			print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob_plain;h=$hash;f=$file_name")}, "plain") .
 			" | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob;hb=HEAD;f=$file_name")}, "head") . "<br/>\n";
 		} else {
 			print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob_plain;h=$hash")}, "plain") . "<br/>\n";
-- 
1.3.3

^ permalink raw reply related

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Martin Langhoff @ 2006-06-12  8:34 UTC (permalink / raw)
  To: Florian Forster; +Cc: git
In-Reply-To: <20060612082448.GA11857@verplant.org>

On 6/12/06, Florian Forster <octo@verplant.org> wrote:
> On Mon, Jun 12, 2006 at 10:02:05AM +1200, Martin Langhoff wrote:
> > good! git-blame/git-annotate are quite expensive to run. Do you think
> > it would make sense making it conditional on a git-repo-config option
> > (gitweb.blame=1)?
>
> sure, that it's a big change and if it helps the kernel.org folks ;)
> I'll follow-up with a patch for this in a second..

That'd be great. I am looking into integrating other feature patches
too (like tarball downloads) that are useful but costly, making them
conditional too...

> Would it help to cache `git-annotate's output, e.g. using one of the

I think we can rely on proxies doing good caching -- a busy host like
kernel.org will have big reverse proxies in front. A git-blame for a
given file+commitsha doesn't change, so we can give it a long cache
time, like... forever ;-)

> I have two more points regarding gitweb's configuration:
> - IMHO it would make sense to move the general gitweb-configuration
>   (where are the repositories, where are the binaries, etc) out of the
>   script.  As far as I know the Debian maintainer of the `gitweb'
>   package has asked for this before but was refused for some reason..

Sounds like a reasonable request. I would make it rely on env vars,
$ENV{GITWEB_CONFIG} can generally point to /etc/gitweb.conf, and that
would override the config values we have.

This is trivial, and it means we buy a lot of flexibility from
apache's httpd.conf being able to point to different config files
depending on arbitrarty conditions.

BTW, I haven't seen the debian maintainer's request, was that on the list?

> - If `GIT_DIR/description' is only used by gitweb it may be more
>   consistent to use the git-repo-config option `gitweb.description' in
>   the future.

Not sure how git-repo configurations deal with long entries. Right now
the description may contain html for instance.

martin

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Shawn Pearce @ 2006-06-12  8:40 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Florian Forster, git
In-Reply-To: <46a038f90606120134n21c269bbj3e8c7e31d4d93a23@mail.gmail.com>

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> >- If `GIT_DIR/description' is only used by gitweb it may be more
> >  consistent to use the git-repo-config option `gitweb.description' in
> >  the future.
> 
> Not sure how git-repo configurations deal with long entries. Right now
> the description may contain html for instance.

It has to be escaped, which could be ugly with HTML.  For example:

  [gitweb]
    description=<div class=\"description\">\n\
This is a chunk of text which describes this repository.  Some\n\
of this text might be rather long, and might need many lines to\n\
really be able to describe the repository in a nice editor such as\n\
vi running in an 80 character wide xterm.\n\
</div>

Forget a \ in front of a double quote (") or an LF and the entry is
corrupt.  So as nice as it sounds it might not be the best way to
obtain a description for gitweb.

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Johannes Schindelin @ 2006-06-12  9:08 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Martin Langhoff, Florian Forster, git
In-Reply-To: <20060612084056.GA29220@spearce.org>

Hi,

On Mon, 12 Jun 2006, Shawn Pearce wrote:

>   [gitweb]
>     description=<div class=\"description\">\n\
> This is a chunk of text which describes this repository.  Some\n\
> of this text might be rather long, and might need many lines to\n\
> really be able to describe the repository in a nice editor such as\n\
> vi running in an 80 character wide xterm.\n\
> </div>

AFAIK the trailing "\" will not work.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Shawn Pearce @ 2006-06-12  9:19 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Martin Langhoff, Florian Forster, git
In-Reply-To: <Pine.LNX.4.63.0606121107520.21813@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
> 
> On Mon, 12 Jun 2006, Shawn Pearce wrote:
> 
> >   [gitweb]
> >     description=<div class=\"description\">\n\
> > This is a chunk of text which describes this repository.  Some\n\
> > of this text might be rather long, and might need many lines to\n\
> > really be able to describe the repository in a nice editor such as\n\
> > vi running in an 80 character wide xterm.\n\
> > </div>
> 
> AFAIK the trailing "\" will not work.

Actually it does.  I figured out that it works (and why it works)
when I implemented the GIT repository parser in Java for my pure
Java version of GIT...

For example:

  [spearce@spearce-pb15 bob]$ cat .git/config 
  [core]
          repositoryformatversion = 0
          filemode = true
  [gitweb]
          description = This is a very\nlong line to put into GIT\n\
  repo config.\n\
  I hope it works.
          on = true
  [spearce@spearce-pb15 bob]$ git repo-config gitweb.description
  This is a very
  long line to put into GIT
  repo config.
  I hope it works.
  [spearce@spearce-pb15 bob]$ git repo-config gitweb.on
  true

The use of a trailing \ makes sense; the collapsing of multiple
spaces into one space unless quoted inside of "" doesn't.
But whatever...

-- 
Shawn.

^ permalink raw reply

* Re[1]: hi from Galusya B.
From: Galusya B. @ 2006-06-12 10:06 UTC (permalink / raw)
  To: Otto

Hi, Otto

I'm a very young and energetic lady! I have very positive attitude to life and people. I do enjoy new experience life can offer me: to see new interesting places, to meet new people.
I do try to enjoy every moment of life and accept everything the way it comes without complaining.
Though my life seems to be quite enjoyable there's one important thing missing. It's LOVE!
Without my beloved one, my soul mate, my King my life is not completed.
I wish i coud find him very soon so that we could share together every momement of the life-time romance! 
What about you? Could you be my King? If answer is "yes" - you can find more about me 
http://Aqgvj.im-waiting-4you.net/

Yourth faithfully
Galusya B.

^ permalink raw reply

* Re: Collecting cvsps patches
From: Anand Kumria @ 2006-06-12 11:27 UTC (permalink / raw)
  To: git
In-Reply-To: <20060611224205.GF1297@nowhere.earth>

On Mon, 12 Jun 2006 00:42:05 +0200, Yann Dirson wrote:

> http://ydirson.free.fr/soft/git/cvsps.git

I think you need to chmod +x hooks/post-update

and then run 'git-update-server-info'.

Cheers,
Anand

^ permalink raw reply

* bisect and gitk happy together
From: Martin Langhoff @ 2006-06-12 11:41 UTC (permalink / raw)
  To: git

I was using git-bisect earlier today, and at the exact point where it
told be about the bad commit, I opened gitk, which was showing all the
bad and good commits. It is great!

Two "user" notes, however:

 - git-bisect visualise wasn't as useful as just a plain gitk. (This
may be because I was working with ~60 commits in a medium-sized
project).

 - gitk didn't show the bad commit tagged specially, even if
git-bisect had just identified it. Of course I could find it, but I
had all the other good/bad commits well labelled. And not the one I
was looking for. Odd.

In any case, the bisect + gitk combo saved the day. I'm too ashamed to
tell what the bug actually was, though ;-)

martin

^ permalink raw reply

* [PATCH] cvsimport: keep one index per branch during import
From: Martin Langhoff @ 2006-06-12 11:50 UTC (permalink / raw)
  To: junkio, git; +Cc: Martin Langhoff

With this patch we have a speedup and much lower IO when
importing trees with many branches. Instead of forcing
index re-population for each branch switch, we keep
many index files around, one per branch.

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>

---

This patch should get some review. It is trivial, but not fully tested.
I am testing it on the moz repo (which will take a while) to check that I get
the same result with and without it. 

Performance-wise, it seems to be doing ~15K commits per hour, with
the mozilla repo, up from ~6Kcph on the same hardware. Of course, 
this is only noticeable in projects with lots of concurrent branches.
Linear projects don't get much from this patch.

With this change, we are now truly waiting on cvs to hand over the
files pronto! Running locally, it is apparent that it isn't IO wait
but the latency of the chatty cvs protocol that is making this slow.

Probably forking 2 or 3 processes to prefetch filerevs from cvs
and put them in a queue directory for the main process to pick
up would work wonders. Actually, they could call git-hash-object
and just put some file metadata in the queue directory. 
---
 git-cvsimport.perl |   37 ++++++++++++++++++++++++++++++-------
 1 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/git-cvsimport.perl b/git-cvsimport.perl
old mode 100755
new mode 100644
index 76f6246..9c4588f
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -465,10 +465,15 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git";
 $ENV{"GIT_DIR"} = $git_dir;
 my $orig_git_index;
 $orig_git_index = $ENV{GIT_INDEX_FILE} if exists $ENV{GIT_INDEX_FILE};
-my ($git_ih, $git_index) = tempfile('gitXXXXXX', SUFFIX => '.idx',
-				    DIR => File::Spec->tmpdir());
-close ($git_ih);
-$ENV{GIT_INDEX_FILE} = $git_index;
+
+my %index; # holds filenames of one index per branch
+{   # init with an index for origin
+    my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+			     DIR => File::Spec->tmpdir());
+    close ($fh);
+    $index{$opt_o} = $fn;
+}
+$ENV{GIT_INDEX_FILE} = $index{$opt_o};
 unless(-d $git_dir) {
 	system("git-init-db");
 	die "Cannot init the GIT db at $git_tree: $?\n" if $?;
@@ -496,6 +501,13 @@ unless(-d $git_dir) {
 	$tip_at_start = `git-rev-parse --verify HEAD`;
 
 	# populate index
+	unless ($index{$last_branch}) {
+	    my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+				     DIR => File::Spec->tmpdir());
+	    close ($fh);
+	    $index{$last_branch} = $fn;
+	}
+	$ENV{GIT_INDEX_FILE} = $index{$last_branch};
 	system('git-read-tree', $last_branch);
 	die "read-tree failed: $?\n" if $?;
 
@@ -776,8 +788,17 @@ while(<CVS>) {
 		}
 		if(($ancestor || $branch) ne $last_branch) {
 			print "Switching from $last_branch to $branch\n" if $opt_v;
-			system("git-read-tree", $branch);
-			die "read-tree failed: $?\n" if $?;
+			unless ($index{$branch}) {
+			    my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+						     DIR => File::Spec->tmpdir());
+			    close ($fh);
+			    $index{$branch} = $fn;
+			    $ENV{GIT_INDEX_FILE} = $index{$branch};
+			    system("git-read-tree", $branch);
+			    die "read-tree failed: $?\n" if $?;
+			} else {
+			    $ENV{GIT_INDEX_FILE} = $index{$branch};
+		        }
 		}
 		$last_branch = $branch if $branch ne $last_branch;
 		$state = 9;
@@ -841,7 +862,9 @@ #	VERSION:1.96->1.96.2.1
 }
 commit() if $branch and $state != 11;
 
-unlink($git_index);
+foreach my $git_index (values %index) {
+    unlink($git_index);
+}
 
 if (defined $orig_git_index) {
 	$ENV{GIT_INDEX_FILE} = $orig_git_index;
-- 
1.4.0.g5fba

^ permalink raw reply related

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Linus Torvalds @ 2006-06-12 14:59 UTC (permalink / raw)
  To: Florian Forster; +Cc: Martin Langhoff, git
In-Reply-To: <20060612082448.GA11857@verplant.org>

On Mon, 12 Jun 2006, Florian Forster wrote:
> 
> Would it help to cache `git-annotate's output, e.g. using one of the
> `Cache::Cache' modules? Or is browsing of blobs too sparse for this to
> result in a performance gain? I'm sure the modules could be integrated
> as a weak precondition.

The apache setup at least on kernel.org is already set up to do caching, 
as long as the generated headers for the page allow it in the first place.

So caching inside gitweb is generally pointless, at least when it's at the 
level of one result page. At a higher level, if the internal caching might 
improve performance of _other_ pages because it caches the result of some 
intermediate important thing, it might be a different issue.

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox