Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-28  9:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce, Dan McGee
In-Reply-To: <7vy5w6xf7n.fsf@alter.siamese.dyndns.org>

On Fri, Oct 28, 2011 at 00:05, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> We recently upgraded to 1.7.7 and I've traced a very large slowdown in
>> packing down to this commit.
>
> Does Dan McGee's series leading to 38d4deb (pack-objects: don't traverse
> objects unnecessarily, 2011-10-18) help?

Yes it does!

When I do:

    git cherry-pick 703f05ad5835cff92b12c29aecf8d724c8c847e2..38d4deb

Here's the time on the linux-2.6.git repack with that series:

    real    0m53.969s
    user    0m47.063s
    sys     0m3.020s

On the repository I was having troubles with this was the result on
master:

    Total 911023 (delta 687744), reused 905500 (delta 683064)

    real    6m0.487s
    user    4m1.751s
    sys     1m56.331s

And with the cherry-pick:

    Total 911023 (delta 687744), reused 911023 (delta 687744)

    real    1m44.192s
    user    0m32.169s
    sys     0m4.383s

And with 1b4bb16b9ec331c91e28d2e3e7dee5070534b6a2 reverted:

    Total 911023 (delta 687744), reused 911023 (delta 687744)

    real    1m23.796s
    user    0m31.931s
    sys     0m3.549s

So it's still slower, but not intolerably slower. I'd be interested to
find out why we have that 20% difference that doesn't show up on
linux-2.6.git though, the repository is mostly source code but there
are some binary blobs in it as well, although not very large, the
overall size of the entire repository is <500MB.

^ permalink raw reply

* Re: git alias and --help
From: Michael J Gruber @ 2011-10-28  9:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Gelonida N, git
In-Reply-To: <7vvcr9wyje.fsf@alter.siamese.dyndns.org>

Junio C Hamano venit, vidit, dixit 28.10.2011 06:05:
> Miles Bader <miles@gnu.org> writes:
> 
>> Of course, that would be the wrong thing for somebody that just wants
>> to be reminded what an alias expands too, but my intuition is that
>> this is a very tiny minority compared to people that want to examine
>> the options for the underlying command...
> 
> And it is doubly wrong if help backend is configured to be anything but
> manpages, no?
> 
> As I said, you should be able to come up with a patch that detects and
> special cases the no frills case (replacement to single token) to get what
> you want.

But "help" is still too much to type for the OP ;) How about this in
your config:

[alias]
	h = help
	hh = "!sh -c 'a=$(git config --get alias.$1); : ${a:=$1}; git help
${a%% *}' -"

Ugly as hell, I know, and works only for aliases whose first word is the
name of a git command, as well as for non-aliases. Catching "!command"
type aliases is left as an exercise to the reader.

Michael

^ permalink raw reply

* git rebase -p does not find mainline
From: David Kastrup @ 2011-10-28  7:51 UTC (permalink / raw)
  To: git

When doing git rebase -p in order to do a non-trivial rebase tracking
merges, I get a message about a missing -m option for specifying the
mainline.

git rebase does not have the corresponding option.  The fix is to do

git cherry-pick -m 1 offending-merge-commit
git rebase --continue

but that is quite unobvious from the resulting error message that does
not even mention "cherry-pick".  Since git rebase -p is supposed to
"preserve merges", it should just cherry-pick with the same mainline
that the original merge commit had, without asking the user questions or
even failing.

This is the version of git delivered with Ubuntu 11.10, namely 1.7.5.4.

-- 
David Kastrup

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Junio C Hamano @ 2011-10-28  6:12 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre, Shawn O. Pearce, Jeff King
In-Reply-To: <7v62j9veh3.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> I haven't started using type=8 and upwards for anything yet, but because
> we have only one "future expansion" value left, I want us to be extremely
> careful in order to avoid painting us into a corner that we cannot get out
> of, so I am sending this out early for a preliminary review.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  cache.h     |    3 ++-
>  sha1_file.c |   36 ++++++++++++++++++++++++++++++++----
>  2 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 2e6ad36..b02139b 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -380,9 +380,10 @@ enum object_type {
>  	OBJ_TREE = 2,
>  	OBJ_BLOB = 3,
>  	OBJ_TAG = 4,
> -	/* 5 for future expansion */
> +	OBJ_EXT = 5, /* 5 for future expansion */
>  	OBJ_OFS_DELTA = 6,
>  	OBJ_REF_DELTA = 7,
> +	OBJ_CAT_TREE = 8,
>  	OBJ_ANY,
>  	OBJ_MAX
>  };

As people may be able to guess from the name, CAT_TREE is envisioned to
encode a large data (primarily of type "blob") by recording the object
name of a tree object and probably the total length, and would represent
the concatenation of all blobs contained in the tree object when the tree
is traversed in some fixed order (e.g. Avery's "bup split"). I am guessing
that the payload for CAT_TREE representation type will be:

 - 20-byte object name for the top-level tree object;

 - type of the basic object (commit, tree, blob, or tag) it represents,
   even though it is unlikely that we would want to record such a large
   commit or tag that needs CAT_TREE representation;

 - the total length of the basic object it represents, even though it is
   redundant (you could traverse and sum the sizes of blobs contained in
   the tree object), it would help sha1_object_info() and friends. This
   will be the "some size" I mentioned in the previous message for this
   representation type.

We would probably add loose object representation for CAT_TREE, which may
look like:

    "cattree" <size of this cat-tree in decimal> NUL
    <basic object type> <size of the basic object> NUL
    <object name of the top-level tree>

and would need to teach unpack_sha1_file() about it. One caveat is that
we would want to keep the "contents name the object" invariant, so even if
a large blob is expressed as a CAT_TREE, its object name must still be
what we would get by hashing '"blob" <size> NUL <payload>'.

A loose object file in "cattree" representation will not hash to the value
a naïve implementation would expect, and fsck_sha1() needs to be aware of
it. I haven't thought things through in this area.

Further work would involve (no way exhaustive, of course):

 - Teach fsck and connectivity tools that objects that are reachable from
   any object (even a blob) that is represented as a CAT_TREE are needed
   and reachable by that object;

 - Teach pack-objects that anything that is represented as a CAT_TREE does
   not need to be deltified (the objects used as its representation would
   go through the usual deltification rules);

 - Teach unpack-objects to expand CAT_TREE representation into a "cattree"
   loose object.

 - Perhaps teach the attributes mechanism to lie to anybody who asks that
   any object in CAT_TREE representation is a binary file to trigger the
   "we do not unnecessarily look at binary" logic in "git diff" machinery.

 - Teach fast-import to write out CAT_TREE representation. This is
   probably the quickest and least-impact way to exploit existing support
   for large files in index_fd().

 - Update grep.c::grep_buffer() to take not <buf,size> but git_istream,
   and rewrite builtin/grep.c::grep_sha1() to use streaming interface.

^ permalink raw reply

* Re: [PATCHv2 3/3] completion: match ctags symbol names in grep patterns
From: Jeff King @ 2011-10-28  6:05 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git
In-Reply-To: <20111023212928.GG22551@goldbirke>

On Sun, Oct 23, 2011 at 11:29:28PM +0200, SZEDER Gábor wrote:

> On Fri, Oct 21, 2011 at 01:30:21PM -0400, Jeff King wrote:
> > This incorporates the suggestions from Gábor's review, with one
> > exception: it still looks only in the current directory for the "tags"
> > files. I think that might have some performance implications, so I'd
> > rather add it separately, if at all.
> 
> I agree that scanning through a whole working tree for tags files
> would cost too much.  But I think that a tags file at the top of the
> working tree is common enough to be supported, and checking its
> existence is fairly cheap.

Actually, it's not too expensive. Asking git for the top of the working
tree means it has to traverse up to there anyway. So the trick is just
doing our search without invoking too many external tools which would
cause unnecessary forks.

The patch is below, but I'm still not sure it's a good idea.

Grep only looks in the current subdirectory for matches. So if I am in
"src/foo/bar/", and the tags file is in "src/", then is it really a good
way to generate completions? Most of what's in the tags file will
probably be in _other_ subdirectories of "src/", and won't be matched by
grep at all. So we will give useless entries to bash to complete.

> So how about something like this for the case arm? (I didn't actually
> tested it.)
> 
> 		local tagsfile
> 		if test -r tags; then
> 			tagsfile=tags
> 		else
> 			local dir="$(__gitdir)"

You don't want __gitdir here, but rather "git rev-parse --show-cdup".

> Btw, there is a bug in the case statement: 'git --no-pager grep <TAB>'
> offers refs instead of symbols, because $cword is not 2 and $prev
> doesn't start with a dash.  But it's not worse than the current
> behavior, so I don't think this bug is a show-stopper for the patch.

Yeah. The intent of the "$cword is 2" thing is to know that we are the
first non-option argument. Arguably, _get_comp_words_by_ref should
somehow tell us which position we are completing relative to the actual
command name.

-Peff

---
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index af283cb..b0ed657 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1429,6 +1429,39 @@ _git_gitk ()
 	_gitk
 }
 
+__git_cdup_dirs() {
+	local prefix=$(git rev-parse --show-cdup 2>/dev/null)
+	local oldifs=$IFS
+	local dots
+	local i
+	IFS=/
+	for i in $prefix; do
+		dots=$dots../
+		echo "$dots"
+	done
+	IFS=$oldifs
+}
+
+__git_find_in_cdup_one() {
+	local dir=$1; shift
+	for i in "$@"; do
+		if test -r "$dir$i"; then
+			echo "$dir$i"
+			return 0
+		fi
+	done
+	return 1
+}
+
+__git_find_in_cdup() {
+	__git_find_in_cdup_one "" "$@" && return
+
+	local dir
+	for dir in $(__git_cdup_dirs); do
+		__git_find_in_cdup_one "$dir" "$@" && return
+	done
+}
+
 __git_match_ctag() {
 	awk "/^${1////\\/}/ { print \$1 }" "$2"
 }
@@ -1457,8 +1490,9 @@ _git_grep ()
 
 	case "$cword,$prev" in
 	2,*|*,-*)
-		if test -r tags; then
-			__gitcomp "$(__git_match_ctag "$cur" tags)"
+		local tags=$(__git_find_in_cdup tags)
+		if test -n "$tags"; then
+			__gitcomp "$(__git_match_ctag "$cur" "$tags")"
 			return
 		fi
 		;;

^ permalink raw reply related

* [RFC/PATCH] define the way new representation types are encoded in the pack
From: Junio C Hamano @ 2011-10-28  6:04 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre, Shawn O. Pearce, Jeff King

In addition to four basic types (commit, tree, blob and tag), the pack
stream can encode a few other "representation" types, such as REF_DELTA
and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
we do not have much room to add new representation types in place, but we
do have one value reserved for future expansion.

This patch is about defining how that reserved value is used.

The first byte in the pack stream data consists of the following for the
current representation types:

 - Bit 0-3 are used for the low 4-bit of "some" size (not necessarily the
   size of the representation);

 - Bit 4-6 are used for object types 0-7, but we have not used type 5 so
   far and reserved it for future expansion (we could also use type 0
   recorded in the pack stream for future expansion, just like how I
   convert 5 into the real "extended" representation type in this patch);

 - Bit 7 is used to signal if the second byte needs to be read for sizes
   that do not fit in the 4-bit.

When bit 4-6 encodes type 5, the first byte is used this way:

 - Bit 0-3 denotes the real "extended" representation type. Because types
   0-7 can already be encoded without using the extended format, we can
   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
   type 11 = 3 + 8);

 - Bit 4-6 has the value "5";

 - Bit 7 is used to signal if the _third_ byte needs to be read for larger
   size that cannot be represented with 8-bit.

As it is unlikely for us to pack things that do not need to record any
size, the second byte is always used in full to encode the low 8-bit of
the size.

I haven't started using type=8 and upwards for anything yet, but because
we have only one "future expansion" value left, I want us to be extremely
careful in order to avoid painting us into a corner that we cannot get out
of, so I am sending this out early for a preliminary review.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h     |    3 ++-
 sha1_file.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 2e6ad36..b02139b 100644
--- a/cache.h
+++ b/cache.h
@@ -380,9 +380,10 @@ enum object_type {
 	OBJ_TREE = 2,
 	OBJ_BLOB = 3,
 	OBJ_TAG = 4,
-	/* 5 for future expansion */
+	OBJ_EXT = 5, /* 5 for future expansion */
 	OBJ_OFS_DELTA = 6,
 	OBJ_REF_DELTA = 7,
+	OBJ_CAT_TREE = 8,
 	OBJ_ANY,
 	OBJ_MAX
 };
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..4dcd023 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1254,16 +1254,43 @@ static int experimental_loose_object(unsigned char *map)
 }
 
 unsigned long unpack_object_header_buffer(const unsigned char *buf,
-		unsigned long len, enum object_type *type, unsigned long *sizep)
+	unsigned long len, enum object_type *typep, unsigned long *sizep)
 {
 	unsigned shift;
 	unsigned long size, c;
 	unsigned long used = 0;
+	enum object_type type;
 
+	/*
+	 * MSB of the first byte is used to tell if the second byte
+	 * needs to be read for the size, so type field is only 3-bit
+	 * wide.
+	 */
 	c = buf[used++];
-	*type = (c >> 4) & 7;
-	size = c & 15;
-	shift = 4;
+	type = (c >> 4) & 7;
+
+	if (type != OBJ_EXT) {
+		/*
+		 * For basic types of object representations, the low
+		 * 4-bit of the first byte is used for the lowermost
+		 * 4-bit of the size. The MSB of the first byte tells
+		 * if the second byte needs to be read for size.
+		 */
+		size = c & 15;
+		shift = 4;
+	} else {
+		/*
+		 * For extended types, the low 4-bit of the first byte
+		 * is used for the representation type (offset by 8),
+		 * and the size begins at the second byte. The MSB of
+		 * the first byte is still used to indicate the next
+		 * byte (i.e. the third byte) needs to be read for the
+		 * size.
+		 */
+		type = (c & 15) + 8;
+		size = buf[used++];
+		shift = 8;
+	}
 	while (c & 0x80) {
 		if (len <= used || bitsizeof(long) <= shift) {
 			error("bad object header");
@@ -1274,6 +1301,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 		shift += 7;
 	}
 	*sizep = size;
+	*typep = type;
 	return used;
 }
 

^ permalink raw reply related

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-28  4:07 UTC (permalink / raw)
  To: Miles Bader
  Cc: Jeff King, git, Nguyen Thai Ngoc Duy,
	Ævar Arnfjörð
In-Reply-To: <buo39edao6r.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> Jeff King <peff@peff.net> writes:
>> We do have people with utf-16 in their repositories on github. I
>> have no idea why they do such a thing, or what kinds of tricks they
>> do to make it usable (because without it, they just get "binary
>> files differ").
>
> Hmm, you could ask them ...  [or, I suppose more diplomatically, post
> a blog entry asking "Hey all you people who use github for utf-16
> encoded files, ..."]

People will hate such a flag day event initially and later thank you for
it ;-) 

I am afraid that that is a wishful thinking, though.

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-28  4:05 UTC (permalink / raw)
  To: Miles Bader; +Cc: Gelonida N, git
In-Reply-To: <buoty6t9937.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> Of course, that would be the wrong thing for somebody that just wants
> to be reminded what an alias expands too, but my intuition is that
> this is a very tiny minority compared to people that want to examine
> the options for the underlying command...

And it is doubly wrong if help backend is configured to be anything but
manpages, no?

As I said, you should be able to come up with a patch that detects and
special cases the no frills case (replacement to single token) to get what
you want.

^ permalink raw reply

* Re: git alias and --help
From: Miles Bader @ 2011-10-28  1:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Gelonida N, git
In-Reply-To: <7v8vo6xd4u.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:
>>> git branch --help
>>
>> How about "git help branch"?
>
> The reason why we do not do what you seem to be suggesting is because
> giving the same behaviour to "git b --help" as "git branch --help" is
> wrong.

I agree with Gelonida's followup:  although what you say makes sense,
it's still pretty annoying behavior for the very common case of a
simple renaming alias...

E.g., I have "co" aliased to "checkout", and so my fingers are very
very inclined to say "co" when I mean checkout... including when
asking for help.  I actually end up typing "git co --help", grumbling,
and retyping with the full command name, quite reguarly.

What I've often wished is that git's help system would output
something like:

   $ git help co
   `git co' is aliased to `checkout'

   Here's the help entry for `checkout':

   GIT-CHECKOUT(1)                   Git Manual                   GIT-CHECKOUT(1)

   NAME
          git-checkout - Checkout a branch or paths to the working tree
   ...

[with the "`git co' is aliased ..." header included in the pager
output.]

Of course, that would be the wrong thing for somebody that just wants
to be reminded what an alias expands too, but my intuition is that
this is a very tiny minority compared to people that want to examine
the options for the underlying command...

-Miles

-- 
Suburbia: where they tear out the trees and then name streets after them.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Miles Bader @ 2011-10-28  1:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Nguyen Thai Ngoc Duy,
	Ævar Arnfjörð
In-Reply-To: <20111027234429.GA28187@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:
> We do have people with utf-16 in their repositories on github. I
> have no idea why they do such a thing, or what kinds of tricks they
> do to make it usable (because without it, they just get "binary
> files differ").

Hmm, you could ask them ...  [or, I suppose more diplomatically, post
a blog entry asking "Hey all you people who use github for utf-16
encoded files, ..."]

-Miles

-- 
Year, n. A period of three hundred and sixty-five disappointments.

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28  0:24 UTC (permalink / raw)
  To: git
In-Reply-To: <7v8vo6xd4u.fsf@alter.siamese.dyndns.org>

On 10/28/2011 12:50 AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Gelonida N <gelonida@gmail.com> writes:
>>
>>> Is there any trick to get the help text of git branch without
>>> having to type
>>>
>>> git branch --help
>>
>> How about "git help branch"?
> 
> It was bad of me to write a tongue-in-cheek answer, get distracted and
> ended up sending it without the real answer.
No issues. At least I got an answer :-)
> 
> The reason why we do not do what you seem to be suggesting is because
> giving the same behaviour to "git b --help" as "git branch --help" is
> wrong.
> 
> To see why, imagine you have configured an alias that is not a simple and
> stupid substitution "b == branch", but something like "bt == branch -t",
> and then want to know what you should write after "git bt".  Giving the
> manpage for branch without giving them any hint that they configured that
> alias to produce customized behaviour that is different from plain vanilla
> "branch" is not quite acceptable.
> 
> I think you _could_ make a patch that special cases a simple and straight
> substitution and skip the "foo is aliased to bar" step, but I doubt it is
> worth it to lose consistency between "git b --help" vs "git bt --help"
> that way.


I understand the reasoning and agree, that as general case it might not
be a good idea to have the behaviour, taht I expected to be the default
behaviour

It's just, that I am lazy so I shortened  my most common commands

For example:
co=checkout
ci=commit
b=branch
sm=submodule

However I'm not smart enough to remember all options for all commands
and so I have to ask git for occasional help.


It feels so 'unlazy' to have to type the full command 'just' to get help


What I wondered though is whether there couldn't be a way to be lazy and
avoid confusion.

1.) prompt for full help
-------------------------

One option would be to display what the command is aliased to and to
prompt whether one wants to see the full help of the base command.

`git b' is aliased to `branch'
pleaset press h if you want to see the full help for git branch
or press q to quit


2.) allow copy paste by displaying the command to be typed for help.
---------------------------------------------------------------------
display sometihing like:

`git b' is aliased to `branch'
for help about the branch command type:  git --help


Then lazy people could copy paste the command instead of typing it.


3.) special alias syntax to allow 'forwarding' of --help
------------------------------------------------------------

some kind of special alias syntax, which explicitely forces that --help
is 'forwarded' to the alias and not treated by git


example:

b = branch   # --help would have the current behaviour

I tried following to see whether '--help' is now being 'forwarded'

b = ! git branch
but also  here --help is treated by git

A possibility could be another 'special' character. For the sake of the
example I used '?'
b = ?branch # 'the question mark would allow 'forwarding' the
             # --help option to the alias'

b = ?! git branch



My workaround
------------

Fort the time being I might do something like

bhlp = branch --help

I would have prefered b_hlp or b_h, but it seems underscore isn't
allowed in an alias.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-28  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v1utyx9ri.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 05:03:29PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > My interest is to make things like bare-repository diff (and everything
> > built on it; i.e., things like github, gitweb, or whatever) do the sane
> > thing for these people, even if I think what they're doing is wrong.
> 
> I do not think we are talking about right or wrong. I was primarily saying
> that textconv may not be the right thing (think github/gitweb showing blob
> contents, nicely formatted inside the chrome the site provides).

But I think it is probably a wrong thing to store utf-16 as the
canonical format inside the git repository. Git simply can't handle it
for diffing. And the right thing, as you suggested, is clean/smudge.

But I'm dealing with repositories on the server side, where it is too
late to do clean/smudge; I just get whatever junk people commited.

> We have in-repository representation that diff and grep and friends work
> on, and output conversion layer that externalizes the result of them in
> the form of "smudge". Another layer above the in-repository representation
> and below operations could convert UTF-16 to UTF-8 when going outward and
> in the opposite when going inward.

I'm not sure that could sanely be done in a backwards compatible way.
Doing it with just textual diffs is a hack, of course, but at least we
know that the damage is limited, and the diff we generate on top doesn't
care that much about the original sha1s[1]. But should read_object_sha1
learn to convert utf-16 into utf-8? I think madness lies that way, as
we are breaking assumptions about sha1 validity.

-Peff

[1] Actually, the text diff does mention the original and resulting
sha1s, which would now either bear no relation to the diff text, or bear
no relation to what's in the repo. Either way, I think we are creating
something that can't necessarily be applied, which is bad. And is why I
thought of textconv, which is basically the same concept (and has the
same problems).

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-28  0:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027234429.GA28187@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> My interest is to make things like bare-repository diff (and everything
> built on it; i.e., things like github, gitweb, or whatever) do the sane
> thing for these people, even if I think what they're doing is wrong.

I do not think we are talking about right or wrong. I was primarily saying
that textconv may not be the right thing (think github/gitweb showing blob
contents, nicely formatted inside the chrome the site provides).

The solution you suggested feels like a gross layering violation, unless
we do it everywhere, in which case I wouldn't mind too much.

We have in-repository representation that diff and grep and friends work
on, and output conversion layer that externalizes the result of them in
the form of "smudge". Another layer above the in-repository representation
and below operations could convert UTF-16 to UTF-8 when going outward and
in the opposite when going inward.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-27 23:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v39eez1ph.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 12:14:34PM -0700, Junio C Hamano wrote:

> > Minor nit, but this is just for diff, so it is not about clean/smudge
> > but rather about doing something like textconv.
> 
> I can understand if some tools in the Windows land prefer to work with
> these encodings, so clean/smudge to have the checkout in these encodings
> would be a reasonable thing not just diff but things like grep. On the
> other hand, I do doubt the sanity of these people if they want to have
> in-repository representation also in these encodings.

I'm pretty much of the same mind. We do have people with utf-16 in their
repositories on github. I have no idea why they do such a thing, or what
kinds of tricks they do to make it usable (because without it, they just
get "binary files differ").

My interest is to make things like bare-repository diff (and everything
built on it; i.e., things like github, gitweb, or whatever) do the sane
thing for these people, even if I think what they're doing is wrong. And
as always, I try to structure the git portions of that as much as
possible to be general and help everybody, so they can be pushed
upstream (also, then I don't have to worry about managing local changes
:) ).

But it sounds like this is probably just too ugly and should end up as a
github-specific thing.

-Peff

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Kyle Moffett @ 2011-10-27 23:20 UTC (permalink / raw)
  To: Atsushi Nakagawa
  Cc: kusmabite, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <20111028080033.57FC.B013761@chejz.com>

On Thu, Oct 27, 2011 at 19:00, Atsushi Nakagawa <atnak@chejz.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
>> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> >> Thanks for pointing this out, I completely forgot about write re-ordering.
>> >>
>> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
>> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
>> >> InterlockedExchange generates a full memory barrier:
>> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
>> >
>> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
>> >
>> > A write barrier in one thread is only effective if it is paired with a
>> > read barrier in the other thread.
>> >
>> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
>> > you don't have any guaranteed ordering.
>
> Out of curiosity, where could re-ordering be a problem here?  I'm
> thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
> of "mutex->cs" not being propagated to the waiting thread.  However,
> shouldn't that be a non-problem, as far as compiler reordering goes,
> because it's an external function call and only the address of mutex->cs
> is passed?
>
> The only other cause I could think of is if ordering at the CPU was
> somehow different (it could be if there're no special provisions for
> calling external functions) or if "InterlockedExchange(&mutex->autoinit,
> 0)" wasn't atomic in updating autoinit and doing the memory barrier.
>
> Either way, I couldn't vouch for the safety of the above logic without
> a memory barrier so this question is purely of an academical nature. :)
>
>> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
>>
>> OK, so I should probably do something like this instead?
>>
>> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
>>       ; /* wait for other thread */
>
> Technically, assuming only the updating of "mutex->cs" is in question,
> the ICE should only be required once after exiting the loop...
>
> There's a question of the propagation of the value of "mutex->autoinit"
> itself, but my take is that the memory barrier on the writing thread
> will push out the updated value across all CPUs, thus preventing an
> infinite loop.  The other factors, value caching and loop optimization
> by the compiler, should be prevented by the "volatile" keyword even with
> gcc or MSVC 2003.
>
>> I really appreciate getting some extra eyes on this, thanks.
>> Concurrent programming is not my strong-suit (as this exercise has
>> shown) ;)
>
> So would I. :)

Ok, so here's the race condition:

Thread1				Thread2
				/* Speculative prefetch */
				prefetch(*mutex);

if (mutex->autoinit) {
if (ICE(&mutex->autoinit, -1, 1) != -1) {
/* Now mutex->autoinit == -1 */
pthread_mutex_init(mutex, NULL);
/* This forces writes out to memory */
ICE(&mutex->autoinit, 0, -1);

				if (mutex->autoinit) {} /* false */
				/* No read barrier here */
				EnterCriticalSection(&mutex->cs);
				/* Use cached mutex->cs from earlier */

Even though you forced the memory write to be ordered in Thread 1 you
did not ensure that the read of autoinit occurred before the read of
mutex->cs in Thread 2.  If the first thing that EnterCriticalSection
does is follow a pointer or read a mutex key value in mutex->cs then
won't necessarily get the right answer.

The rule of memory barriers is the ALWAYS come in pairs.  This simple
example guarantees that Thread2 will read "tmp_a" == 1 if it
previously read "tmp_b" == 1, although getting "tmp_a" == 1 and
"tmp_b" != 1 is still possible.

Thread1:
a = 1;
write_barrier();
b = 1;

Thread2:
tmp_b = b;
read_barrier();
tmp_a = a;

I think there's a Documentation/memory-barriers.txt file in the kernel
source code with more helpful info.

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Atsushi Nakagawa @ 2011-10-27 23:00 UTC (permalink / raw)
  To: kusmabite; +Cc: Kyle Moffett, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <CABPQNSY9LdqOK=1nN_61ZRMv-ieXzSDYgNsQe0w21RAOw_D7yA@mail.gmail.com>

Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> >>> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
> >>>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> >>>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
> >>>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
> >>>>>> +{
> >>>>>> + [snip]
> >>>>>
> >>>>> The double-checked locking idiom. Very suspicious. Can you explain why it
> >>> [snip]
> >>>
> >>>  if (mutex->autoinit) {
> >>>
> >>> Assume two threads enter this block.
> >>>
> >>>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
> >>>
> >>> Only one thread, A, say on CPU A, will enter this block.
> >>>
> >>>        InitializeCriticalSection(&mutex->cs);
> >>>
> >>> Thread A writes some values. Note that there are no memory barriers
> >>> involved here. Not that I know of or that they would be documented.
> >>>
> >>>        mutex->autoinit = 0;
> >>>
> >>> And it writes another one. Thread A continues below to contend for the
> >>> mutex it just initialized.
> >>>
> >>>     } else
> >>>
> >>> Meanwhile, thread B, say on CPU B, spins in this loop:
> >>>
> >>>        while (mutex->autoinit != 0)
> >>>           ; /* wait for other thread */
> >>>
> >>> When thread B arrives here, it sees the value of autoinit that thread A
> >>> has written above.
> >>>
> >>> [snip]
> >>>
> >>
> >> Thanks for pointing this out, I completely forgot about write re-ordering.
> >>
> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
> >> InterlockedExchange generates a full memory barrier:
> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
> >
> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
> >
> > A write barrier in one thread is only effective if it is paired with a
> > read barrier in the other thread.
> >
> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
> > you don't have any guaranteed ordering.

Out of curiosity, where could re-ordering be a problem here?  I'm
thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
of "mutex->cs" not being propagated to the waiting thread.  However,
shouldn't that be a non-problem, as far as compiler reordering goes,
because it's an external function call and only the address of mutex->cs
is passed?

The only other cause I could think of is if ordering at the CPU was
somehow different (it could be if there're no special provisions for
calling external functions) or if "InterlockedExchange(&mutex->autoinit,
0)" wasn't atomic in updating autoinit and doing the memory barrier.

Either way, I couldn't vouch for the safety of the above logic without
a memory barrier so this question is purely of an academical nature. :)

> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
> 
> OK, so I should probably do something like this instead?
> 
> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
> 	; /* wait for other thread */

Technically, assuming only the updating of "mutex->cs" is in question,
the ICE should only be required once after exiting the loop...

There's a question of the propagation of the value of "mutex->autoinit"
itself, but my take is that the memory barrier on the writing thread
will push out the updated value across all CPUs, thus preventing an
infinite loop.  The other factors, value caching and loop optimization
by the compiler, should be prevented by the "volatile" keyword even with
gcc or MSVC 2003.

> I really appreciate getting some extra eyes on this, thanks.
> Concurrent programming is not my strong-suit (as this exercise has
> shown) ;)

So would I. :)

-- 
Atsushi Nakagawa
<atnak@chejz.com>
Changes are made when there is inconvenience.

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:50 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <7vfwiexe6m.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Gelonida N <gelonida@gmail.com> writes:
>
>> Is there any trick to get the help text of git branch without
>> having to type
>>
>> git branch --help
>
> How about "git help branch"?

It was bad of me to write a tongue-in-cheek answer, get distracted and
ended up sending it without the real answer.

The reason why we do not do what you seem to be suggesting is because
giving the same behaviour to "git b --help" as "git branch --help" is
wrong.

To see why, imagine you have configured an alias that is not a simple and
stupid substitution "b == branch", but something like "bt == branch -t",
and then want to know what you should write after "git bt".  Giving the
manpage for branch without giving them any hint that they configured that
alias to produce customized behaviour that is different from plain vanilla
"branch" is not quite acceptable.

I think you _could_ make a patch that special cases a simple and straight
substitution and skip the "foo is aliased to bar" step, but I doubt it is
worth it to lose consistency between "git b --help" vs "git bt --help"
that way.

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Jakub Narebski @ 2011-10-27 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Shawn O. Pearce
In-Reply-To: <CACBZZX7tghoHhxCygEj9DZSxvKyTvybawVA2HwHBkjBaH73Ujg@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Oct 27, 2011 at 23:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> > Actually it just seems slow in general, not just on repositories with
> > a lot of tags, on linux-2.6.git with this patch:
> 
> Here's profiling with gprof for everything with >1% of execution time
> with the patch applied:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.07     15.99    15.99  2184059     0.00     0.00  add_descendants_to_write_order
>      20.25     31.35    15.37 1146371554     0.00     0.00  add_to_write_order
[...]
> 
> And without:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.29      9.13     9.13 142180385     0.00     0.00  hashcmp
>      10.59     13.67     4.54 90592818     0.00     0.00  lookup_object
[...]

Errr... do or do not gprof results include time spend in libraries?
Though that might not matter for this case.

Can you repeat profiling using "perf events" or something using it
(e.g. via PAPI library like HPCToolkit)?

-- 
Jakub Narębski

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:28 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <j8clg9$ldh$1@dough.gmane.org>

Gelonida N <gelonida@gmail.com> writes:

> Is there any trick to get the help text of git branch without
> having to type
>
> git branch --help

How about "git help branch"?

^ permalink raw reply

* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Junio C Hamano @ 2011-10-27 22:26 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-3-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> Two optimizations take place here- we can start our objects array
> iteration from a known point where we left off before we started trying
> to find our tags,

This I would understand (but I am somewhat curious how much last_untagged
would advance relative to nr_objects for this half of the optimization to
be worth it), but...

> and we don't need to do the deep dives required by
> add_family_to_write_order() if the object has already been marked as
> filled.

I am not sure if this produces the identical result that was benchmarked
in the original series.

For example, if you have a tagged object that is not a commit (say a
blob), you would have written that blob in the second phase (write tagged
objects together), so the family of blobs that share same delta parent as
that blob will not be written in this "Finally all the rest" in the right
place in the original list, no?

I do not think this change would forget to fill an object that needs to be
filled, but it would affect the resulting ordering of the list, so...

> @@ -560,8 +561,13 @@ static struct object_entry **compute_write_order(void)
>  	/*
>  	 * Finally all the rest in really tight order
>  	 */
> -	for (i = 0; i < nr_objects; i++)
> -		add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	for (i = last_untagged; i < nr_objects; i++) {
> +		if (!objects[i].filled)
> +			add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	}
> +
> +	if(wo_end != nr_objects)
> +		die("ordered %u objects, expected %u", wo_end, nr_objects);

^ permalink raw reply

* git alias and --help
From: Gelonida N @ 2011-10-27 22:20 UTC (permalink / raw)
  To: git

I'm having a tiny question about git aliases.

in ~/.gitconfig I defined an alias

b = branch



This works well execp if I type


git b --help

Now I get the output
`git b' is aliased to `branch'


which is nice to know however what I reall wanted is to get the helptext
for the branch command.


Is there any trick to get the help text of git branch without
having to type

git branch --help

^ permalink raw reply

* Re: [PATCH 4/4] pack-objects: rewrite add_descendants_to_write_order() iteratively
From: Junio C Hamano @ 2011-10-27 22:13 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-4-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> This removes the need to call this function recursively, shinking the
> code size slightly and netting a small performance increase.
>
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>

Tricky.

As long as this is done after compute_write_order() populates
delta_sibling vs delta link in a consistent way, the new logic should
produce exactly the same result as the original code.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Junio C Hamano @ 2011-10-27 22:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6Ss4jLtdrDhLUNKCUjEHjHHKzfv-LMkOyTPDhRUXm4sQ@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We recently upgraded to 1.7.7 and I've traced a very large slowdown in
> packing down to this commit.

Does Dan McGee's series leading to 38d4deb (pack-objects: don't traverse
objects unnecessarily, 2011-10-18) help?

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 22:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6ZWOF=j-k8o-4NHmjS2HpyS+PmKjJh_QKevWurBf9pbA@mail.gmail.com>

On Thu, Oct 27, 2011 at 23:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> Actually it just seems slow in general, not just on repositories with
> a lot of tags, on linux-2.6.git with this patch:

Here's profiling with gprof for everything with >1% of execution time
with the patch applied:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls   s/call   s/call  name
     21.07     15.99    15.99  2184059     0.00     0.00
add_descendants_to_write_order
     20.25     31.35    15.37 1146371554     0.00     0.00  add_to_write_order
     11.94     40.41     9.06 142180385     0.00     0.00  hashcmp
      5.55     44.62     4.21 90592818     0.00     0.00  lookup_object
      4.64     48.14     3.52 72804470     0.00     0.00  hashcmp
      3.87     51.08     2.94 90007452     0.00     0.00  get_mode
      3.31     53.59     2.51 90007452     0.00     0.00  decode_tree_entry
      1.90     55.03     1.44  2184059     0.00     0.00
add_family_to_write_order
      1.79     56.39     1.36 43247856     0.00     0.00  hashcmp
      1.29     57.37     0.98                             pack_offset_sort
      1.27     58.33     0.96 90007452     0.00     0.00  update_tree_entry
      1.27     59.29     0.96 90592817     0.00     0.00  hashtable_index
      1.20     60.20     0.91  4009188     0.00     0.00  find_pack_revindex
      1.19     61.10     0.90  5899321     0.00     0.00  find_pack_entry_one
      1.12     61.95     0.85   269514     0.00     0.00
commit_list_insert_by_date
      1.08     62.77     0.82  5387773     0.00     0.00  patch_delta

And without:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls   s/call   s/call  name
     21.29      9.13     9.13 142180385     0.00     0.00  hashcmp
     10.59     13.67     4.54 90592818     0.00     0.00  lookup_object
      8.48     17.31     3.64 72638478     0.00     0.00  hashcmp
      6.60     20.14     2.83 90007452     0.00     0.00  decode_tree_entry
      6.15     22.77     2.64 90007452     0.00     0.00  get_mode
      2.99     24.05     1.28 43247182     0.00     0.00  hashcmp
      2.96     25.32     1.27 90592817     0.00     0.00  hashtable_index
      2.47     26.38     1.06 90007452     0.00     0.00  update_tree_entry
      2.26     27.35     0.97  4009188     0.00     0.00  find_pack_revindex
      2.05     28.23     0.88   269245     0.00     0.00  process_tree
      1.96     29.07     0.84   269514     0.00     0.00
commit_list_insert_by_date
      1.94     29.90     0.83                             pack_offset_sort
      1.73     30.64     0.74  5389900     0.00     0.00  patch_delta
      1.70     31.37     0.73  5885588     0.00     0.00  find_pack_entry_one
      1.38     31.96     0.59  8692967     0.00     0.00  hashcmp
      1.24     32.49     0.53  8175096     0.00     0.00
unpack_object_header_buffer
      1.14     32.98     0.49        1     0.49     0.59  write_idx_file
      1.12     33.46     0.48  5885588     0.00     0.00
nth_packed_object_offset
      1.12     33.94     0.48  6051632     0.00     0.00
locate_object_entry_hash

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6Ss4jLtdrDhLUNKCUjEHjHHKzfv-LMkOyTPDhRUXm4sQ@mail.gmail.com>

On Thu, Oct 27, 2011 at 23:01, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> I haven't dug into why this is but I'm pretty sure it's because this
> patch makes Git behave pathologically on repositories with a large
> amount of tags. git.git itself has ~27k revs / and ~450 tags, or a tag
> every ~60 revisions.
>
> Try it on e.g. a repository with a couple of hundred thousand revs and
> a tag every 10-20 revisions.

Actually it just seems slow in general, not just on repositories with
a lot of tags, on linux-2.6.git with this patch:

    $ time ~/g/git/git-pack-objects --keep-true-parents
--honor-pack-keep --non-empty --all --reflog
--delta-base-offs</dev/null
/home/avar/g/linux-2.6/.git/objects/pack/.tmp-pack
    Counting objects: 2184059, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (338780/338780), done.
    130605d5b82492a38452b9bc7bddfb4a6ef0e942
    Writing objects: 100% (2184059/2184059), done.
    Total 2184059 (delta 1825129), reused 2184059 (delta 1825129)

    real    1m28.426s
    user    1m19.661s
    sys     0m3.008s

With it reverted:

    $ time ~/g/git/git-pack-objects --keep-true-parents
--honor-pack-keep --non-empty --all --reflog --delta-base-offset
</dev/null /home/avar/g/linux-2.6/.git/objects/pack/.tmp-pack
    Counting objects: 2184059, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (338780/338780), done.
    130605d5b82492a38452b9bc7bddfb4a6ef0e942
    Writing objects: 100% (2184059/2184059), done.
    Total 2184059 (delta 1825129), reused 2184059 (delta 1825129)

    real    0m50.152s
    user    0m45.367s
    sys     0m2.920s

And if you create a lot of tags:

    git log --pretty=%h | perl -lne 'chomp(my $sha = <>); print $sha
if $i++ % 10 == 0' | parallel "git tag test-tag-{} {}"

Resulting in:

    $ git rev-list HEAD | wc -l
    269514
    $ git tag -l | wc -l
    13733

This'll be the time with this patch reverted, i.e. almost exactly the
same as with just ~250 tags:

    real    0m52.860s
    user    0m44.907s
    sys     0m3.172s

And with it applied, i.e. almost exactly the same as with just ~250
tags:

    real    1m29.080s
    user    1m21.825s
    sys     0m3.096s

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox